Linux and the I2C and SPI interfaces (part 2)

Intro

Note: because I had to recover the db and there wasn’t a backup of this post, the images are only thumbnails and the comments were lost 🙁

In the previous post stupid project I’ve implemented a very simple I2C and SPI device with an Arduino that was interfaced with a raspberry pi and there I’ve done some tests by using various ways to communicate with the Arduino. There I’ve pretty much showed that for those two buses it doesn’t really matter if you write a kernel driver or you just use the user space to access the devices. Well, if you think about it, spidev is also a kernel driver and also when you access the I2C from the user-space there’s still a kernel driver that does the work. So writing a driver for a specific subsystem instead of just interface them with a custom user-space tool has its uses, but it’s also not necessary and needs some consideration before you decide to go to either way.

Still, there was something interesting missing from that stupid project. What about speed and performance? What are your options? How these options affect the general performance? Is there a magic solution for all the cases? Well, with this stupid project I will try to fail to answer those questions and make it even more worse to give any valuable information and answers to these questions.

Components

NANOPI NEO

For the last stupid project I’ve used a raspberry pi, but I’ve also provided the sources for you to use also the nanopi neo. This time I won’t be that kind so I’ll only use the nanopi neo. The reason is that I like its small format and it’s quite capable for the task and also I didn’t want to use a powerful board for this. So, it’s a low-mid tier board and also dirty cheap.

STM32F103

Last time I’ve used the Arduino nano. Well, nano it’s an excellent board to implement an I2C/SPI slave device in… minutes. But, it lacks performance. So, it’s completely incapable to stress out the nanopi neo. Therefore, we need something much faster and here comes the STM32F103 as it has all the good stuff in there. 72MHz (or up to 128MHz if you overclock it, which I did, of course), DMA for both the I2C and SPI and also very, very, very cheap (I’m talking about the famous blue-pill). Therefore, I’ve implemented an I2C and SPI slave that both use DMA for fast data transfers.

OTHER COMPONENTS

The rest of the components are exactly the same with the previous stupid project. So we have a whateverphoto-resistor (it doesn’t really matter) and a whatever LED.

Project

This stupid project is focused actually on the Linux kernel. As everyone learns now in school the kernel comes in two main flavors, the SMP and PREEMPT-RT kernel. The first in the normal mainline kernel and the second one is the real-time patched version of the mainline. I won’t get into the details, but just to simplify, the main difference is that the PREEMPT-RT kernel, actually guarantees that any process that runs in the CPU will get a fair and predictable time of execution, which minimizes the latency of the application. Oversimplified, but that’s not a post about the Linux kernel.

Therefore, what happens if you have a couple of fastdevices that you want to interface under various conditions, like the CPU has a low or heavy background load? To find this out we actually need a fast slave, therefore the stm32f103 is just right for that, as we’ve seen in this stupid project that the SPI can achieve up to 63MHz by using DMA, which is way faster that the Arduino nano (and probably even the nanopi neo actually). So, by assuring that the slave device won’t be our bottleneck were good to go. Here you’ll find the repo for the project:

https://bitbucket.org/dimtass/linux-stm32-spi-i2c/src/master/

In order to build the nanopi neo image for both SMP and RT kernel you need Yocto, but again I won’t get into the details on that (you can read the README.md file in the repo for more detailed info). Therefore, to switch between the SMP and RT kernel you need to use either of the following combinations in the build/conf/local.conf file:

PREFERRED_PROVIDER_virtual/kernel = "linux-stable"
PREFERRED_VERSION_linux-stable = "4.19%"

Or for the PREEMPT-RT:

PREFERRED_PROVIDER_virtual/kernel = "linux-stable-rt"
PREFERRED_VERSION_linux-stable-rt = "4.19%"

Also you should build the `​arduino-test-image` like the previous project (it’s the same image actually).

So, now let’s go to some fancy stuff. In the repo there is a tool in the linux-appfolder. You need to build this with the Yocto SDK or any other arm toolchain. This tool actually opens the i2c and spi devices and reads/writes data from them in a loop according to the options you’re passing in the command.

To make it a bit different, compared to the previous project, this time the SPI slave is the photo-resistor ADC and the I2C slave is the PWM LED (it was the opposite in the previous). Anyway, that doesn’t really matters, you can change that in the source code of the stm32f103 which is also available in the repo and you need also to build that and flash it on the mcu. Pretty much if you read the previous README file, it’s quite the same thing.

Benchmarks

I’ve performed the benchmarks with the stm32f103 running on 72MHz and 128MHz, too; but there wasn’t any difference at all really, as I’ve limited the SPI bandwidth to 30MHz. The reason for that was actually the cabling that causing a lot of errors above that frequency and it seems that the problem was the nanopi neo and not the stm32f103. Still, the results are interesting and I was able to get valuable information.

I’ve performed various tests. First with two different kernels, SMP and PREEMPT-RT. Then for each kernel I’ve tested a range of SPI frequencies (50KHz, 1MHz, 2MHz, 5MHz, 10MHz, 20MHZ, 30MHz). The provided tool does that automatically, actually. Then for all the above cases I’ve tested the kernel with no load, then with a light load and then with heavy load. The light load was, guess what? printf of course. Well, printf might sound silly, but in a while loop does the trick because the uart buffer fills up pretty quickly and then the kernel will have to flush the buffer and send the data. For the heavy load I’ve just used the Linux stress tool. I’ve also included a calc file in the repo with my results.

So let’s get to the fun stuff. No. Before that I want also to say that there were two kind of benchmarks the first was a pin on the stm32f103 which was toggling every time a full read/write cycle was performed. That means that the Linux app was reading the ADC value of the photoresistor from the stm32f013, then it was writing that value to the I2C PWM LED on the stm32f103. Every time this cycle was performed a pin from the stm32f103 is toggling state. Therefore by measuring the time of a single pulse you actually get the time of the cycle.

Before proceed these are the kernel versions for the SMP and PREEMPT-RT kernels:

Linux nanopi-neo 4.19.91-allwinner #1 SMP ... 15:26:48 UTC 2019 armv7l GNU/Linux
Linux nanopi-neo 4.19.82-rt30-allwinner #1 SMP PREEMPT RT ... 20:12:29 UTC 2019 armv7l GNU/Linux
SMP KERNEL

Let’s see the first three images. These are some oscilloscope probings with the SMP kernel and the SPI frequency at 500KHz, which is a very low frequency.

The first image is a zoom in on the stm32f103’s toggle pin. As I’ve said two toggles (or a pulse if you prefer) is a full read/write cycle, which means that in this time a 16-bit word on the SPI and a 3-bytes on the I2C are transferred. Because the I2C is much slower (100KHz) it has a strong affect on the speed compared to the SPI, but we try to emulate a real-life scenario here. In this case, the average cycle time is 475 μsecs (you see that the average is calculated for both low and high pulse).

The second and third screenshot display the toggle pin output when we’re running the light load with the printf. Wow! What’s going there? You see there are large gaps between the read/write cycles. Why? Well, that’s because of printf and the UART interface. UART is the dinosaur of the comm peripherals and its sloooow. In this case has only a 115200 bps baudrate. And why there are gaps, you’ll think. There are gaps because the printf is inside a while loop, which means that it fills the kernel UART buffer very fast and at some point the kernel needs to flush this buffer in order to create more space for the next bytes. During the flush is occupied with the UART peripheral which doesn’t support DMA in this case and there you have it… Huge gaps where the kernel flushes the UART buffer. We can extract valuable information from this, though. The middle picture shows that the kernel spends 328 ms to empty the UART buffer (see the BX-AX value on the top left corner). During this time you a gap. Then in the last picture you see that for the next 504 ms performs read/writes on the I2C/SPI. This behavior is with the default kernel affinity and scheduler priority per process.

Now let’s see the same output when the SPI is set to 30MHz which for the current setup seems to be the maximum frequency without getting errors.

In the first picture we see now that a full SPI/I2C read/write cycle takes 335 μsecs which is much faster compared to the 500KHz SPI speed. You can also see that the printf time is 550 ms and the SPI/I2C read/write time 218 ms, which means that the kernel uses the CPU for almost the same amount of time to empty the printf buffer, but also the SPI/I2C transactions are using the CPU for almost the half time. That seems that the kernel CPU time is tied to the SPI/I2C statistics.

Now let’s use the user-space tool to get some different numbers. In this case I’ll run the benchmark mode of the tool which counts the SPI/I2C read/write cycles per second. Each second is an iteration. Therefore, the tool also takes a parameter for how many iterations it will run. For example the following command, means that the tool will use /dev/i2c-0 and /dev/spidev-0 in benchmark mode (-m 1) and 20 iterations/runs/seconds (-r 20) and with the printf (light load) disabled (-p 0).

./linux-app -i /dev/i2c-0 -s /dev/spidev0.0 -m 1 -r 20 -p 0

After this test runs it will print some result, for example:

        SPI speed: 1000000 Hz (1000 KHz)
2249
2235
2243
2264
2256
2250
2246
2248
2250
2237
2231
2246
2241
2237
2253
2262
2260
2261
2261
2264
        SPI speed: 2000000 Hz (2000 KHz)
2257
2261
2253
2246
2261
2265
2259
2254
2251
2246
2241
2229
2261
2270
2269
2250
2244
2258
2263
2246
        SPI speed: 5000000 Hz (5000 KHz)
2269
2253
2281
2278
2284
2287
2277
2270
2263
2273
2256
2273
2266
2270
2285
2292
2272
2268
2276
2280
        SPI speed: 10000000 Hz (10000 KHz)
2272
2268
2275
2263
2278
2283
2295
2273
2269
2274
2280
2280
2274
2291
2265
2286
2294
2310
2290
2309
        SPI speed: 20000000 Hz (20000 KHz)
2291
2291
2317
2266
2294
2291
2306
2260
2289
2305
2285
2286
2298
2281
2288
2294
2278
2250
2298
2270
        SPI speed: 30000000 Hz (30000 KHz)
2307
2301
2271
2296
2304
2312
2292
2296
2301
2278
2296
2317
2309
2305
2282
2315
2290
2272
2305
2308

There you see that for each SPI speed the number of SPI/I2C read/write cycles are counted and printed. I won’t paste other data here, but I’ll use only the average values instead. You can have a look to the second sheet spreadsheet in the calc ods file for all the data.

So let’s see the average values when we use the benchmark mode, for 20 runs and the printf on and off.

SMP -m 1 -r 20 -p 0 -m 1 -r 20 -p 1
1MHz 2750.75 1561.55
2MHz 2843.75 1499.45
5MHz 2938.78 1427.05
10MHz 2936.2 1450.65
20MHz 2987 1902.6
30MHz 2986.6 1902.65

From the above table I make the following conclusions, there are almost twice as much SPI/I2C read/write cycles with the printf enabled in the loop and when using the SMP kernel. Wow, nobody expected that… And that when there no printf then after the 5MHz there’s no much difference in the number of cycles, but there is difference when the printf is enabled especially after the 20MHz. Anyway, as expected the faster the clock the more cycle counts.

But let’s now also enable a quite heavy load in the background and re-run those tests to see what happens. The full command I’ve used for the stress tools is:

stress --cpu 4 --io 4 --vm 2 --vm-bytes 128M

The above command means that the stress tool will utilize 4 cores, spawns 4 worker threads and two extra threads that spinning a malloc/free of 128MB each to utilize memory load. And these are the averages:

SMP -m 1 -r 20 -p 0 -m 1 -r 20 -p 1
1MHz 1733.7 1155.5
2MHz 1874.95 1186.9
5MHz 1760.65 1196.9
10MHz 1731.4 1154.65
20MHz 1698.7 1170.2
30MHz 1826.7 1298.75

Now with the background heave load we see of course a huge drop in the performance for the SMP kernel, for both cases with either the printf on or off. Here the frequency doesn’t really have a great impact, but it still performs better. Any increase in the performance that is more that the statistic error is welcome. Therefore even those 100-200 full read/write cycles are a better performance, it just doesn’t scale uniform as the previous  example that there wasn’t a background load.

Now let’s see the PREEMPT-RT kernel…

PREEMPT-RT KERNEL

Let’s have a look to a couple of oscilloscope probings like we did in case of the SMT kernel.

In this case the average time for a full SPI/I2C cycle is 476 μsec. You can also see that the kernel performs read/write cycles for 504 ms and also spends 324 ms to flush the UART buffer. I will make the conclusions about how the SMP and the PREEMPT-RT are compared in the next section, so I’m continuing with the rest of the benchmarks.

These are the two tables for the PREEMPT-RT kernel with the benchmark result as the previous example.

PREEMPT-RT -m 1 -r 20 -p 0  -m 1 -r 20 -p 1
1MHz 2249.7 1448.55
2MHz 2254.2 1444.65
5MHz 2273.89 1447.55
10MHz 2281.45 1457.95
20MHz 2286.9 1457.55
30MHz 2297.85 1458.7

So, this means that the light load that printf adds to the CPU has a huge affect on the performance, although the kernel is the real-time kernel. That’s expected though, because real-time doesn’t mean that you’ll get the performance from each process that you get it’s running as the only process in the CPU, it just means that the scheduler will be fair and the process is guaranteed to get a minimum amount of time to execute frequently. Therefore, the whole performance is affected from any additional load as in the case of the SMP.

Now let’s see what’s happening when there’s a heavy background load as before. So I’ve used the exact same parameters for the stress tool as before and these are the results I’ve got.

PREEMPT-RT -m 1 -r 20 -p 0   -m 1 -r 20 -p 1
1MHz 1815.15 1398.7
2MHz 1930.25 1443.35
5MHz 1963.55 1399.9
10MHz 1929.9 1441.5
20MHz 2045.65 1472
30MHz 2002.05 1442.65

So, what we see here? Wow, right? There’s really no big difference compared to the previous table. It seems that although the load now is much higher, the performance impact was quite low. Why’s that? Well, that’s what the RT kernel does, it makes sure that your process will get a fair time to run and it will preempt other processes frequently, so there’s no process that will occupy the CPU more time that the others. Again, the printf has a great impact, because the problem relies in the implementation and there’s no DMA to unload the CPU from the task of sending bytes over the UART.

Results

So let’s compare the results that we have from the two kernels and see what we got. I’ll create two new tables with the sum of the results for the light and heavy load. This is the table without the heavy background load.

SMP RT SMP+pr RT+pr
1MHz 2750.75 2249.7 1561.55 1448.55
2MHz 2843.75 2254.2 1499.45 1444.65
5MHz 2938.79 2273.89 1427.05 1447.55
10MHz 2939.2 2281.45 1450.65 1457.95
20MHz 2987 2286.9 1902.6 1457.55
30MHz 2986.6 2297.85 1902.65 1458.7

In this table we see that without any load, the SMP kernel is much faster compared to RT. That’s happening because the scheduler is not really fair, but gives as much as processing time to the SPI/I2C and the benchmark tool as the rest of the processes are idle. Quite the same happens for the RT without the load, but still the CPU is forced to switch between also other tasks and processes that don’t have much to do, so the scheduler is more “fair”.

On the next two columns, the impact of the printf in the while loop has a huge affect on both kernels. Nevertheless, the SMP kernel gives more processing time to the benchmark tool and the SPI/I2C interfaces, therefore the SMP has approx. 450 more read/write cycles more in higher frequencies.

Another thing is obvious from the table is that the SPI/I2C read/writes scale with the frequency increment and the RT kernel is not. So for the RT kernel it doesn’t matter if the SPI bus is running on 1MHz or 30MHz. Cool, right? So that means that if you’re running on a RT kernel you don’t have to worry in optimizing your SPI to achieve the max frequency, because it doesn’t make any difference. But on the SMP you should definitely do any optimizations.

So in this case, it seems that the SMP kernel is much, much better for such use scenarios. What are those scenarios? Well, SPI displays are definitely one of those, for example. And this most probably is the same for every other peripheral that demands a high throughput (e.g. pcie, USB, e.t.c.)

Now let’s go to the next table that the benchmark is running with a heavy load in the background.

SMP RT SMP+pr RT+pr
1MHz 1733.7 1815.15 1155.5 1398.7
2MHz 1874.95 1930.25 1186.9 1443.35
5MHz 1760.65 1963.55 1196.9 1399.9
10MHz 1731.4 1929.9 1154.65 1441.5
20MHz 1698.7 2045.65 1170.2 1472
30MHz 1826.7 2002.05 1298.75 1441.65

Wait, what? What happened here? In all benchmarks the RT kernel not only scores higher, but also if you see the full table in the calc file, you’ll see that there is smooth and consistent performance between each SPI/I2C read/write cycle for the RT kernel. The SMP kernel from the other hand, has a great variation between the cycles and also the average performance is lower. The performance difference between the SMP and RT is not huge, but its substantial. Who doesn’t want 100,200 or even 300 more SPI/I2C read/write cycles per second, right?

So what happened here? Well, as I’ve mentioned before, the RT scheduler is fair. Therefore, for the RT kernel you get the almost the same performance as you get with a lower load, because the kernel will more or less assign the CPU for the same amount of time. But, the performance on the SMP is getting a great impact, because now the kernel needs to assign more time to other processes that may need the kernel for more time. Hence, this difference between the last two tables.

OK, so what’s better then? What should I use? Which is better? Well… that depends. What are your needs? What your device is doing? For example, if you want to drive an SPI display with the max framerate possible then forget about RT, but on the same time make sure that there’s no other processes in your system that load the CPU that much, because then your framerate will drop even more compared to the RT kernel. Then why use the RT kernel? You would use the RT kernel if your system needs to perform specific tasks in a predictable way, even under heavy load. An example of that for example is audio or let’s say that you drive motors and you need minimum latency under every circumstance (no load, mid load, high load). In most cases the SMP kernel is what you need when a lot of IOs and a high throughput is needed and also almost every other case, except when you need low latency and predictable execution.

Another thing needs notice here is that the RT kernel is not just a plug n play thing that you boot in you OS and everything works just as fine as with SMP. Instead, there may a lot of underlying issues and bugs in there, that may have an undefined behaviour which is not triggered with the SMP kernel. This means that some drivers, subsystems, modules or interfaces, even a hardware may don’t be stable with the RT kernel. Of course, the same goes for the SMP, but at least the SMP is used much more widely and those issues are come to the surface and fixed sooner, compared to the RT kernel. Also, if your kernel is not a mainline kernel then it’s a hard and long process to convert it to a fully PREEMPT-RT kernel, as the patches for the RT kernel are targeting the mainline kernel only. So until all the PREEMP-RT patches become mainline and also we get to the point that your hardware supports those mainline versions, might take a looong time.

This post is just a stupid project and is not meant to be an extensive review, benchmark or versus between the SMP and the PREEMPT-RT. Don’t forget where you are. This is a stupid projects blog. And for that reason let’s see the SPI protoresistor and I2C PWM LED in action.

Linux and the I2C and SPI interfaces

Intro

Most of the people that reading that blog, except that they don’t have anything more interesting to do, are probably more familiar with the lower embedded stuff. I like the baremetal embedded stuff. Everything is simple, straight-forward, you work with registers and close to the hardware and you avoid all the bloatware between the hardware and the more complicated software layers, like an RTOS. Also, the code is faster, more efficient and you have the full control of everything. When you write a firmware, you’re the God.

And then an RTOS comes and says, well, I’m the god and I may give you some resources and time to spend with my CPU to run your pity firmware or buggy code. Then you become a semi-god, at best. So Linux is one of those annoying RTOSes that demotes you to a simple peasant and allows you to use only a part of the resources and only when it decides to. On the other hand, though, it gives you back a lot more benefits, like supporting multiple architectures and have an API for both the kernel and the user-space that you can re-use among those different architectures and hardware.

So, in this stupid project we’ll see how we can use a couple of hardware interfaces like I2C and SPI to establish a communication between the kernel and an external hardware. This will unveil the differences between those two worlds and you’ll see how those interfaces can be used in Linux in different ways and if one of those ways are better than the other.

Just a note here: I’ve tested this project in two different boards, on a raspberry pi 3 model B+ and on a nano-pi neo. Using the rpi is easier for most of the people, but I prefer the nano-pi neo as it’s smaller and much cheaper and it has everything you need. Therefore, in this post I will explain (not very thorough) how to make it work on the rpi, but in the README.md file in the project repo, you’ll find how to use Yocto to build a custom image for the nano-pi and do the same thing. In case of the nano-pi you can also use a distro like armbian and build the modules by using the sources of the armbian build. There are so many ways to do this, so I’ll only focus one way here.

Components

Nanopi-neo

I tried to keep everything simple and cheap. For the Linux OS I’ve chosen to use the nanopi-neo board. This board costs around ~$12 and it has an Allwinner H3 cpu @800 or @1200MHz and a 512MB RAM. It also has various other interfaces, but we only care about the I2C and the SPI. This is the board:

You can see the full specs and pinout description here. You will find the guide how to use the nano-pi neo in the repo README.md file in here:

https://bitbucket.org/dimtass/linux-arduino-spi-i2c/src/master/

Raspberry pi

I have several raspberry pi flying around here, but in this case I’ll use the latest Raspberry Pi 3 Model B+. That way I can justify to my self that I bought it for a reason and feel a bit better. In this guide I will explain how to make this stupid project with the rpi, as most of the people have access to this rather to a nano pi.

Arduino nano

Next piece of hardware is the arduino-nano. Why arduino? Well, it’s fast and easy, that’s why. I think the arduino is both bless and curse. If you have worked a lot with the baremetal stuff, arduino is like miracle. You just write a few line of codes and everything works. On the other hand, it’s also a trap. If you write a lot of code in there, you end up losing the reality of the baremetal and you become more and more lazy and forget about the real hardware. Anyway, because there’s no much time now, the Arduino API will do just fine! This is the Arduino nano:

Other stuff

You will also need a photoresistor, a LED and a couple of resistors. I’ll explain why in the next section. The photoresistor I’m using is a pre-historic component I’ve found in my inventory and the part name is VT33A603/2, but you can use whatever you have, it doesn’t really matter. Also, I’m using an orange led with a forward voltage of around 2.4V @ 70mA.

Project

OK, that’s fine. But what’s the stupid project this time? I’ve though the most stupid thing you can build and would be a nice addition to my series of stupid projects. Let’s take a photo-resistor and the Arduino mini and use an ADC to read the resistance and then use the I2C interface as a slave to send the raw resistance value. This value actually the amount of the light energy that the resistor senses and therefore if you have the datasheets you can convert this raw resistance to a something more meaningful like lumens or whatever. But we don’t really care about this, we only care about the raw value, which will be something between 0 and 1023, as the avr mega328p (arduino nano) has a 10-bit ADC. Beautiful.

So, what we do with this photo-resistor? Well, we also use a PWM channel from the Arduino nano and we will drive a LED! The duty cycle of the PWM will control the LED brightness and we feed the mega328p with that value by using the SPI bus, so the Arduino will be also an SPI slave. The SPI word length will be 16-bit and from those only the 10-bit will be effective (same length as the ADC).

Yes, you’ve guessed right. From the Linux OS, we will read the raw photo-resistor value using the I2C interface and then feed back this value to the PWM LED using the SPI interface. Therefore, the LED will be brighter when we have more light in less bright in dark conditions. It’s like the auto-brightness of your mobile phone’s screen? Stupid right? Useless, but let’s see how to do that.

Connections

First you need to connect the photo-resistor and the LED to the Arduino nano as it’s shown in the next schematic.

As you can see the D3 pin of the Arduino nano will be the PWM output that drives the LED and the A3 pin is the ADC input. In my case I’ve used a 75Ω resistor to drive the LED to increase the brightness range, but you might have to use a higher value. It’s better to use a LED that can handle a current near the maximum current output of the Arduino nano. The same goes for the resistor that creates a voltage divider with the photo-resistor; use a value that is correct for your components.

Next you need to connect the I2C and SPI pins between the Arduino and the rpi (or nanopi-neo). Have in mind that the nano-pi neo has an rpi compatible pinout, so it’s the same pins. These are the connections:/

Signal Arduino RPI (and nano-pi neo)
/SS D10 24 (SPI0_CS)
MOSI D11 19 (SPI0_MISO)
MISO D12 21 (SPI0_MOSI)
SCK D13 23 (SPI0_CLK)
SDA A4 3 (I2C0_SDA)
SCL A5 5 (I2C0_SCL)

You will need to use two pull-up resistors for the SDA and SCL. I’ve used 10ΚΩ resistors, but you may have to check with an oscilloscope to choose the proper values.

Firmwares

You need to flash the Arduino with the proper firmware and also boot up the nanopi-neo with a proper Linux distribution. For the second thing you have two option that I’ll explain later. So, clone this repo from here:

https://bitbucket.org/dimtass/linux-arduino-spi-i2c/src/master/

There you will find the Arduino sketch in the arduino-i2c-spi folder. Use the Arduino IDE to build and to upload the firmware.

For the rpi download the standard raspbian strech lite image from here and flash an SD card with it. Don’t use the desktop image, because you won’t need any gui for this guide and also the lite image is faster. After you flash the image and boot the board there are a few debug tweaks you can do, like remove the root password and allow root passwordless connections. Yeah, I know it’s against the safety guidelines, don’t do this at your home and blah blah, but who cares? You don’t suppose to run your NAS with your dirty secrets on this thing, it’s only for testing this stupid project.

To remove the root password run this:

passwd -d root

Then just for fun also edit your /etc/passwdfile and remove the xfrom the root:x:... line. Then edit your /etc/ssh/sshd_config and edit it so the following lines are like that:

PermitRootLogin yes
#PubkeyAuthentication yes
PasswordAuthentication yes
PermitEmptyPasswords yes
UsePAM no

Now you should be able to ssh easily to the board like this:

ssh root@192.168.0.33

Anyway, just flash the arduino firmware and the raspbian distro and then do all the proper connections and boot up everything.

Usage

Here’s the interesting part. How can you retrieve and send data from the nanopi-neo to the Arduino board? Of course, by using the I2C and the SPI, but how can you use those things inside Linux. Well, there are many ways and we’ll see a few of them.

Before you boot the raspbian image on the rpi3, you need to edit the /boot/config.txt and add this in the end of the file, in order to enable the uart console.

enable_uart=1
Raw access from user space using bash

Bash is cool (or any other shell). You can use it to read and write raw data from almost any hardware interface. From the bitbucket repo, have a look in the bash-spidev-example.sh. That’s a simple bash script that is able to read data from the I2C and then send data to the SPI using the spidev module. To do that the only thing you need to take care of is to load the spidev overlay and install the spi-tools. The problem with the debian stretch repos is that the spi-tools is not in the available packages, so you need to either build it yourself. To do this, just login as root and run the following commands on the armbian shell:

apt-get install -y git
apt-get install -y autotools-dev
apt-get install -y dh-autoreconf
apt-get install -y i2c-tools
git clone https://github.com/cpb-/spi-tools.git
cd spi-tools
autoreconf -fim
./configure
make
make install

Now that you have all the needed tools installed you need to enable the i2c0 and the spidev modules on the raspberry pi. To do that add this run the raspi-config tool and then browse to the Interfacing options and enable both I2C and SPI and then reboot. After that you will be able to see that there are these devices:

/dev/spidev0.0
/dev/i2c-1

 

This means that you can use one of the bash scripts I’ve provided with the repo to read the photo-resistor value and also send a pwm value to the LED. First you need to copy the scripts from you workstation to the rpi, using scp (I assume that the IP is 192.168.0.36).

cd bash-scripts
scp *.sh root@192.168.0.36

Before you run the script you need to properly set up the SPI interface, using the spi-config tool that you installed earlier, otherwise the default speed is too high. To get the current SPI settings run this command:

pi-config -d /dev/spidev0.0 -q

If you don’t get the following then you need to configure the SPI.

/dev/spidev0.0: mode=0, lsb=0, bits=8, speed=10000000, spiready=0

To configure the SPI, run these commands:

spi-config -d /dev/spidev0.0 -m 0 -l 0 -b 8 -s 1000000
./bash-spidev-example.sh

With the first command you configure the SPI in order to be consistent with the Arduino SPI bus configuration. Then you run the script. If you see the script then the sleep function is sleep for 0.05 secs, or 50ms. We do that for benchmark. Therefore, I’ve used the oscilloscope to see the time difference every SPI packet and the average is around 66ms (screenshots later on), instead of 50. Of course that includes the time to read from the I2C and also send to the SPI. Also, I’ve seen a few I2C failures with the following error:

mPTHD: 0x036e
mError: Read failed
PTHD: 0x
n\PTHD: 0x036d
xPTHD: 0x036e

Anyway, we’ve seen that way that we are able to read from the I2C and write to the SPI, without the need to write any custom drivers. Instead we used the spidev module which is available to the mainline kernel and also a couple of user-space tools. Cool!

Using a custom user-space utility in C

Now we’re going to write a small C utility that opens the /dev/i2c-1 and the /dev/spidev0.0 devices and read/writes to them like they are files. To do that you need to compile a small tool. You can do that on the rpi, but we’ll need to build some kernel modules later so let’s use a toolchain to do that.

The toolchain I’ve use is the `gcc-linaro-7.2.1-2017.11-x86_64_arm-linux-gnueabihf` and you can download it from here. You may seen that this is a 32-bit toolchain. Yep, although rpi is a 64-bit cpu, raspbian by default is 32-bit. Then just extract it to a folder (this folder in my case is /opt/toolchains) and then you can cross-build the tool to your desktop with these commands:

export CC=/opt/toolchains/gcc-linaro-7.2.1-2017.11-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf-gcc
${CC} -o linux-app linux-app.c
scp linux-app root@192.168.0.36:/root

Then on the rpi terminal run this:

./linux-app /dev/i2c-1 /dev/spidev0.0

Again in the code I’ve used a sleep of 50ms and this time the oscilloscope shows that the average time between SPI packets is ~50ms (screenshots later on), which is a lot faster compared to the bash script. In the picture you see the average is 83, but that’s because sometimes there are some delays because of the OS like 200+ms, but that’s quite expected in a non PREEMPT-RT kernel. Also, I’ve noticed that there was no missing I2C packet with the executable. Nice!

You won’t need the spidev anymore, so you can run the raspi-config and disable the SPI, but leave the I2C as it is. Also, in the /boot/config.txt make sure that you have those:

dtparam=i2c_arm=on
#dtparam=i2s=on
#dtparam=spi=off
enable_uart=1

Now reboot.

Use custom kernel driver modules

Now the more interesting stuff. Let’s build some drivers and use the device-tree to load the modules and see how the kernel really handles these type of devices using the IIO and the LED subsystems. First let’s build the IIO module. To do that you need first to set up the rpi kernel and the cross-toolchain to your workstation. To do that you need first to get the kernel from git and run some commands to prepare it.

git clone https://github.com/raspberrypi/linux.git
cd linux

Now you need to checkout the correct hash/tag. To do this run this command to rpi console.

uname -a

In my case I get this:

Linux raspberrypi 4.14.79-v7+ #1159 SMP Sun Nov 4 17:50:20 GMT 2018 armv7l GNU/Linux

That means that the date this kernel was build was 04.11.2018. Then in the kernel repo to your workstation, run this:

git tag

And you will get a list of tags with dates. In my case the `tag:raspberrypi-kernel_1.20181112-1` seems to be the correct one, so check out to the one that is appopriate for your kernel, e.g.

git checkout tag:raspberrypi-kernel_1.20181112-1

Then run these commands:

export KERNEL=kernel7
export ARCH=arm
export CROSS_COMPILE=/opt/toolchains/gcc-linaro-7.2.1-2017.11-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf-
export KERNEL_SRC=/rnd2/linux-dev/rpi-linux
make bcm2709_defconfig
make -j32 zImage modules dtbs

This will build the kernel, the modules and the device-tree files and it will create the Module.symversthat is needed to build our custom modules. Now on the same console (that the above parameters are exported) run this from the repo top level.

cd kernel_iio_driver
make
dtc -I dts -O dtb -o rpi-i2c-ard101ph.dtbo rpi-i2c-ard101ph.dts
scp ard101ph.ko root@192.168.0.36:/root/
scp rpi-i2c-ard101ph.dtbo root@192.168.0.36:/boot/overlays/

Now, do the same thing for the led driver:

cd kernel_led_driver
make
dtc -I dts -O dtb -o rpi-spi-ardled.dtbo rpi-spi-ardled.dts
scp ardled.ko root@192.168.0.36:/root/
scp rpi-spi-ardled.dtbo root@192.168.0.36:/boot/overlays/

And then run these commands to the rpi terminal:

mv ard101ph.ko /lib/modules/$(uname -r)/kernel/
mv ardled.ko /lib/modules/$(uname -r)/kernel/drivers/leds/drivers/iio/light/
depmod -a

And finally, edit the /boot/config.txtfile and make sure that those lines are in there:

dtparam=i2c_arm=on
#dtparam=i2s=on
dtparam=spi=off
enable_uart=1
dtoverlay=rpi-i2c-ard101ph
dtoverlay=rpi-spi-ardled

And now reboot with this command:

systemctl reboot

After the reboot (and if everything went alright) you should be able to see the two new devices and also be able to read and write data like this:

cat /sys/bus/iio/devices/iio\:device0/in_illuminance_raw
echo 520 > /sys/bus/spi/devices/spi0.0/leds/ardled-0/brightness

Have in mind, that this is the 4.14.y kernel version and if you’re reading this and you have a newer version then a lot of things might be different.

So, now that we have our two awesome devices, we can re-write the bash script in order to use those kernel devices now. Well, the script is already in the bash-scripts folder so just scp the scripts and run this on the rpi:

./bash-modules-example.sh

The oscilloscope now shows an average period of 62.5ms, which is a bit faster compared to the raw read/write from bash in the first example, but the difference is too small to be significant.

Conclusions

Let’s see some pictures of the various oscilloscope screenshots. The first one is from the bash script and the spidev module:

 

The second is from the linux-app program that also used spidev and the /dev/i2c.

And the last one is by using the kernel’s iio and led subsystems.

So let’s make some conclusions about the speed in those different cases. It’s pretty clear that writing an app in the user space using the spidev and the /dev/spi is definitely the way to go as it’s the best option in terms of speed and robustness. Then the difference between using a bash script to read/write to the bus with different type of drivers [spidev & /dev/i2c] vs [leds & iio] is very small.

Then, why writing a driver for iio and leds in the first place if there’s no difference in performance? Exactly. In most of the cases it’s muck easier to write a user-space tool to control this kind of devices instead of writing a driver.

Then are those subsystems useless? Well, not really. There are useful if you use them right.

Let’s see a few bullet points, why writing a user-space app using standard modules like the spidev is good:

  • No need to know the kernel internal
  • Independent from the kernel version
  • You just compile on another platform without having to deal with hardware specific stuff.
  • Portability (pretty much a generalization of the above)
  • If the user space app crashes then the kernel remains intact
  • Easier to update the app than updating a kernel module
  • Less complicate compared to kernel drivers

On the other hand, writing or having a subsystem driver also has some nice points:

  • There are already a lot of kernel modules for a variety of devices
  • You can write a user space software which is independent from the hardware. For example if the app accesses an iio device then it doesn’t have to know how to handle the device and you can just change the device at any time (as long it’s compatible in iio terms)
  • You can hide the implementation if the driver is not a module but fixed in the kernel
  • It’s better for time critical operations with tight timings or interrupts
  • Might be a bit faster compare to a generic driver (e.g. spidev)
  • Keeps the hardware isolated from the user-space (that means that the user-space devs don’t have to know how to deal with the hardware)

These are pretty much the generic points that you may read about generally. As a personal preference I would definitely go for a user-space implementation in most of the cases, even if a device requires interrupts, but it’s not time critical. I would choose to write a driver only for very time critical systems. I mean, OK, writing and know how to write kernel drivers it’s a nice skill to have today and it’s not difficult. In the bottom line, I believe most of the times, even on the embedded domain, when it comes to I2C and SPI, you don’t have to write a kernel driver, unless we talking about ultra fast SPI that is more than 50MHz with DMAs and stuff like that. But there are very few cases that really needs that, like audio/video or a lot of data. Then in 95% of the rest cases the spidev and the user-space is fine. The spidev is even used for driving framebuffers, so it’s proven already. If you work in the embedded industry, then probably you know how to do both and choose the proper solution every time; but most of the time on a mainstream product you may choose to go with a driver because it’s “proper” rather “needed”.

Anyway, in this stupid project you pretty much seen how the SPI and I2C devices are used, how to implement your own I2C and SPI device using an Arduino and then interface with it in the Linux kernel, either by using the standard available drivers (like spidev) or by writing your custom subsystem driver.

Finally, this is a video where both kernel modules are loaded and the bash script reads the photo-resistor value every 50ms via I2C and then writes the value to the brightness value of the led device.

Have fun!