Adding armbian supported boards to meta-sunxi Yocto (updated)

Intro

Yocto is the necessary evil. Well, this post is quite different from the others, because it’s more related with stuff I do for living, rather fun. Still, sometimes they can be fun, too; especially if you do something other than support new BSPs. Therefore, this post is not much about Yocto. If you don’t know what Yocto is, then you need to find another resource as this is not a tutorial. On the other hand if you’re here because the title made you smile then go on.

Probably you’re already know about the allwinner meta layer, meta-sunxi. Although sunxi is great and they’ve done a great job the supported boards are quite limited. On the other hand, armbian supports so many boards! But if you’re a Yocto-man then you know that this doesn’t help much. Therefore, I thought why not port the u-boot and kernel patches to the meta-sunxi layer and build images that support the same allwinner boards as armbian?

And the result was this repo that does exactly that. Though it’s still a work in progress.

https://bitbucket.org/dimtass/meta-allwinner-hx/

This repo is actually a mix of the meta-sunxi and armbian and only supports H2, H3 and H5 boards from nanopi and orange-pi. The README.mdis quite detailed, so you don’t really need to read the rest post to bring it up and build your images.

More details please?

Yes, sure. Let’s see some more details. Well, most of the hard work is already done on armbian and meta-sunxi. In the armbian build, they have implemented a script to automatically patch the u-boot and the kernel and also all the patches are compatible with their patch system. Generally, the trick with armbian is that actually deletes most of the files that it touches and apply new ones, instead of patching each file separately. Therefore, the patches sizes are larger but on the other hand is much easier to maintain. It’s a neat trick.

The script that is used in armbian to apply the patches is in lib/compilation.sh. There you’ll find two functions, advanced_patch() and process_patch_files() and these are the ones that we would like to port to the meta-sunxi layer. Other than that, armbian uses the files in config/boards/*.conf to apply the proper patches (e.g. default, next, dev). Those are refer to the  patch/kernel. There, for example, you’ll find that sunxi has the sunxi-dev, sunx-next and sunxi-next-old folders and inside each folder there are some patches. If you build u-boot and and kernel for a sunxi-supported board, then you’ll find in the output/debug/output.log and output/debug/patching.log which patches are used for each board.

Therefore, I’ve just took the u-boot and kernel patches from armbian and implemented the patching system in the meta layer. To keep things simple I’ve added the patch functions in both u-boot and kernel, instead of implement a bbclass that could handle both. Yeah, I know, Yocto has a lot of nice automations, but some time it doesn’t worth the trouble… So in my branch you’ll find the do_patch.sh script in both recipes-bsp/u-boot/do_patches.shand recipes-kernel/linux/linux-stable/do_patch.sh. Both script share the same code, which is the code that is used also from armbian. The patches for u-boot and the kernel are in a folder called patches in the same path with the scripts.

Last but not least, I’ve also to add the option to create .wic.bz2 and .bmap images. Please use them. If you want lightning-fast speed when you flash images to SD card or eMMC.

Conclusion

If you want to use Yocto to build custom destributions/images for allwinner H2, H3 and H5, then you can use this meta layer. This is just a mix of the meta-sunxi layer and the patch system from armbian, which offers a much more wider board support. For now I’ve ported most of the nano-pi boards that use H2, H3 and H5 cpus and soon I’ll do the same for the orange-pi board (update: done). For the rest boards (A10 etc) you can still use the same layer.

Also support for wic images and bmap-tools is a good to have, so use it wherever you can.

Have fun!

Driving an ILI9341 LCD with an overclocked stm32f103 (updated)

Intro

LCDs… I think LEDs and LCDs are probably the most common wet dream of people who like playing with embedded. I mean, who doesn’t like blinking LEDs and furthermore who doesn’t like to draw graphics with a microcontroller on an LCD? If you don’t press Alt+F4 now please.

LEDs are easy. Toggling a pin or flashing a LED is the first breath of every project. There you know that your microcontroller’s heart is beating properly and it doesn’t have arrhythmia. Then, driving a character LCD is a bit more harder, but still easy; but driving a graphic LCD is definitely more harder, especially if you’re starting from the scratch. Do you have to start from the scratch, though? Nah… There are many projects out there and also this is another one.

OK, so what’s the motivation behind this, if there are so many projects out there? For me it was the fact that I don’t like to use all these arduino-like stuff with my STMs, I don’t like HAL and I couldn’t find a proper cmake project that builds out of the box, without special dependencies like specific IDEs, compilers e.t.c. With this project you just download cmake and a gcc compiler, point the cmake toolchain to your gcc compiler and run build. Then it works… (maybe)

Of course, if you are regular customer here, there’s no need to say that this is a completely stupid project. It does nothing.  No, really. You won’t see any fancy graphics that other people post with their STMs in youtube. You’ll see just a yellow screen. Why? Because I just wanted to benchmark and have a template to use for any other project.

Note: I’ve updated the code and post, because I’ve added support for the xpt2046/ads7843 touch controller. I’ve used SPI2 with DMA to read the touch sensor and also I’m using the /PENIRQ interrupt pin and not polling.

Overclocking, SPI, DMA and other fancy buzzwords

If you found this by searching on the web, then you’re probably here because you know exactly what you want. SPI & DMA!!1! The reason that I like bluepill stm32 boards is that the have a lot of DMA channels and they are dirt-cheap. On top of that you can overclock it up to 128MHz!

So, why DMA is important? Well, I won’t bother you with the details here, if you need to know more than it’s much much faster, then you need to do some web-searching for the specific details, as there are people that already explain these stuff better than me. The fact is that by using DMA on SPI’s tx/rx the transfer speed sky-rockets and you can achieve the maximum available bandwidth.

On the other hand, overclocking is more interesting. The stm32f103 can be easily overclocked by just changing the PLL value. Then you can increase the main clock from 72MHz to 128MHz. Not bad, right? Especially if you think that a project which drives a graphic LCD will benefit a lot from this speed increase. I assume that you’re not crazy enough to do that on a commercial project, but if you do then you’re my hero.

In this project I’ve done some benchmarking with two different system clocks @72MHz and @128MHz and there’s a significant difference as you see later in the benchmarks.

Components

STM32

I’m using an stm32f103c8t6 board (bluepill). These modules cost less than €2 in ebay and you may have already seen me using them also in other stupid projects.

ILI9341

There is a very standard LCD module that you find in ebay and it costs around $7. It’s 2.8″ TFT, supports 240×320 resolution, it has a touch interface and an sd card holder. The part name is TJCTM24028-SPI and is the following one:

It’s a beauty, right?

USB-uart module

You need this to print the FPS count every second and also if you want to add your custom commands. You can find these on ebay with less than€1.50 and it looks like that

ST-Link

Finally, you need an ST-Link programmer to upload the firmware, like this one:

Or whatever programmer you like to use.

Pinout connections

As the LCD is not wireless you need to connect it somehow to your stm32. We’re lucky in this case, because both LCD and the stm32 have conductive pins that if they’re connected with each other in the proper way then it may work. The connections you need to do are:

STM32 ILI9341 (LCD)
PA0 LED
PA2 RESET
PA3 D/C
PA4 CS
PA5 SCK
PA6 SDO(MISO)
PA7 SDI(MOSI)
3.3 VCC
GND GND
STM32 ILI9341 (touch conrtoller)
PB8 T_IRQ
PB14 T_DO
PB15 T_DIN
PB12 T_CS
PB13 T_CLK
STM32 UART module
PA9 (TX) RX
PA10 (RX) TX
GND GND

You can power the stm32 from the USB connector.

Project source code

You can download the project source code from here:

https://bitbucket.org/dimtass/stm32f103-ili9341-dma/src/master/

All you need to do is install (or have already installed) a gcc toolchain for ARM. I’m using the gcc-arm-none-eabi-7-2017-q4-major, that you can find here. Just scroll down a bit, because there are newer toolchains; but from the tests I’ve done here, it seems this produces the most compact code. Then depending your OS, change the path of the toolchain in the TOOLCHAIN_DIR variable in the project’s file cmake/TOOLCHAIN_arm_none_eabi_cortex_m3.cmake. Last, run ./build.sh on Linux or build.cmd on Windows to build and then flash the bin/hex on the stm32.

Results

The SPI speed of the stm32f103 by using DMA can achieve an SPI clock up to 36MHz when the mcu is running on the default highest frequency which is 76MHz. That’s really fast already compared to other Cortex-M3 mcus with the same frequency (even faster mcus). To use the default 72MHz clock you need to comment out line 47 in main.c here, otherwise the clock will set to 128MHz.

This is a capture of CLK/MOSI when sending the byte 0xAA.

With the above clock settings, the stm32f103 achieves 29 FPS when drawing all the pixels of the screen.

By overclocking the mcu to 128MHz the SPI/DMA speed is much higher. To enable the oveclocking you need to un-comment line 47 here (SystemCoreClock = overclock_stm32f103();), which by default is already enabled.

This is a capture of CLK/MOSI when sending the byte 0xAA.

Now you can see that the SPI clock frequency is now ~63MHz, which is almost the double as the previous. That means that the updating of all the pixels on the screen can be done in a rate of 52 fps, which is quite amazing for this €2 board.

Touch controller (UPDATE)

I’ve lost my sleep knowing that I didn’t implemented the touch control interface, therefore I’ve updated the code and added support for the touch controller and also a calibration routine.

So, now there are two function modes. The first one (and default) is the benchmark mode and the other mode is the calibration. During the calibration mode you can calibrate the sensor and you need to do that if you want to retrieve pixel x/y values. Without calibration you’ll only get the adc sensor values, which may or may not be what you want.

To enter the calibration mode you need to send a command to the uart port. The supported commands are the following:

MODE=<MODE>
     BENCH : Benchmark mode
     CALIB : Calibration mode

FPS=<MODE>
    0 : Disable FPS display
    1 : Tx FPS on UART
    2 : Tx FPS on UART and on display

TOUCH=<MODE>
      0 : Do not Tx X/Y from touch to UART
      1 : Tx X/Y from touch to UART

The default values are MODE=BENCH, FPS=0, TOUCH=0. Therefore to enter to the calibration mode send this command to the UART: MODE=CALIB.

The calibration routine is very simple, so do not expect fancy handling in there. Even the switch statement in `ili9341_touch_calib_start()` is not needed as it’s a completely serial process. I was about to implement a state machine, but it didn’t worth it, but I left the switch in there.

So, when you enable the calibration mode, you’ll get this screen.

Then you need to press the center of the cross. Behind the scenes, this code is in the ili9341_touch_calib.c file, inside the ili9341_touch_calib_start() function, which at the state STATE_DRAW_P1 draws the screen and then waits in STATE_WAIT_FOR_P1 for the user to press the cross. I’ve  added bouncing control in the  xpt2046_polling() function, but generally the xpt2046 library doesn’t have. So the xpt2046_update() function which updates the static X/Y variables of the lib doesn’t have de-bouncing. The reason for this is that this provides a generic solution and some times de-bouncing is not always wanted, so if it does it can be implemented easily.

Anyway after pressing the point on the screen then other 2 points will be appear and after that the code will calculate the calibration data. The calibration is needed if you want to get the touch presses expressed in pixels that correspond to the screen pixels. Otherwise, the touch sensor only returns 12-bit ADC values, which are not very useful if you need to retrieve the pixel location. Therefore, by calibrating the touch sensor surface to the LCD, you can get the corresponding pixels of the pixels that are just under the pressure point.

The algorithm for that is in the touch_calib.c file and it actually derives from an application note from TI which is called “Calibration in touch-screen systems” and you can probably find it in this link. The only thing worth a note is that there are two main methods for calibration which are the 3 and 5-point. Most of the times you find the 5-point calibration method, but also the 3-point gives a good result. There a calc file in the sources source/libs/xpt2046-touch/calculations.ods that you can use to emulate the algorithm results. In this file the (X’1,Y’1), (X’2,Y’2), (X’3,Y’3) are the points that you read from the sensor (12-bit ADC values) and (X1,Y1), (X2,Y2), (X3,Y3) are the pixels in the center of each cross you draw on the screen. Therefore, in the ods file I’ve just implemented the algorithms of the pdf file. The same algorithms are in the code.

After you finish with the calibration, the calibration data are printed out in the serial port and also are stored in the RAM and they are used for the touch presses. After a reset those settings are gone. If you want to print the ADC and pixel values (after the calibration) on the serial port then you can send the TOUCH=1 command on the UART. After that every touch will display both values.

Finally, if you enable to show the FPS on the LCD you will see something weird.

This displays the FPS rate of drawing all the pixels on the screen and you see a tearing in the number, which is 51. This happens because the number print overwriting that part of the screen when is displayed and also it gets overwritten from the full display draw. The reason for that is to simplify the process and also do not insert more delay while doing the maths.

Generally, I think the code is straight forward and it serves just a template for someone to start writing his code.

Conclusion

stm32f103, rocks. That’s the conclusion. Of course, getting there is not as easy as using an arduino or a teensy board, but the result pays off the difficulty of getting there. And also when you get there is not difficult anymore.

Again, this is a really stupid and completely useless project. You won’t see any fancy graphics with this code, but nevertheless is a very nice template to create your own graphics by using an optimized SPI/DMA code for the stm32f103 and also with the overclocking included. That’s a good deal.

Of course, I didn’t wrote the whole code from the scratch, I’ve ported some code from here which is a port from the Adafruit library from the arduino. So it’s a port of a port. FOSS is a beautiful thing, right?

Have fun!

Added macros to CuteCom

Intro

The last years I’m only using Linux to my workplace and since I’ve also started using only Linux at home, too; I’ve found my self missing some tools that I was using with Windows. That’s pretty much the case with everyone that at some point tries or tried the same thing. Gladly, because many people already do this more and more the last years, there are many alternatives for most of the tools. Alternatives can be either better or worse than the tool you we’re using, of course, but the best thing with FOSS is that you can download the code and implement all the functionality you’re missing yourself. And that’s great.

Anyway, one the tools I’ve really missed is br@y’s terminal. I assume that every bare metal embedded firmware developer is aware of this amazing tool. It’s everything you need when you develop firmware for a micro-controller. For embedded Linux I prefer putty for serial console, though. Anyway, this great tool is only for Windows and although you can use Wine to run it on Linux, soon you’ll find out that when you develop USB CDC devices then the whole wine/terminal thing doesn’t work well.

CuteCom

There are many alternatives for Linux (console and gui) terminal apps and I’ve used most of those you can find in the first 7 pages of google results. But after using Bray’s terminal for so many years, only one of them seem to be close enough to that; and that’s CuteCom. The nice thing with CuteCom is that it’s written with Qt, so it’s a cross-platform app and also Qt is easy and nice to write code.

Tbh, I’m familiar with Qt since Trolltech era and after the Nokia era. I’ve written a lot of code in Qt, especially for the Nokia n900 phone and Maemo OS. But since Nokia abandoned Maemo and also Meego (former Tizen), I’ve started doing other stuff. I was really disappointed back then, because I believe n900 and Maemo could be the future until everything went wrong and Nokia abandoned everything and adopt Windows for their mobiles. I’ll moan another time for how much Microsoft loves Linux.

Anyway, Qt may also affected my decision to go with CuteCom, but the problem was that the functionality that I was using most from Bray’s terminal wasn’t there. And I mean the macros. Let me explain what macros are. Macros are just predefined data that you can send over the serial port by pressing the corresponding macro button. And also you can use a timer for every macro and send it automatically every x programmable intervals in milliseconds. That’s pretty much all you need when you developing a firmware. But this functionality was not implemented yet in CuteCom.

Therefore, I had to implement it myself and also find an excuse to write some Qt again.

Result

I’ve branched CuteCom from github and added the macro functionality in here:

https://github.com/dimtass/CuteCom

I’ve done a pull request, but I can’t tell if it gets merged or not. But anyways if you are a macro lover like myself, then you can download it from the above branch.

Edit: Macros are now merged to the master git branch, thanks to Meinhard Ritscher.

https://github.com/neundorf/CuteCom

I’ll add here a couple of notes how to build it, because it’s not very clear from the README file. You can either clone the repo and use QtDesigner to load the project and and build it, or you can use cmake. In case you use cmake you need the Qt libs and header (version >= 5) in your system.

If you don’t have Qt installed then you need to do the following (tested on Ubuntu 18.04):

git clone https://github.com/neundorf/CuteCom
cd CuteCom
sudo apt install cmake qtbase5-dev libqt5serialport5-dev
cmake .
make
make install

This build and install cutecom in /usr/local/bin/cutecom. Then you can create a desktop launcher

gedit ~/.local/share/applications/CuteCom.desktop

And add the  following:

#!/usr/bin/env xdg-open
[Desktop Entry]
Version=1.0
Type=Application
Terminal=false
Icon[en_US]=cutecom
Name[en_US]=CuteCom
Exec=/usr/local/bin/cutecom
Comment[en_US]=Terminal
Name=CuteCom Comment=Terminal
Icon=cutecom

If you have installed another Qt SDK then you can just point cmake there and build like this:

cmake . -DCMAKE_PREFIX_PATH=/opt/Qt/5.x/gcc_64
make
sudo make install

This will be installed in `/usr/local/bin/cutecom` (try also `which` just to be sure…)

Finally, you’ll need a desktop icon or a launcher. For Ubuntu you can create a `CuteCom.desktop` file in your `~/.local/share/applications` path and paste the following:

#!/usr/bin/env xdg-open
[Desktop Entry]
Version=1.0
Type=Application
Terminal=false
Icon[en_US]=cutecom
Name[en_US]=CuteCom
Exec=env LD_LIBRARY_PATH=/opt/Qt/5.11.1/gcc_64/lib /usr/local/bin/cutecom
Comment[en_US]=Terminal
Name=CuteCom
Comment=Terminal
Icon=cutecom

The result should look like this:

Have fun!

Joystick gestures w/ STM32

Intro

Time for another stupid project, which adds no value to the humanity! This time I got my hands on one of these dirt-cheap analog joysticks on ebay that cost less than €1.5. I guess that you can make a lot of projects with them by using them as normal joysticks, but for some reason I wanted to do something more pointless than that. And that was to make a joystick that outputs gestures via USB.

By gestures I mean, that instead of sending the ADCs in realtime, instead support only the basic directions like up, down, left, right and button press and then send the gesture combinations through USB. If you’re using mouse gestures to your web browser, then you know what I mean.

OK, let’s see the components and result.

Components

STM32

I’m using a stm32f103c8t6 board. These modules cost less than €2 in ebay and you may have already seen me using them also in other stupid projects.

Joystick

You can find those joysticks in ebay if you search for a joystick breakout for arduino. Although they’re cheap the quality is really nice and also the stick feeling is nice, too. This is how it looks like

As you can see from the image there is +5V pin, which of course you need to connect to your micro-controller’s (the stm32 in this case) Vcc; which 3V3 and not only to +5V. The VRx pin it the x-axis variable resistor, the VRy is the y-axis variable resistor and the SW is the button. The switch output is activated if you press the joystick down. The orientation of the x,y axis is valid when you place the joystick to your palm while you can read the pin descriptions.

ST-Link

Finally, you need an ST-Link programmer to upload the firmware, like this one:

Or whatever programmer you like to use.

USB-uart module

You don’t really need this for the project, but if you like to debug or add some debugging message your own, then you’ll need this. You can find these on ebay with less than€1.50 and it looks like that

Making the stupid project

I’ve build the project on a breadboard. You can use a prototype breadboard if you want to make this a permanent board. I’ve added support for both USB and UART in the project. The USB, of course, is the easiest and preferred way to connect the device to your computer, but the UART port can be used for debugging. So, let’s start with a simple schematic of how everything is connected. This a screenshot from KiCad.

Therefore, the PA0 and PA1 are connected to VRx and VRy and they are set as ADC inputs. In the source code I’m using both ADC1 and ADC2 channels at the same time. The ADC1 channel is also using DMA, which is not really necessary as the conversion rate doesn’t need to be that fast, but I’m re-using code that I’ve already written for other projects. The setup of the ADCs is in the hw_config.c file in the source code. The ADCs are continuously convert the VRx and VRy inputs in the background as they are based on interrupts, but only every JOYS_UPDATE_TMR_MS milliseconds the function joys_update() updates the algorithm with the last valid values. The default update rate is 10ms, but you can trim it down to 1ms if you like. You can also have a look in the joys_update() function in joystick.c and trim the JOYS_DEBOUNCE_CNTR and JOYS_RECOGNITION_TIME_MS to your needs. The first one controls the debounce sensitivity and the second one the timeout of the gesture. That means the time in ms that the recognition timer will expire after the joystick is released in the center position and then send the recorded gesture.

The source code can be found here:
https://bitbucket.org/dimtass/stm32f103-usb-joystick

To build the code you need cmake and to flash it you need ST-Link. Have a look in the README.md file in the repo for details. Also you need to point to your arm toolchain.

Because I’m using the -flto -O3 flags, you need to make sure that you use a GCC version newer that 4.9

I’ve tested the code with this version:

arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]

The code size is ~14.6KB.

Finally, this is an example video of how the joystick gesture performs.

Have fun!

GCC compiler size benchmarks

Intro

Compilers, compiliers, compilers…

The black magic behind producing executable binaries for different  kind of processors. All programmers use them, but most of them don’t care about the internals and their differences. Anyway, this post is not about the compiler’s internals though, but how the different versions perform regarding the size that is produced.

I’ve made another benchmark few months ago here, but that was using different compilers (GCC and clang) and different C libraries. Now I’m using only GCC, but different versions.

Size doesn’t matter!

Well, don’t get me wrong here, but sometimes it does. Typical scenario is when you have a small microcontroller with a small flash size and your firmware is getting bigger and bigger. Another scenario is that you need to sacrifice some flash space for the DFU bootloader and then you realize that 4-12K are gone without writting any line of code for you actual app.

Therefore, size does matter.

Compiler Flags

Compililers come with different optimisation flags and the -Os flag commands the compiler to optimize specifically for size.

OK, so the binary size matters only when you the -Os!

No, no, no. The binary size matters whatever optimisation flag you use. For example your main need may be to optimise for performance. An example is if you’re using a fast toggle gpio, e.g. implementing a custom bit-banging bus to program and interface an FPGA (like the Xilinx’s selectmap). In this case you may need the -O1/2/3 optimisation more than -Os, but still the size matters because you’re limited in flash space. So, two different compiler versions may have even 1KB difference for the same optimization level and that 1KB may be critical someday to one of your projects!

And don’t forget about the -flto! This is an important flag if you need size optimisation; therefore, all the benchmarks are done with and without this flag also.

Benchmarking

I’ve benchmarked the following 9 different GCC compiler versions:

  • gcc-arm-none-eabi-4_8-2013q4
  • gcc-arm-none-eabi-4_9-2014q4
  • gcc-arm-none-eabi-5_3-2016q1
  • gcc-arm-none-eabi-5_4-2016q2
  • gcc-arm-none-eabi-5_4-2016q3
  • gcc-arm-none-eabi-6_2-2016q4
  • gcc-arm-none-eabi-6-2017-q1-update
  • gcc-arm-none-eabi-6-2017-q2-update
  • gcc-arm-none-eabi-7-2017-q4-major

It turned out that all the GCC6 compilers performed exactly the same; therefore, without reading the release notes I assume that the changes have to do with fixes rather optimisations.

The code I’ve used for the benchmards is here:
https://bitbucket.org/dimtass/stm32f103-usb-periph-expander

This is my next stupid project and it’s not completed yet, but still it compiles and without optimisations creates a ~50KB binary. To use your toolchain, just change the toolchain path in the `TOOLCHAIN_DIR` variable in the `cmake/TOOLCHAIN_arm_none_eabi_cortex_m3.cmake` file and run ./build.bash on Linux or build.cmd on Windows.

Results

These are the results from compiling the code with different compilers and optimisation flags.

gcc-arm-none-eabi-4_8-2013q4

flag size in bytes size in bytes (-flto)
-O0 51908
-O1 32656
-O2 31612
-O3 39360
-Os 27704

gcc-arm-none-eabi-4_9-2014q4

flag size in bytes size in bytes (-flto)
-O0 52216 56940
-O1 32692 23984
-O2 31496 22988
-O3 39672 31268
-Os 27563 19748

gcc-arm-none-eabi-5_3-2016q1

flag size in bytes size in bytes (-flto)
-O0 51696 55684
-O1 32656 24032
-O2 31124 23272
-O3 39732 30956
-Os 27260 19684

gcc-arm-none-eabi-5_4-2016q2

flag size in bytes size in bytes (-flto)
-O0 51736 55724
-O1 32672 24060
-O2 31144 23292
-O3 39744 30932
-Os 27292 19692

gcc-arm-none-eabi-5_4-2016q3

flag size in bytes size in bytes (-flto)
-O0 51920 55888
-O1 32684 24060
-O2 31144 23300
-O3 39740 30948
-Os 27292 19692

gcc-arm-none-eabi-6_2-2016q4, gcc-arm-none-eabi-6-2017-q1-update, gcc-arm-none-eabi-6-2017-q2-update

flag size in bytes size in bytes (-flto)
-O0 51632 55596
-O1 32712 24284
-O2 31056 22868
-O3 40140 30488
-Os 27128 19468

gcc-arm-none-eabi-7-2017-q4-major

flag size in bytes size in bytes (-flto)
-O0 51500 55420
-O1 32488 24016
-O2 30672 22080
-O3 40648 29544
-Os 26744 18920

Conclusion

From the results it’s pretty obvious that the -flto flag makes a huge difference in all versions except GCC4.8 where the code failed to compile at all with this flag enabled.

Also it seems that when no optimisations are applied with -O0 then the -flto instead of doing size optimisation, actually created a larger binary. I have no explain for that, but anyways it doesn’y really matter, because there’s no point in using -flto at all in such cases.

OK, so now let’s get to the point. Is there any difference between GCC versions? Yes, there is, but you need to see that in different angles. So, for the -Os flag it seems that the GCC7-2017-q4-major produces a binary which is ~380 bytes smaller without -flto and ~550 bytes with -flto from the second better GCC version (GCC6). That means that GCC7 will save you from changing part to another one with a bigger flash, only if your firmware exceeds the size by those sizes with GCC6. But, what are the changes, right? We’re not talking about 8051 here…

But wait… let’s see what happens with the -O3 though. In this case using the -flto flag GCC7 creates a binary which is 1KB smaller compared to the GCC6 version. That’s big enough and that may save you from changing to a larger part! Therefore, the size matters also for other optimisation levels like the -O3. This also means that if your code size getting larger and you need the max performance optimisation, then the compiler version may be significant.

So, why not use always the latest GCC version?

That’s a good question. Well, if you’re writing your software from the scratch now, then probably you should. But if you have an old project which is compiling with an old GCC version, then this doesn’t mean that it will also compile with -Wall in the newer version. That’s because between those two versions there might be some new warnings and errors that doesn’t allow the build. Hence, you need to edit your code and correct all the warnings and errors. If the code is not that big, then the effort may not be that much; but if the code is large then it means that you may need to spend much time on it. It’s even worse if you’re porting code that is not yours.

Therefore, the compiler version does matter for the binary size for all the available optimisation flags and depending your code size and processor you might need to choose between those versions depending your needs.

Have fun!

Why I moved away from github

I bet in every language in the word there’s a phrase similar to that.

Opinions are like butt holes. Everyone has one, but nobody wants to hear about yours.

Anyway, that my blog, so that’s my butt-hole-opinion.

I don’t really hate Microsoft. I mean, I may speak dirty about them sometimes; but that’s my temperament, I don’t have anything against them. I’m using microsoft products since 1985. My first OS was MS-Dos 2.0, running on a Schneider PC with 640KB Ram and a 3.5″ 720K FDD. It was a beast! I was coding GW-Basic on my yellow-black CRT and I was enjoying my animations on the Hercules video card. Since then, I’ve seen and used almost every MS product. I enjoyed Windows 95, 98, NT, XP and Office 2003 a lot. Nowadays, I’m not using their products that much, except VS Code. Also Windows 10 seems to be OK’ish, but there are also many things that I don’t like in them. To me, it seems that when you need to do more advance things, then it’s like an NT-relic with a 2018 makeup. So, generally I have a meh opinion for them, but that’s all.

So, why I decided to leave GitHub? Well, firstly it was how it felt when I’ve read the headline. When I’ve read the news I was like “WTF? Why?” It left a bitter taste in the tip of my tongue. Usually, that’s enough for me to make logical decisions. But this time I said to my self, OK,  let’s analyze this a bit.

Therefore, I’ve asked my self.

Let’s say that this moment you decide to create your new repository and you have 3-4 options and one of them is Microsoft. Would you choose that?

Well, the answer for me is, No. No, I wouldn’t choose a Microsoft service. I don’t want to. I don’t like to add another Microsoft product/service to my life. Thanks.

Then, why also remain to GitHub?

Also, there are other things that are also important. Like the kind of information that will gain access. I don’t want Microsoft to trace what I do in my free time and which projects I wish to develop or contribute. I’m most certain that they will do data-mining in every data that they have available. It’s even worse for Microsoft employees, because now their employer will have access to that information.

Anyway, there are so many different butt holes around this story, that there’s not enough room for all. But for me personally, it was the first reaction when I’ve heard the news that mostly affect my decision and that was:

Oh, no! No more Microsoft in my life.

My repos are now located here:
https://bitbucket.org/dimtass

container_of()

Intro

A small post for one of the more beautiful and useful macros in C. I consider the container_of() macro to be the equivalent of the Euler’s identity for the C language. Like the Euler’s identity is considered to be an example of mathematical beauty, the container_of is considered to be an example of C programming beauty. It has everything in there and it’s more simple than it looks. So, let’s see how it works.

Explanation

Actually there are many good explanations of how container_of works, some of them are good, some just waste internet space. I’ll give my explanation hoping that it won’t waste more internet space.

The container_of macro is defined in several places in the Linux kernel. This is the macro.

#define container_of(ptr, type, member) ({ \
    const typeof(((type *)0)->member) * __mptr = (ptr); \
    (type *)((char *)__mptr - offsetof(type, member)); }h

That’s a mess right. So, lets start breaking up things. First of all, there are other two things in there that need explanation. These are the typeof and the offsetof.

The typeof is a compiler extension. It’s not a function and it’s not a macro. All it does is that during compile type evaluates or replaces the type of the variable that the typeof() has. For example, consider this code:

int tmp_int = 20;

Then typeof(tmp_int) is int, therefore everywhere you use the typeof(tmp_int) the compiler will place that with the int keyword. Therefore, if you write this:

typeof(tmp_int) tmp_int2 = tmp_int;

then, the compiler will replace the typeof(tmp_int) with int, so the above will be the same as writing this:

int tmp_int2 = tmp_int;

The offsetof is another beautiful macro that you will find in the Linux kernel and it’s also defined in several places. This is the macro

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
The purpose of this function is to retrieve the offset of the address of a member variable of a structure. Let’s make that more simple. Let’s consider the following image.
This is a struct that has a green and a blue member. We can write this as
struct x {
     green a;
     blue  b;
};
Suppose that the tape measure is the RAM and each cm of the tape is a byte; then if I ask what’s the offset of blue member in the RAM then the answer is obvious and it’s 120. But what’s the offset of the blue member in the structure? In that case you need to calculate it by subtract 118 from 120, because 120 is the offset of the blue member in the tape (RAM) and 118 is the offset of the structure in the tape (RAM). So this needs to do a subtraction to calculate the relative offset, which is 2.
Now, lets do this.

What happened now? You see it’s the same structure but now I’ve slide the tape measure so the offset of the struct starts from zero. Now if I ask, what’s the offset of the blue member, then the answer is obvious and it’s 2.

Now that the struct offset is “normalized”, we don’t even care about the size of the green member or the size of the structure because it’s easy the absolute offset is the same with relative offset. This is exactly what &((TYPE *)0)->MEMBER does. This code dereferences the struct to the zero offset of the memory.

This generally is not a clever thing to do, but in this case this code is not executed or evaluated. It’s just a trick like the one I’ve shown above with the tape measure. The offsetof() macro will just return the offset of the member compared to zero. It’s just a number and you don’t access this memory. Therefore, doing this trick the only thing you need to know is the type of the structure.

Also, note that the 0 dereference doesn’t declare a variable.

Ok, so now let’s go back to the container_of() macro and have a look in this line

const typeof(((type *)0)->member) * __mptr = (ptr);

Here the ((type *)0)->member doesn’t declare or point to variable and it’s not an instance. It’s a compiler trick to point to the member type offset, as I’ve explained before. The compiler, by knowing the offset of the member in the structure and the structure type, knows also the type of the member in that index. Therefore, using the example of the tape measure, the typeof() the member on the offset 2 when the struct is dereferenced to 0, is blue.

So the code for the example with the tape measure becomes:

const typeof(((type *)0)->member) * __mptr = (ptr);
const blue * __mptr = (ptr);

and

(type *)((char *)__mptr - offsetof(type, member));

becomes

(x *)((char *)__mptr - 2);

The above means that the address of the the blue member minus the relative offset of the blue is dereferenced to the struct x. If you subtract the relative offset of the blue member from the address of the blue member, you get the absolute address of the struct x.

So, let’s see the container_of() macro again.

#define container_of(ptr, type, member) ({ \
    const typeof(((type *)0)->member) * __mptr = (ptr); \
    (type *)((char *)__mptr - offsetof(type, member)); }

Think about the tape measure example and try to evaluate this:

container_of(120, x, blue)

This means that we want to get a pointer in the absolute address of struct x when we know that in the position 120 we have a blue member. The container_of() macro will return the offset of the blue member (which is located in 120) minus the relative offset of blue in the x struct. That will evaluate to 120-2=118, so we’ll get the offset of the x struct by knowing the offset of the blue member.

Issues

Well, there are a few issues with the container_of() macro. These issues have to do with the some versions of gcc compilers. For example, let’s say that you have this structure:

struct person {
    int age;
    char* name;
};
If you try to do this:
struct person somebody;
somebody.name = (char*) malloc(25);
if (!somebody.name) {
    printf("Malloc failed!\n");
    return;
}
strcpy(somebody.name, "John Doe");
somebody.age = 38;
char* person_name = &somebody.name;
struct person * v =container_of(person_name , struct person, name);h

Then if you have a GCC compiler with version 5.4.0-6 then you’ll get this error:

error: cannot convert ‘char*’ to ‘char* const*’ in initialization
     const typeof(((type *)0)->member) * __mptr = (ptr);

Instead if you do this:

int * p_age = &somebody.age;
struct person * v =container_of(p_age, struct person, age);

Then the compiler will build the source code. Also, if you use a later compiler then both examples will be built. Therefore, have that in mind that the type checking thing is mainly a compiler trick and needs the compiler to handle this right.

Conclusion

container_of() and offsetof() macros are a beautiful piece of code. They are compact, simple and have everything in there. In 3 lines of beauty. Of course, you can use container_of() without know how it works, but where’s the fun then?

C++ vs C for embedded

Intro

Hi all! Well… I won’t write about a stupid project this time. I’ll write just my opinion about this endless debate between embedded engineers (usually old and new ones). The reason to debate usually is:

Which language is better for embedded?

And then the flame begins to end up to a battle with endless references and assembly code outputs.

Conclusion

I’ll answer this for you straight away. Use whatever language you like and you’re really good at when you do your own personal projects; and when working as a professional use the language of your company’s codebase. See? Simple! Why debate at all?

Details

Now, lets cut a few things in to pieces and get into more details. First things first, we need definitions. Without definitions there is no reason to even bother chat about. Therefore, the first definition is:

What is embedded?

Well, that’s a tough one. Embedded could be described much easier 10-20+ years ago. You had a 8 or 16-bit microcontroller (if you were lucky) and you had to write either assembly or C to make it do something meaningful. Then microprocessors became more powerful, dirt chip and they could easily run Linux. Now, you can by a Orange Pi Zero or RPi Zero for less than $7.

So now, embedded has a much more wider meaning, because now you can develop applications for an embedded board that can use a much larger code base written in C++, have threads and all the goodies that the Linux kernel can provide. Therefore, it’s important to distinguish the level of an embedded project to the discussion.

It’s much easier to give an answer to the main question when talking about high level embedded. At high level embedded design that runs Linux, to develop a user space app you can use C, C++ and a dozen more languages, you name them. In this case use whatever you like, but use your tools smart. Does it really matter if you use a bash script or python or write you own code in C/C++ or whatever language to access SPI or I2C? Well, not really… If for example you need to just update the configuration for the PLLs in an I2C clock chip, then it really doesn’t matter. I would use Python there. Open a jedec file, parse it and then configure the chip in 20 lines of code. To do the same in C/C++ for example it would take dozens of code lines. Is execution speed an important factor here? Nah.. So, don’t be dogmatic about your tools and use your tools smart. Therefore, especially for high level embedded stuff, you need to check you specs, weight your solutions and then pick up the easiest solution that does the job and doesn’t affect other processes. Also using STL and libs like Boost, may be much faster to implement your app instead of using C. So, why bother using C there?

The lower embedded answer is easy, too; but it also might be a bit complicated in some occasions. In the lower embedded domain I include every microprocessor that doesn’t have enough horsepower to run a full blown Linux kernel with enough storage for a rootfs. So, let’s say it’s every MCU from Cortex-M7 and lower. You might find a few projects with M4 and M7 running some micro-Linux distros, which are slow, though; so still I classify these cores as low embedded MCUs. Enough with definitions.

Therefore, in the lower embedded domain there are many things that need considering, like the supported tools, the existing code base, your fluency in a language and the vendor support.

Does your MCU tools really support C++? If you’re using GCC then the answer is yes, but that’s not always the case. There are still vendors that they don’t have C++ compilers for their MCUs or these C++ compilers are not ones to trust. In this case you need to go with C. Also, the vendor may supply libraries only for C then in this case go with C.

Some times, the vendor may support both C and C++ and have libraries for both languages but one of them might be bad written, obscure or buggy. For example, the HAL framework for the STM32 belongs to that category, so in this case if you’re limited with C++ choose another MCU or find alternative 3rd party C++ libs (if they exist) or if you’re not limited with C++ then go with StdPeriph and C.

What is your existing code base? If you’re working for a company (or even if you have a personal code base, which is usual after a few years) then go with your code base. It’s almost impossible, dangerous and time/money consuming to convert all you code base from C to C++ or the opposite. Why do that? Don’t! Use the language that your code base uses. That way you’ll use a well tested code, probably free of bugs and you’ll only need to use fragments and snippets that you already have. I’ve tried to convert projects from C to C++. Of course, it’s fun in the beginning and then when you proceed to more complex code, you may realize that you’re too bored to do this and you’ve already spend much time on it. This sucks, but tbh you need to try this at least once, for a small project. Therefore, go with your code base!

In which language you’re really good at? I mean, really really good! Feel like a pro. If you feel this for a specific language then go with it, at least for a professional project. Yes, of course learning new languages is cool and you should do that anyways, but when it comes to a project with deadlines don’t choose the language you’re fancy with, but your best weapon. Do I really need to elaborate on this? It should be crystal clear. You don’t want to come near a deadline and realise that you don’t know your tools well enough to overcome issues. Oh, no. That would be a nightmare. I think the experience on a language is not how to write the syntax and standard functional procedures, but all the little minor bits under the hood that you’ll only need once in a single project and if you don’t have the experience, it will hit you back. Hard. So, are you fluent in C++ and not in C, then go with C++.

Do you need to learn C++ for embedded?

Right now, C is enough to do everything on almost any embedded platform. So, do you really need to learn C++? Well, it wouldn’t hurt! Actually, you should do it, at some point. It’s not necessary, but it’s a plus for your skills. Also, you need to do it for the right reasons. For example, I’ve heard from other engineers that C++11 is nice because it has auto and lambas. So, what? That’s not a reason to learn C++11 just because of that. Besides there’s no difference for a good compiler like GCC if you use a normal function or a lambda. Also, some people don’t like the readability or obscure layer that an auto variable adds to the code while they reading a program; which is fair. Learn C++ to complement your skills not to go with the hype or substitute C on any embedded development. If you’re not using STL at all, then maybe just learn C++ for future use or for fun, otherwise you probably don’t need it.

As I see the embedded development world right now, it’s good to know both. You can develop high embedded Linux apps that use STL using C++ and use C for the kernel, u-boot and whatever other use you like. Also, for Linux user space given that you have enough horsepower use whatever language you like; but always try to be moderate. I find using Python or bash scripts to be an effective way to do difficult tasks in less time, especially in cases that the process speed is not trivial. But I don’t like to see resources like CPU cycles and RAM to be spent carelessly without optimisations and well thought decisions.

I’ll close this post with the following quote:

Use your tools smart, use whatever does the job right and always be moderate.

Hacking an RS232/485 to ETH board

Intro

It’s time for another stupid project. This time we’ll do some hacking and by that I mean that we’ll take an already existing device and alternate it to do something else. A couple of years ago, I’ve found in e-bay some nice and cheap RS232/485 to ethernet modules from usriot and I’ve bought them to play with them at some point as they had an LPC1114 controller, a DM9000 ethernet MAC and an UART interface configured as RS232 or RS485. You can do a lot of great stuff with this combo. Anyway, I decided to implement an ArtNet DMX512-A controller to controll DMX loads via Ethernet. To do that I needed a TCP/IP stack and also implement the basics of the ArtNet protocol. For the TCP/IP stack I’ve chosen uIP as it the most light-weight stack and it’s also easy to expand, as it doesn’t support all the things are needed (like udp broadcast and multicast). Also, ArtNet is an open and well defined protocol that can be found here (the link points always in the latest version which now is 4 but at the time I’ve written the code was in v.3).

Components

USR-TCP232-24

All you actually need is a USR-TCP232-24 board. Although I’ve bought it a couple of years ago you can still find many if those especially in alibaba and quite cheap. It’s a nice board no matter what and it can be used for several other stupid projects (for example a KNXnet/ip device would be a good candidate). The company that manufactures these modules it seems that it changes their hardware quite often. At some point they’ve changed their USR-TCP232-24 (LPC1114+DM9000) with the USR-TCP232-410S and USR-TCP232-410-PCBA (TM4C129EKCPD w/ integrated MAC) and now they don’t officially sell only PCB board but they sell the device with the casing. I’ve also bought two of the USR-TCP232-410-PCBA to play with at some point as they have a more powerful Cortex-M4 processor. Anyway, this is the board.

PX24506 DMX512 Decoder Driver

You don’t really need this for the project if you already have a DMX512A compatible device. If you don’t then you can buy a cheap RGB LED strip from ebay and a DMX512A decoder driver and have fun. A nice and quite cheap driver is the PX24506. It has 3x 3A channels for RGB, DMX-in and DMX-out to connect drivers in series and you can set the DMX address with a binary switch. I really like this driver. Uber Chinese tech stuff here. This is the driver.

You can find this beauty sold in tons in ebay.

Also, there are other two alternatives which I haven’t test them yet. They’re much cheaper but you can’t program the DMX address easily. You may find them as DM-103 and DM-104 in ebay or if you search for generic DMX512 Decoder Boards. This is how they look like.

Their difference is that the DM-104 is rated at 144W. The processor on these is an STM8 so you may also write a custom firmware in there (just saying).

Making the stupid project.

To make this project you need the free version of the MDK-ARM Keil compiler, the source code from my bitbucket repo here and the flash magic tool to flash compiled HEX file. The size of this project is not that large, therefore the evaluation MDK-ARM license is just fine. So, you need to load the project, build it and then flash the board. I’ve used an older uVision version (v4) to build this project, so in case of the v.5 you need to install the legacy packages that support the LPC1114 controller and then import the project file.

So, first step is to install the evaluation version of MDK-ARM and Flash Magic. Then clone the source code repo from bitbucket, open the dmx_tcp.uvproj and compile the project. The output dmx_tcp.hex will be created in the ./Flash folder, therefore use Flash Magic to flash it on the device. To do that, remove the power and place the jumber in the UPD position. The LPC series have a serial bootloader, so the next time you apply power the bootloader will load and then you can use the Flash Magic to flash the board. After that place the jumber again in the GND position.

When the jumber is on the GND position then the device functions in DMX512A mode using the RS485 to send the data. When the jumber is places in the CFG position then the device enters the debug mode. You can see what the debug mode does in the source code and also you can see the supported RS232 commands in UART_Handler() function in main.c. In the debug/configuration mode you can set use an RS232 terminal or console to configure the device. I’ve also created a file (terminal_macros.tmf) which contains a few macros that you can use with Br@y’s terminal. Connect an RS232 cable to the device and enter HELP and <Enter> key to print the available commands. Anyway, use the macros and the Br@y’s terminal, trust me, it’s the easiest way.

In the normal mode (jumber in GND) the device is running in one of the two supported modes that you can set in the configuration mode. The mode 0 (DMX_TCP_MODE_ASCII) is a plain ASCII TCP socket that you can connect and handle the DMX by sending simple ASCII commands. The supported commands are in the connection_handle_ascii() function in connectio.c file.

Mode 1 (DMX_TCP_MODE_BINARY) is the default ArtNet protocol mode. When this mode is used then you need a software that can control DMX universes using ArtNet, such the ArtNetominator. I’ve also implemented the ArtNet configuration protocol, so you can configure the IP and the device name by using the ArtNetominator.

Conclusion

This is a nice stupid project if you want to hack an RS232/485 to Ethernet device to implement a 1-Universe DMX Artnet gateway. There’s absolutely no meaning to do that, though! I mean, you can buy a Chinese 4x DMX universe device that supports E1.31(sACN) for ~$80 which is dirt cheap. There’s absolutely no reason to really make this stupid project. In case you want, though grab the code and play around with it.

Have fun!

FPGA digital clock

Intro

It’s been quite a long time since the last stupid project. This why summer holidays should be prohibited. Holidays cunningly change your mindset so fast and so deep that is really hard to switch back to the slave robot mindset and start doing some work. Anyway, during this period I spent some quality time learning Verilog and play around with FPGAs and as always, the best way to learn something new is to do something stupid with it. So, I’ve bought a dirt cheap dev board from e-bay and start writing some code.

If you’re new in FPGA development then you’ll probably face the same difficulty to learn how verilog works. If you already have a hardware background it will be easier. There are so many great guides and tutorials out there in the internet that you don’t really need a book or something. With my short experience if you understand what ‘wire’, ‘reg’, ‘<=’ and ‘=’ do and mean and you understand that you can’t electrically drive the same input with two outputs at the same time, then you’re good to go. That’s the base to proceed building rockets and spaceships.

Components

FPGA development board (EP4CE6E22C8N)

There are quite a few FPGA development boards and the two main FPGA manufacturers are Xilinx and Altera. You can buy cheap dev boards for both manufacturers in ebay for $40. In this project I’ve used a development board with an Altera Cyclone IV FPGA device (EP4CE6E22C8N), like this one:

This board has plenty of peripherals like vga output, PS/2, SPI flash, SDRAM, 4 digit 7-segment display, 4 tact buttons, A/D, IrDA, LEDs, buzzer and USB-to-UART. So you can build several stupid projects with this beauty.

USB Blaster (programmer)

Finally, you’re going to need a programmer. Again ebay has plenty of cheap USB programmers. Search for ‘Altera USB Blaster’ and you’ll find many around with $3-4, like this one:

Quartus Prime Lite Edition

To write your awesome software you’ll need the Quartus Prime Lite Edition IDE. This IDE will look a bit strange, if you’re coming from the embedded software world. FPGA engineers have a quite different taste on how an IDE should be look like, but after spending some time with it, then everything makes sense and it’s fun to work with.

Making the stupid project

Well, no much things to say here. I mean, you don’t have to make any connections, no soldering, nothing. Just plug the USB power cable on the board, connect the USB Blaster to on the 10-pin header and to you computer USB port, download and install the Quartus and then git clone the following the project and open it in the Quartus.

https://bitbucket.org/dimtass/fpga_clock

I’ve used Quartus in both Linux and Windows and I’ve programmed the board just fine. The result is a digital a clock. Actually, is a powerful FPGA on a board with couple awesome peripherals and all it does is flashing a few LEDS every second and display the time on the 7-segment. That’s the definition of a stupid project.

This is a short video of the clock:

Conclusion

FPGAs are fun. Especially, when it’s not your everyday job and it’s just a hobby. From the few things that I’ve seen is easy to make simple projects, is nice to know how they work and how you program them and it can be an agonising pain if it’s your job. There are so many things that can go wrong in there, that sometimes is miracle that a project works. Time constrains, parallelisation and several other things in the silicon that are sensitive in so many parameters. It’s hell. But now I have a $40 digital clock and I can shout to the world that I’m also an expert on FPGAs. FPGAs are addictive when you start with them, but when you scratch the surface and go a bit deeper, then you’re glad it’s not your everyday job.

Have fun!