Added macros to CuteCom


The last years I’m only using Linux to my workplace and since I’ve also started using only Linux at home, too; I’ve found my self missing some tools that I was using with Windows. That’s pretty much the case with everyone that at some point tries or tried the same thing. Gladly, because many people already do this more and more the last years, there are many alternatives for most of the tools. Alternatives can be either better or worse than the tool you we’re using, of course, but the best thing with FOSS is that you can download the code and implement all the functionality you’re missing yourself. And that’s great.

Anyway, one the tools I’ve really missed is br@y’s terminal. I assume that every bare metal embedded firmware developer is aware of this amazing tool. It’s everything you need when you develop firmware for a micro-controller. For embedded Linux I prefer putty for serial console, though. Anyway, this great tool is only for Windows and although you can use Wine to run it on Linux, soon you’ll find out that when you develop USB CDC devices then the whole wine/terminal thing doesn’t work well.


There are many alternatives for Linux (console and gui) terminal apps and I’ve used most of those you can find in the first 7 pages of google results. But after using Bray’s terminal for so many years, only one of them seem to be close enough to that; and that’s CuteCom. The nice thing with CuteCom is that it’s written with Qt, so it’s a cross-platform app and also Qt is easy and nice to write code.

Tbh, I’m familiar with Qt since Trolltech era and after the Nokia era. I’ve written a lot of code in Qt, especially for the Nokia n900 phone and Maemo OS. But since Nokia abandoned Maemo and also Meego (former Tizen), I’ve started doing other stuff. I was really disappointed back then, because I believe n900 and Maemo could be the future until everything went wrong and Nokia abandoned everything and adopt Windows for their mobiles. I’ll moan another time for how much Microsoft loves Linux.

Anyway, Qt may also affected my decision to go with CuteCom, but the problem was that the functionality that I was using most from Bray’s terminal wasn’t there. And I mean the macros. Let me explain what macros are. Macros are just predefined data that you can send over the serial port by pressing the corresponding macro button. And also you can use a timer for every macro and send it automatically every x programmable intervals in milliseconds. That’s pretty much all you need when you developing a firmware. But this functionality was not implemented yet in CuteCom.

Therefore, I had to implement it myself and also find an excuse to write some Qt again.


I’ve branched CuteCom from github and added the macro functionality in here:

I’ve done a pull request, but I can’t tell if it gets merged or not. But anyways if you are a macro lover like myself, then you can download it from the above branch.

Edit: Macros are now merged to the master git branch, thanks to Meinhard Ritscher.

I’ll add here a couple of notes how to build it, because it’s not very clear from the README file. You can either clone the repo and use QtDesigner to load the project and and build it, or you can use cmake. In case you use cmake you need the Qt libs and header (version >= 5) in your system.

If you don’t have Qt installed then you need to do the following (tested on Ubuntu 18.04):

git clone
cd CuteCom
sudo apt install cmake qtbase5-dev libqt5serialport5-dev
cmake .
make install

This build and install cutecom in /usr/local/bin/cutecom. Then you can create a desktop launcher

gedit ~/.local/share/applications/CuteCom.desktop

And add the  following:

#!/usr/bin/env xdg-open
[Desktop Entry]
Name=CuteCom Comment=Terminal

If you have installed another Qt SDK then you can just point cmake there and build like this:

cmake . -DCMAKE_PREFIX_PATH=/opt/Qt/5.x/gcc_64
sudo make install

This will be installed in `/usr/local/bin/cutecom` (try also `which` just to be sure…)

Finally, you’ll need a desktop icon or a launcher. For Ubuntu you can create a `CuteCom.desktop` file in your `~/.local/share/applications` path and paste the following:

#!/usr/bin/env xdg-open
[Desktop Entry]
Exec=env LD_LIBRARY_PATH=/opt/Qt/5.11.1/gcc_64/lib /usr/local/bin/cutecom

The result should look like this:

Have fun!

Joystick gestures w/ STM32


Time for another stupid project, which adds no value to the humanity! This time I got my hands on one of these dirt-cheap analog joysticks on ebay that cost less than €1.5. I guess that you can make a lot of projects with them by using them as normal joysticks, but for some reason I wanted to do something more pointless than that. And that was to make a joystick that outputs gestures via USB.

By gestures I mean, that instead of sending the ADCs in realtime, instead support only the basic directions like up, down, left, right and button press and then send the gesture combinations through USB. If you’re using mouse gestures to your web browser, then you know what I mean.

OK, let’s see the components and result.



I’m using a stm32f103c8t6 board. These modules cost less than €2 in ebay and you may have already seen me using them also in other stupid projects.


You can find those joysticks in ebay if you search for a joystick breakout for arduino. Although they’re cheap the quality is really nice and also the stick feeling is nice, too. This is how it looks like

As you can see from the image there is +5V pin, which of course you need to connect to your micro-controller’s (the stm32 in this case) Vcc; which 3V3 and not only to +5V. The VRx pin it the x-axis variable resistor, the VRy is the y-axis variable resistor and the SW is the button. The switch output is activated if you press the joystick down. The orientation of the x,y axis is valid when you place the joystick to your palm while you can read the pin descriptions.


Finally, you need an ST-Link programmer to upload the firmware, like this one:

Or whatever programmer you like to use.

USB-uart module

You don’t really need this for the project, but if you like to debug or add some debugging message your own, then you’ll need this. You can find these on ebay with less than€1.50 and it looks like that

Making the stupid project

I’ve build the project on a breadboard. You can use a prototype breadboard if you want to make this a permanent board. I’ve added support for both USB and UART in the project. The USB, of course, is the easiest and preferred way to connect the device to your computer, but the UART port can be used for debugging. So, let’s start with a simple schematic of how everything is connected. This a screenshot from KiCad.

Therefore, the PA0 and PA1 are connected to VRx and VRy and they are set as ADC inputs. In the source code I’m using both ADC1 and ADC2 channels at the same time. The ADC1 channel is also using DMA, which is not really necessary as the conversion rate doesn’t need to be that fast, but I’m re-using code that I’ve already written for other projects. The setup of the ADCs is in the hw_config.c file in the source code. The ADCs are continuously convert the VRx and VRy inputs in the background as they are based on interrupts, but only every JOYS_UPDATE_TMR_MS milliseconds the function joys_update() updates the algorithm with the last valid values. The default update rate is 10ms, but you can trim it down to 1ms if you like. You can also have a look in the joys_update() function in joystick.c and trim the JOYS_DEBOUNCE_CNTR and JOYS_RECOGNITION_TIME_MS to your needs. The first one controls the debounce sensitivity and the second one the timeout of the gesture. That means the time in ms that the recognition timer will expire after the joystick is released in the center position and then send the recorded gesture.

The source code can be found here:

To build the code you need cmake and to flash it you need ST-Link. Have a look in the file in the repo for details. Also you need to point to your arm toolchain.

Because I’m using the -flto -O3 flags, you need to make sure that you use a GCC version newer that 4.9

I’ve tested the code with this version:

arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]

The code size is ~14.6KB.

Finally, this is an example video of how the joystick gesture performs.

Have fun!

GCC compiler size benchmarks


Compilers, compiliers, compilers…

The black magic behind producing executable binaries for different  kind of processors. All programmers use them, but most of them don’t care about the internals and their differences. Anyway, this post is not about the compiler’s internals though, but how the different versions perform regarding the size that is produced.

I’ve made another benchmark few months ago here, but that was using different compilers (GCC and clang) and different C libraries. Now I’m using only GCC, but different versions.

Size doesn’t matter!

Well, don’t get me wrong here, but sometimes it does. Typical scenario is when you have a small microcontroller with a small flash size and your firmware is getting bigger and bigger. Another scenario is that you need to sacrifice some flash space for the DFU bootloader and then you realize that 4-12K are gone without writting any line of code for you actual app.

Therefore, size does matter.

Compiler Flags

Compililers come with different optimisation flags and the -Os flag commands the compiler to optimize specifically for size.

OK, so the binary size matters only when you the -Os!

No, no, no. The binary size matters whatever optimisation flag you use. For example your main need may be to optimise for performance. An example is if you’re using a fast toggle gpio, e.g. implementing a custom bit-banging bus to program and interface an FPGA (like the Xilinx’s selectmap). In this case you may need the -O1/2/3 optimisation more than -Os, but still the size matters because you’re limited in flash space. So, two different compiler versions may have even 1KB difference for the same optimization level and that 1KB may be critical someday to one of your projects!

And don’t forget about the -flto! This is an important flag if you need size optimisation; therefore, all the benchmarks are done with and without this flag also.


I’ve benchmarked the following 9 different GCC compiler versions:

  • gcc-arm-none-eabi-4_8-2013q4
  • gcc-arm-none-eabi-4_9-2014q4
  • gcc-arm-none-eabi-5_3-2016q1
  • gcc-arm-none-eabi-5_4-2016q2
  • gcc-arm-none-eabi-5_4-2016q3
  • gcc-arm-none-eabi-6_2-2016q4
  • gcc-arm-none-eabi-6-2017-q1-update
  • gcc-arm-none-eabi-6-2017-q2-update
  • gcc-arm-none-eabi-7-2017-q4-major

It turned out that all the GCC6 compilers performed exactly the same; therefore, without reading the release notes I assume that the changes have to do with fixes rather optimisations.

The code I’ve used for the benchmards is here:

This is my next stupid project and it’s not completed yet, but still it compiles and without optimisations creates a ~50KB binary. To use your toolchain, just change the toolchain path in the `TOOLCHAIN_DIR` variable in the `cmake/TOOLCHAIN_arm_none_eabi_cortex_m3.cmake` file and run ./build.bash on Linux or build.cmd on Windows.


These are the results from compiling the code with different compilers and optimisation flags.


flag size in bytes size in bytes (-flto)
-O0 51908
-O1 32656
-O2 31612
-O3 39360
-Os 27704


flag size in bytes size in bytes (-flto)
-O0 52216 56940
-O1 32692 23984
-O2 31496 22988
-O3 39672 31268
-Os 27563 19748


flag size in bytes size in bytes (-flto)
-O0 51696 55684
-O1 32656 24032
-O2 31124 23272
-O3 39732 30956
-Os 27260 19684


flag size in bytes size in bytes (-flto)
-O0 51736 55724
-O1 32672 24060
-O2 31144 23292
-O3 39744 30932
-Os 27292 19692


flag size in bytes size in bytes (-flto)
-O0 51920 55888
-O1 32684 24060
-O2 31144 23300
-O3 39740 30948
-Os 27292 19692

gcc-arm-none-eabi-6_2-2016q4, gcc-arm-none-eabi-6-2017-q1-update, gcc-arm-none-eabi-6-2017-q2-update

flag size in bytes size in bytes (-flto)
-O0 51632 55596
-O1 32712 24284
-O2 31056 22868
-O3 40140 30488
-Os 27128 19468


flag size in bytes size in bytes (-flto)
-O0 51500 55420
-O1 32488 24016
-O2 30672 22080
-O3 40648 29544
-Os 26744 18920


From the results it’s pretty obvious that the -flto flag makes a huge difference in all versions except GCC4.8 where the code failed to compile at all with this flag enabled.

Also it seems that when no optimisations are applied with -O0 then the -flto instead of doing size optimisation, actually created a larger binary. I have no explain for that, but anyways it doesn’y really matter, because there’s no point in using -flto at all in such cases.

OK, so now let’s get to the point. Is there any difference between GCC versions? Yes, there is, but you need to see that in different angles. So, for the -Os flag it seems that the GCC7-2017-q4-major produces a binary which is ~380 bytes smaller without -flto and ~550 bytes with -flto from the second better GCC version (GCC6). That means that GCC7 will save you from changing part to another one with a bigger flash, only if your firmware exceeds the size by those sizes with GCC6. But, what are the changes, right? We’re not talking about 8051 here…

But wait… let’s see what happens with the -O3 though. In this case using the -flto flag GCC7 creates a binary which is 1KB smaller compared to the GCC6 version. That’s big enough and that may save you from changing to a larger part! Therefore, the size matters also for other optimisation levels like the -O3. This also means that if your code size getting larger and you need the max performance optimisation, then the compiler version may be significant.

So, why not use always the latest GCC version?

That’s a good question. Well, if you’re writing your software from the scratch now, then probably you should. But if you have an old project which is compiling with an old GCC version, then this doesn’t mean that it will also compile with -Wall in the newer version. That’s because between those two versions there might be some new warnings and errors that doesn’t allow the build. Hence, you need to edit your code and correct all the warnings and errors. If the code is not that big, then the effort may not be that much; but if the code is large then it means that you may need to spend much time on it. It’s even worse if you’re porting code that is not yours.

Therefore, the compiler version does matter for the binary size for all the available optimisation flags and depending your code size and processor you might need to choose between those versions depending your needs.

Have fun!

Why I moved away from github

I bet in every language in the word there’s a phrase similar to that.

Opinions are like butt holes. Everyone has one, but nobody wants to hear about yours.

Anyway, that my blog, so that’s my butt-hole-opinion.

I don’t really hate Microsoft. I mean, I may speak dirty about them sometimes; but that’s my temperament, I don’t have anything against them. I’m using microsoft products since 1985. My first OS was MS-Dos 2.0, running on a Schneider PC with 640KB Ram and a 3.5″ 720K FDD. It was a beast! I was coding GW-Basic on my yellow-black CRT and I was enjoying my animations on the Hercules video card. Since then, I’ve seen and used almost every MS product. I enjoyed Windows 95, 98, NT, XP and Office 2003 a lot. Nowadays, I’m not using their products that much, except VS Code. Also Windows 10 seems to be OK’ish, but there are also many things that I don’t like in them. To me, it seems that when you need to do more advance things, then it’s like an NT-relic with a 2018 makeup. So, generally I have a meh opinion for them, but that’s all.

So, why I decided to leave GitHub? Well, firstly it was how it felt when I’ve read the headline. When I’ve read the news I was like “WTF? Why?” It left a bitter taste in the tip of my tongue. Usually, that’s enough for me to make logical decisions. But this time I said to my self, OK,  let’s analyze this a bit.

Therefore, I’ve asked my self.

Let’s say that this moment you decide to create your new repository and you have 3-4 options and one of them is Microsoft. Would you choose that?

Well, the answer for me is, No. No, I wouldn’t choose a Microsoft service. I don’t want to. I don’t like to add another Microsoft product/service to my life. Thanks.

Then, why also remain to GitHub?

Also, there are other things that are also important. Like the kind of information that will gain access. I don’t want Microsoft to trace what I do in my free time and which projects I wish to develop or contribute. I’m most certain that they will do data-mining in every data that they have available. It’s even worse for Microsoft employees, because now their employer will have access to that information.

Anyway, there are so many different butt holes around this story, that there’s not enough room for all. But for me personally, it was the first reaction when I’ve heard the news that mostly affect my decision and that was:

Oh, no! No more Microsoft in my life.

My repos are now located here:



A small post for one of the more beautiful and useful macros in C. I consider the container_of() macro to be the equivalent of the Euler’s identity for the C language. Like the Euler’s identity is considered to be an example of mathematical beauty, the container_of is considered to be an example of C programming beauty. It has everything in there and it’s more simple than it looks. So, let’s see how it works.


Actually there are many good explanations of how container_of works, some of them are good, some just waste internet space. I’ll give my explanation hoping that it won’t waste more internet space.

The container_of macro is defined in several places in the Linux kernel. This is the macro.

#define container_of(ptr, type, member) ({ \
    const typeof(((type *)0)->member) * __mptr = (ptr); \
    (type *)((char *)__mptr - offsetof(type, member)); }h

That’s a mess right. So, lets start breaking up things. First of all, there are other two things in there that need explanation. These are the typeof and the offsetof.

The typeof is a compiler extension. It’s not a function and it’s not a macro. All it does is that during compile type evaluates or replaces the type of the variable that the typeof() has. For example, consider this code:

int tmp_int = 20;

Then typeof(tmp_int) is int, therefore everywhere you use the typeof(tmp_int) the compiler will place that with the int keyword. Therefore, if you write this:

typeof(tmp_int) tmp_int2 = tmp_int;

then, the compiler will replace the typeof(tmp_int) with int, so the above will be the same as writing this:

int tmp_int2 = tmp_int;

The offsetof is another beautiful macro that you will find in the Linux kernel and it’s also defined in several places. This is the macro

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
The purpose of this function is to retrieve the offset of the address of a member variable of a structure. Let’s make that more simple. Let’s consider the following image.
This is a struct that has a green and a blue member. We can write this as
struct x {
     green a;
     blue  b;
Suppose that the tape measure is the RAM and each cm of the tape is a byte; then if I ask what’s the offset of blue member in the RAM then the answer is obvious and it’s 120. But what’s the offset of the blue member in the structure? In that case you need to calculate it by subtract 118 from 120, because 120 is the offset of the blue member in the tape (RAM) and 118 is the offset of the structure in the tape (RAM). So this needs to do a subtraction to calculate the relative offset, which is 2.
Now, lets do this.

What happened now? You see it’s the same structure but now I’ve slide the tape measure so the offset of the struct starts from zero. Now if I ask, what’s the offset of the blue member, then the answer is obvious and it’s 2.

Now that the struct offset is “normalized”, we don’t even care about the size of the green member or the size of the structure because it’s easy the absolute offset is the same with relative offset. This is exactly what &((TYPE *)0)->MEMBER does. This code dereferences the struct to the zero offset of the memory.

This generally is not a clever thing to do, but in this case this code is not executed or evaluated. It’s just a trick like the one I’ve shown above with the tape measure. The offsetof() macro will just return the offset of the member compared to zero. It’s just a number and you don’t access this memory. Therefore, doing this trick the only thing you need to know is the type of the structure.

Also, note that the 0 dereference doesn’t declare a variable.

Ok, so now let’s go back to the container_of() macro and have a look in this line

const typeof(((type *)0)->member) * __mptr = (ptr);

Here the ((type *)0)->member doesn’t declare or point to variable and it’s not an instance. It’s a compiler trick to point to the member type offset, as I’ve explained before. The compiler, by knowing the offset of the member in the structure and the structure type, knows also the type of the member in that index. Therefore, using the example of the tape measure, the typeof() the member on the offset 2 when the struct is dereferenced to 0, is blue.

So the code for the example with the tape measure becomes:

const typeof(((type *)0)->member) * __mptr = (ptr);
const blue * __mptr = (ptr);


(type *)((char *)__mptr - offsetof(type, member));


(x *)((char *)__mptr - 2);

The above means that the address of the the blue member minus the relative offset of the blue is dereferenced to the struct x. If you subtract the relative offset of the blue member from the address of the blue member, you get the absolute address of the struct x.

So, let’s see the container_of() macro again.

#define container_of(ptr, type, member) ({ \
    const typeof(((type *)0)->member) * __mptr = (ptr); \
    (type *)((char *)__mptr - offsetof(type, member)); }

Think about the tape measure example and try to evaluate this:

container_of(120, x, blue)

This means that we want to get a pointer in the absolute address of struct x when we know that in the position 120 we have a blue member. The container_of() macro will return the offset of the blue member (which is located in 120) minus the relative offset of blue in the x struct. That will evaluate to 120-2=118, so we’ll get the offset of the x struct by knowing the offset of the blue member.


Well, there are a few issues with the container_of() macro. These issues have to do with the some versions of gcc compilers. For example, let’s say that you have this structure:

struct person {
    int age;
    char* name;
If you try to do this:
struct person somebody; = (char*) malloc(25);
if (! {
    printf("Malloc failed!\n");
strcpy(, "John Doe");
somebody.age = 38;
char* person_name = &;
struct person * v =container_of(person_name , struct person, name);h

Then if you have a GCC compiler with version 5.4.0-6 then you’ll get this error:

error: cannot convert ‘char*’ to ‘char* const*’ in initialization
     const typeof(((type *)0)->member) * __mptr = (ptr);

Instead if you do this:

int * p_age = &somebody.age;
struct person * v =container_of(p_age, struct person, age);

Then the compiler will build the source code. Also, if you use a later compiler then both examples will be built. Therefore, have that in mind that the type checking thing is mainly a compiler trick and needs the compiler to handle this right.


container_of() and offsetof() macros are a beautiful piece of code. They are compact, simple and have everything in there. In 3 lines of beauty. Of course, you can use container_of() without know how it works, but where’s the fun then?

C++ vs C for embedded


Hi all! Well… I won’t write about a stupid project this time. I’ll write just my opinion about this endless debate between embedded engineers (usually old and new ones). The reason to debate usually is:

Which language is better for embedded?

And then the flame begins to end up to a battle with endless references and assembly code outputs.


I’ll answer this for you straight away. Use whatever language you like and you’re really good at when you do your own personal projects; and when working as a professional use the language of your company’s codebase. See? Simple! Why debate at all?


Now, lets cut a few things in to pieces and get into more details. First things first, we need definitions. Without definitions there is no reason to even bother chat about. Therefore, the first definition is:

What is embedded?

Well, that’s a tough one. Embedded could be described much easier 10-20+ years ago. You had a 8 or 16-bit microcontroller (if you were lucky) and you had to write either assembly or C to make it do something meaningful. Then microprocessors became more powerful, dirt chip and they could easily run Linux. Now, you can by a Orange Pi Zero or RPi Zero for less than $7.

So now, embedded has a much more wider meaning, because now you can develop applications for an embedded board that can use a much larger code base written in C++, have threads and all the goodies that the Linux kernel can provide. Therefore, it’s important to distinguish the level of an embedded project to the discussion.

It’s much easier to give an answer to the main question when talking about high level embedded. At high level embedded design that runs Linux, to develop a user space app you can use C, C++ and a dozen more languages, you name them. In this case use whatever you like, but use your tools smart. Does it really matter if you use a bash script or python or write you own code in C/C++ or whatever language to access SPI or I2C? Well, not really… If for example you need to just update the configuration for the PLLs in an I2C clock chip, then it really doesn’t matter. I would use Python there. Open a jedec file, parse it and then configure the chip in 20 lines of code. To do the same in C/C++ for example it would take dozens of code lines. Is execution speed an important factor here? Nah.. So, don’t be dogmatic about your tools and use your tools smart. Therefore, especially for high level embedded stuff, you need to check you specs, weight your solutions and then pick up the easiest solution that does the job and doesn’t affect other processes. Also using STL and libs like Boost, may be much faster to implement your app instead of using C. So, why bother using C there?

The lower embedded answer is easy, too; but it also might be a bit complicated in some occasions. In the lower embedded domain I include every microprocessor that doesn’t have enough horsepower to run a full blown Linux kernel with enough storage for a rootfs. So, let’s say it’s every MCU from Cortex-M7 and lower. You might find a few projects with M4 and M7 running some micro-Linux distros, which are slow, though; so still I classify these cores as low embedded MCUs. Enough with definitions.

Therefore, in the lower embedded domain there are many things that need considering, like the supported tools, the existing code base, your fluency in a language and the vendor support.

Does your MCU tools really support C++? If you’re using GCC then the answer is yes, but that’s not always the case. There are still vendors that they don’t have C++ compilers for their MCUs or these C++ compilers are not ones to trust. In this case you need to go with C. Also, the vendor may supply libraries only for C then in this case go with C.

Some times, the vendor may support both C and C++ and have libraries for both languages but one of them might be bad written, obscure or buggy. For example, the HAL framework for the STM32 belongs to that category, so in this case if you’re limited with C++ choose another MCU or find alternative 3rd party C++ libs (if they exist) or if you’re not limited with C++ then go with StdPeriph and C.

What is your existing code base? If you’re working for a company (or even if you have a personal code base, which is usual after a few years) then go with your code base. It’s almost impossible, dangerous and time/money consuming to convert all you code base from C to C++ or the opposite. Why do that? Don’t! Use the language that your code base uses. That way you’ll use a well tested code, probably free of bugs and you’ll only need to use fragments and snippets that you already have. I’ve tried to convert projects from C to C++. Of course, it’s fun in the beginning and then when you proceed to more complex code, you may realize that you’re too bored to do this and you’ve already spend much time on it. This sucks, but tbh you need to try this at least once, for a small project. Therefore, go with your code base!

In which language you’re really good at? I mean, really really good! Feel like a pro. If you feel this for a specific language then go with it, at least for a professional project. Yes, of course learning new languages is cool and you should do that anyways, but when it comes to a project with deadlines don’t choose the language you’re fancy with, but your best weapon. Do I really need to elaborate on this? It should be crystal clear. You don’t want to come near a deadline and realise that you don’t know your tools well enough to overcome issues. Oh, no. That would be a nightmare. I think the experience on a language is not how to write the syntax and standard functional procedures, but all the little minor bits under the hood that you’ll only need once in a single project and if you don’t have the experience, it will hit you back. Hard. So, are you fluent in C++ and not in C, then go with C++.

Do you need to learn C++ for embedded?

Right now, C is enough to do everything on almost any embedded platform. So, do you really need to learn C++? Well, it wouldn’t hurt! Actually, you should do it, at some point. It’s not necessary, but it’s a plus for your skills. Also, you need to do it for the right reasons. For example, I’ve heard from other engineers that C++11 is nice because it has auto and lambas. So, what? That’s not a reason to learn C++11 just because of that. Besides there’s no difference for a good compiler like GCC if you use a normal function or a lambda. Also, some people don’t like the readability or obscure layer that an auto variable adds to the code while they reading a program; which is fair. Learn C++ to complement your skills not to go with the hype or substitute C on any embedded development. If you’re not using STL at all, then maybe just learn C++ for future use or for fun, otherwise you probably don’t need it.

As I see the embedded development world right now, it’s good to know both. You can develop high embedded Linux apps that use STL using C++ and use C for the kernel, u-boot and whatever other use you like. Also, for Linux user space given that you have enough horsepower use whatever language you like; but always try to be moderate. I find using Python or bash scripts to be an effective way to do difficult tasks in less time, especially in cases that the process speed is not trivial. But I don’t like to see resources like CPU cycles and RAM to be spent carelessly without optimisations and well thought decisions.

I’ll close this post with the following quote:

Use your tools smart, use whatever does the job right and always be moderate.

Hacking an RS232/485 to ETH board


It’s time for another stupid project. This time we’ll do some hacking and by that I mean that we’ll take an already existing device and alternate it to do something else. A couple of years ago, I’ve found in e-bay some nice and cheap RS232/485 to ethernet modules from usriot and I’ve bought them to play with them at some point as they had an LPC1114 controller, a DM9000 ethernet MAC and an UART interface configured as RS232 or RS485. You can do a lot of great stuff with this combo. Anyway, I decided to implement an ArtNet DMX512-A controller to controll DMX loads via Ethernet. To do that I needed a TCP/IP stack and also implement the basics of the ArtNet protocol. For the TCP/IP stack I’ve chosen uIP as it the most light-weight stack and it’s also easy to expand, as it doesn’t support all the things are needed (like udp broadcast and multicast). Also, ArtNet is an open and well defined protocol that can be found here (the link points always in the latest version which now is 4 but at the time I’ve written the code was in v.3).



All you actually need is a USR-TCP232-24 board. Although I’ve bought it a couple of years ago you can still find many if those especially in alibaba and quite cheap. It’s a nice board no matter what and it can be used for several other stupid projects (for example a KNXnet/ip device would be a good candidate). The company that manufactures these modules it seems that it changes their hardware quite often. At some point they’ve changed their USR-TCP232-24 (LPC1114+DM9000) with the USR-TCP232-410S and USR-TCP232-410-PCBA (TM4C129EKCPD w/ integrated MAC) and now they don’t officially sell only PCB board but they sell the device with the casing. I’ve also bought two of the USR-TCP232-410-PCBA to play with at some point as they have a more powerful Cortex-M4 processor. Anyway, this is the board.

PX24506 DMX512 Decoder Driver

You don’t really need this for the project if you already have a DMX512A compatible device. If you don’t then you can buy a cheap RGB LED strip from ebay and a DMX512A decoder driver and have fun. A nice and quite cheap driver is the PX24506. It has 3x 3A channels for RGB, DMX-in and DMX-out to connect drivers in series and you can set the DMX address with a binary switch. I really like this driver. Uber Chinese tech stuff here. This is the driver.

You can find this beauty sold in tons in ebay.

Also, there are other two alternatives which I haven’t test them yet. They’re much cheaper but you can’t program the DMX address easily. You may find them as DM-103 and DM-104 in ebay or if you search for generic DMX512 Decoder Boards. This is how they look like.

Their difference is that the DM-104 is rated at 144W. The processor on these is an STM8 so you may also write a custom firmware in there (just saying).

Making the stupid project.

To make this project you need the free version of the MDK-ARM Keil compiler, the source code from my bitbucket repo here and the flash magic tool to flash compiled HEX file. The size of this project is not that large, therefore the evaluation MDK-ARM license is just fine. So, you need to load the project, build it and then flash the board. I’ve used an older uVision version (v4) to build this project, so in case of the v.5 you need to install the legacy packages that support the LPC1114 controller and then import the project file.

So, first step is to install the evaluation version of MDK-ARM and Flash Magic. Then clone the source code repo from bitbucket, open the dmx_tcp.uvproj and compile the project. The output dmx_tcp.hex will be created in the ./Flash folder, therefore use Flash Magic to flash it on the device. To do that, remove the power and place the jumber in the UPD position. The LPC series have a serial bootloader, so the next time you apply power the bootloader will load and then you can use the Flash Magic to flash the board. After that place the jumber again in the GND position.

When the jumber is on the GND position then the device functions in DMX512A mode using the RS485 to send the data. When the jumber is places in the CFG position then the device enters the debug mode. You can see what the debug mode does in the source code and also you can see the supported RS232 commands in UART_Handler() function in main.c. In the debug/configuration mode you can set use an RS232 terminal or console to configure the device. I’ve also created a file (terminal_macros.tmf) which contains a few macros that you can use with Br@y’s terminal. Connect an RS232 cable to the device and enter HELP and <Enter> key to print the available commands. Anyway, use the macros and the Br@y’s terminal, trust me, it’s the easiest way.

In the normal mode (jumber in GND) the device is running in one of the two supported modes that you can set in the configuration mode. The mode 0 (DMX_TCP_MODE_ASCII) is a plain ASCII TCP socket that you can connect and handle the DMX by sending simple ASCII commands. The supported commands are in the connection_handle_ascii() function in connectio.c file.

Mode 1 (DMX_TCP_MODE_BINARY) is the default ArtNet protocol mode. When this mode is used then you need a software that can control DMX universes using ArtNet, such the ArtNetominator. I’ve also implemented the ArtNet configuration protocol, so you can configure the IP and the device name by using the ArtNetominator.


This is a nice stupid project if you want to hack an RS232/485 to Ethernet device to implement a 1-Universe DMX Artnet gateway. There’s absolutely no meaning to do that, though! I mean, you can buy a Chinese 4x DMX universe device that supports E1.31(sACN) for ~$80 which is dirt cheap. There’s absolutely no reason to really make this stupid project. In case you want, though grab the code and play around with it.

Have fun!

FPGA digital clock


It’s been quite a long time since the last stupid project. This why summer holidays should be prohibited. Holidays cunningly change your mindset so fast and so deep that is really hard to switch back to the slave robot mindset and start doing some work. Anyway, during this period I spent some quality time learning Verilog and play around with FPGAs and as always, the best way to learn something new is to do something stupid with it. So, I’ve bought a dirt cheap dev board from e-bay and start writing some code.

If you’re new in FPGA development then you’ll probably face the same difficulty to learn how verilog works. If you already have a hardware background it will be easier. There are so many great guides and tutorials out there in the internet that you don’t really need a book or something. With my short experience if you understand what ‘wire’, ‘reg’, ‘<=’ and ‘=’ do and mean and you understand that you can’t electrically drive the same input with two outputs at the same time, then you’re good to go. That’s the base to proceed building rockets and spaceships.


FPGA development board (EP4CE6E22C8N)

There are quite a few FPGA development boards and the two main FPGA manufacturers are Xilinx and Altera. You can buy cheap dev boards for both manufacturers in ebay for $40. In this project I’ve used a development board with an Altera Cyclone IV FPGA device (EP4CE6E22C8N), like this one:

This board has plenty of peripherals like vga output, PS/2, SPI flash, SDRAM, 4 digit 7-segment display, 4 tact buttons, A/D, IrDA, LEDs, buzzer and USB-to-UART. So you can build several stupid projects with this beauty.

USB Blaster (programmer)

Finally, you’re going to need a programmer. Again ebay has plenty of cheap USB programmers. Search for ‘Altera USB Blaster’ and you’ll find many around with $3-4, like this one:

Quartus Prime Lite Edition

To write your awesome software you’ll need the Quartus Prime Lite Edition IDE. This IDE will look a bit strange, if you’re coming from the embedded software world. FPGA engineers have a quite different taste on how an IDE should be look like, but after spending some time with it, then everything makes sense and it’s fun to work with.

Making the stupid project

Well, no much things to say here. I mean, you don’t have to make any connections, no soldering, nothing. Just plug the USB power cable on the board, connect the USB Blaster to on the 10-pin header and to you computer USB port, download and install the Quartus and then git clone the following the project and open it in the Quartus.

I’ve used Quartus in both Linux and Windows and I’ve programmed the board just fine. The result is a digital a clock. Actually, is a powerful FPGA on a board with couple awesome peripherals and all it does is flashing a few LEDS every second and display the time on the 7-segment. That’s the definition of a stupid project.

This is a short video of the clock:


FPGAs are fun. Especially, when it’s not your everyday job and it’s just a hobby. From the few things that I’ve seen is easy to make simple projects, is nice to know how they work and how you program them and it can be an agonising pain if it’s your job. There are so many things that can go wrong in there, that sometimes is miracle that a project works. Time constrains, parallelisation and several other things in the silicon that are sensitive in so many parameters. It’s hell. But now I have a $40 digital clock and I can shout to the world that I’m also an expert on FPGAs. FPGAs are addictive when you start with them, but when you scratch the surface and go a bit deeper, then you’re glad it’s not your everyday job.

Have fun!

Sine generator using stm32f407 internal DAC and PCM5102A


Welcome to another completely stupid project!

This time I’ll show you how to implement a sine generator using a dirt cheap stm32f407 board. It really doesn’t make any sense to just build a sine generator, especially that way and using an STM32 processor, but it’s fun and at some point later I’ll post a project with the making of a more useful DDS (direct digital synthesizer). As usual let’s see the components and their prices.


STM32F407ZET6 / STM32407VET6 development board

You can find cheap development boards for both stm32f407vet6 and stm32f407zet6 micro-processors on e-bay for $11 and $15. The vet6 is a bit cheaper compared to zet6. Their only difference is the package and therefore the number of available GPIOs. In more detail the vet6 has 100 pins (LQFP100 package) and 82 GPIOs and the zet6 has 144 pins (LQFP144 package) and 114 GPIOS and both have 512KB flash and 192KB ram. If you think that the additional 32 GPIOs worth paying $4 more then go for it, or if you’re like me and hate dilemmas buy both and be happy. Therefore, search e-bay for stm32f407vet6 or stm32f407zet6 and buy the cheapest ones. This is how stm32f407vet6 looks like


For the record there is a difference in the layout between vet6 and zet6 boards. The vet6 has the SD card on the top right side and the usb connector on the top left side and the zet6 has these components mirrored. Also, this post is written in 2017 so if you read this in 2030 then expect to find those boards in a museum.

USB to UART module

Most of the time when you develop on micro-controller you need a debug uart port. This project is no different, therefore you’ll need this.

This^. Ebay. Cheap. $1.2.


I’ve used J-Link to flash the board during development as the board has a jtag connector, but you may use an st-link or even a cheap usb programmer with an SWD interface. Use whatever you have that can flash this thing.

Passive components

You’re going to need a few passive components to implement a low-pass filter (LPF) for the internal stm32 DAC. Nothing fancy, just a resistor R=1300Ω and a capacitor C=5.9nF. More details about the LPF later. Also the cost for these is unknown, they are so cheap that probably cost around $0.something.


Finally if you want to play around with a ‘real’ DAC then buy one of these cheap pcm5102a boards on e-bay for $7.

This board has everything you need to drive the pcm5102 DAC using the stm32’s I2S interface and also you get a bonus stereo RCA connector that you can connect to your speakers.

Good to have

Well, you’re going to build a sine generator, so it’s good to have an oscilloscope. Also you may need a breadboard or a prototype breadboard to implement the LPF.

Making the stupid project

Using the internal DACs

Ok, now that you have everything you need let’s see how you build a sine generator. First, I’ll show how you can use the stm32’s internal DACs, so grab the board and clone the following git repo.

Depending on your OS (Windows or Linux) follow the instructions in the README. What you actually need is a bare metal arm compiler, cmake and then run the build script. After that you’ll find the binary file in build-stm32 folder. Flash this bin on the stm32 board and reset the board. If you feel adventurous spend some time to read the crap source code. Everything is done using double buffering DMAs and FIFO buffers for faster speed and because of that the sampling rate in dds_defs.h is set to 384KHz. Be aware that if you don’t use this specific board then your board might have a different xtal crystal, which means that you need to edit the HSE_VALUE in stm32f4xx.h

In the internal_dac.c file you’ll find the code for the two internal DACs and in dds.c the code for the sine generator. The sine generator code is quite interesting as it uses the phase accumulator concept, which is very clever and nice to know. The idea is that you have an accumulator that you increment with a phase that is analogous with the frequency of the desired sine output. Yeah, I know, bοring, but if you are interested in the concept have a look at those links (link, link, link, link).

Connect the USB to UART module to the USART1 the following way:
USB Rx -> STM32 PA10

Also, connect your oscilloscope to PA4 to probe the DAC1 channel and PA5 to probe the DAC2 channel.

Now, use a terminal (I’m using Br@y’s Terminal) and open the COM port using 115200, 8 data bit, no parity, 1 stop bit and no handshake. Also make sure that the received CR bytes are handled as CR+LF (CR=LF option in Brays terminal). You can change the sine frequency with the following command:

<channel> is 1: for DAC1, 2: for DAC2
<frequency> is the desired frequency in Hz

For example:
sets the DAC1 frequency to 5434.25 Hz
sets the DAC2 frequency to 18934.32 Hz

That way you can have two individual DAC channels with two different frequencies. Regarding the code implementation, there are several ways to implement this DAC functionality on the STM32. You can individually control each DAC using its data holding register or use the dual data holding register, use dual timers with dual DMAs or single timer dual DMAs or single timer with single DMA (in case of using the dual data holding register). I’ve choose to use single timer, dual DMA, single data holding register because it makes the output sine waveform simpler, but I would like to implement also the single timer, single DMA, dual data holding register because it may be even faster as fewer IRQs are triggered.

Now you can git clone the single timer, single DMA, dual data holding register repo from here if you want to do any performance tests.

Next is the output of a single DAC channel in various frequencies, without using an LPF filter in the output.

The displayed frequencies are 0.1, 100, 1000, 10000, 20000, 48000 and 192000 Hz. As expected the quantization noise is high as no anti-aliasing filter is used in the output. The following images are after using a simple analog first order LPF anti-aliasing filter.


The difference is that now there’s much lower quantization noise in higher frequencies (e.g. 48KHz) but also the output amplitude is getting smaller after aprox. 20KHz as the filter cut-off frequency is 20750.32Hz (R=1300Ω, C=5.9nF). There’s no point to display the 192KHz output as the amplitude is almost zero.

Using an external DAC (PCM5102)

It’s nice having fun with the internal DACs on the STM32 but… let’s use real DAC that can handle 16, 24 and 32 bit stereo channel audio. These days you can find good DACs really cheap. Texas (TI) has some nice chips that are sold as PCB boards on e-bay. For this test I’m using a PCM5102 board. I’ve also got some cirrus logic cs4344 DACs I would like to play with, but unfortunately I haven’t managed to find a board for them, which means I have to build mine at some point. The only cs4344 boards you can find on ebay are some USB DACs, but they don’t fit the purpose and also they are using the old out-of-life PCM2702.

Anyway, what’s the main difference between the internal DAC and this PCM5102? Well, the first one is internal! That means that there everything is happen in the STM32, no external components, no protocol interfaces, nothing. This means increased speed but less performance as the internal DAC is a basic 12-bit DAC. On the other hand now, by using an external DAC you get better performance as the chip is dedicated and engineered for the purpose, but this also mean that you need an interface to drive the DAC. In this case the I2S interface is used and it’s set to 16/48 (16-bit & 48KHz). Well, 16/48 it’s more than enough for my old rusted ears and it’s fine for testing. Maybe at some point I can try if STM32 can go up to 32/192 which is the highest that the STM32’s I2S can get (the PCM5102 supports up to 32/384).

The PCM5102 board has its own regulators; therefore, even though the VCC is rated at 3V3 (same as STM32F407) you’ll need an external power supply 5~9V, because if you apply 3V3 it won’t even start.

Download the source code from the following repo and read the file how to build the binaries:

Connect the board the following way:
PCM5102 BCK -> STM32 PC10
PCM5102 DATA -> STM32 PC12
PCM5102 LRCK -> STM32 PA15
PCM5102 GND -> STM32 GND
PCM5102 VCC -> PSU 5V
PCM5102 FMT jumber set to ‘LJ’ instead of ‘I2S’ (‘LJ’ stands for Left Justified)

The I2S on the STM32 has DMA, double buffering and FIFO enabled for better performance. In case you use a different board then you should refer to the stm32 reference manual to set the correct N and R values on the RCC_PLLI2SConfig() function in i2s_dma.c. Actually, I would advice you to do that always as the I2S rate is very sensitive to the XTAL frequency and therefore even small variations on the XTAL speed wil have an affect on the output I2S rate. For more info check the table 127 in 28.4.4 in the reference manual which are the ‘correct’ values, but I suggest that you use an oscilloscope to trim the N value as the on-board crystal sucks. Using the oscilloscope try to find the correct value that gives more precise frequencies for the LRCK (I2S clock). E.g. in my case to get 16/48 the table suggests to use PLLI2SN=192 (that’s N) and PLLI2SR=5 (that’s R), but by using these values the LRCK clock was quite off, so I had to set the PLLI2SN=200 to get the correct bitrate. The correct clock frequency/bitrate for a stereo 16/48 I2S audio is 2*16*Fs, where Fs=48KHz in this case, so bitrate=1536000Hz.

If you want to have a look at the code then you’ll find all the I2S code in the i2s.h and i2s_dma.c files and the phase accumulator algorithm for the sine in dds.h/dds.c. To change the output frequency for the L and R channel then use a terminal as before and use the following commands.

<channel> is L: for Left, R: for Right audio channel
<frequency> is the desired frequency in Hz

For example:
sets the Left channel frequency to 5434.25 Hz
sets the Right channel frequency to 18934.32 Hz

So, like before you have two individual channels to play with. Next are some oscilloscope captions on various frequencies (0.1, 100, 1000, 20000 and 22000Hz).

You’ll see that the output anti-aliasing filtering is much better now, compared the example with the DACs. In this case is not possible to get frequencies higher than 22KHz.


That was another meaningless stupid project. I mean really, you can’t do anything useful with these stuff, but you can experiment and use pieces to implement other more useful projects like a DDS that can output different waveforms or even a synthesizer. It’s also quite rare to find code for the STM32 even in the internet that uses DMAs with double buffering and FIFOs for both DAC and I2S, so keep the code for reference.

Have fun!

Benchmarks with gcc, musl and clang and how can they affect the embedded development cost


Ok, I know that’s a long and pompous title, but it’s difficult to summarize the whole meaning of this post.

As you’ll find out on this blog I won’t only write about stupid projects, but I’ll write also posts about other things that I find interesting mainly on the embedded world.

When dealing with embedded Linux there are several things that need to be considered, because “embedded” usually refers to a vast ARM ecosystem that extends from the tiny cortex-M0 processor (ARMv6-M) to a cortex-A73 (ARMv8-A). Of course, you won’t see Linux on a cortex-M0 cpu but you may see it on a Cortex-M4.

There are two reasons for this post. The first one is to write some thoughts about when is preferred to use bare-metal, RTOS or Linux on embedded products. The second reason was this video, which leads to the question that after we’ve decided that we need to use Linux, then what options and tools do we have; and most importantly what happens with the code and binaries size? Well, in embedded you should care about size, because that can limit your options and also can affect your product cost, development time and budget.

Right now embedded is a hot topic and there a many things going on with the compilers, their optimizations and the system libraries. So what’s the deal with the compilers?


On the embedded systems you can find many different compilers with fancy names, like ‘arm-none-eabi-‘, ‘arm-linux-gnueabi’, ‘gcc-arm-embedded’ and the list goes on. Assuming a specific architecture (e.g. ARM in this case), these compilers are all different but they all can be used to compile an application that doesn’t run on an OS. One of the main differences is that these compilers assume a different C library; therefore, ‘arm-none-eabi’ assumes no C library or newlib and the ‘arm-linux-gnueabi’ assumes the full blown glibc (or eglibc). So that means that usually if you cross-compile a bare metal source code for ARM (e.g. for an STM32 micro-controller) you should use ‘arm-none-eabi-‘ and when you cross-compile a Linux kernel or a Linux user space application you’ll use the ‘arm-linux-gnueabi’.


This is another question that usually comes on the table when dealing with a new project. Usually, is pretty much clear if you need an OS or not, but the line between if you need Linux or another embedded RTOS (eRTOS) some times is blur. If you go with an eRTOS then you go with arm-none-eabi for the whole project, but if you go with Linux then you’ll need arm-none-linux-gnueabi.

The difference between eRTOS and Linux is huge. It’s much more complex to develop on an eRTOS compare to Linux, because in the eRTOS most of the times you’ll need to implement your own subsystems and underlying tasks that usually the Linux already provides. Also the Linux separated the kernel from the user space applications, but on a eRTOS the separation is not always achievable and you need to write in several different places in the eRTOS. On the other hand Linux takes care of all the low level mumbo jumbo and you only have to write your app using the kernel API. So what are the criteria that you decide which to choose? Well, that depends for many variables, like your existing code base, the development time, the platform tools, cost e.t.c. but now more than ever you need to consider also the size.


There’s no doubt that the size when using an embedded RTOS (eRTOS) will be much smaller but also comes with higher complexity; and some times is also difficult to chose the right eRTOS that will have the least problems, bugs and issues. In some cases where the cost of a small external RAM and eMMC/SD is not that high and the development comes with less complexity and faster development time, then you need to consider Linux. And this is what this post is all about.

Is it possible to shrink a Linux OS to a low cost embedded platform?

If the cost to use plenty of RAM and a big fast storage is not an issue, then Linux it’s a no brainer, but when there’s a cost restriction and at the same time there’s a budget to use a small ram and a cheap storage, then you must do a research if the current tools we have these days can deliver small enough user space code size.

How to shrink size?

The time this post is written there are a few tools that targeting to achieve small binary sizes. First, gcc already has some optimization flags that can be used to optimize speed and/or size. Also gcc provides a Link Time Optimization (LTO) tool (-flto flag in gcc), which also can reduce the compiled size. What LTO does, is that it gathers more details about the code during the compile time and then provides the linker with these details so the linker can use them to optimize further.

Apart from gcc there’s also the clang compiler which is designed to fully replace gcc. Therefore, when you build a source with clang then you don’t use gcc at all. Although clang is highly compatible with gcc, it doesn’t offer full compatibility, which means that you may not be able to compile all the existing code base.

In addition with the compilers, we also have the system libraries which also play significant role in the resulted binary size. GLib is the low level system library used by gcc and it’s huge! You would never consider the system library size when developing desktop applications, but for small embedded systems GLib is a behemoth. There are various light-weight replacement GLib libraries for the arm-none-linux-eabi, like the musl lib. You can see a comparison of few libraries here. Note, that there are also replacement glib libraries for the arm-none-eabi, like the nano-lib (–specs=nano.specs compiler option)., which does a great difference in code size, too.

So, to sum up, we have different optimization flags, compilers and system libraries. Now, we can proceed with the benchmarks by using all these different options and see what happens.


I’ve prepared a benchmark repository in bitbucket that you can you use to do your own benchmarks here:

You can git clone the repo and then run the script followed with the c file and any extra flags you want, like this:

./ oggenc.c -lm

By default the binaries are created in the output/ folder and when the test is finished they are deleted; therefore, if you want to keep the binaries to do some tests then run the benchmark like this:

KEEP=true ./ oggenc.c -lm

I’ve included these 4 source code files: test.c, bzip2.c, gcc.c, oggenc.c. The test.c is a simple code file I’ve made and the rest files I’ve found them here. Every file has different size and source code and by comparing these we can obtain an overview of how well each benchmark performs in real applications.

These are the details for the compilers and libraries on my Linux Mint 18 ‘Sarah’ 64-bit.

GCC version : gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
GLib version : (Ubuntu GLIBC 2.23-0ubuntu7) 2.23
clang version : clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
musl version : 1.1.16

And these are the results for each source file.

./ test.c
Testing file: test.c

gcc            :  8976
gcc -Os        :  8984
gcc -O3        :  8984
gcc -flto      :  8968
gcc -flto -Os  :  8920
gcc -flto -O3  :  8920
musl           :  7792
musl -Os       :  7792
musl -O3       :  7792
musl -flto     :  7784
musl -flto -Os :  7728
musl -flto -O3 :  7728
clang          :  7664
clang -Os      :  7800
clang -O3      :  7832
clang+musl     :  4792
clang+musl -Os :  4896
clang+musl -O3 :  4928

./ oggenc.c -lm
Testing file: oggenc.c -lm

gcc            :  2147072
gcc -Os        :  2028656
gcc -O3        :  2179880
gcc -flto      :  2140944
gcc -flto -Os  :  1974736
gcc -flto -O3  :  2067600
musl           :  2141096
musl -Os       :  2022552
musl -O3       :  2173776
musl -flto     :  2134952
musl -flto -Os :  1972976
musl -flto -O3 :  2069824
clang          :  2112544
clang -Os      :  2020600
clang -O3      :  2116544
clang+musl     :  2107952
clang+musl -Os :  2016264
clang+musl -O3 :  2110528

./ bzip2.c
Testing file: bzip2.c
gcc            :  138008
gcc -Os        :  79320
gcc -O3        :  115912
gcc -flto      :  130448
gcc -flto -Os  :  71456
gcc -flto -O3  :  107760
musl           :  136376
musl -Os       :  73512
musl -O3       :  114304
musl -flto     :  128840
musl -flto -Os :  69728
musl -flto -O3 :  106152
clang          :  129112
clang -Os      :  94568
clang -O3      :  113504
clang+musl     :  125840
clang+musl -Os :  90824
clang+musl -O3 :  109968

./ gcc.c
Testing file: gcc.c
gcc            :  6879424
gcc -Os        :  4263096
gcc -O3        :  6493624
gcc -flto      :  6772760
gcc -flto -Os  :  3886664
gcc -flto -O3  :  5889240
musl           :  failed
musl -Os       :  failed
musl -O3       :  failed
musl -flto     :  failed
musl -flto -Os :  failed
musl -flto -O3 :  failed
clang          :  7446328
clang -Os      :  4785560
clang -O3      :  6537192
clang+musl     :  failed
clang+musl -Os :  failed
clang+musl -O3 :  failed

Analyzing the results

Now that we have the results we need to analyze them and the best way to do that is to visualize the data in a way that’s easy to compare the results. For that reason I chose to plot the size for each file against the compiler and the lib it was used and the last plot is the sum of all the output file sizes, except the gcc.c which was failed to build with the musl lib. Click on each image to zoom in.

Lets take this step by step. If you see the source code of the test.c you’ll find out that it’s a very simple program that doesn’t do much. There the combination of clang+musl shines and the result code is half the size compared to gcc+glib and also it is much smaller compared to gcc+musl. This is a very good sign for clang. Also clang by itself managed to produce almost the same binary size with gcc+musl. When I’ve seen these results I was surprised and I thought that this groundbreaking news and actually it is, but… after running the benchmarks with the rest of the files my feelings were mixed.

oggenc.c benchmark shows that gcc+glib+LTO+Os and gcc+musl+LTO+O3 have the same performance and much smaller size compared to others benchmarks.

bzip2.c shows that gcc+glib+LTO+Os, gcc+musl+Os and gcc+musl+LTO+Os perform the same and better that the rest.

gcc.c failed on musl and the gcc+glib+flto+Os performed better than the others.

Finally, the most important result is by adding the sizes of all the resulted builds as this is closer to real-life application as your rootfs size will be the sum of all the programs. Of course, only 3 programs doesn’t really reflect to let’s say 100 or 1000 or 2000 programs that your rootfs may really have, but still it’s a good measure to get a rough view of what it might be. By seeing the sum of the sizes is obvious that gcc+glib+LTO+Os and gcc+musl+LTO+Os perform much better that the others and clang is far behind.

You can make your own conclusions with the result and even better try by your self.


From the things I’ve seen my opinion is that the LTO combined with the -Os does the most significant difference compared the other options. That means that the gcc compiler optimizations outperform clang and also it seems that musl doesn’t give much more compared to glib. Of course the last statement isn’t always true. Musl does make a huge difference on some programs (test.c) and it doesn’t make on others (oggenc and bzip), but also it failed to build gcc. I think musl looks promising though. On the other hand clang results were very mixed. On a simple application test.c it produces really small binary and outperforms all the other options but when the things get tough in more complex code, is worse that gcc.

So, let’s go back to the first question. Does it worth? Can these results help us to get a decision on the thin line between choosing an RTOS and a Linux OS for an embedded product? Well, don’t expect this answer from me. You are the one to decide what is right for you, but I’ll tell you what I would do.

Personally, I break the embedded projects needs in three categories like sequential execution, couple of parallel threads and real multi-thread.

  • If your project can be implemented with sequential execution procedures you don’t need RTOS or Linux, you go with bare metal.
  • If you project needs 1 or 2 threads, then that doesn’t mean that you need an RTOS. You can develop using finite states (FSM); but if for some reason you think that your FSM will be hard to maintain (e.g. a complex GUI with user inputs and peripherals) then go for an RTOS.
  • If you have a couple of threads and the cost must be kept low then go for RTOS.
  • If you have multiple threads and the budget allows you to add some RAM and a small storage go for Linux.
  • If your code base is on Linux, then of course, you go for Linux.

In any case is good to use optimizations. Nano-lib for bare-metal and compiler optimizations & alternative system libraries for the OS.

Now, if you decide to go with Linux and a limited budget then you’ll need all the performance gains the compiler and the alternative system libraries can offer. In that case you need to verify that all the core programs you need in your rootfs can be compiled with all the above optimizations and the result size is small enough to meet your specs on RAM and storage.

It’s encouraging to know that there’s an active development and effort to the correct path. Size matters and it can do the difference and introduce Linux in the low budget projects. Of course, Linux is not a panacea for every embedded project and it shouldn’t be, but it has it’s use and it can solve problems.

Also, many cheap linux-enabled SBCs are showing up these days. For example have a look at Omega2, which is a $5 SBC with Linux, 64MB memory, 16MB storage, USB, WiFi, I2C, SPI, i2S and 15 GPIOs. On the other side a cheap stm32f103c8t6 board (Cortex-M3) costs $2.5 and an stm32f407vet6 board costs $7. NXP ARM mcus are much more expensive as also Renesas, Cypress and others. Of course, the STM boards have more peripherals and still they have their use, but the future for the most mainstream products seems to be the ultra low cost Linux IOT ready boards and for that reason we need smaller size binaries and better performing compilers and tools.

For those that only develop only on small ARM mcus, don’t worry! There are many reasons that low embedded will be still on the market in the next years, no matter what happens. Especially on industries that have to do with health, human life, security, defense e.t.c. where everything needs to be absolutely deterministic. An RTOS or any other OS is non-deterministic and regardless the budget and the costs they can’t be used on these products as the regulations and approvals are very strict about this.