Joystick gestures w/ STM32

Intro

Time for another stupid project, which adds no value to the humanity! This time I got my hands on one of these dirt-cheap analog joysticks on ebay that cost less than €1.5. I guess that you can make a lot of projects with them by using them as normal joysticks, but for some reason I wanted to do something more pointless than that. And that was to make a joystick that outputs gestures via USB.

By gestures I mean, that instead of sending the ADCs in realtime, instead support only the basic directions like up, down, left, right and button press and then send the gesture combinations through USB. If you’re using mouse gestures to your web browser, then you know what I mean.

OK, let’s see the components and result.

Components

STM32

I’m using a stm32f103c8t6 board. These modules cost less than €2 in ebay and you may have already seen me using them also in other stupid projects.

Joystick

You can find those joysticks in ebay if you search for a joystick breakout for arduino. Although they’re cheap the quality is really nice and also the stick feeling is nice, too. This is how it looks like

As you can see from the image there is +5V pin, which of course you need to connect to your micro-controller’s (the stm32 in this case) Vcc; which 3V3 and not only to +5V. The VRx pin it the x-axis variable resistor, the VRy is the y-axis variable resistor and the SW is the button. The switch output is activated if you press the joystick down. The orientation of the x,y axis is valid when you place the joystick to your palm while you can read the pin descriptions.

ST-Link

Finally, you need an ST-Link programmer to upload the firmware, like this one:

Or whatever programmer you like to use.

USB-uart module

You don’t really need this for the project, but if you like to debug or add some debugging message your own, then you’ll need this. You can find these on ebay with less than€1.50 and it looks like that

Making the stupid project

I’ve build the project on a breadboard. You can use a prototype breadboard if you want to make this a permanent board. I’ve added support for both USB and UART in the project. The USB, of course, is the easiest and preferred way to connect the device to your computer, but the UART port can be used for debugging. So, let’s start with a simple schematic of how everything is connected. This a screenshot from KiCad.

Therefore, the PA0 and PA1 are connected to VRx and VRy and they are set as ADC inputs. In the source code I’m using both ADC1 and ADC2 channels at the same time. The ADC1 channel is also using DMA, which is not really necessary as the conversion rate doesn’t need to be that fast, but I’m re-using code that I’ve already written for other projects. The setup of the ADCs is in the hw_config.c file in the source code. The ADCs are continuously convert the VRx and VRy inputs in the background as they are based on interrupts, but only every JOYS_UPDATE_TMR_MS milliseconds the function joys_update() updates the algorithm with the last valid values. The default update rate is 10ms, but you can trim it down to 1ms if you like. You can also have a look in the joys_update() function in joystick.c and trim the JOYS_DEBOUNCE_CNTR and JOYS_RECOGNITION_TIME_MS to your needs. The first one controls the debounce sensitivity and the second one the timeout of the gesture. That means the time in ms that the recognition timer will expire after the joystick is released in the center position and then send the recorded gesture.

The source code can be found here:
https://bitbucket.org/dimtass/stm32f103-usb-joystick

To build the code you need cmake and to flash it you need ST-Link. Have a look in the README.md file in the repo for details. Also you need to point to your arm toolchain.

Because I’m using the -flto -O3 flags, you need to make sure that you use a GCC version newer that 4.9

I’ve tested the code with this version:

arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]

The code size is ~14.6KB.

Finally, this is an example video of how the joystick gesture performs.

Have fun!

GCC compiler size benchmarks

Intro

Compilers, compiliers, compilers…

The black magic behind producing executable binaries for different  kind of processors. All programmers use them, but most of them don’t care about the internals and their differences. Anyway, this post is not about the compiler’s internals though, but how the different versions perform regarding the size that is produced.

I’ve made another benchmark few months ago here, but that was using different compilers (GCC and clang) and different C libraries. Now I’m using only GCC, but different versions.

Size doesn’t matter!

Well, don’t get me wrong here, but sometimes it does. Typical scenario is when you have a small microcontroller with a small flash size and your firmware is getting bigger and bigger. Another scenario is that you need to sacrifice some flash space for the DFU bootloader and then you realize that 4-12K are gone without writting any line of code for you actual app.

Therefore, size does matter.

Compiler Flags

Compililers come with different optimisation flags and the -Os flag commands the compiler to optimize specifically for size.

OK, so the binary size matters only when you the -Os!

No, no, no. The binary size matters whatever optimisation flag you use. For example your main need may be to optimise for performance. An example is if you’re using a fast toggle gpio, e.g. implementing a custom bit-banging bus to program and interface an FPGA (like the Xilinx’s selectmap). In this case you may need the -O1/2/3 optimisation more than -Os, but still the size matters because you’re limited in flash space. So, two different compiler versions may have even 1KB difference for the same optimization level and that 1KB may be critical someday to one of your projects!

And don’t forget about the -flto! This is an important flag if you need size optimisation; therefore, all the benchmarks are done with and without this flag also.

Benchmarking

I’ve benchmarked the following 9 different GCC compiler versions:

  • gcc-arm-none-eabi-4_8-2013q4
  • gcc-arm-none-eabi-4_9-2014q4
  • gcc-arm-none-eabi-5_3-2016q1
  • gcc-arm-none-eabi-5_4-2016q2
  • gcc-arm-none-eabi-5_4-2016q3
  • gcc-arm-none-eabi-6_2-2016q4
  • gcc-arm-none-eabi-6-2017-q1-update
  • gcc-arm-none-eabi-6-2017-q2-update
  • gcc-arm-none-eabi-7-2017-q4-major

It turned out that all the GCC6 compilers performed exactly the same; therefore, without reading the release notes I assume that the changes have to do with fixes rather optimisations.

The code I’ve used for the benchmards is here:
https://bitbucket.org/dimtass/stm32f103-usb-periph-expander

This is my next stupid project and it’s not completed yet, but still it compiles and without optimisations creates a ~50KB binary. To use your toolchain, just change the toolchain path in the `TOOLCHAIN_DIR` variable in the `cmake/TOOLCHAIN_arm_none_eabi_cortex_m3.cmake` file and run ./build.bash on Linux or build.cmd on Windows.

Results

These are the results from compiling the code with different compilers and optimisation flags.

gcc-arm-none-eabi-4_8-2013q4

flag size in bytes size in bytes (-flto)
-O0 51908
-O1 32656
-O2 31612
-O3 39360
-Os 27704

gcc-arm-none-eabi-4_9-2014q4

flag size in bytes size in bytes (-flto)
-O0 52216 56940
-O1 32692 23984
-O2 31496 22988
-O3 39672 31268
-Os 27563 19748

gcc-arm-none-eabi-5_3-2016q1

flag size in bytes size in bytes (-flto)
-O0 51696 55684
-O1 32656 24032
-O2 31124 23272
-O3 39732 30956
-Os 27260 19684

gcc-arm-none-eabi-5_4-2016q2

flag size in bytes size in bytes (-flto)
-O0 51736 55724
-O1 32672 24060
-O2 31144 23292
-O3 39744 30932
-Os 27292 19692

gcc-arm-none-eabi-5_4-2016q3

flag size in bytes size in bytes (-flto)
-O0 51920 55888
-O1 32684 24060
-O2 31144 23300
-O3 39740 30948
-Os 27292 19692

gcc-arm-none-eabi-6_2-2016q4, gcc-arm-none-eabi-6-2017-q1-update, gcc-arm-none-eabi-6-2017-q2-update

flag size in bytes size in bytes (-flto)
-O0 51632 55596
-O1 32712 24284
-O2 31056 22868
-O3 40140 30488
-Os 27128 19468

gcc-arm-none-eabi-7-2017-q4-major

flag size in bytes size in bytes (-flto)
-O0 51500 55420
-O1 32488 24016
-O2 30672 22080
-O3 40648 29544
-Os 26744 18920

Conclusion

From the results it’s pretty obvious that the -flto flag makes a huge difference in all versions except GCC4.8 where the code failed to compile at all with this flag enabled.

Also it seems that when no optimisations are applied with -O0 then the -flto instead of doing size optimisation, actually created a larger binary. I have no explain for that, but anyways it doesn’y really matter, because there’s no point in using -flto at all in such cases.

OK, so now let’s get to the point. Is there any difference between GCC versions? Yes, there is, but you need to see that in different angles. So, for the -Os flag it seems that the GCC7-2017-q4-major produces a binary which is ~380 bytes smaller without -flto and ~550 bytes with -flto from the second better GCC version (GCC6). That means that GCC7 will save you from changing part to another one with a bigger flash, only if your firmware exceeds the size by those sizes with GCC6. But, what are the changes, right? We’re not talking about 8051 here…

But wait… let’s see what happens with the -O3 though. In this case using the -flto flag GCC7 creates a binary which is 1KB smaller compared to the GCC6 version. That’s big enough and that may save you from changing to a larger part! Therefore, the size matters also for other optimisation levels like the -O3. This also means that if your code size getting larger and you need the max performance optimisation, then the compiler version may be significant.

So, why not use always the latest GCC version?

That’s a good question. Well, if you’re writing your software from the scratch now, then probably you should. But if you have an old project which is compiling with an old GCC version, then this doesn’t mean that it will also compile with -Wall in the newer version. That’s because between those two versions there might be some new warnings and errors that doesn’t allow the build. Hence, you need to edit your code and correct all the warnings and errors. If the code is not that big, then the effort may not be that much; but if the code is large then it means that you may need to spend much time on it. It’s even worse if you’re porting code that is not yours.

Therefore, the compiler version does matter for the binary size for all the available optimisation flags and depending your code size and processor you might need to choose between those versions depending your needs.

Have fun!

Why I moved away from github

I bet in every language in the word there’s a phrase similar to that.

Opinions are like butt holes. Everyone has one, but nobody wants to hear about yours.

Anyway, that my blog, so that’s my butt-hole-opinion.

I don’t really hate Microsoft. I mean, I may speak dirty about them sometimes; but that’s my temperament, I don’t have anything against them. I’m using microsoft products since 1985. My first OS was MS-Dos 2.0, running on a Schneider PC with 640KB Ram and a 3.5″ 720K FDD. It was a beast! I was coding GW-Basic on my yellow-black CRT and I was enjoying my animations on the Hercules video card. Since then, I’ve seen and used almost every MS product. I enjoyed Windows 95, 98, NT, XP and Office 2003 a lot. Nowadays, I’m not using their products that much, except VS Code. Also Windows 10 seems to be OK’ish, but there are also many things that I don’t like in them. To me, it seems that when you need to do more advance things, then it’s like an NT-relic with a 2018 makeup. So, generally I have a meh opinion for them, but that’s all.

So, why I decided to leave GitHub? Well, firstly it was how it felt when I’ve read the headline. When I’ve read the news I was like “WTF? Why?” It left a bitter taste in the tip of my tongue. Usually, that’s enough for me to make logical decisions. But this time I said to my self, OK,  let’s analyze this a bit.

Therefore, I’ve asked my self.

Let’s say that this moment you decide to create your new repository and you have 3-4 options and one of them is Microsoft. Would you choose that?

Well, the answer for me is, No. No, I wouldn’t choose a Microsoft service. I don’t want to. I don’t like to add another Microsoft product/service to my life. Thanks.

Then, why also remain to GitHub?

Also, there are other things that are also important. Like the kind of information that will gain access. I don’t want Microsoft to trace what I do in my free time and which projects I wish to develop or contribute. I’m most certain that they will do data-mining in every data that they have available. It’s even worse for Microsoft employees, because now their employer will have access to that information.

Anyway, there are so many different butt holes around this story, that there’s not enough room for all. But for me personally, it was the first reaction when I’ve heard the news that mostly affect my decision and that was:

Oh, no! No more Microsoft in my life.

My repos are now located here:
https://bitbucket.org/dimtass

container_of()

Intro

A small post for one of the more beautiful and useful macros in C. I consider the container_of() macro to be the equivalent of the Euler’s identity for the C language. Like the Euler’s identity is considered to be an example of mathematical beauty, the container_of is considered to be an example of C programming beauty. It has everything in there and it’s more simple than it looks. So, let’s see how it works.

Explanation

Actually there are many good explanations of how container_of works, some of them are good, some just waste internet space. I’ll give my explanation hoping that it won’t waste more internet space.

The container_of macro is defined in several places in the Linux kernel. This is the macro.

#define container_of(ptr, type, member) ({ \
    const typeof(((type *)0)->member) * __mptr = (ptr); \
    (type *)((char *)__mptr - offsetof(type, member)); }h

That’s a mess right. So, lets start breaking up things. First of all, there are other two things in there that need explanation. These are the typeof and the offsetof.

The typeof is a compiler extension. It’s not a function and it’s not a macro. All it does is that during compile type evaluates or replaces the type of the variable that the typeof() has. For example, consider this code:

int tmp_int = 20;

Then typeof(tmp_int) is int, therefore everywhere you use the typeof(tmp_int) the compiler will place that with the int keyword. Therefore, if you write this:

typeof(tmp_int) tmp_int2 = tmp_int;

then, the compiler will replace the typeof(tmp_int) with int, so the above will be the same as writing this:

int tmp_int2 = tmp_int;

The offsetof is another beautiful macro that you will find in the Linux kernel and it’s also defined in several places. This is the macro

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
The purpose of this function is to retrieve the offset of the address of a member variable of a structure. Let’s make that more simple. Let’s consider the following image.
This is a struct that has a green and a blue member. We can write this as
struct x {
     green a;
     blue  b;
};
Suppose that the tape measure is the RAM and each cm of the tape is a byte; then if I ask what’s the offset of blue member in the RAM then the answer is obvious and it’s 120. But what’s the offset of the blue member in the structure? In that case you need to calculate it by subtract 118 from 120, because 120 is the offset of the blue member in the tape (RAM) and 118 is the offset of the structure in the tape (RAM). So this needs to do a subtraction to calculate the relative offset, which is 2.
Now, lets do this.

What happened now? You see it’s the same structure but now I’ve slide the tape measure so the offset of the struct starts from zero. Now if I ask, what’s the offset of the blue member, then the answer is obvious and it’s 2.

Now that the struct offset is “normalized”, we don’t even care about the size of the green member or the size of the structure because it’s easy the absolute offset is the same with relative offset. This is exactly what &((TYPE *)0)->MEMBER does. This code dereferences the struct to the zero offset of the memory.

This generally is not a clever thing to do, but in this case this code is not executed or evaluated. It’s just a trick like the one I’ve shown above with the tape measure. The offsetof() macro will just return the offset of the member compared to zero. It’s just a number and you don’t access this memory. Therefore, doing this trick the only thing you need to know is the type of the structure.

Also, note that the 0 dereference doesn’t declare a variable.

Ok, so now let’s go back to the container_of() macro and have a look in this line

const typeof(((type *)0)->member) * __mptr = (ptr);

Here the ((type *)0)->member doesn’t declare or point to variable and it’s not an instance. It’s a compiler trick to point to the member type offset, as I’ve explained before. The compiler, by knowing the offset of the member in the structure and the structure type, knows also the type of the member in that index. Therefore, using the example of the tape measure, the typeof() the member on the offset 2 when the struct is dereferenced to 0, is blue.

So the code for the example with the tape measure becomes:

const typeof(((type *)0)->member) * __mptr = (ptr);
const blue * __mptr = (ptr);

and

(type *)((char *)__mptr - offsetof(type, member));

becomes

(x *)((char *)__mptr - 2);

The above means that the address of the the blue member minus the relative offset of the blue is dereferenced to the struct x. If you subtract the relative offset of the blue member from the address of the blue member, you get the absolute address of the struct x.

So, let’s see the container_of() macro again.

#define container_of(ptr, type, member) ({ \
    const typeof(((type *)0)->member) * __mptr = (ptr); \
    (type *)((char *)__mptr - offsetof(type, member)); }

Think about the tape measure example and try to evaluate this:

container_of(120, x, blue)

This means that we want to get a pointer in the absolute address of struct x when we know that in the position 120 we have a blue member. The container_of() macro will return the offset of the blue member (which is located in 120) minus the relative offset of blue in the x struct. That will evaluate to 120-2=118, so we’ll get the offset of the x struct by knowing the offset of the blue member.

Issues

Well, there are a few issues with the container_of() macro. These issues have to do with the some versions of gcc compilers. For example, let’s say that you have this structure:

struct person {
    int age;
    char* name;
};
If you try to do this:
struct person somebody;
somebody.name = (char*) malloc(25);
if (!somebody.name) {
    printf("Malloc failed!\n");
    return;
}
strcpy(somebody.name, "John Doe");
somebody.age = 38;
char* person_name = &somebody.name;
struct person * v =container_of(person_name , struct person, name);h

Then if you have a GCC compiler with version 5.4.0-6 then you’ll get this error:

error: cannot convert ‘char*’ to ‘char* const*’ in initialization
     const typeof(((type *)0)->member) * __mptr = (ptr);

Instead if you do this:

int * p_age = &somebody.age;
struct person * v =container_of(p_age, struct person, age);

Then the compiler will build the source code. Also, if you use a later compiler then both examples will be built. Therefore, have that in mind that the type checking thing is mainly a compiler trick and needs the compiler to handle this right.

Conclusion

container_of() and offsetof() macros are a beautiful piece of code. They are compact, simple and have everything in there. In 3 lines of beauty. Of course, you can use container_of() without know how it works, but where’s the fun then?

C++ vs C for embedded

Intro

Hi all! Well… I won’t write about a stupid project this time. I’ll write just my opinion about this endless debate between embedded engineers (usually old and new ones). The reason to debate usually is:

Which language is better for embedded?

And then the flame begins to end up to a battle with endless references and assembly code outputs.

Conclusion

I’ll answer this for you straight away. Use whatever language you like and you’re really good at when you do your own personal projects; and when working as a professional use the language of your company’s codebase. See? Simple! Why debate at all?

Details

Now, lets cut a few things in to pieces and get into more details. First things first, we need definitions. Without definitions there is no reason to even bother chat about. Therefore, the first definition is:

What is embedded?

Well, that’s a tough one. Embedded could be described much easier 10-20+ years ago. You had a 8 or 16-bit microcontroller (if you were lucky) and you had to write either assembly or C to make it do something meaningful. Then microprocessors became more powerful, dirt chip and they could easily run Linux. Now, you can by a Orange Pi Zero or RPi Zero for less than $7.

So now, embedded has a much more wider meaning, because now you can develop applications for an embedded board that can use a much larger code base written in C++, have threads and all the goodies that the Linux kernel can provide. Therefore, it’s important to distinguish the level of an embedded project to the discussion.

It’s much easier to give an answer to the main question when talking about high level embedded. At high level embedded design that runs Linux, to develop a user space app you can use C, C++ and a dozen more languages, you name them. In this case use whatever you like, but use your tools smart. Does it really matter if you use a bash script or python or write you own code in C/C++ or whatever language to access SPI or I2C? Well, not really… If for example you need to just update the configuration for the PLLs in an I2C clock chip, then it really doesn’t matter. I would use Python there. Open a jedec file, parse it and then configure the chip in 20 lines of code. To do the same in C/C++ for example it would take dozens of code lines. Is execution speed an important factor here? Nah.. So, don’t be dogmatic about your tools and use your tools smart. Therefore, especially for high level embedded stuff, you need to check you specs, weight your solutions and then pick up the easiest solution that does the job and doesn’t affect other processes. Also using STL and libs like Boost, may be much faster to implement your app instead of using C. So, why bother using C there?

The lower embedded answer is easy, too; but it also might be a bit complicated in some occasions. In the lower embedded domain I include every microprocessor that doesn’t have enough horsepower to run a full blown Linux kernel with enough storage for a rootfs. So, let’s say it’s every MCU from Cortex-M7 and lower. You might find a few projects with M4 and M7 running some micro-Linux distros, which are slow, though; so still I classify these cores as low embedded MCUs. Enough with definitions.

Therefore, in the lower embedded domain there are many things that need considering, like the supported tools, the existing code base, your fluency in a language and the vendor support.

Does your MCU tools really support C++? If you’re using GCC then the answer is yes, but that’s not always the case. There are still vendors that they don’t have C++ compilers for their MCUs or these C++ compilers are not ones to trust. In this case you need to go with C. Also, the vendor may supply libraries only for C then in this case go with C.

Some times, the vendor may support both C and C++ and have libraries for both languages but one of them might be bad written, obscure or buggy. For example, the HAL framework for the STM32 belongs to that category, so in this case if you’re limited with C++ choose another MCU or find alternative 3rd party C++ libs (if they exist) or if you’re not limited with C++ then go with StdPeriph and C.

What is your existing code base? If you’re working for a company (or even if you have a personal code base, which is usual after a few years) then go with your code base. It’s almost impossible, dangerous and time/money consuming to convert all you code base from C to C++ or the opposite. Why do that? Don’t! Use the language that your code base uses. That way you’ll use a well tested code, probably free of bugs and you’ll only need to use fragments and snippets that you already have. I’ve tried to convert projects from C to C++. Of course, it’s fun in the beginning and then when you proceed to more complex code, you may realize that you’re too bored to do this and you’ve already spend much time on it. This sucks, but tbh you need to try this at least once, for a small project. Therefore, go with your code base!

In which language you’re really good at? I mean, really really good! Feel like a pro. If you feel this for a specific language then go with it, at least for a professional project. Yes, of course learning new languages is cool and you should do that anyways, but when it comes to a project with deadlines don’t choose the language you’re fancy with, but your best weapon. Do I really need to elaborate on this? It should be crystal clear. You don’t want to come near a deadline and realise that you don’t know your tools well enough to overcome issues. Oh, no. That would be a nightmare. I think the experience on a language is not how to write the syntax and standard functional procedures, but all the little minor bits under the hood that you’ll only need once in a single project and if you don’t have the experience, it will hit you back. Hard. So, are you fluent in C++ and not in C, then go with C++.

Do you need to learn C++ for embedded?

Right now, C is enough to do everything on almost any embedded platform. So, do you really need to learn C++? Well, it wouldn’t hurt! Actually, you should do it, at some point. It’s not necessary, but it’s a plus for your skills. Also, you need to do it for the right reasons. For example, I’ve heard from other engineers that C++11 is nice because it has auto and lambas. So, what? That’s not a reason to learn C++11 just because of that. Besides there’s no difference for a good compiler like GCC if you use a normal function or a lambda. Also, some people don’t like the readability or obscure layer that an auto variable adds to the code while they reading a program; which is fair. Learn C++ to complement your skills not to go with the hype or substitute C on any embedded development. If you’re not using STL at all, then maybe just learn C++ for future use or for fun, otherwise you probably don’t need it.

As I see the embedded development world right now, it’s good to know both. You can develop high embedded Linux apps that use STL using C++ and use C for the kernel, u-boot and whatever other use you like. Also, for Linux user space given that you have enough horsepower use whatever language you like; but always try to be moderate. I find using Python or bash scripts to be an effective way to do difficult tasks in less time, especially in cases that the process speed is not trivial. But I don’t like to see resources like CPU cycles and RAM to be spent carelessly without optimisations and well thought decisions.

I’ll close this post with the following quote:

Use your tools smart, use whatever does the job right and always be moderate.

Hacking an RS232/485 to ETH board

Intro

It’s time for another stupid project. This time we’ll do some hacking and by that I mean that we’ll take an already existing device and alternate it to do something else. A couple of years ago, I’ve found in e-bay some nice and cheap RS232/485 to ethernet modules from usriot and I’ve bought them to play with them at some point as they had an LPC1114 controller, a DM9000 ethernet MAC and an UART interface configured as RS232 or RS485. You can do a lot of great stuff with this combo. Anyway, I decided to implement an ArtNet DMX512-A controller to controll DMX loads via Ethernet. To do that I needed a TCP/IP stack and also implement the basics of the ArtNet protocol. For the TCP/IP stack I’ve chosen uIP as it the most light-weight stack and it’s also easy to expand, as it doesn’t support all the things are needed (like udp broadcast and multicast). Also, ArtNet is an open and well defined protocol that can be found here (the link points always in the latest version which now is 4 but at the time I’ve written the code was in v.3).

Components

USR-TCP232-24

All you actually need is a USR-TCP232-24 board. Although I’ve bought it a couple of years ago you can still find many if those especially in alibaba and quite cheap. It’s a nice board no matter what and it can be used for several other stupid projects (for example a KNXnet/ip device would be a good candidate). The company that manufactures these modules it seems that it changes their hardware quite often. At some point they’ve changed their USR-TCP232-24 (LPC1114+DM9000) with the USR-TCP232-410S and USR-TCP232-410-PCBA (TM4C129EKCPD w/ integrated MAC) and now they don’t officially sell only PCB board but they sell the device with the casing. I’ve also bought two of the USR-TCP232-410-PCBA to play with at some point as they have a more powerful Cortex-M4 processor. Anyway, this is the board.

PX24506 DMX512 Decoder Driver

You don’t really need this for the project if you already have a DMX512A compatible device. If you don’t then you can buy a cheap RGB LED strip from ebay and a DMX512A decoder driver and have fun. A nice and quite cheap driver is the PX24506. It has 3x 3A channels for RGB, DMX-in and DMX-out to connect drivers in series and you can set the DMX address with a binary switch. I really like this driver. Uber Chinese tech stuff here. This is the driver.

You can find this beauty sold in tons in ebay.

Also, there are other two alternatives which I haven’t test them yet. They’re much cheaper but you can’t program the DMX address easily. You may find them as DM-103 and DM-104 in ebay or if you search for generic DMX512 Decoder Boards. This is how they look like.

Their difference is that the DM-104 is rated at 144W. The processor on these is an STM8 so you may also write a custom firmware in there (just saying).

Making the stupid project.

To make this project you need the free version of the MDK-ARM Keil compiler, the source code from my bitbucket repo here and the flash magic tool to flash compiled HEX file. The size of this project is not that large, therefore the evaluation MDK-ARM license is just fine. So, you need to load the project, build it and then flash the board. I’ve used an older uVision version (v4) to build this project, so in case of the v.5 you need to install the legacy packages that support the LPC1114 controller and then import the project file.

So, first step is to install the evaluation version of MDK-ARM and Flash Magic. Then clone the source code repo from bitbucket, open the dmx_tcp.uvproj and compile the project. The output dmx_tcp.hex will be created in the ./Flash folder, therefore use Flash Magic to flash it on the device. To do that, remove the power and place the jumber in the UPD position. The LPC series have a serial bootloader, so the next time you apply power the bootloader will load and then you can use the Flash Magic to flash the board. After that place the jumber again in the GND position.

When the jumber is on the GND position then the device functions in DMX512A mode using the RS485 to send the data. When the jumber is places in the CFG position then the device enters the debug mode. You can see what the debug mode does in the source code and also you can see the supported RS232 commands in UART_Handler() function in main.c. In the debug/configuration mode you can set use an RS232 terminal or console to configure the device. I’ve also created a file (terminal_macros.tmf) which contains a few macros that you can use with Br@y’s terminal. Connect an RS232 cable to the device and enter HELP and <Enter> key to print the available commands. Anyway, use the macros and the Br@y’s terminal, trust me, it’s the easiest way.

In the normal mode (jumber in GND) the device is running in one of the two supported modes that you can set in the configuration mode. The mode 0 (DMX_TCP_MODE_ASCII) is a plain ASCII TCP socket that you can connect and handle the DMX by sending simple ASCII commands. The supported commands are in the connection_handle_ascii() function in connectio.c file.

Mode 1 (DMX_TCP_MODE_BINARY) is the default ArtNet protocol mode. When this mode is used then you need a software that can control DMX universes using ArtNet, such the ArtNetominator. I’ve also implemented the ArtNet configuration protocol, so you can configure the IP and the device name by using the ArtNetominator.

Conclusion

This is a nice stupid project if you want to hack an RS232/485 to Ethernet device to implement a 1-Universe DMX Artnet gateway. There’s absolutely no meaning to do that, though! I mean, you can buy a Chinese 4x DMX universe device that supports E1.31(sACN) for ~$80 which is dirt cheap. There’s absolutely no reason to really make this stupid project. In case you want, though grab the code and play around with it.

Have fun!

FPGA digital clock

Intro

It’s been quite a long time since the last stupid project. This why summer holidays should be prohibited. Holidays cunningly change your mindset so fast and so deep that is really hard to switch back to the slave robot mindset and start doing some work. Anyway, during this period I spent some quality time learning Verilog and play around with FPGAs and as always, the best way to learn something new is to do something stupid with it. So, I’ve bought a dirt cheap dev board from e-bay and start writing some code.

If you’re new in FPGA development then you’ll probably face the same difficulty to learn how verilog works. If you already have a hardware background it will be easier. There are so many great guides and tutorials out there in the internet that you don’t really need a book or something. With my short experience if you understand what ‘wire’, ‘reg’, ‘<=’ and ‘=’ do and mean and you understand that you can’t electrically drive the same input with two outputs at the same time, then you’re good to go. That’s the base to proceed building rockets and spaceships.

Components

FPGA development board (EP4CE6E22C8N)

There are quite a few FPGA development boards and the two main FPGA manufacturers are Xilinx and Altera. You can buy cheap dev boards for both manufacturers in ebay for $40. In this project I’ve used a development board with an Altera Cyclone IV FPGA device (EP4CE6E22C8N), like this one:

This board has plenty of peripherals like vga output, PS/2, SPI flash, SDRAM, 4 digit 7-segment display, 4 tact buttons, A/D, IrDA, LEDs, buzzer and USB-to-UART. So you can build several stupid projects with this beauty.

USB Blaster (programmer)

Finally, you’re going to need a programmer. Again ebay has plenty of cheap USB programmers. Search for ‘Altera USB Blaster’ and you’ll find many around with $3-4, like this one:

Quartus Prime Lite Edition

To write your awesome software you’ll need the Quartus Prime Lite Edition IDE. This IDE will look a bit strange, if you’re coming from the embedded software world. FPGA engineers have a quite different taste on how an IDE should be look like, but after spending some time with it, then everything makes sense and it’s fun to work with.

Making the stupid project

Well, no much things to say here. I mean, you don’t have to make any connections, no soldering, nothing. Just plug the USB power cable on the board, connect the USB Blaster to on the 10-pin header and to you computer USB port, download and install the Quartus and then git clone the following the project and open it in the Quartus.

https://bitbucket.org/dimtass/fpga_clock

I’ve used Quartus in both Linux and Windows and I’ve programmed the board just fine. The result is a digital a clock. Actually, is a powerful FPGA on a board with couple awesome peripherals and all it does is flashing a few LEDS every second and display the time on the 7-segment. That’s the definition of a stupid project.

This is a short video of the clock:

Conclusion

FPGAs are fun. Especially, when it’s not your everyday job and it’s just a hobby. From the few things that I’ve seen is easy to make simple projects, is nice to know how they work and how you program them and it can be an agonising pain if it’s your job. There are so many things that can go wrong in there, that sometimes is miracle that a project works. Time constrains, parallelisation and several other things in the silicon that are sensitive in so many parameters. It’s hell. But now I have a $40 digital clock and I can shout to the world that I’m also an expert on FPGAs. FPGAs are addictive when you start with them, but when you scratch the surface and go a bit deeper, then you’re glad it’s not your everyday job.

Have fun!

Sine generator using stm32f407 internal DAC and PCM5102A

Intro

Welcome to another completely stupid project!

This time I’ll show you how to implement a sine generator using a dirt cheap stm32f407 board. It really doesn’t make any sense to just build a sine generator, especially that way and using an STM32 processor, but it’s fun and at some point later I’ll post a project with the making of a more useful DDS (direct digital synthesizer). As usual let’s see the components and their prices.

Components

STM32F407ZET6 / STM32407VET6 development board

You can find cheap development boards for both stm32f407vet6 and stm32f407zet6 micro-processors on e-bay for $11 and $15. The vet6 is a bit cheaper compared to zet6. Their only difference is the package and therefore the number of available GPIOs. In more detail the vet6 has 100 pins (LQFP100 package) and 82 GPIOs and the zet6 has 144 pins (LQFP144 package) and 114 GPIOS and both have 512KB flash and 192KB ram. If you think that the additional 32 GPIOs worth paying $4 more then go for it, or if you’re like me and hate dilemmas buy both and be happy. Therefore, search e-bay for stm32f407vet6 or stm32f407zet6 and buy the cheapest ones. This is how stm32f407vet6 looks like

 

For the record there is a difference in the layout between vet6 and zet6 boards. The vet6 has the SD card on the top right side and the usb connector on the top left side and the zet6 has these components mirrored. Also, this post is written in 2017 so if you read this in 2030 then expect to find those boards in a museum.

USB to UART module

Most of the time when you develop on micro-controller you need a debug uart port. This project is no different, therefore you’ll need this.

This^. Ebay. Cheap. $1.2.

J-Link

I’ve used J-Link to flash the board during development as the board has a jtag connector, but you may use an st-link or even a cheap usb programmer with an SWD interface. Use whatever you have that can flash this thing.

Passive components

You’re going to need a few passive components to implement a low-pass filter (LPF) for the internal stm32 DAC. Nothing fancy, just a resistor R=1300Ω and a capacitor C=5.9nF. More details about the LPF later. Also the cost for these is unknown, they are so cheap that probably cost around $0.something.

PCM5102A

Finally if you want to play around with a ‘real’ DAC then buy one of these cheap pcm5102a boards on e-bay for $7.

This board has everything you need to drive the pcm5102 DAC using the stm32’s I2S interface and also you get a bonus stereo RCA connector that you can connect to your speakers.

Good to have

Well, you’re going to build a sine generator, so it’s good to have an oscilloscope. Also you may need a breadboard or a prototype breadboard to implement the LPF.

Making the stupid project

Using the internal DACs

Ok, now that you have everything you need let’s see how you build a sine generator. First, I’ll show how you can use the stm32’s internal DACs, so grab the board and clone the following git repo.

https://bitbucket.org/dimtass/stm32f407_dds_dac

Depending on your OS (Windows or Linux) follow the instructions in the README. What you actually need is a bare metal arm compiler, cmake and then run the build script. After that you’ll find the binary file in build-stm32 folder. Flash this bin on the stm32 board and reset the board. If you feel adventurous spend some time to read the crap source code. Everything is done using double buffering DMAs and FIFO buffers for faster speed and because of that the sampling rate in dds_defs.h is set to 384KHz. Be aware that if you don’t use this specific board then your board might have a different xtal crystal, which means that you need to edit the HSE_VALUE in stm32f4xx.h

In the internal_dac.c file you’ll find the code for the two internal DACs and in dds.c the code for the sine generator. The sine generator code is quite interesting as it uses the phase accumulator concept, which is very clever and nice to know. The idea is that you have an accumulator that you increment with a phase that is analogous with the frequency of the desired sine output. Yeah, I know, bοring, but if you are interested in the concept have a look at those links (link, link, link, link).

Connect the USB to UART module to the USART1 the following way:
USB TX -> STM32 PA9
USB Rx -> STM32 PA10
USB GND -> STM32 GND

Also, connect your oscilloscope to PA4 to probe the DAC1 channel and PA5 to probe the DAC2 channel.

Now, use a terminal (I’m using Br@y’s Terminal) and open the COM port using 115200, 8 data bit, no parity, 1 stop bit and no handshake. Also make sure that the received CR bytes are handled as CR+LF (CR=LF option in Brays terminal). You can change the sine frequency with the following command:

FREQ=<channel>,<frequency>
where,
<channel> is 1: for DAC1, 2: for DAC2
<frequency> is the desired frequency in Hz

For example:
FREQ=1,5434.25
sets the DAC1 frequency to 5434.25 Hz
FREQ=2,18934.32
sets the DAC2 frequency to 18934.32 Hz

That way you can have two individual DAC channels with two different frequencies. Regarding the code implementation, there are several ways to implement this DAC functionality on the STM32. You can individually control each DAC using its data holding register or use the dual data holding register, use dual timers with dual DMAs or single timer dual DMAs or single timer with single DMA (in case of using the dual data holding register). I’ve choose to use single timer, dual DMA, single data holding register because it makes the output sine waveform simpler, but I would like to implement also the single timer, single DMA, dual data holding register because it may be even faster as fewer IRQs are triggered.

Edit:
Now you can git clone the single timer, single DMA, dual data holding register repo from here if you want to do any performance tests.

Next is the output of a single DAC channel in various frequencies, without using an LPF filter in the output.

The displayed frequencies are 0.1, 100, 1000, 10000, 20000, 48000 and 192000 Hz. As expected the quantization noise is high as no anti-aliasing filter is used in the output. The following images are after using a simple analog first order LPF anti-aliasing filter.

 

The difference is that now there’s much lower quantization noise in higher frequencies (e.g. 48KHz) but also the output amplitude is getting smaller after aprox. 20KHz as the filter cut-off frequency is 20750.32Hz (R=1300Ω, C=5.9nF). There’s no point to display the 192KHz output as the amplitude is almost zero.

Using an external DAC (PCM5102)

It’s nice having fun with the internal DACs on the STM32 but… let’s use real DAC that can handle 16, 24 and 32 bit stereo channel audio. These days you can find good DACs really cheap. Texas (TI) has some nice chips that are sold as PCB boards on e-bay. For this test I’m using a PCM5102 board. I’ve also got some cirrus logic cs4344 DACs I would like to play with, but unfortunately I haven’t managed to find a board for them, which means I have to build mine at some point. The only cs4344 boards you can find on ebay are some USB DACs, but they don’t fit the purpose and also they are using the old out-of-life PCM2702.

Anyway, what’s the main difference between the internal DAC and this PCM5102? Well, the first one is internal! That means that there everything is happen in the STM32, no external components, no protocol interfaces, nothing. This means increased speed but less performance as the internal DAC is a basic 12-bit DAC. On the other hand now, by using an external DAC you get better performance as the chip is dedicated and engineered for the purpose, but this also mean that you need an interface to drive the DAC. In this case the I2S interface is used and it’s set to 16/48 (16-bit & 48KHz). Well, 16/48 it’s more than enough for my old rusted ears and it’s fine for testing. Maybe at some point I can try if STM32 can go up to 32/192 which is the highest that the STM32’s I2S can get (the PCM5102 supports up to 32/384).

Notes:
The PCM5102 board has its own regulators; therefore, even though the VCC is rated at 3V3 (same as STM32F407) you’ll need an external power supply 5~9V, because if you apply 3V3 it won’t even start.

Download the source code from the following repo and read the README.md file how to build the binaries:
https://bitbucket.org/dimtass/stm32f407_dds_i2s_dma

Connect the board the following way:
PCM5102 BCK -> STM32 PC10
PCM5102 DATA -> STM32 PC12
PCM5102 LRCK -> STM32 PA15
PCM5102 GND -> STM32 GND
PCM5102 GND -> PSU GND
PCM5102 VCC -> PSU 5V
PCM5102 FMT jumber set to ‘LJ’ instead of ‘I2S’ (‘LJ’ stands for Left Justified)

The I2S on the STM32 has DMA, double buffering and FIFO enabled for better performance. In case you use a different board then you should refer to the stm32 reference manual to set the correct N and R values on the RCC_PLLI2SConfig() function in i2s_dma.c. Actually, I would advice you to do that always as the I2S rate is very sensitive to the XTAL frequency and therefore even small variations on the XTAL speed wil have an affect on the output I2S rate. For more info check the table 127 in 28.4.4 in the reference manual which are the ‘correct’ values, but I suggest that you use an oscilloscope to trim the N value as the on-board crystal sucks. Using the oscilloscope try to find the correct value that gives more precise frequencies for the LRCK (I2S clock). E.g. in my case to get 16/48 the table suggests to use PLLI2SN=192 (that’s N) and PLLI2SR=5 (that’s R), but by using these values the LRCK clock was quite off, so I had to set the PLLI2SN=200 to get the correct bitrate. The correct clock frequency/bitrate for a stereo 16/48 I2S audio is 2*16*Fs, where Fs=48KHz in this case, so bitrate=1536000Hz.

If you want to have a look at the code then you’ll find all the I2S code in the i2s.h and i2s_dma.c files and the phase accumulator algorithm for the sine in dds.h/dds.c. To change the output frequency for the L and R channel then use a terminal as before and use the following commands.

FREQ=<channel>,<frequency>
where,
<channel> is L: for Left, R: for Right audio channel
<frequency> is the desired frequency in Hz

For example:
FREQ=L,5434.25
sets the Left channel frequency to 5434.25 Hz
FREQ=R,18934.32
sets the Right channel frequency to 18934.32 Hz

So, like before you have two individual channels to play with. Next are some oscilloscope captions on various frequencies (0.1, 100, 1000, 20000 and 22000Hz).

You’ll see that the output anti-aliasing filtering is much better now, compared the example with the DACs. In this case is not possible to get frequencies higher than 22KHz.

Conclusion

That was another meaningless stupid project. I mean really, you can’t do anything useful with these stuff, but you can experiment and use pieces to implement other more useful projects like a DDS that can output different waveforms or even a synthesizer. It’s also quite rare to find code for the STM32 even in the internet that uses DMAs with double buffering and FIFOs for both DAC and I2S, so keep the code for reference.

Have fun!

Benchmarks with gcc, musl and clang and how can they affect the embedded development cost

Intro

Ok, I know that’s a long and pompous title, but it’s difficult to summarize the whole meaning of this post.

As you’ll find out on this blog I won’t only write about stupid projects, but I’ll write also posts about other things that I find interesting mainly on the embedded world.

When dealing with embedded Linux there are several things that need to be considered, because “embedded” usually refers to a vast ARM ecosystem that extends from the tiny cortex-M0 processor (ARMv6-M) to a cortex-A73 (ARMv8-A). Of course, you won’t see Linux on a cortex-M0 cpu but you may see it on a Cortex-M4.

There are two reasons for this post. The first one is to write some thoughts about when is preferred to use bare-metal, RTOS or Linux on embedded products. The second reason was this video, which leads to the question that after we’ve decided that we need to use Linux, then what options and tools do we have; and most importantly what happens with the code and binaries size? Well, in embedded you should care about size, because that can limit your options and also can affect your product cost, development time and budget.

Right now embedded is a hot topic and there a many things going on with the compilers, their optimizations and the system libraries. So what’s the deal with the compilers?

Compilers

On the embedded systems you can find many different compilers with fancy names, like ‘arm-none-eabi-‘, ‘arm-linux-gnueabi’, ‘gcc-arm-embedded’ and the list goes on. Assuming a specific architecture (e.g. ARM in this case), these compilers are all different but they all can be used to compile an application that doesn’t run on an OS. One of the main differences is that these compilers assume a different C library; therefore, ‘arm-none-eabi’ assumes no C library or newlib and the ‘arm-linux-gnueabi’ assumes the full blown glibc (or eglibc). So that means that usually if you cross-compile a bare metal source code for ARM (e.g. for an STM32 micro-controller) you should use ‘arm-none-eabi-‘ and when you cross-compile a Linux kernel or a Linux user space application you’ll use the ‘arm-linux-gnueabi’.

OS or RTOS?

This is another question that usually comes on the table when dealing with a new project. Usually, is pretty much clear if you need an OS or not, but the line between if you need Linux or another embedded RTOS (eRTOS) some times is blur. If you go with an eRTOS then you go with arm-none-eabi for the whole project, but if you go with Linux then you’ll need arm-none-linux-gnueabi.

The difference between eRTOS and Linux is huge. It’s much more complex to develop on an eRTOS compare to Linux, because in the eRTOS most of the times you’ll need to implement your own subsystems and underlying tasks that usually the Linux already provides. Also the Linux separated the kernel from the user space applications, but on a eRTOS the separation is not always achievable and you need to write in several different places in the eRTOS. On the other hand Linux takes care of all the low level mumbo jumbo and you only have to write your app using the kernel API. So what are the criteria that you decide which to choose? Well, that depends for many variables, like your existing code base, the development time, the platform tools, cost e.t.c. but now more than ever you need to consider also the size.

Size!

There’s no doubt that the size when using an embedded RTOS (eRTOS) will be much smaller but also comes with higher complexity; and some times is also difficult to chose the right eRTOS that will have the least problems, bugs and issues. In some cases where the cost of a small external RAM and eMMC/SD is not that high and the development comes with less complexity and faster development time, then you need to consider Linux. And this is what this post is all about.

Is it possible to shrink a Linux OS to a low cost embedded platform?

If the cost to use plenty of RAM and a big fast storage is not an issue, then Linux it’s a no brainer, but when there’s a cost restriction and at the same time there’s a budget to use a small ram and a cheap storage, then you must do a research if the current tools we have these days can deliver small enough user space code size.

How to shrink size?

The time this post is written there are a few tools that targeting to achieve small binary sizes. First, gcc already has some optimization flags that can be used to optimize speed and/or size. Also gcc provides a Link Time Optimization (LTO) tool (-flto flag in gcc), which also can reduce the compiled size. What LTO does, is that it gathers more details about the code during the compile time and then provides the linker with these details so the linker can use them to optimize further.

Apart from gcc there’s also the clang compiler which is designed to fully replace gcc. Therefore, when you build a source with clang then you don’t use gcc at all. Although clang is highly compatible with gcc, it doesn’t offer full compatibility, which means that you may not be able to compile all the existing code base.

In addition with the compilers, we also have the system libraries which also play significant role in the resulted binary size. GLib is the low level system library used by gcc and it’s huge! You would never consider the system library size when developing desktop applications, but for small embedded systems GLib is a behemoth. There are various light-weight replacement GLib libraries for the arm-none-linux-eabi, like the musl lib. You can see a comparison of few libraries here. Note, that there are also replacement glib libraries for the arm-none-eabi, like the nano-lib (–specs=nano.specs compiler option)., which does a great difference in code size, too.

So, to sum up, we have different optimization flags, compilers and system libraries. Now, we can proceed with the benchmarks by using all these different options and see what happens.

Benchmarks

I’ve prepared a benchmark repository in bitbucket that you can you use to do your own benchmarks here:

https://bitbucket.org/dimtass/gcc_musl_clang_benchmark

You can git clone the repo and then run the benchmark.sh script followed with the c file and any extra flags you want, like this:

./benchmark.sh oggenc.c -lm

By default the binaries are created in the output/ folder and when the test is finished they are deleted; therefore, if you want to keep the binaries to do some tests then run the benchmark like this:

KEEP=true ./benchmark.sh oggenc.c -lm

I’ve included these 4 source code files: test.c, bzip2.c, gcc.c, oggenc.c. The test.c is a simple code file I’ve made and the rest files I’ve found them here. Every file has different size and source code and by comparing these we can obtain an overview of how well each benchmark performs in real applications.

These are the details for the compilers and libraries on my Linux Mint 18 ‘Sarah’ 64-bit.

GCC version : gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
GLib version : (Ubuntu GLIBC 2.23-0ubuntu7) 2.23
clang version : clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
musl version : 1.1.16

And these are the results for each source file.

./benchmark.sh test.c
Testing file: test.c

gcc            :  8976
gcc -Os        :  8984
gcc -O3        :  8984
gcc -flto      :  8968
gcc -flto -Os  :  8920
gcc -flto -O3  :  8920
musl           :  7792
musl -Os       :  7792
musl -O3       :  7792
musl -flto     :  7784
musl -flto -Os :  7728
musl -flto -O3 :  7728
clang          :  7664
clang -Os      :  7800
clang -O3      :  7832
clang+musl     :  4792
clang+musl -Os :  4896
clang+musl -O3 :  4928

./benchmark.sh oggenc.c -lm
Testing file: oggenc.c -lm

gcc            :  2147072
gcc -Os        :  2028656
gcc -O3        :  2179880
gcc -flto      :  2140944
gcc -flto -Os  :  1974736
gcc -flto -O3  :  2067600
musl           :  2141096
musl -Os       :  2022552
musl -O3       :  2173776
musl -flto     :  2134952
musl -flto -Os :  1972976
musl -flto -O3 :  2069824
clang          :  2112544
clang -Os      :  2020600
clang -O3      :  2116544
clang+musl     :  2107952
clang+musl -Os :  2016264
clang+musl -O3 :  2110528

./benchmark.sh bzip2.c
Testing file: bzip2.c
gcc            :  138008
gcc -Os        :  79320
gcc -O3        :  115912
gcc -flto      :  130448
gcc -flto -Os  :  71456
gcc -flto -O3  :  107760
musl           :  136376
musl -Os       :  73512
musl -O3       :  114304
musl -flto     :  128840
musl -flto -Os :  69728
musl -flto -O3 :  106152
clang          :  129112
clang -Os      :  94568
clang -O3      :  113504
clang+musl     :  125840
clang+musl -Os :  90824
clang+musl -O3 :  109968

./benchmark.sh gcc.c
Testing file: gcc.c
gcc            :  6879424
gcc -Os        :  4263096
gcc -O3        :  6493624
gcc -flto      :  6772760
gcc -flto -Os  :  3886664
gcc -flto -O3  :  5889240
musl           :  failed
musl -Os       :  failed
musl -O3       :  failed
musl -flto     :  failed
musl -flto -Os :  failed
musl -flto -O3 :  failed
clang          :  7446328
clang -Os      :  4785560
clang -O3      :  6537192
clang+musl     :  failed
clang+musl -Os :  failed
clang+musl -O3 :  failed

Analyzing the results

Now that we have the results we need to analyze them and the best way to do that is to visualize the data in a way that’s easy to compare the results. For that reason I chose to plot the size for each file against the compiler and the lib it was used and the last plot is the sum of all the output file sizes, except the gcc.c which was failed to build with the musl lib. Click on each image to zoom in.

Lets take this step by step. If you see the source code of the test.c you’ll find out that it’s a very simple program that doesn’t do much. There the combination of clang+musl shines and the result code is half the size compared to gcc+glib and also it is much smaller compared to gcc+musl. This is a very good sign for clang. Also clang by itself managed to produce almost the same binary size with gcc+musl. When I’ve seen these results I was surprised and I thought that this groundbreaking news and actually it is, but… after running the benchmarks with the rest of the files my feelings were mixed.

oggenc.c benchmark shows that gcc+glib+LTO+Os and gcc+musl+LTO+O3 have the same performance and much smaller size compared to others benchmarks.

bzip2.c shows that gcc+glib+LTO+Os, gcc+musl+Os and gcc+musl+LTO+Os perform the same and better that the rest.

gcc.c failed on musl and the gcc+glib+flto+Os performed better than the others.

Finally, the most important result is by adding the sizes of all the resulted builds as this is closer to real-life application as your rootfs size will be the sum of all the programs. Of course, only 3 programs doesn’t really reflect to let’s say 100 or 1000 or 2000 programs that your rootfs may really have, but still it’s a good measure to get a rough view of what it might be. By seeing the sum of the sizes is obvious that gcc+glib+LTO+Os and gcc+musl+LTO+Os perform much better that the others and clang is far behind.

You can make your own conclusions with the result and even better try by your self.

Conclusion

From the things I’ve seen my opinion is that the LTO combined with the -Os does the most significant difference compared the other options. That means that the gcc compiler optimizations outperform clang and also it seems that musl doesn’t give much more compared to glib. Of course the last statement isn’t always true. Musl does make a huge difference on some programs (test.c) and it doesn’t make on others (oggenc and bzip), but also it failed to build gcc. I think musl looks promising though. On the other hand clang results were very mixed. On a simple application test.c it produces really small binary and outperforms all the other options but when the things get tough in more complex code, is worse that gcc.

So, let’s go back to the first question. Does it worth? Can these results help us to get a decision on the thin line between choosing an RTOS and a Linux OS for an embedded product? Well, don’t expect this answer from me. You are the one to decide what is right for you, but I’ll tell you what I would do.

Personally, I break the embedded projects needs in three categories like sequential execution, couple of parallel threads and real multi-thread.

  • If your project can be implemented with sequential execution procedures you don’t need RTOS or Linux, you go with bare metal.
  • If you project needs 1 or 2 threads, then that doesn’t mean that you need an RTOS. You can develop using finite states (FSM); but if for some reason you think that your FSM will be hard to maintain (e.g. a complex GUI with user inputs and peripherals) then go for an RTOS.
  • If you have a couple of threads and the cost must be kept low then go for RTOS.
  • If you have multiple threads and the budget allows you to add some RAM and a small storage go for Linux.
  • If your code base is on Linux, then of course, you go for Linux.

In any case is good to use optimizations. Nano-lib for bare-metal and compiler optimizations & alternative system libraries for the OS.

Now, if you decide to go with Linux and a limited budget then you’ll need all the performance gains the compiler and the alternative system libraries can offer. In that case you need to verify that all the core programs you need in your rootfs can be compiled with all the above optimizations and the result size is small enough to meet your specs on RAM and storage.

It’s encouraging to know that there’s an active development and effort to the correct path. Size matters and it can do the difference and introduce Linux in the low budget projects. Of course, Linux is not a panacea for every embedded project and it shouldn’t be, but it has it’s use and it can solve problems.

Also, many cheap linux-enabled SBCs are showing up these days. For example have a look at Omega2, which is a $5 SBC with Linux, 64MB memory, 16MB storage, USB, WiFi, I2C, SPI, i2S and 15 GPIOs. On the other side a cheap stm32f103c8t6 board (Cortex-M3) costs $2.5 and an stm32f407vet6 board costs $7. NXP ARM mcus are much more expensive as also Renesas, Cypress and others. Of course, the STM boards have more peripherals and still they have their use, but the future for the most mainstream products seems to be the ultra low cost Linux IOT ready boards and for that reason we need smaller size binaries and better performing compilers and tools.

For those that only develop only on small ARM mcus, don’t worry! There are many reasons that low embedded will be still on the market in the next years, no matter what happens. Especially on industries that have to do with health, human life, security, defense e.t.c. where everything needs to be absolutely deterministic. An RTOS or any other OS is non-deterministic and regardless the budget and the costs they can’t be used on these products as the regulations and approvals are very strict about this.

WiFi digital control DC power supply with web interface and USB

Intro

Welcome to my next stupid project!

Ok, this project  is really stupid and I’ll probably never going to use it for any of my next projects, but it was fun doing it nevertheless. I have a bunch of these adjustable LM2596 DC-DC boards in one of my “magic” component cabinets, that you can find quite cheap in ebay (~$1.5).

As you can see it’s composed of few components like an inductor, capacitors, resistors e.t.c. You apply a DC input voltage and then you get a step-down DC output on the other side. You can control the output voltage with a 10KΩ pot (the blue block device with the screw on top). That’s great. But… this means that every time you need to change the output voltage you need to turn the screw several times to get it. The good thing with that is that if you have a good multimeter you can get quite precise output voltages as the POT is analog. The bad thing is that you need to do manual labor every time you need to change the output voltage. But not anymore.

The idea was to simply change the analog POT with a digital one and then find a way to control it remotely. So, why not do that using a UART port, or even better a USB port. Oh, wait… Why not make it WiFi controlled and also have a web interface? And this how stupid projects are made. Do to that we’ll need the following components.

Components

Digi-pot

There are several digital pots on the ebay and they are cheap, but the trick here is to find a digi-pot that has enough wiper points (or steps). Why you need many steps? The ‘step’ for a digital pot defines it’s resolution, so for a 10KΩ pot with 100 steps (like the X9C103P) each step is 10ΚΩ/100=100Ω. That’s quite large when it’s used in voltage a divider like the pot on the LM2596 board. On the other hand by using a digi-pot like the MCP41010 that has 256 steps, the resolution gets much higher. You can find these microchip digi-pots on ebay for around $1.5 each.

The mcp41010 is controlled with an SPI interface, which means that you need a micro-controller. That’s cool, because this means that you can also use an ESP8566 wifi module and why not a USB interface if the controller comes with it.

Micro-controller

I like ARM processors. My favorite boards are those stm32f103c8t6 that you can find on ebay for $2. They are ultra-cheap, they have almost any interface that I need for my stupid projects and they also have a very nice API to program them. The stm32f103 has a USB port, more than the 2 uarts that are needed and an SPI interface to control the digi-pot.

This board is power either from the USB connector either from the 5V or 3.3V on-board pins. Never use both of them at the same time!

ESP8266

The ESP8266 shows up again on this stupid project as also on the first one. Copy-paste from the previous post follows: it’s easy to modify the source code with the SDK and also you can use any network capable device to interact with them. I’ll write a separate post about ESP8266 in the future and how you can write your own code for these modules. You can find them cheap in ebay and they cost around $2.50. There are a few types of this module that they have a different flash size (512KB, 1MB), but the only thing that you should care about now is that it needs to support 9600 baud rate and not only 115200, because I’m using a software serial library for the arduino that behaves much better on lower baud rates. Be aware that ESP8266 is a 3V3 only device. This is the module:

Output relay

In the output I’ve used a relay to turn on and off the output from the LM2596, you may not want to do that, but it’s nice to have it. There are many cheap variations of opto-couple relays in ebay that cost $1-$2 like this one

I prefer the high-level drive relays as the microprocessors usually after a reset they drive their pins to low, which is safe as the relay doesn’t get activated when the mcu is reset. You can connect the positive voltage output of the reference power supply to the COM and NC terminals. Also, make sure that the relay is rated for the DC output you’re going to use, so don’t use a 30V relay to output 40V. Finally, make sure that the relay is able to be activated with a 3V3 input trigger.

Step-down (AMS1117-3.3)

You’ll also need a step-down DC power supply module to power the components, like the AMS1117-3.3. There are some cheap pre-soldered modules with the AMS1117 on the ebay, I’ve found 5pcs for less that $1, which means $0.20 for each. They look like this

They are very convenient if you are using a breadboard or a double-sided prototype PCB, as they only have 3 pins and all the needed components are on the module. It also has a LED indicator that indicates that it’s powered.

Prototype breadboard

If you want to assembly the circuit without design a PCB, then you can buy on ebay one of these cheap prototype breadboards for less than $1.

 

The image is from a pack of various pcb sizes, you’ll just need one that fits everything you need to solder.

ST-Link

Finally, you need an ST-Link programmer to upload the firmware. Generally, I have the original ST-Link V2 and the Segger J-link programmers, but to be honest, most of the times I’m using one of these cheap st-link copies; which I find much more convenient to use on the limited workspace. Of course, I suggest you to buy the original ones, but if you also have a limited space then buy one of those from ebay.

Making the stupid project

I haven’t build a PCB for this project and it is only a proof of concept on my breadboard. These are the schematics of the circuit you need to build.

The stm32 is powered either from the USB port when it’s connected on the PC or from the AMS1117-3.3. Therefore, you need to be careful with that, so if you’re going to use a USB adapter or connect the stm32 to your PC then you need to remove the K1 jumper. If you’re not going to use the USB interface then place the jumper.

When you unsolder the blue resistor POT from the LM2596 PSU module, then you’ll have three empty holes on the PCB. The LM2596-POT1 and LM2596-POT2 terminals are connected to the PCB holes next to the OUT+ output. There are two main power inputs, the one is the VIN that is connected to the AMS1117-3.3 and provides power to the circuit and the other is the PSU V+ that comes from the external power supply you’re going to use for the LM2596. Therefore, the [LM2596-POT1/2] and [PSU V+ in/out] are connected to the LM2596 PSU. The USB_UART (P1) is not necessary to use and it’s just the debug UART port.

You can download the source files from my bitbucket project

https://bitbucket.org/dimtass/stm32f103_wifi_usb_psu

Read the README.md file as it has all the details you need, but still I’ll explain some things here. You don’t have to build the code to use it, but I suggest that you do as you need to change a few parameters in the source files. The pre-build binaries of the latest build are located in the firmware folder, therefore you can use the ST-Link utility on windows to flash the hex file or the st-flash utility on Linux to flash the bin file. To upload the firmware you just need to connect the USB cable on the stm32f103 board and run the flashing commands which are in the README.md file.

If you need to build the code then you’ll find the cmake files and scripts for both Windows and Linux. So, if you have a Windows OS, then you need to install cmake, make and a gcc toolchain for the ARM cortex-M3. Read this to see how you can do that.

After you’ve setup everything, all you need to do is run the build.cmd (on Windows) or build.sh (on Linux) to build the code. Cmake will create the binaries in the build-stm/src folder but also will create the .cproject and .project files in the build-stm folder. This means that you can import this project to your eclipse CDT IDE and edit the code in there.

One of the things you probably have to edit is the IP address definition (HTTP_IP_ADDRESS) in the src/http_server.h file. This is the address that your AP assigns to the ESP8266 module. That means that you need to configure your AP router to always assign the same IP to the MAC address of the ESP8266. Then you can re-build the code and flash the new binary on the stm32. It’s recommended to erase the stm32’s flash before you upload the firmware as the last 1K of the flash (address: 0x0800FC00) is used to store the configuration data, like the AP SSID, password and the pre-defined POT values.

The first time that the stm32 powers up after a firmware update, it will try to load the configuration data from the flash. If it doesn’t find a valid configuration then it creates a default configuration. In the default configuration the stm is not able to connect to any AP and the pre-defined POT values are all set to 127 (which means 5KΩ). You can use the USB or UART interface to update the configuration data for the AP SSID and password. To do that just connect the stm via a USB cable on your computer and open a terminal (I always prefer to use Bray’s terminal). Whatever terminal you use make sure that the LF char is handled as (CR & LF). Bray’s terminal has that option by checking the [CR=LF] checkbox in the settings area and the [+CR] next to the [-> Send] button.

If you have already red the README.md file in the project folder you’ll know which commands to use. As an example I’ll suppose that the AP SSID name is “Router” and the WPA2 password is “MyPassword”; of course, you need to change these with your own. To update the configuration send the following commands to the terminal.

SSID=Router
PASS=MyPassword
RECONNECT

The first command stores the SSID name, the second the password and the third one initiates a reconnection to the router. If the last one doesn’t work, just remove and re-apply the power on the stm. When the stm starts then the LED will start flashing every 250ms if the ESP is not connected and when it gets connected the it flash every 500ms. When it’s connected you can finally open you browser and connect to the web interface by writing its IP address on the address bar. This is what you’ll get

This is a very simple html page, so there are minimal javascript automations, which means that the web interface will not automatically restore the PSU state, so you don’t really know if it’s turned on/off and which pre-defined output is used. Therefore, have that in mind. On some browsers you may need to do a dummy click on a button first, so you can click the OFF button for 1-2 times. As you see the web interface is quite simple. In the first row there are the ‘Power & Trim’ buttons that you use to turn off or on the output relay and trim the output voltage by change the digi-pot value. The +/- buttons change the digi-pot step by 1 on every click and the –/++ buttons change the step by 5. You’ll need these buttons to trim the output and save the value on one of the pre-defined values.

Let’s say that it’s the first time you open the web interface after a new firmware update and a full flash erase, so there aren’t any configuration data. Also suppose that you’ve chosen that the pre-defined values will be (2V5), (3V3), (5V), (6V) and (12V) in the src/http_server.c file. Now you need to do the calibration procedure as all the pre-defined values are set by default to step 127 for the digi-pot (5KΩ). First connect the input power [ΙΝ+/ΙΝ-] to the LM2596 (e.g. 12V) and a multimeter in the output to measure the voltage, then click 1-2 times the OFF button, click the pre-defined value you need to set and click the ON button. On your multimeter you’ll see the output value that corresponds to the digi-pot’s step 127 (5KΩ). Now you need to change the output. To do that keep pressing the (–) button (or ++) until you get as close to the pre-defined value. Then use the (-/+) buttons to trim further and get the wanted output. Don’t expect the output to be very precise as the 256 steps for a 10K digi-pot is only 39Ω per step. When you get the nearest value to the wanted one then press the (Save) button and this value will be stored. Then press the next pre-defined button (3V3) and repeat the procedure until you set all the values and this is how the calibration is done. Don’t expect to get 12V in the output with a 12V input as there’s about a 0.7V drop on the LM2596.

The digi-pot is a linear resistor and not logarithmic like the pot you’ve replaced on the LM2596. This means that if you use a 12V PSU as a reference, you’ll get a better resolution under the 5V but over this voltage the resolution drops significantly.

If for some reason you change the input voltage on the LM2596 then you need to repeat the calibration procedure again. Also, it’s good to do that from time to time to be sure that it’s calibrated and always check the output with a multimeter before you connect a circuit. If you need another output voltage then just connect your multimeter in the output press the pre-defined voltage that’s closer to the wanted output and use the (+/-) button to get it.

Finally, you can do all these things by using the USB connection and the commands that are explained in the README.md file. You can also use the USB interface to set the exact digi-pot step value like this.

POT=127

You can set any value from 0 to 255.

This is a screenshot of how the web interface actually looks like in use.

You can edit the html file in the src/http_server.c source code file. You can right click on you browser and view the page source as it’s quite cryptic with the C formatting and then you can do any changes you like. Just don’t add too many things in there because the interface will take more time to load. You can change the CSS styles also to change the colors and then re-build the code and upload the firmware.

Summary

Well, this is a completely stupid project and is quite simple to build. All the needed components costs around $10. This the list of the components and their approximate price on ebay (you may find these even cheaper).

LM2596 PSU ~ $1.5
MCP41010 (10KΩ digi-pot) ~ $1.5
STM32F103C8T6 dev board ~ $2
ESP8266-05 module ~ $2.5
Optocouple relay board ~ $1.5
AMS1117-3.3 module ~ $0.20
double sided prototype breadboard ~ $1

This is an extremely bad photo of everything working on a breadboard.

Have fun!