A kernel-mode solution for a real-time Raspberry Pi problem

Saturday, 8 June 2013

I had an idea to extend my Raspberry Pi-based heating system with the ability to control Home Easy sockets around my house. Home Easy sockets look like this:
Home Easy remote control and socket
The socket (right) is remotely-controlled by the handset (left). Each socket can be linked to three or four handsets. When a button on the handset is pressed, the socket switches on (or off). It's a nice way to control lighting, though the socket can be used with almost anything that plugs in.

The remote control messages are sent by radio at 433.92MHz. This frequency is commonly used for short-range remote control applications (the key fob for my car uses the same frequency) and so there are low-cost plug-and-play modules tuned for this frequency, like these:
433MHz transmitter (top) and receiver (bottom) modules
My initial plan was to use a microcontroller as a bridge between the RPi and the 433MHz transmitter module. I bought an Arduino Nano, which has a USB interface and can be easily programmed from a PC. The Nano's Atmel AVR microcontroller is a small, complete computer in a single chip. I wrote software for the microcontroller to send and receive Home Easy codes.
The first step in writing such software is to capture the raw signals. You write a program to capture the on/off pulses from the 433MHz receiver module. Here is an example:
Trace for a remote control "on" signal from handset number 0x31337, page 1 button 1 (click to enlarge)
Having captured a few traces of this sort, it becomes clear that (1) the same code is always sent by the same button on the same remote, and (2) the signal is divided into four basic parts:
  1. Preamble (the same for all remotes),
  2. Remote identifier ("unique" code for each remote),
  3. Command (e.g. on/off),
  4. Button identifier.
The entire code may be represented as a 32-bit number. The preamble is 4 bits, the identifier is 20 bits, the command is 4 bits, and the button identifier is 4 bits. A high bit (logic 1) is sent by:
  1. transmitting a signal for 220 microseconds,
  2. a space for 1330 microseconds,
  3. a signal for 220 microseconds,
  4. a space for 320 microseconds.
A low bit (logic 0) bit is sent by swapping steps (2) and (4), so the short space comes before the long space.

There are two subtleties. Firstly, the first bit of the preamble follows a "start" bit (220 microseconds high) and a "start" space (2700 microseconds). Secondly, the gap between the end of one code and the start of the next should be 10270 microseconds. It seems to be important to send each code more than once. (The receiver apparently waits for two or more matching codes before taking action.)

Generally, a microcontroller is a great platform for any embedded system application requiring precise timing, and the business of sending and receiving Home Easy codes certainly requires this. There is some tolerance for imprecision, but not much. The pulses must have timing similar to that shown above. Too long or too short, and the message will not get through.

In other words, it's real-time! And it's a great example of the (trivial) solution to many problems in real-time computing, namely "use a microcontroller".

My Arduino program (here) makes use of the "micros()" library function to sample a microsecond-resolution clock. It uses this clock to determine when to change the state of an output pin in order to send a specific 32-bit Home Easy code. Using this, I was able to (1) discover the remote identifiers for all the remotes in my house, and (2) control any of my Home Easy sockets from my PC.

But the next step did not go so well. I had intended to simply plug the Arduino straight into the RPi via its USB port. That way, there is no need to change the RPi's configuration beyond running a new program to communicate with the Arduino. All of the time-critical sending and receiving work would be done on the microcontroller.

But I hit a snag, because the Arduino Nano's USB chipset does not work perfectly with the RPi. The problem looks like virtually all Linux-related problems: it doesn't work, there's a weird error message, you search the web for the error message, and you find forums that are full of bad advice and "works for me". Some Arduinos work, others don't. There are solutions for related problems which are superficially similar. There are bizarre workarounds that make matters worse. Meanwhile, the real issue is some driver or firmware bug, and nobody with the required skills has got to the bottom of it yet.

Recognising this problem (I have a lot of Linux experience) I abandoned the plan to use the Arduino, instead connected the 433MHz transmitter module directly to the RPi's GPIO output. Electrically speaking, this is fine, because the 3.3V logic level from the RPi is enough to control the transmitter module. In programming terms, it is also fine, because programs on the RPi can directly control the GPIO pins.

But in real-time terms, it is not fine. How can we obtain microcontroller-like precise timing from the RPi?

ARM CPUs can be used as microcontrollers. When ARM decided to rebrand their CPUs under the "Cortex" name, they introduced three subtypes of CPU. Application (A) CPUs like the A9 were intended for user-interface tasks, like running Android on a smartphone. These are high-performance designs. Real-time (R) CPUs were intended for complex microcontroller tasks, like running wireless interface ("baseband") software on the same smartphone. These are high-predictability designs. And Microcontroller (M) CPUs were intended for simple microcontroller tasks, like charging a battery and sensing temperatures. These are space-saving, ultra-low-power designs.

The CPU in the RPi predates the Cortex rebrand, but it's plainly the Application variety. It has all of the frills available at the time. It's not intended to run real-time applications, or to act as a microcontroller. Can it operate predictably?

The answer is that it certainly can act predictably enough. Microsecond precision is possible. The thing that acts against it is not hardware at all, but rather Linux itself.

My initial attempt to move the Home Easy control software to the RPi involved a conventional "user-mode" Linux application, written in C. It used the "gettimeofday" system call to obtain a microsecond-level clock, and it ran with the maximum priority allowed for applications on Linux.

Most of the time, it met the deadlines for controlling the 433MHz transmitter, and Home Easy messages got through correctly. But this could not be guaranteed. For instance, receiving a network packet would disrupt the transmission, as in the following example:
Partially corrupt transmission (e.g. around 58 milliseconds)
If the system were heavily loaded, e.g. running another application, then there would be many deadline misses:
Corrupted transmission caused by execution of other programs on RPi
This was disastrous. The transmissions were garbage. In most cases they simply did not get through. In some cases they might even be misinterpreted by a receiver! This would never happen with a microcontroller.

Fortunately, there is a way to get close to the predictability required. But we cannot do it with a conventional Linux application, running in user mode. We must move the Home Easy software into the kernel, where it has the required privileges needed for precise timing.

I rewrote the software as a "kernel module", an add-on driver that may be loaded into the kernel after system startup. It's very similar to the user-mode application. It uses the internal Linux kernel function "do_gettimeofday" to obtain a microsecond-accurate clock. It transmits codes that are written to the device "/dev/tx433". For example, to send the code shown above, I can enter the following shell command:
echo 03133790 /dev/tx433

However, merely moving the application into the kernel does not prevent other parts of the kernel preempting the transmission process. A further step is needed. Here's the code:
unsigned long flags;
local_irq_save(flags);
transmit_code(code, 10);
local_irq_restore(flags);
This is the ultra-privileged operation that is only allowed within the kernel: interrupts are disabled. This means that no other code can run at the same time as the "transmit_code" function, which has completely exclusive use of the CPU. Interrupts are re-enabled when "transmit_code" is done. In effect, "transmit_code" becomes the highest-priority task on the RPi.

This is sufficient to get the microcontroller-like behaviour that is required. Transmissions work perfectly. The timing is accurate to the microsecond.

It is possible that features of the ARM CPU architecture would act against predictable operation if sub-microsecond timing were required, but there is no need for that here. The problem, in this case, is the non-real-time design of Linux. The workaround is to demand higher privileges.

The disadvantage is that the CPU is otherwise unavailable during transmission. During this time, network packets may be lost, and no other code can run. That would be a big deal in a more complex system handling other real-time tasks. That's the sort of situation where you would need a proper real-time operating system (RTOS) which would make timing guarantees that ordinary Linux cannot. Or, perhaps, a number of microcontrollers!

Postscript: Here is the module source code. I followed the guide here to compile a kernel for the RPi from source, as this is easier than trying to build modules for the kernel supplied by Raspbian. (Modules aren't interchangeable between different Linux kernels.) I added my module to the drivers/misc/ directory.