It's Dangerous to go Alone! Take MISRA-C
Frequently, we talk about how a little compassion and respect for the future can make a big difference for code quality. I purposely avoided many specifics to focus more on the motivations for taking greater care with our work. This week, I’d like to address something very specific and perhaps something that could form a foundation for writing better code.
On Episode 99 (You Can Say a Boat) with Andrei Chichak, we talked about MISRA-C. Now, that may sound like a disease or vitamin supplement, but it’s actually a comprehensive set of guidelines for using the C programming language. MISRA stands for the “Motor Industry Software Reliability Association”, a group founded by the British auto industry in the late 90’s for the purpose of improving robustness and reliability in vehicle software.
The particular industry doesn’t matter though; reliable and robust code should be a goal for everyone writing software. Thus guidelines like MISRA-C can provide a good starting point for your own personal (or organizational) coding standards. Unfortunately, the MISRA-C documents are not free and open, so I can’t directly quote rules here. However, I can talk about some similar coding standards for which your tools might already test with the right options turned on, and, which in most cases, overlap with MISRA.
Check for Uninitialized variables (-Wuninitilized in gcc)
How is an uninitialized variable handled in your particular language? Even if you know, it’s probably not wise to rely on implicit behavior as it is easy to forget how it works. In C there are different behaviors depending on where the variable declaration occurs. It’s trivial to make a mistake of habit when there are subtle differences like that. For instance, the following code contains a big bug:
// Return the pointer to the first instance of ch in str
char *find_char(char ch, char *str, uint16 length)
{
char *ptr;
for (index = 0; index < length; index++) {
if (ch == str[index]) {
ptr = &str[index];
break;
}
}
return ptr;
}
What happens if ch does not occur in str? Well, the return value is going to be undefined. Now, this example is contrived, and there are lots of ways to implement this without the intermediate variable. However, at a cursory glance, it may not be obvious there’s a problem here because it is easy to see that the code works for everything but the error case. Unit tests might catch this, if you remember to test the wrong path as well as the right ones.
Remember, the issue with C is complicated by the different behavior depending on where the variable is declared. For example, a minor change to this code will work just fine (the first time!):
// Return the pointer to the first instance of ch in str
char *find_char(char ch, char *str, uint16 length)
{
static char *ptr;
for (index = 0; index < length; index++) {
if (ch == str[index]) {
ptr = &str[index];
break;
}
}
return ptr;
}
In C, normal variables allocated on the stack (within functions) are uninitialized and will have whatever value the stack memory held at the time of declaration. Static variables (those declared outside functions (aka globals) or within functions with the static keyword) are implicitly initialized to zero if they don’t have another initialization value. This inconsistency is what leads to errors. You probably wouldn’t want to make ptr a static (but if you were writing a strtok like function, you can see how locals vs statics and globals might be confusing).
In C++, object member variables are also uninitialized, which might be even more unexpected to developers coming from other OOP languages.
When in doubt, explicitly set your variables to a value. In fact, skip the when in doubt part.
Your compiler can help you out too in case you forget, turn on the warning for uninitialized variables and it will keep you out of trouble.
Implicit type conversion/promotion (integer promotion, usual arithmetic conversions)
A common theme throughout all the MISRA rules is to avoid implicit behavior. Not only is it not portable, experience tells us that’s where many errors creep into systems.
There are lots of operations in C that result in some conversion between types if the compiler isn’t told what to do. Those conversions are governed by a complex set of rules that I challenge any developer to rattle off the top of their heads.
One of the worst and subtlest offenses in this class of bugs is conversion between signed and unsigned. For example:
int32_t a = -1; uint32_t b = 1; printf(“%d”\n”, a < b);
Don’t look it up, think about what happens. Stumped? If you answered 1 or true, think again. Still stumped? Well don’t feel bad, it’s really subtle. Notice that we are dealing with two different types, signed and unsigned integers. Perhaps your intuition tells you that C would be sensible and convert b to a signed integer to evaluate the comparison. What really happens is the C compiler converts a to an unsigned and since a’s real value in hex is 0xFFFFFFFF, that’s a really big number.
Your intuition has run aground on the rocks of C’s arcane implicit behavior. If you are sensing a trend here, it’s intentional; many of MISRA’s rules attempt to supersede implicit actions by demanding that you ask for what you really want.
Use standard types
This is a big one, and if you are like me and learned C back in the 80’s and early 90’s, you might habitually use int, short, char, etc. as your basic data types. You might also have some notion that each of those types are always the same size (int’s are 4 bytes, shorts 2, and chars are 1, right?).
In the embedded world, and especially if you are writing portable code, this can cause a lot of trouble. The interpretation of these basic types is often up to the compiler and architecture. Thus, chars can be both signed and unsigned. Integers can vary in size from 16 to 64-bits. Mix this confusion with the implicit type promotion rules and you can set yourself up for disaster.
Include <stdint.h> and explicitly ask for the storage size and signedness you desire. Use typecasts on the right-side elements (rvalues) of expressions to make sure that the compiler is promoting types in a way you want.
Use parentheses judiciously (and avoid operator precedence problems)
As a recovering math major, I like to think I have a good handle on how arithmetic operator precedence. You know, things like 4*5 + 10*5 and knowing that the multiplications will be performed first. C has a way of humbling all of us self-described smarty-pantses. Here’s an example that tripped me up recently:
int Z = X + Y << 2;
Now, I have this unfortunate habit of treating right and left shift operators (<<, >>) as multiplies and divides by factors of 2 in my mind. Thus, one would think the shift should be evaluated first, since it’s equivalent to a divide. In actuality, C puts a lower precedence on the shift operator than addition and subtraction. So X+Y is evaluated first, then shifted. Oops.
You can either try to remember things like this, or you can once again be explicit:
int Z = X+ (Y << 2);
Problem solved, and I think in a way that is far more readable. If you forget this one, you may be lucky enough to get a compiler warning (gcc did in fact warn me by default when I tried the first example, while IAR did not). Turn it on if you have it available, but also just put parentheses everywhere, they are free.
Don’t use Goto/continue
Just don’t.
Summary
A lot of these things come down to a few simple ideas:
- Be explicit, especially where implicit rules might be confusing
- Avoid language features that are ambiguous or unexpected.
- Avoid assumptions about the underlying platform.
- Turn on all the warnings you can and make warnings equal errors in your compiler to keep yourself honest.
If you would like to use the actual MISRA-C rules in your projects, there a few ways to go about it, some of which may already be available to you depending on your tools. IAR, for example has the ability to statically check your code against the entirety of the MISRA rule set:
You can also purchase the MISRA C documentation from MISRA at http://www.misra.org.uk. As of this writing, it is £15 for a pdf copy.
You will probably find a subset of the rules more applicable to your particular needs than the entire collection. Some are very constraining (particular when working close to the hardware), and as our guest Andrei pointed out, you can still be compliant with MISRA while not using every rule by noting deviations. Having any rules at all is a big step up for some of us, and is often the first step toward quality code.
Some useful resources for safe C programming:
Books
- C Traps and Pitfalls by Andrew Koenig
- Safer C by Les Hatton
Software
- PC-Lint from Gimpel Software. A static analysis tool that can check for MISRA compliance and more.