Embedded Wednesdays: A Crash Course in C - Part 1 - Integers

When we did a post-mortem on the first in-person session of Embedded Wednesdays, the embedded systems programming course that I lead at ENTS, the biggest issue was the need for more information about the C language.

Now that we have had an introduction to embedded systems programming, it’s time to introduce the C language.

#pragma section(introduction)

I covered the history of C in my programming language post, so we know why C is still around and how long it’s been here for, and the fact that it has a lot of traction in embedded systems. In the next group of posts I will be providing an introduction to C. This will be a crash course for some and a review for others.

#pragma section(data, integer)

C provides numerous built-in data types; integers, floating point, pointers, enums, arrays, structures, and unions. This week we will start with integers, later we will move on to the other data types.

Integers are numbers that don’t have a fractional part, just the whole number part. You use integers for counting, calculating, and in C, defining the size of arrays and controlling loops. Integers are stored in variables which are memory locations which are given a name so they can be used later. Think of them like the X in all of those algebra problems you had in high school. A name with a value.

As I have said previously, computers are really stupid, they have a restriction on the size of a number that you can hold in a variable. If the number you store is too big for the variable, the computer will store whatever it can and not tell you that it threw the rest away. It’s like if we were to write a program and create a variable that can only hold an integer in the range 0 to 99, and we stubbornly decide to try to store the value 100. The program will actually store 0. This is the basis of the Y2K problem that you may have heard of. [1]

How big can an integer be? Since the range of values that an integer variable can store are not specified in the C standard, the people that write your compiler will decide for you and there is no consensus. You should figure out what range of values the variables can handle before you use them, but nobody ever does and things occasionally break. To avoid this problem some header files were included in the 1999 version of C to avoid some of the popular sources of bugs. So let’s introduce our first best practice, don’t use the built-in data types like int and float.

The header file stdint.h that comes with your compiler defines a bunch of integer types that tell you how big they are and if they are signed or not. Our first bit of C code looks like this:

#include < stdint.h >
uint32_t unsigned32BitInteger;
int8_t signed8BitInteger;

This bit of code shows how to use the stdint header file, then requests storage for two integers. The first integer uses the uint32_t type and has the name “unsigned32BitInteger”. The second one uses the int8_t type and has the name “signed8BitInteger”.

The names must start with an alphabetic character or the underscore “_” symbol, they cannot start with a number, and they cannot have special characters in them like dots and splats (‘*’). Upper and lower case characters can be used, but the case is important, ab, Ab, aB, and AB are all different.

There are some common naming schemes for variables. Some people like to use long names with underscores in them sort_of_like_this. Some people will use all lower case, andyougetstufflikethis. All uppercase LOOKSLIKEYELLING. Put in some underscores and_you_get_stuff_like_this and LOOKS_LIKE_YELLING becomes more readable. My favorite is something called camel case whereTheWordsStartWithCapitals. The variable names can be really long, use the characters available to you and choose names that describe what the variable does.[2]

The general syntax for declaring a variable is the type followed by the name and a semicolon to end the declaration.

Under the hood, the type names in stdint.h get translated into various combinations of the built-in signed and unsigned, char, int, short, and long, but the combinations change with the processor and compiler and this has caused many problems over the years. Here is a table of the data types available in stdint.h that you can expect:

Name

Signedness

Size

Range

int8_t

signed

8 bits

-128 to 127

uint8_t

unsigned

8 bits

0 to 255

int16_t

signed

16 bits

-32768 to 32767

uint16_t

unsigned

16 bits

0 to 65535

int32_t

signed

32 bits

-2,147,483,648 to 2,147,483,647

uint32_t

unsigned

32 bits

0 to 4,294,967,295

int64_t

signed

64 bits

-2^63 to 2^63 - 1

uint64_t

unsigned

64 bits

0 to 2^64 - 1

As you can see, the range of values that integers hold aren’t like our normal counting system. Computers don’t naturally work with 10s, 100s, and 1000s. They work in twos: 2, 4, 8, 16, these are the powers of 2. So an 8 bit value goes up to 2 to the power 8 or 256, but since we start at 0 instead of 1, we get a range of 0 through 255.

As you can see, the values for 64 bits were just left as 2 to the power of 64 because this is a huge number, +/-9 quintillion or so for signed values. It’s big enough for anything you might want to do. But, unless you have a 64-bit processor, don’t try and use a 64-bit variable unless you really need to.

On small processors, the 64-bit integer types may not be defined in stdint.h because there is no combination of standard keywords that will give a 64-bit integer. Check stdint.h first, or try this code:

#include < stdint.h >
void main(void) {
   uint64_t reallyBig;
}

If this piece of code compiles, you have 64-bit integers available.

If you are working on an 8-bit processor (determined by the width of the registers used for calculations) like the Atmel ATMega328 on an Arduino, you should know that all calculations done with any data type larger than 8 bits will be simulated using 8-bit operations. 32-bit math will become a combination of a bunch of 8-bit operations. These processors are getting really zippy, but can get quite slow when doing 16, 32, and 64 bit operations.

This effect is not limited to 8-bit processors though; even a 32-bit ARM processor has to simulate 64-bit operations. On the other hand, calculations done on a data type smaller than the registers are typically done with full sized instructions then adjusted, the speed penalty is minimal or nonexistent.

Now you know why there is a push towards wider and wider processors, they can work with larger values without having to use many instructions and complex math to simulate the calculation. They use far fewer instructions to get a calculation done.

If you are working with unsigned values, a natural choice for counting physical things, there can be a slight speed advantage, and you get twice the positive number range as a signed value.

#pragma section(data, integer, constants)

Constants are values that cannot be changed. You can use them to give a variable its starting value, called initialization.

Integer constants in C are assumed to be signed, so if you have:

#include <stdint.h>
void main(void) {
   uint8_t i;
   i = 42;
}

The compiler will generate code to take the signed value +42 and convert it to unsigned, then assign it to i. A bunch of that code will get taken out by the compiler but you can avoid some subtle errors if you use an unsigned constant when doing unsigned math. You make a constant unsigned by placing the character “U” after the value.

This brings us to our second best practice - when using unsigned constants, use the ‘U’ suffix to indicate to the compiler that you are using an unsigned constant. This code becomes:

#include 
void main(void) {
   uint8_t i;
   i = 42U;
}

The suffixes should be capitalized. The compiler lets them be lowercase, but a problem shows up in the suffix for long  (typically 32 bits) values, l or L. If you have the number 11 and you want it to be a really big representation of 11, you append an ‘el’ to it; and get 11l, that doesn’t look obvious, let’s try 11L. Ah, much better.

You can combine variable declaration and initialization like this:

#include 
void main(void) {
   uint8_t i = 42U;
}

Next week we’ll look at floats.

 

[1] The Y2K or Year 2000 problem was the result of the choice to store only the last two digits of the year in databases. Disk space was very expensive at one time, and storing the 84 portion of 1984 took half of the space, multiply that by a million records and you have saved two megabytes on storing just one date. The problem could be fixed in the future when the software got a rewrite and disk space became cheaper.

Unfortunately the choice to fix the problem was delayed until some programmers figured out that on January 1st, 2000, their computer systems would figure it had suddenly become January 1st, 1900, and testing showed that a lot of systems were going to fail. This included Windows 95, 98, and NT, a lot of the PC hardware that they ran on, embedded systems, their clock chips, as well as the large mainframe computers.

The problem was fixed adequately and the world didn’t implode, but the flurry of activity involved in getting everything compliant involved a lot of overtime, replacement of computers and operating systems, and buying disk drives. Once 2000 started, the computer customers didn’t need any more new computers, software, or contractors and the computer industry went into a great slump.

 

[2] My personal naming scheme, which you can adopt, modify, or reject is:

  • Variables - camel case with the first letter in lower case: engineTemperature.
  • Functions - camel case with the first letter in upper case: CalculateTemperature.
  • Defines and macros - upper case with underscores: QUAN_ADC_CHANNELS.
  • Custom type names - upper case with underscores with _T appended: ADC_CHANNELS_T.

If you need a generic loop counter, use i. If you need another, use j. After that, come up with a name that indicates what the counter does, like pillowCount for counting pillows.

Remember, there are only two things that will ever read your program, the person that has to fix it, and the compiler. The compiler doesn’t care what you call your variables. Write your program for the person that has to fix it in the future, it may be you. Be nice to your future self.


This post is part of a series. Please see the other posts here.


The C section of my library.  

The C section of my library.