Definitely Tricky: Downloading Firmware
I mentioned the need for putting an interface on the board for programming access in my previous post in this series about the tricky parts of embedded systems. Those programming pins are the same pins we’d use to program the board in manufacturing. However, once the board leaves the factory, we are going to need another way to download new firmware. Firmware update is very tricky, with many pitfalls.
Have you heard that interview question where you have to get a goat, a cabbage, and a wolf across a river but your boat can only take you and one item at a time? And you have to go back and forth to make it work? Downloading firmware requires many of the same mental gymnastics that solving the goat problem does. The whole process is about moving memory from one place to another, trying to keep the system safe from being bricked or hacked.
Initially, if you are starting out with an Arduino-style system, you may think about using USB to update in the field. It is easy and everyone has USB. But then you find out how cost prohibitive that is. Not only is the hardware expensive, but verifying against your customers' many different operating systems is nontrivial as well. Plus, in the past few posts, I’ve been building up a BLE device so it makes sense to update over the air.
Nordic has an example for how to download firmware update (DFU in their parlance). Even better, the way Nordic sample code does the downloads is relatively safe and secure. Still, firmware download is something that I consider to be tricky, even if the vendor code handles it for us. Also, we still need a plan for the ATTiny, something that can work in the real world where connections are lost and batteries die.
Before we figure that out or even consider into Nordic’s DFU example code, let’s look at the standard options for downloading firmware. First, the processor can have an onboard bootloader. This runs before the code, it looks for certain conditions (a button pressed, some flash memory address with a particular value, and so on). If those conditions are met, the system goes into bootloader mode where it updates its code. If those conditions are not met, the system runs the normal runtime code. This is option a) in the figure below.
This is how the ATTiny works when we program it from a computer. If we set it up right after boot, the data on the SPI port gets written to the ATTiny’s internal flash and can be run later. If the power is cut partway through the download, the processor may not be able to run the code. However, the processor will be able to run the bootloader so your user (or your application) can retry and recover.
If you were updating the bootloader itself when power is cut, then all hope is lost. You have created a brick! Hopefully you’ll never need to update your onboard bootloader in the field.
Another option is indicated in the image as b) in the slide above: build your own loader. For this path, start with the idea that your runtime code code includes the ability to load new code. The new image gets put into a scratch space, usually RAM, sometimes external memory. Then when all of the new code is on the processor (and decrypted and verified), the system updates itself.
In general, this build-your-own-loader is an ok plan as long as there is enough space for a whole second image. It is reasonably stable and unlikely to create bricks; if something happens so that the firmware update doesn’t complete or can’t be verified, the code doesn’t flash the new code to the runtime location. However, it means you have a lot of storage space sitting around doing nothing, but waiting for a firmware update. Your program can’t fill all of the flash available. In fact, your program must be less than half the size of your flash. In a resource constrained environment, such wasted resources may not be possible.
In the third option, c) Updating from RAM from the slide above, the runtime code knows how to load a small amount of code into RAM. The server or application in charge of providing the firmware sends a small loader program that knows how to program the rest of the system. This loader runs, receiving and programming the runtime code. This doesn’t require scratch space at all and since the loader is separate, the amount of flash needed for the runtime is smaller than the previous two options. However, losing power or connection to the server can cause an unrecoverable failure, creating a brick if the loader has erased the old runtime but not yet programmed the new one.
Scratch space can mitigate that danger to some extent, allowing the loader to receive and verify the runtime before blowing away the old code and programming the new. Since the loader is separate, less scratch space is needed.
Enough theory. Let’s look at what we can do in our IoT Light’s bootloader. This is how the Nordic SDK used to do firmware update (now they use option a) but my memory map is from their older SDK. To start with, I need to figure where I’m going to store all of the images before verifying and programming them.
The Nordic firmware downloader doesn’t know about the external ATTiny processor’s firmware. But we definitely want to be able to update that code in the field. So we tack it onto our normal runtime image through a little python script and sneak it into the code. That means our scratchpad area for firmware update has to accommodate both the Nordic image and the ATTiny one.
Note that when we made the initial proof-of-concept, we didn’t have to think about any of this. As we built up the systems from the available (and cheap) parts, how to update firmware wasn’t on the radar as a feature critical to getting to the minimum viable product. However, it is required if we want to be able to fix bugs in the field. Given the schedules these days, sadly, firmware update is often required if we want to be able to supply the promised features in the first place.
Firmware update is often treated as an afterthought, an expensive one. I recommend implementing it very early so it gets a lot of in-office testing. It is the sort of code that must be stable before it gets into your customer’s hands.
Before we finish with firmware update, I want to mention security. This seems like a natural place because your company probably wants you to make sure no one else can read the device’s code, even the binary. They (should) see firmware update is a weakness, one that can affect the bottom line. They will likely give you time to secure the bootloader and update process.
I hope you think about more than that. You have a responsibility to your users to keep their data private and to keep your device from harming other systems.
There is no universal answer here; sadly, there is no secret key to make security simple. Worse, good security is a moving target that is always getting harder. I have a device security checklist blog post for you to consider but no easy implementation advice beyond the basics of the Top 10 Secure Coding Practices from CERT.org. Overall, Nordic does an ok job of securing your image if you set up BLE security. However, BLE security is never very good.
Let’s consider the security around firmware update in general. In fact, let’s assume that hackers can inject their own code into your device and use it the way they want. How does that change what you do? If your security is good enough, then a hacker may only be able to hack the device (read your code or inject theirs) with physical access to the device. So assume that they’ve done that on one device: does it endanger your server, your customer data, and your other devices?
The other (more corporate) side of the coin is whether or not someone can gain access to your code so their illicit firmware updates match yours. Decapping a chip to look inside at the physical arrangement of memory is slightly expensive. But if someone really wants your executable, they can get it.
If a hacker physically has your device, they get to use all of your test points and debugging features. They can figure out what you are doing and make their own version of the hardware and the software. They can do these things if they care enough and have the resources to pull it off.
Determining how much a hacker is likely to care is part of creating a threat model. Creating a threat model early on helps guide design decisions. It lets you ask questions and get answers. Overall, you do the best you can with security but if you make an item with high profit margins, well, the possibilities are terrifying and far beyond my scope here.
One heartfelt bit of advice: don’t roll your own security. Use standards methods. Get all the help you can from experts. Get your methodology and implementation audited. Security is complicated, and you will make mistakes in ways you will never test for.
It isn’t just a matter of protecting your code during firmware updates but if that’s how to start the conversation at your company, good, get it started. One final note: I think this is going to be the topic of an increasing number of legal actions (lawsuits and regulatory action) so not doing security well will be an expensive proposition.
This a blog post is part of a series based on the “Embedded Systems: The Tricky Parts” talk I gave to the Silicon Valley IEEE Computer Society. A video of the talk is available on YouTube.