Introduction to DMA

I’m not going to tell you how to get DMA working on your system, I don’t know your processor. Instead, my goal is to give you some insights into how DMA works and why we use it to make our systems faster. By the way, DMA spelled out is direct memory access, that part will make sense soon.

Let's say you have a system that needs to transfer data from place to place. In fact, let's say you have this monitoring system, though the details aren't important. The data from the ADC goes to SPI to an SD card and to the computer via USB. The UART is used for command handling.

Block diagram of data collection embedded system where the data flows from an ADC to a computer (and local SD card).

While there are lots of boxes, the processor (inside the box) here is doing very little. It gets busy when an ADC interrupt comes and data has to flow through the system.

The hardest thing the processor has to do is copy the data from place to place. But copying things from place to place is a silly thing for a processor to do. I mean, the processor copies data from one place to another. The CPU touches each and every byte. But, it doesn’t have to.

If we wanted to use the processor for something useful (such as analyzing this pile of data), we could offload copying into the magical technology of DMA.

I have two metaphors for how DMA works. The first one is terrible but fun. The second is also terrible but also fun. Hopefully between the two, the reasoning and the concept make sense.

I really like these Inkarnate maps so let's look at a different system but in the style of an adventurer map. Please take a look at a similar system to the above, though this one has a display and more of an algorithm.

Adventurer’s map of a processor with included ADC and display port.

This map is a representation of a system with some peripherals and some memory (RAM). The CPU is in the center, connected to Memory Lake with Backenforth Falls. Our circular buffer is there, spinning buffers around and around like a water wheel.

See how everything talks to the CPU?

(And by the CPU, I mean the core of your processor, the Cortex-M4F part of the processor. This isn’t like when I talk about microcontrollers or microprocessors, that’s the whole thing that includes a core CPU.)

Anyway, everything talks to the CPU. The CPU doles out memory as it is needed, sending it over these busses, err, bridges. Any memory transfer or copy or math operation, it all goes through the CPU.

Imagine if these peripherals didn’t need to talk to the processor to get to Memory Lake. What if they made a tunnel through the mountains? Sure, the initial setup would be painful. But once a tunnel was done, the peripheral could put things in the lake and take them out without waiting for CPU cycles.

In this metaphor, those tunnels are DMA, direct memory access.

It is a pain to set up but it frees your processor to do things other than copy in and out of memory. Even better, it means your peripherals don’t need to wait on the CPU to have time to receive or transmit their data.

But how?!? How do you build those tunnels? (Err, set up DMA?) Well, the documentation is needed because how you set up DMA is different for each processor (and sometimes for peripherals in a processor). However, there are some commonalities. But for that I need to switch metaphors.


Are you ready? It is a big switch. Maybe take a breath or two. 


Ok, say you have dry cleaning and you pick it up and take it home, put it in the closet. Whether you do it weekly or daily, it is just a stop on the way home. Easy enough. But say you get busy with work and life and wish someone else could do it for you.

However, you know how sometimes it is easier to do something yourself than to explain to someone else how to do it? How delegating something means explaining all the details?

So let’s say your boss noticed you are busy and gives you a couple assistants, each able to do one chore. Each assistant is only skilled in a few types of specific chores.

You allocate one assistant to pick up your dry-cleaning. Of course, you have to instruct the assistant:

  • This is my dry cleaner address.

  • Here’s how many items to pickup and what kind.

  • Here is where they go in my closet.

  • Here’s what to do if my closet is full.

  • Here is what to do in case of error.

And that’s just the general outline. When the time comes, you have to tell the assistant:

  • Ok, here is my current dry cleaning ticket.

  • Go, go pick it up now.

  • Text me when you are done so that I know I have clothes available to wear.

Some of this you can tell the assistant ahead of time, they don’t change (like where the dry cleaners is, or what to do if the closet is full). And some things may change with each chore (like the number of items to pick up and that there is something available now).

I don’t know how to build a tunnel but I do know how to delegate. Sometimes DMA is harder than doing it yourself. Other times, it is a luxury to have someone else doing your chores for you.


So that's all metaphor, let's go back to reality. Here is a sequence diagram of an ADC getting data normally.

Again, the processor core touches every byte of data coming in.

On the other hand, fortified with the idea of DMA as a dry cleaning micro assistant, here is some pseudo code for setting up the DMA SPI ADC along with its sequence diagram.

In this example, I’m not incrementing the send buffer from processor to ADC so the processor will always send 0xFFs to ADC but receive good data in an incrementing buffer. Once the transfer is complete, I’ll immediately set up another DMA transfer so it is ready when the ADC has data, the interrupt GPIO can fire off the DMA process that gets data. Meanwhile, the processor can do whatever needs doing until a new batch of data is available.

Adding DMA adds at least one layer of complexity for configuration. But it makes up for that in reducing processor cycles that deal with memory. You may not need DMA or your processor may not support DMA to all peripherals. And sometimes it is harder to set up than it is worth but when you do need DMA it is a very nice luxury.


For more tactical advice, Andrei Chichak had a post about using DMA on an STM32 Cortex with a UART. His introduction to DMA was also excellent, though less fanciful.