Transcript from 414: Puff, the Magically Secure Dragon with Laura Abbott, Elecia White, and Christopher White.

Welcome to Embedded. I am Elecia White, alongside Christopher White. When people tell me about chip bugs in non-alpha silicon, I usually nod quietly and wonder what bug they have in their code that makes them think they found something so rare. Not today.

EW (00:23):

Today, we're going to talk about a bug in the silicon that can be used to hack a system, and I'm happy to talk to Laura Abbott today.

CW (00:29):

Hi, Laura. Welcome.

LA (00:31):

Hi, thanks for having me.

EW (00:32):

Could you tell us about yourself as if we met at the Hardwear.io Conference next week?

LA (00:40):

Sure. I'm a firmware engineer at Oxide Computer. I've been there since January, 2020. For those who haven't heard of Oxide, Oxide is rethinking the server from the ground up.

LA (00:51):

Servers haven't really changed in a number of years and Oxide is helping to build something that's fundamentally a better product than what you can buy currently.

CW (01:00):

And when we talk about servers, my ancient brain goes to, "Oh, a Dell computer next to my desk that some people get some files from," but these are big things that go in racks, and for data centers, and stuff like that, right?

LA (01:13):

That's right. But that's actually a good comparison, because the servers that you can buy from Dell that go in racks actually look a lot like your Dell PC. And it turns out that's a pretty difficult thing to work with, especially for new hardware.

LA (01:29):

And big companies like Google and Facebook are designing their own hardware these days that's much nicer to be able to use. But if you're not one of these big companies, of course, you don't have the chance to be able to buy this, because you can't afford to be able to design your own hardware.

LA (01:43):

So that's sort of what Oxide is going for, being able to build really nice hardware to be able to deliver a great experience.

EW (01:50):

We want to do lightning round, where we ask you short questions, and if we're behaving ourselves, we won't ask "How," and, "Why," and all of that.

LA (01:58):

Okay.

CW (01:59):

Do you like to complete one project or start a dozen?

LA (02:02):

I honestly have a tendency to do both, depending on where I am and what type of thing I'm looking at. I definitely have a whole bunch of sort of electronics projects that are half-completed that I need to actually finish sometime, but I think other things I tend to do one at a time.

EW (02:19):

Hubris or humility?

LA (02:22):

Oh, I'm going to have to go with Hubris.

CW (02:26):

Favorite Cortex-M?

LA (02:28):

[Ooh], that's a tricky one. I'm going to have to go with the Cortex-M 33, because I do have a soft spot for that TrustZone.

CW (02:40):

But no floating point.

LA (02:42):

[Inaudible] floating point.

EW (02:46):

Urgency or rigor?

LA (02:47):

Did you go through and put in an Oxide application? I'm going to have to go with rigor, I think.

EW (02:55):

That question is from Oxide's interview questions. What do you hear most often?

LA (03:02):

The Oxide has a number of questions people ask for on the application. And I think one of the questions Oxide likes to ask is, "Talk about two values in tension and how you resolve them."

LA (03:14):

And people do talk a lot about urgency versus rigor, because that's a fairly common thing for people to talk about in engineering in terms of trying to figure out, "Okay, how much work do I need to do to make this correct," versus "Can we get it done a little bit faster?"

CW (03:27):

Favorite fictional robot?

LA (03:30):

I love Wall-E. That was a great movie.

EW (03:34):

PyOCD or OpenOCD?

LA (03:38):

[Ooh], I'm definitely going to have to go with PyOCD. OpenOCD has a soft spot in my heart, but it's kind of a pain and my colleague Cliff, if you happen to listen to this, I respect your choice to use OpenOCD.

CW (03:52):

What's PyOCD? I haven't heard of this.

EW (03:54):

I hadn't either.

LA (03:56):

PyOCD is a Python library to be able to do debugging support. It has support for the CMSIS-DAP standard to be able to connect to a lot of microcontrollers. And that's what we end up using at Oxide or at least that's one of the tool chains we end up using.

CW (04:13):

Oh, we should check that out.

EW (04:14):

Yeah. Sounds cool. Do you have a tip everyone should know?

LA (04:18):

Always read the documentation and don't be afraid to try things.

EW (04:23):

Okay. So before we get into the silicon issue you found that you're going to be presenting at Hardware.io soon, tell me about the chip in general. It was the NXP LPC55?

LA (04:40):

That's right. The NXP LPC55 is a Cortex-M 33 that we evaluated. It has a number of nice features that Oxide was evaluating for a chip, for our Root of trust, that we found. And it has things like a strong identity to be able to give a cryptographically, secure, unique identity.

LA (05:04):

It has some hardware accelerators, but that's less important to us. It has Secure Boot to be able to do things. And we chose these features in particular because they let us be able to build up for what we're doing for the Root of trust.

EW (05:19):

So the Root of trust is that I have to be able to trust everything in the whole chain in order to trust that what I'm doing at the end is useful. And this is often used in firmware update. Is that what you're talking about?

LA (05:36):

That's part of it. So the idea behind the Root of trust is, sometimes when you say that term, ... that can mean a lot of different things to many people. When I say Root of trust, we're talking about answering the question about what software is running on the system.

LA (05:50):

So the idea is that we're going say, "Okay, we're going to trust what's running on the Root of trust, and then that can be used to build up other parts of the system." So the idea is that we know exactly what's running on there.

LA (06:01):

And we'll be able to compare that against an expected set of hashes and then say, when it gets time to be able to do something, like system updates, "You'll be able to not just install the update, but also get another set of expected calculations about what should be running on the system."

EW (06:14):

So this is like if I jailbreak my iPhone, I've lost the Root of trust for Apple. And so their apps won't work on it.

LA (06:22):

That's a good example there.

EW (06:26):

And the LPC55 is an M33, which you said has TrustZone. What is that?

LA (06:33):

So TrustZone is another way to provide isolation from code. So for a lot of chips, you'll often have the privileged versus unprivileged mode. TrustZone for M provides another axis of secure versus unsecure.

LA (06:49):

So you could have secure privileged, secure non-privileged, non-secure non-privileged, non-secure privileged, and other things like that.

EW (06:57):

Every permutation.

LA (06:59):

Yes.

CW (07:01):

So I remember using on Cortex-M4, there was a memory protection unit, not an MMU, but a memory protection unit that would allow you to mark various pages as privileged versus non-privileged. Was this part of that?

LA (07:16):

It's a similar concept. So the MPU is still definitely there and that provides isolation to be able to choose what things are accessible. The TrustZone uses a controller called the SAU, Security Attribution Unit. You need to be able to specify which regions are secure or non-secure.

LA (07:35):

So in our system, for example, we end up having both of those pieces configured. So when you're running in non-secure world, for example, the regions that are secure are specified. But you also have the MPU going to specify what regions of memory you're allowed to touch.

CW (07:52):

Okay.

EW (07:53):

And if I didn't have the TrustZone, what would be different? ... Is it only for IoT systems?

LA (08:04):

If you didn't have TrustZone, you could still build a secure system, but it's helpful to think about TrustZone as just another layer of protection that provides another way to isolate things. So the goal is that if you're running code in a non-secure world, you should only be able to get into TrustZone through a very specific path.

LA (08:26):

So if for some reason you didn't have TrustZone, then you would be left with coming up with another way to be able to fully protect that. And if you have a properly working system, that always should be fine.

LA (08:36):

But the idea with, for example, something like TrustZone is that if you did end up having a bug in your code that say might expose secrets, then the other layer of security protection will make it even harder to get those secrets.

EW (08:46):

Okay. And this is part of the Arm chip, not part of the NXP chip.

LA (08:52):

That's correct. TrustZone itself is a part of a specification defined by Arm. And one of the things, when you start looking at microcontrollers and looking at everything, is that it turns out that implementing various things like TrustZone can be optional.

LA (09:08):

So depending on what version of the specification a chip vendor chose to implement, there may or may not be TrustZone.

EW (09:16):

But the LPC55 has one.

LA (09:19):

Correct.

EW (09:21):

... Okay. I understand you need to have something on there that is the base of security, and you mentioned an IDE. And then I can, in manufacturing, use that IDE to assign a particular cryptographic key and put that in.

EW (09:42):

And then I use that for communications, including firmware update. What am I missing about the specialness of TrustZone?

LA (09:50):

I think a good example there is that if for some reason you wanted to really make sure you didn't want to be able to read out the secret for being able to do your firmware update.

LA (10:03):

So for example, if you had a key that you really wanted to keep private, you could put that in TrustZone so that when... A common design may be to have your bootloader be secure, and then jump into non-secure, there would be no way to read that out once you're in secure mode.

LA (10:18):

Another example that we were looking at for it to be able to do ... with our chip is being able to use TrustZone to provide even more hardware isolation.

LA (10:27):

So we would only be able to access certain hardware blocks that we were really not sure about and put them only in TrustZone. So that way the non-secure world couldn't potentially access them.

EW (10:38):

So the non-secure world wouldn't be able to access the power off button.

LA (10:43):

That's a good example. Yeah.

EW (10:45):

Or the rewrite flash code.

LA (10:49):

Yes. And that's part of what we were looking at when we were evaluating is what exactly can we do to make sure that our product is secure and being able to rewrite flash is definitely one of those dangerous scenarios.

EW (11:02):

And there are fuses you can change in manufacturing so that people can't read out your code over a JTAG link, or CMSIS-DAP, this isn't really JTAG. And there are fuses you can blow so that you can't ever change what's on the board.

EW (11:24):

But those are small things and cover large pieces of the system. The TrustZone lets you change parts, but not other, is that right?

LA (11:33):

Yes. So I think your example of fuses is a good one to sort of compare and contrast, because it's also worth noting is that that TrustZone is ultimately just another sort of hardware configuration. And especially for things like fuses, that tends to be a permanent, one-time thing.

LA (11:51):

So TrustZone, once you have your finalized settings, you may choose to blow your fuses to be able to make sure you can't actually make further changes to your flash, for example.

EW (12:05):

Okay. Do you understand TrustZone, Christopher?

CW (12:08):

I do understand it better. Yes.

EW (12:10):

Okay. Laura, you are giving a talk called "Unwanted features: Finding and exploiting in ROM buffer overflow on the LPC55S69.

LA (12:29):

That's correct.

EW (12:30):

So what's that about?

LA (12:33):

So this is a bug I stumbled on somewhat accidentally. So I mentioned that I'm a firmware engineer at Oxide Computer.

LA (12:41):

My job is not actually vulnerability hunting, but during the course of trying to work with the feature of the chip related to software update, I stumbled across a buffer overflow that could be used to break some security boundaries in the chip and really violate some pretty fundamental assumptions.

EW (13:00):

So stumbled across, is this, you had a horrible bug and then noticed it did something?

LA (13:06):

Not really. A little bit more -

CW (13:10):

The Fifth Amendment doesn't apply to this podcast.

LA (13:14):

So honestly, I ended up finding out this bug because I was a little bit lazy. And I didn't want to write a parser for the update format that NXP was going for, or at least I started to work on it and realized, "Huh, this format is kind of complicated. There's a lot of fields in this header."

LA (13:34):

So the update format that NXP uses is called SB2, and it starts out with an unencrypted header before actually getting to the keys and then commands to actually do things like erase the flash. And so ... this is transmitted sequentially.

LA (13:51):

So I started thinking and going, "There's a lot of fields in this header. How well does the ROM actually validate all parts of this header?" So I had a ROM dump laying around, and I started looking a little bit closer. And I happened to find one of these fields that wasn't being validated correctly and gave me a buffer overflow.

CW (14:13):

So just so I'm clear, what ROM is on the Cortex?

LA (14:18):

This is a ROM that's specifically designed by NXP ...

CW (14:24):

Okay.

LA (14:25):

Cortex itself doesn't actually mandate any of this. This is a design choice that NXP actually made. And when we were initially first choosing a chip, several of my colleagues pointed out that this might be a disadvantage just because they had had bad experiences with ROMs. And so far that wisdom has turned out to be very correct.

CW (14:47):

Interesting.

EW (14:48):

But this is flash that is on the chip.

CW (14:52):

It's flash, or is it - ?

EW (14:53):

It's probably flash. It may be masked ROM, but it's probably flash.

CW (14:58):

Okay. Okay.

EW (14:58):

And it's flash that we can't get to and that we can't modify, because it probably has had its fuse blown when it left manufacturing.

CW (15:07):

Right. Okay.

EW (15:08):

And I gave that memory map talk, and I had that area that was the unused addressed spaces in the ocean. And you can just -

CW (15:18):

Sure.

EW (15:18):

We don't know what all the registers are. They don't tell us what all the registers are in the manual. They tell us the registers they want us to use.

CW (15:24):

Yeah.

EW (15:25):

And so this ROM code, it's kind of like that. They tell us how to use it, but they don't tell us what it is. And I mean, ... you said you had coworkers who were distrustful of the ROM code. What made you look around for it, and how did you dump it?

LA (15:47):

My coworkers who are distrustful, one of my colleagues, Cliff in particular, just pointed out that, I think especially for what we're trying to build with the Root of trust, part of what we're doing, because we really want to know what exactly is running on it, and I think, especially as you gave a great example, ... the manufacturer may not tell us everything that's in the ROM.

LA (16:07):

And for building a Root of trust, this is kind of terrifying, because it's hard to know exactly, is there something in there that may break our assumptions about being able to do our chain links to be able to do our measurements.

LA (16:19):

So I think initially we're a little bit worried about this, and it turns out in this case that NXP did not actually add any sort of read protection off the ROM. So dumping the ROM was a very simple matter of literally just reading it out with a debugger and saving it.

EW (16:36):

Okay. That was mistake number one.

LA (16:38):

I don't know. I kind of disagree there. I actually think that the ROM actually should be available, but I mean, really if I have a complaint, they should just be giving us the source code to the ROM.

EW (16:49):

Yes. Either it should be transparent, or you should make it totally opaque if that's the path you're going.

CW (16:56):

Yeah. But totally opaque, I was going to ask this later, but now I'm going to ask it right now. So I worked with an authentication chip, many years ago, when this stuff was still pretty much in its infancy. And it had a lot of problems.

CW (17:11):

But the question I kept getting from other technical people in the company when we chose this chip or any chip that did authentication, ... and back then the question was, "What's preventing somebody with a million dollars of equipment from acid etching down the chip and reading out the secret key from the flash, or wherever it's programmed?"

CW (17:31):

And I said, "Pretty much nothing except a million dollars," at that point. Now it's much less than a million dollars. ... So when Elecia talks about, "Well, they should block off the ROM," does that actually prevent anything if somebody wants to go look at it visually or are flashes harder to do that with these days?

LA (17:50):

I think it's still possible to be able to do that to some extent with the ROM. On the LPC55, I think we did a little bit of investigation with that. But I think to your question about being able to read out the secret key, one of the features that was appealing to us about the LPC55 was that it had a PUF, a physically unclonable function.

LA (18:10):

And I don't have the full background to be able to talk about exactly what a PUF does. But the general idea is that it's able to do encoding that's physically unique to the chip such that it can't be cloned, such that it's only tied to the actual chip itself, which is a great way to be able to get that strong, unique ID.

LA (18:32):

And you can do that to get further encoding so that even if you did happen to say, get a copy of some of the flash, if it's been encoded by the PUF, you can't actually decode it.

CW (18:41):

Interesting. Okay.

EW (18:42):

Puff the magically secure dragon?

LA (18:45):

Yes. I like that.

EW (18:49):

Okay. So you found a buffer overflow. And buffer overflows are the sort of things that lead to security issues, because if you have, say, a buffer overflow in a function, you may be able to overwrite the stack. And then you can change the code to run what you want instead of what it was supposed to run, which is bad.

EW (19:14):

I mean, that's why we get denial of service attacks and all kinds of things that lead us to say, "Always check your inputs for malicious actions." So what does the ROM overflow do?

LA (19:33):

You gave a great example of about what people usually think of when they think of a buffer overflow, which is doing a classic stack overflow. In this case, because of how the ROM was actually parsing the code, it was an overflow in the global space.

LA (19:48):

So the idea is that you would just continue writing, and it was something that's maybe somewhat closer, I call it a heap overflow, but that's incorrect just because where this was, it wasn't actually overflowing the heap. It was overflowing something the equivalent of, say, the BSS section that's normally all zeros, and then you later set it up.

LA (20:08):

So I found an overflow there, and then it turned out right next to this global variable I was able to overflow was the heap address or heap allocator. And I was able to use that to be able to turn that into a way to get code execution. And I will be talking more about all the gory details about how I did that in my talk.

EW (20:31):

Okay. And just to make it clear, the talk is in early June, 2nd week of June, -

LA (20:39):

That's correct.

EW (20:40):

- in Santa Clara, California, and it's physical. You're going to a real, actual in-person conference.

LA (20:49):

I am. I will admit I'm a little bit nervous, but also excited to be able to potentially do a meetup and see people in person.

EW (21:01):

And have you been to a Hardwear.io event before?

LA (21:05):

I have not. I heard about the conference from my colleague, Rick, who suggested after we found this bug that it might be an interesting place to submit a talk.

EW (21:18):

They tend to do security talks more than any other kind of embedded talk, but it's all very embedded.

LA (21:25):

Yeah. It's all definitely fairly low-level, a lot of things at the hardware layer. I'm excited to see some of the other talks that are going and really get to learn about more aspects about hardware security.

EW (21:38):

And usually their talks are put online afterwards, and I'm hoping yours will be too. Do you know?

LA (21:45):

I don't know right now, but again, I'm hoping the talk will be online. If not, I will definitely be releasing some more of my slides and details once everything is over.

EW (21:56):

Cool. Have you written your talk yet? You can tell me. I know how this goes.

LA (22:01):

I'm in the process of doing that. I am not one of those people who can write the talk on and slides while they're on the plane. I need to be practiced. I definitely have the slides going, and I'm beginning to practice to make sure I have everything, especially for a talk like this.

LA (22:16):

I really want to make sure I'm explaining everything correctly. Because I realized, I did a test of the talk, and there are a few more diagrams I need to add to be able to explain things, like what exactly the PUF does.

EW (22:27):

Yeah. Okay. And how you found it, and the tools you used, and really, how can we use this? How can a malicious actor use this?

CW (22:41):

How can we use this to do things?

EW (22:44):

Actually, how a malicious actor could use this to do bad things is probably the most interesting of questions. What do you have for that?

LA (22:57):

That is a very interesting question. And I think part of it goes back to the system configuration of the chip. And when I say I found a buffer overflow in a software update, that sounds pretty bad. And people might initially say, "Wow, this thing is completely broken."

LA (23:12):

But if the chip is properly configured to prevent modifications to certain configuration areas, it's not completely broken. I mentioned the chip has Secure Boot. You can't change Secure Boot keys. You can't change other various configuration settings.

LA (23:29):

But what is available for you to be able to do is to perhaps write to unwritten flash pages that aren't covered by a Secure Boot image. If you have another image that's already been signed, you could roll back attack to be able to boot an older version that, say, might have a buggy version.

LA (23:46):

One of the most serious issues that we found with this is that typically the way the chip is set up, it has a feature called DICE that's designed to be able to compute an identity. And part of the way that DICE works, it relies on keeping a particular PUF-encoded secret restricted.

LA (24:09):

So the idea is that once it calculates the value for DICE using the existing image, it will restrict access and make a change to a register to prevent you from being able to access that same PUF-encoded value.

LA (24:22):

It turns out that at the point this buffer overflow happens, you can read that value out, which means it's possible to be able to write some code on there to be able to read out this PUF-encoded value you should really not be able to, and be able to say, "Clone an identity."

EW (24:39):

Yeah. Clone the identity. That's where being able to read the identity comes out.

LA (24:43):

Yeah.

EW (24:44):

And so you could make another system that pretended to be yours and then use that to probably practice other attacks.

LA (24:54):

Yeah. And especially for what we're trying to do with being able to have Root of trust measure other parts of the system, that's pretty bad if you could have another part of the system pretend to be the measurement and say, "Oh, here are some measurements. Yeah. They're definitely coming from me. Wink, wink."

EW (25:08):

Do I need physical access if I was a malicious actor?

LA (25:13):

That ends up depending on how exactly the chip itself is set up. So I'd say you do, but it's also important to think about what physical access actually means. Just because, say, if you have this chip deployed out there, and it's getting updates over the network, then maybe it might be able to do things over there.

LA (25:34):

But I mean, it requires stuff to be physically sent over a hardware interface perhaps, but ... depending on the other software you've written on your system, you can also invoke it that way. So it's not one versus the other. It depends on a whole bunch of other parts of your system.

EW (25:51):

Because it's part of the in-system programing which can happen over UART, SPI, I2C, CAN. And so you need to send things over one of those. Of course, one of those might be attached to a Wi-Fi chip or something like that.

CW (26:09):

You can in-system program this over CAN?

LA (26:10):

Yes.

EW (26:13):

I know. Isn't that awful?

CW (26:14):

I guess it makes sense.

EW (26:15):

Yeah. No, you'd have to.

LA (26:17):

Yeah. There's a version of this chip that does in fact have CAN support so I think, for those who aren't familiar with CAN, this is oftentimes used in automotive. So the idea is that, yeah, if they're trying to update something in your car, you could potentially be able to send things that way.

EW (26:35):

So you've mentioned that as long as things are set up correctly, this probably doesn't affect most people. How do I avoid letting people clone my device? How do I avoid having the PUF be puffed? Sorry, I can't get past PUF.

LA (26:59):

It's a great name, but yeah, so I should clarify that. I think, when I said that it doesn't affect certain things, that means this isn't as completely bad as it could be, is that the PUF you can definitely always read out as long as you're trying to use this buggy code.

LA (27:17):

And I think the real concern ... is that if for some reason you didn't actually fully seal the CMPA programming area, it's possible to change a whole bunch of things there by rewriting the flash. But to prevent using this, I think, we did a lot of evaluation when we found this issue at Oxide.

LA (27:37):

And I think we came to the conclusion that the best way to avoid this issue is just to not use this ROM update code at all. That's the safest path until you get a fixed chip.

EW (27:49):

Oh. And so you can write your own programming, your own flash programming, and have your own ROM update or your flash-based update like everybody else does.

LA (28:02):

Yeah, that's correct. And I mean, it's definitely considered a rite of passage, I think, to write software update on microcontrollers at some point.

CW (28:10):

I think that's a rite passage that gets repeated over and over again sometimes.

LA (28:12):

Yeah.

EW (28:13):

Serial drivers, boot loaders, and -

CW (28:16):

They all work differently.

EW (28:18):

Yeah. So has NXP had a reaction?

LA (28:24):

Yes, they have. So I think we were generally pleased with how NXP responded to this. We sent them the proof of concept, and they definitely accepted and were able to get a fix out. And I think we had previously had interactions with NXP's product response that was less than satisfactory.

LA (28:43):

And I think this time they definitely took it more seriously. And they certainly made the announcement. And I was actually pleased to find, I think it was last week, I stumbled upon that the security vulnerability was actually publicly available on their knowledge base.

LA (28:59):

And this was something I don't think we had actually seen before. So I think it's really good to see hardware vendors like NXP making these things public. Because I think all these things should in fact be public so that if there's a chip vulnerability, you should know, and it should be freely available to everybody.

CW (29:13):

Certainly if you're advertising that your chips are secure and that you have all these features, then being open about when they fall down is probably a prerequisite of being trusted.

LA (29:26):

Yeah. And I mean, it's not that you expect chips to be completely bug-free -

CW (29:30):

Right, right.

LA (29:30):

- and there to never be an errata, but I mean, it's a matter of making sure everybody can actually be aware of it.

EW (29:36):

... Is it in the errata? Are they changing their ROM and for new versions of this chip, or are they coughing up that programming code so that you could fix it, and compile it, and put it in part of your program instead of in their ROM?

LA (29:55):

I think they do plan to issue fixed chips. And actually, I was thankful I got an engineering sample to be able to test it and verify that it was fixed. So I think the hope is we'll be able to get some fixed chips.

LA (30:09):

But of course trying to get your hands on any kind of silicon these days is difficult. And even in the best of times, trying to get a new fix out is difficult.

EW (30:21):

You also submitted it to the NIST vulnerability database, which I had only seen for, I want to say big bugs. But I guess I only see it when I hear about something so catastrophically bad that then I go look at the NIST database. What made you do that? And how did you figure out that nobody else had found it before you?

LA (30:46):

Oh, so NIST and the CVE database is an interesting discussion. So CVE assignment is ultimately, I'd say, left up to both the reporter who is finding a bug and say the receiving end of a bug.

LA (31:07):

Sometimes companies may do the receiving end themselves, but ultimately Oxide decided to report the CVE to NIST to be able to have an easy way to track the vulnerability.

LA (31:20):

And that's really what I see this as being about, is being able to say, "Okay, we need to have a way to identify this and be able to point to specific ways that it goes there." And you're right that you oftentimes see CVEs as being highlighted for big issues, but anyone technically can request a CVE if they want for any kind of issue.

LA (31:39):

I think it's important to always read the details about what's on the NIST database to see what's actually there and what the issue actually means. And then to your question about, "How did you know if anyone else had actually found this issue already?" That's an interesting question and one I actually thought about a lot.

LA (31:54):

Honestly, I don't think we had a good answer, and I think there wasn't a good way to know until we actually tried to publish this and see if anybody had come out.

LA (32:03):

I think if someone had come out and said they had already found this issue, honestly I would've been really excited to see what exactly someone else was doing to be able to find this.

EW (32:11):

And you'd hope that NXP would tell you, "Thank you. We've already started fixing this."

CW (32:16):

"And here's your $10,000 reward."

EW (32:19):

Yeah.

LA (32:21):

Yeah. I mean, what we hoped when we reported to NXP, they'd immediately come back and say, "Oh yeah, we're getting ready to fix this. ... Sometimes these bugs aren't found, but we are looking forward to being able to get fixed chips and being able to deliver them."

EW (32:40):

Finding this bug, and writing it up in such a way that NXP can take action on it, and writing it up for the NIST vulnerability database, this all took time that you didn't have to spend.

LA (32:56):

That is correct. And I mean, I would also probably say at this point, I think Oxide definitely supports my work and finding issues like this. But at the same time, I think we're kind of tired of doing this work of finding these bugs. And we hope that this is the last one that we found.

EW (33:15):

Kind of hope that this is the last one you've found, but do you think it is?

LA (33:20):

I think it is. I mean, at least for now, I think, but who knows exactly what exactly ... will end up happening.

EW (33:32):

I can see how finding this, and talking about it, and giving a presentation on it are all very good for your career and good for the community, but Oxide's building servers. So it is nice of them to give you the opportunity to talk more about it kind of on their time. I imagine some of the preparation is on your time.

LA (33:56):

Yeah. And I'm grateful to Oxide, but it definitely does take some time to be able to do this. And I think internally we did have some back and forth about what exactly we should do. A lot of people who work in security will have many opinions about how exactly the disclosure process should work.

LA (34:14):

And I think in some respects, the way we do disclosures, how we would like to be disclosed to, is that when someone inevitably finds an issue in Oxide's... a bug, I mean, we'd like to believe that if someone came to us with a proof of concept, we'd take it seriously, and be able to give an estimate about when things were fixed.

LA (34:36):

But it definitely does take a lot of time to be able to do all that. But ultimately, I think it's good, and not just for the community, but also I think it's the right thing to do. But I mean, there's lots of debate out there about how you do disclosure.

CW (34:48):

Beyond the right thing to do, I think for a company with a product like Oxide's, which is a server which has certain security requirements, demonstrating competence that, "Oh, we at Oxide find these kinds of issues, and are really good at it."

CW (35:03):

"And that should give you more confidence in our ability to make solid hardware." I mean, that's not nothing.

LA (35:12):

Yeah.

EW (35:13):

Absolutely. Being able to say we found these vulnerabilities, therefore we're not passing them along to you -

CW (35:20):

Yeah. Yeah.

EW (35:20):

- is definitely confidence building for the server. When is the server coming out?

LA (35:27):

We're still definitely working on building it, iterating on hardware, and trying to be able to do things. Watch this space for when we have a Oxide space for when we're actually able to deliver it. But it's definitely coming. And people were out doing bring-up about two weeks ago.

LA (35:45):

And it was very exciting to see another iteration of hardware come out, and make lot of lights blink, and fans turn on.

EW (35:53):

Your website says late 2022 is when you're going to start shipping racks. So I won't ask you beyond that, because I know very well that if you answer, it's probably bad.

LA (36:05):

The website needs to be updated as well.

EW (36:07):

Okay. Well, we'll just leave that as it is then. You said you were looking forward to some of the other talks. Do you have anything in mind?

LA (36:15):

Yeah. I'm excited to see things related to glitching and side channels. There's a talk about breaking SoC security by glitching data transfers. I'd love to learn more in the future about how to do physical glitching attacks. I bought a ChipWhisperer which, -

EW (36:32):

Yeah.

LA (36:32):

- lets people do glitch attacks. I haven't had a chance to actually sit down and play with it sometime, and hopefully be able to try that on the LPC55.

CW (36:42):

And find something else.

LA (36:44):

I don't want to find something else, but there's certain things I'm like, "I wonder if I glitch this here, if I could actually break it." And it's sort of tempting me just to be able to find something else.

EW (36:55):

Yeah. So tempting.

CW (36:57):

It's a different attitude toward development, because my attitude toward development is I want to get this code done and never see this again. And I certainly don't want to find out there's something wrong with the chip. ... I think your attitude is much better, to be deeply curious about the things you're using.

CW (37:18):

And I think it's cool that the company supports that because a lot of times, I think you might have mentioned that, you're right, that other companies might not be so supportive of that. And, "What are you doing finding a bug in this thing that may or may not apply to us? Just get this done."

LA (37:36):

Yeah. I'm really lucky to have a lot of support from everyone at Oxide. And I was also joking before to some coworkers, that in some respects, the fact that I had to do some reverse engineering to be able to figure this out actually made it more attempting to try and figure out what was going on.

LA (37:54):

If NXP had actually just put up all the C code for the ROM, I may have been less tempted to want to dig into that and just do a whole bunch of reading of the code to be able to find out what was there.

EW (38:05):

Because it would be transparent. And if it was transparent, someone else probably looked at it.

CW (38:13):

Well, and reading code's not as fun.

EW (38:15):

Reading code is not as fun.

LA (38:17):

Reading code is important, but that doesn't mean it's always fun.

EW (38:21):

It's more fun to pit yourself against the puzzle of what they've done.

LA (38:27):

Yes. And reverse engineering is definitely a fun puzzle to try and figure out, just because Ghidra is a great tool for being able to reverse engineer, but it doesn't always do everything perfectly. It's still up to you to figure out what exactly this code is doing and what it's calling.

EW (38:44):

Ghidra, that's the one that you put in Arm machine code, and then it makes assembly code, and then it makes C code. Is that right?

LA (38:54):

Yeah, that's correct. You give it some code that will disassemble it into assembly and then also attempt to put it back into something C-like.

EW (39:04):

How well does that work? I mean, how does it decide what to use for variable names?

LA (39:10):

It tends to assign them sequentially. So you're absolutely right that figuring out what the variables do is one of the first things you do when you're looking at a disassembly.

LA (39:19):

What you end up with essentially is that if you imagine if you took a C function, and took away all the nice names for everything, and everything's just variable one, variable two, variable three. So it does a lot of complicated algorithms behind the scene to be able to generate this.

LA (39:33):

And then you're left trying to pick up patterns. But it's pretty easy to start guessing what things are, for example, like, "Oh, this looks like a loop."

LA (39:42):

And especially with something like when you're reverse engineering a ROM, I spend a lot of time comparing what the ROM was actually accessing to physical hardware blocks in the memory map.

LA (39:53):

So I was able to say, "Okay, this function is touching the GPIO block. This function is touching the clock configuration block," which gave me a good idea about what things were doing.

CW (40:03):

How much code was this in the ROM?

LA (40:07):

There's a pretty big chunk of stack there. I mean, it includes stuff to be able to do the ISP. There's a USB stack -

CW (40:17):

Oh, my God. Okay.

LA (40:18):

- which gives you an idea about how much you have to be able to support in ROM.

CW (40:22):

That's a lot.

EW (40:23):

Wait a minute. Ghidra is from the National Security Agency?

LA (40:28):

Yes.

EW (40:30):

Okay. I had no idea, but they have a Git repo, and it says NSA. That's interesting.

CW (40:40):

Too many secrets. Sorry.

LA (40:43):

Yeah. But that sometimes makes some people nervous. But I mean, I found it to be a great tool. I'm not actually an expert in reverse engineering. This was really one of the first serious projects I've ever taken up. But I found Ghidra to be a nice tool that's available.

EW (40:58):

Does it put the C library all together so that you can identify what the C functions are, like strcpy()?

LA (41:10):

I think it might, again, I'm learning a lot about Ghidra every time I use it. But it has some things built in to be able to identify things like common formats. But it did not have a way to automatically detect things like strcpy() and memcpy.

LA (41:26):

So that was actually one of the things we ended up having to spend some time doing. And staring at some of these functions, you realize, "Okay. This is actually just memcpy just written out in a bunch of assembly," because it's well-optimized memcpy.

EW (41:38):

Yeah. Oh, optimized code. It is very hard to understand.

LA (41:42):

Yes.

EW (41:44):

Wow. Now I kind of want to play with it. What does structures look like in Ghidra?

LA (41:49):

There is a structure editor, so you can define your own structures to be able to say what things want. So the idea is that you could edit it field by field and be able to specify the layout of things.

CW (42:01):

Yeah. So I think as you go along, you kind of deduce what things are, and then rename them, and sort of make it more readable from the automatically generation stuff.

EW (42:08):

And hope you're right, kind of like a crossword puzzle where things start to line up.

CW (42:11):

That's what I was just thinking. Yeah.

LA (42:12):

The crossword puzzle is actually a great example, because I think sometimes what I ended up doing was saying, "Okay, this is definitely a structure. I can tell by what it's accessing. And it's also got some other nested structures."

LA (42:23):

So I sort of end up with structure one has, structure two has, structure three. And I could guess how big things were, and then having them all have fields with assorted names, and being able to try and guess what these was.

LA (42:36):

And as I looked at the code, I would go back to being able to say, "Okay, this looks like it's calling the function that's for initialization. This is like a teardown function to be able to change the names to better match.

CW (42:45):

I'm just now thinking of terrible interview questions, like just hand somebody this, and a pile of machine code, and say, "Okay, tell me what this does. You have a couple hours."

LA (42:55):

I don't know. I swear I feel like I've seen that as an interview question before about, "Tell me what this code does."

CW (43:01):

Yeah, I've seen that, but not from, yeah. I've been given that question. I got very angry. Oh, I've been given that question twice.

EW (43:10):

It was because the question was, "What does this code do?" And the code was plus plus, plus plus plus plus, plus plus.

CW (43:16):

No, no. There was minus. Minuses.

LA (43:17):

Oh, I hate that question.

EW (43:18):

Minus plus plus i. And the answer is, it gets somebody fired.

CW (43:22):

Well, there was stuff afterward, too. Because if it's just on one side, it's fine, but there was stuff, yeah.

EW (43:27):

The only answer is, it should get somebody fired. And if that's how you write your code, please let me know, so I can leave the interview right now.

CW (43:35):

The other question I got asked, "What does this do?" It was Duff's device, which is a loop unrolling thing.

EW (43:39):

Oh, my God. That's so hard to identify.

CW (43:41):

Yeah.

LA (43:42):

That's an annoying question. And yeah, I hate that interview question just because it involves actually knowing the answer beforehand.

CW (43:50):

Right.

LA (43:50):

And it's one of the things you can't actually solve in an interview. I mean, it's very cool to be able to learn about, but that's not a great interview question.

CW (43:56):

I agree completely.

EW (43:58):

You do a lot of interviewing for Oxide, don't you?

LA (44:01):

I do quite a bit of interviewing with Oxide. I've gotten a chance to meet a lot of candidates, and I help with application review too.

EW (44:09):

What do you look for?

LA (44:11):

I don't think there's one necessarily right answer about what Oxide is looking for. I think it is a combination of some level of experience, but also an interest in what we're building. I think when I say interest, some people think, "Oh yeah, so you need to be completely passionate."

LA (44:31):

But I think sometimes when people say passion, they assume that means it must be all-consuming, the only thing you ever do. But I think it is about, can you demonstrate that you're able to get the job done? And I think there's a lot of different ways you can show that you have relevant skills to be able to do what you want.

LA (44:51):

I mean, can you talk about, what have you built before? I always like to ask people about the past problems they've solved, because I think that shows a lot about types of things they've solved and exactly what problems you actually overcame.

LA (45:05):

And I think Oxide has definitely tried a unique approach with its materials question and giving a chance to be able to show exactly what they want just because I think that materials are an interesting way to be able to show off a different background, for example.

EW (45:20):

Can you tell us about the materials question?

LA (45:23):

Sure. So Oxide has everyone submits written materials. I'd say one thing that Oxide definitely values is being able to write well. I think Oxide asks for a work sample which is left open-ended. So it's a way for you be able to talk about what people have done. If you've done open source work, that's a good thing to be able to do there.

LA (45:44):

And I mean, a lot of times what I'm looking for there is how exactly does that relate to what Oxide is doing? And I mean, what exactly are you showing me about why that would make you a good person to work with? Oxide also asks for an analysis question.

LA (45:58):

I honestly love reading the analysis questions just because I love seeing what kind of problems people have worked through in the past, and getting to work through the nitty-gritty details about these weird bugs, and seeing what sort of things people have done.

LA (46:12):

And then there are also some questions related to your happiest time, your unhappiest time. And some people may think these questions are a little bit cheesy.

LA (46:20):

But I think they also are a good way to get people to really reflect on what exactly they've learned, and maybe even in fact, things they wish they had done better or might have done differently today.

CW (46:34):

I find those kinds of questions much, much better than solve this problem or -

EW (46:39):

Duff's device.

CW (46:40):

Yeah. Or, "Here's a high pressure situation that you'll never, ever actually encounter. You have 30 minutes to do some code thing that if you had three hours wouldn't be a problem." I like engaging with the candidates and figuring out, okay, do they have a history of solving things?

CW (46:56):

Do they have a history of delivering and are they somebody I want to work with? And I like what you said about engagement instead of passion. Because I can be really engaged with work, but also not very passionate about it. So I think that's a distinction. Yeah.

EW (47:12):

I mean, that's why we're consultants, because -

CW (47:15):

I don't want to be.

EW (47:16):

We don't want to be passionate about companies anymore. We want to be passionate about our lives, and I'm happy to be engaged, but at the end of the day, I'm not going to be dreaming about your product.

CW (47:31):

Yeah. Reminds me of the time I was asked if I had a passion for iPod, and then I knew I was doomed.

LA (47:35):

You don't have a passion to listen to 4,000 songs in your pocket?

EW (47:41):

Well, no, that part was cool.

CW (47:42):

Yeah. It was, well, interviewing there.

LA (47:44):

Yeah.

CW (47:44):

They wanted to know if I was super into iPod, which I didn't have one.

EW (47:49):

Into music.

CW (47:51):

Yeah.

EW (47:51):

Does that count?

CW (47:53):

So what is your day work like aside from finding holes in LPC55?

LA (48:03):

Yeah, so, I do a lot of firmware work in Rust. So I spend a lot of my time doing that. And I'm writing code that goes from the Root of trust and sometimes related to the service processor. I also help with code reviews, and I'm lucky to have a lot of fantastic colleagues as well.

LA (48:20):

So I'll talk to them if I have questions about what I'm doing or especially Rust. I didn't really know Rust before I joined Oxide, and I've definitely gotten much better at it. But I mean, there's certainly a lot to learn there to be able to pick up on everything and be able to do a lot of things correctly there.

EW (48:37):

I know that the Oxide folks like Rust, I mean, they named their company after the language. How do you like Rust? You can tell me. Be serious.

CW (48:47):

You have to whisper.

LA (48:49):

I do like Rust. I promise there's not a Rust crab sitting next to here, pinching my leg, telling me to say this. Mostly I'd like to say that Rust, it's a powerful language. And I think also it makes it a lot easier for me to be able to write C-like code because of what the language offers.

LA (49:12):

The fact that I don't have to think about array index out of bounds errors or other things like that, or it will give me an error in a way I can actually parse, is much nicer. Some time ago I remember I was working on making some change to the Hubris, Humility stuff.

LA (49:30):

And I ended up hitting a bug that I think probably would've taken me significantly longer to figure out if it had been done in C simply because it would've been some sort of a silent array index out of bounds error, as opposed to giving me a nice error message. And I think it's things like that that are really great to work with.

CW (49:47):

Has it been difficult? Is the language changing at a rate that's somewhat difficult to keep up with?

CW (49:53):

That was one of the issues I had with iOS development, with Swift, was every six months, like, "Oh, here's Swift 5.5 and look at these eight things you can do that are really complicated now, but are probably cool. And you should learn about them."

CW (50:04):

And it got really in the way of writing code sometimes, because I was like, "Well, I've got to keep up with the latest thing. As Rust is also a new language, has that been an issue?

LA (50:15):

I think the Rust community has tried to minimize that in terms of splitting things out to, and having a well-defined process for a stable tool chain and an unstable tool chain. So I think if you're working with the stable things, you should mostly be able to find things are roughly the same, and you'll be able to do things.

LA (50:36):

Now I think, especially for what Oxide is doing, we are definitely close to the leading edge of things. So I think there are certain features we're keeping an eye on that we're hoping to see go stable. But I think that the language has definitely come a long way, and it should be pretty stable to be able to do a lot of things.

EW (50:53):

Going back to the bug you found and are going to be talking about, this wasn't the only one, was it?

LA (51:00):

No. So actually last year I ended up finding another bug or a different kind of bug.

LA (51:08):

That was actually why I originally had the ROM dump around was, that while taking a look at the ROM dump, I discovered there was an undocumented hardware block that could be allowed to patch the ROM and be able to make changes to the ROM.

LA (51:22):

And something like this definitely does have its use cases, but it couldn't be logged out, so it's possible to reprogram it, which could be used to break isolation between the secure and non-secure world for TrustZone.

EW (51:37):

That sounds kind of important.

LA (51:39):

Yeah. And I mean, I think this was our first experience with NXP and that was the one I think that we were less than satisfied. I think it took a little bit of convincing to have NXP believe that this was an issue.

LA (51:51):

And I think more than anything at Oxide, we really just wanted NXP to give us the documentation for what this thing was doing and make sure everybody knew it was available, just because there's good reasons to want to be able to patch your ROM.

LA (52:03):

I mean, ultimately your ROM is just code, and you're probably going to have a few bugs in your ROM. This is understandable, and you need to have a way to fix that up. But I think what's also important is to make sure that you can't reprogram that say, to be able to do other things you weren't expecting.

EW (52:18):

So let's go back the RAM patcher. ROM patcher, sorry. Let's go back.

CW (52:25):

Yeah. RAM patcher's pretty easy.

EW (52:26):

Yeah. The ROM patcher would let you modify the ROM, including how you program for the TrustZone, including the PUF generator system, and the firmware update. And, okay. So why didn't you ditch NXP at that point? That seems really important.

LA (52:55):

Yeah, this is a question we get a lot. And again, we spent a lot of time trying to figure out exactly what we should do. And it sort of comes down to a couple of factors.

LA (53:07):

One was just that I mentioned we had some specific requirements for what we were looking for in a chip, and it turned out that there weren't a lot of chips out there that met our requirements. But we still have the documentation for writeup when we selected this chip back in spring 2020.

LA (53:24):

And even back then, there were some chips we had to rule out simply because we couldn't actually get our hands on silicon. All we could get were datasheets.

EW (53:32):

Probably still can only get datasheets.

LA (53:34):

Yeah. And so trying to find that, and then there's also the factor that we're pretty far into our product, we're getting boards and being able to do things like that.

LA (53:44):

So trying to find another chip, and being able to put that in, and then having to do even more silicon takes more time. And we've all spent a lot of time evaluating the chip. So I think we know far too much about this chip by now.

LA (53:54):

So in some respects, we are reasonably confident we know exactly how this thing works, so we've decided to go with it. I do think this is a great lesson for everyone to think about.

LA (54:05):

And I'm definitely going to be talking about this in my talk, about making choices like this. I don't wish silicon bugs on anyone, but sometimes you end up having to make these hard choices.

EW (54:15):

Arm itself has a module for ROM patching.

LA (54:19):

They used to actually. There used to be the Flash Patch Breakpoint Unit, which was for Armv7 and earlier, but it was explicitly removed in Armv8-M, I think because of TrustZone. Because they realized you could actually use this to be able to do bad things.

EW (54:37):

So have you used it to do bad things to show the vulnerability?

LA (54:45):

Yes. And when we found this issue, I think we shared it with NXP, and I don't think they were fully convinced. So I worked with my colleagues to be able to do a full proof of concept. And my colleague, Rick, I think, was the one who really helped to dig in and figure out how to turn this into something that was pretty impressive.

LA (55:01):

And I think we joked about figuring out how to do assembly code golf in terms of finding the smallest number of instructions we could do to be able to do something interesting to be able to reprogram the ROM.

LA (55:13):

And what Rick and the rest of us eventually came up with was something to be able to take what is essentially reference code out there and demonstrate that using the expected APIs, we could have the non-secure world read out stuff from the secure world, which was definitely not supposed to happen.

EW (55:32):

So how did you fix that?

LA (55:36):

That one was actually somewhat easier to fix mostly, is that it is possible to actually lock out changes and access to the ROM patcher via another security mechanism on the NXP. So that in fact is available or at least restricted to only certain levels such that only certain levels are able to make modifications.

EW (56:01):

Is that something you have to do on boot each time? Or is it more like a fuse that you say, "Okay, never again can the ROM patcher patch this?"

LA (56:10):

No, it's not a fuse, unfortunately. You have to do it each on boot.

EW (56:15):

So if somebody could hijack the boot, they could read out your secrets?

LA (56:21):

Yeah. If you managed to hijack the boot and disable that check, you would probably run into some problems, assuming you could be able to take something else to be able to do this.

LA (56:32):

I mean, this is a lot of times what security ends up looking like, is that, "Well, if you could do this, you could do this. You could do that." So it's all a matter of finding that one little inch and being able to come up with a mile.

EW (56:44):

Well, this has been really interesting, and I look forward to your talk hopefully being available online after the conference. Laura, is there anything you'd like to leave us with?

LA (56:57):

Stay curious everyone, and don't be afraid to break things. You never know what you might find.

EW (57:04):

Our guest has been Laura Abbott, an engineer at Oxide Computer working on Rust software for microcontrollers. She'll be speaking at Hardwear.io in early June 2022 in Santa Clara, California.

CW (57:17):

Thanks, Laura.

LA (57:19):

Thanks.

EW (57:21):

Thank you to Christopher for producing and cohosting. Thank you to Andrea at Hardwear.io for the introduction and Rick Altherr for his Patreon support and his lightning round questions. And of course, thank you for listening. You can always contact us at show@embedded.fm, or hit the contact link on embedded.fm.

EW (57:41):

And now a quote to leave you with, from Audrey Hepburn. "Nothing is impossible, the word itself says "I'm possible!' "