342: That Girl's Brain
Transcript for Embedded 342: That Girl’s Brain with Jess Frazelle.
EW (00:06):
Welcome to Embedded. I am Elecia White, alongside Christopher White. Our guest this week is Jessie Frazelle. We're going to talk about computers, computers, maybe Rust and computers.
CW (00:22):
Jessie, thanks for putting up with the technical difficulties.
JF (00:26):
Thanks for having me. I'm excited to be here.
EW (00:29):
Could you tell us about yourself?
JF (00:32):
Yeah. So, I've been a computer nerd for a long time. I've worked on a lot of different projects, whether it's like the Go Programming language, Docker. Now most recently, we started a computer company.
EW (00:48):
That's Oxide.
JF (00:50):
Yes, Oxide Computer Company.
EW (00:52):
What do you make? I mean, you must make computers, because you introduced it as a computer company, but I think not laptops.
JF (00:59):
Yeah. I mean, I would love to one day do that. But right now, we're making rack scale servers, so not even individual servers we're making. We're actually selling racks of servers themselves. So, you get a bunch of servers all ready packaged into a rack.
EW (01:17):
Okay, we're going to talk about why that's interesting in a few minutes, but first, I want to do lightning round. We'll ask you short questions. We want short answers. Are you ready?
JF (01:26):
Yes.
CW (01:27):
How much malt powder is required to change a milkshake into a malt?
JF (01:32):
Okay, so I mean, this is a contentious question, especially for me. You must have gotten this question from someone internal at Oxide, because I have an obsession with malt powder. I like double what most normal people want. So, I mean, that might be like eight scoops or something ridiculous.
EW (01:50):
Do you have a favorite vintage computer?
JF (01:54):
Oh, that's hard. I really like the old Macs, because they're fun to play with and they have some of my favorite games just from my own past history. But I also love the 486 they have has a bunch of old software that my grandpa wrote. So, there's like a sentimental aspect there.
EW (02:15):
When you say old Macs, do you mean Apple IIe or iFruit or.
CW (02:22):
Wait a minute. Wait, wait, wait. Stop. Apple II's were not Mac's.
JF (02:24):
Yeah, this is like the Macintosh SE.
CW (02:26):
You can't say that.
EW (02:29):
I'm sorry, so ignorant.
CW (02:35):
Favorite processor?
JF (02:38):
So, currently, it's the AMDs, because they're just like the most powerful that's on the market today. I mean, ARM has like a lot of promise, but it's just not there yet from what I've seen. Yeah, it's hard. I also love RISC-V as far as instruction sets go, because it's open source. I think that there's a huge amount of potential there when it comes to what the future will look like in 20 years or something as to what people are running.
EW (03:09):
What's your least favorite programming language?
JF (03:18):
I really like all of them. I would say any language that is not compiled, because I mean I love to program by compile time errors, which sounds terrible, but I like to refactor a lot. The easiest way to catch mistakes is that way. So, yeah, any language. JavaScript now with TypeScript is great, because you can kind of get those types errors. But I don't think I have a least favorite one honestly.
EW (03:47):
It sounds like your least favorite might be Python.
JF (03:50):
Yeah, I mean, I do love Python on occasion though. I mean, it depends, but yeah, Python is hard to refactor well.
CW (03:56):
Would you rather be a superhero or a supervillain?
JF (04:03):
Yeah. So, I mean, I like 95% superhero, but then on the occasion when you need to really get something done, maybe that 5% of supervillain comes out. I don't know.
EW (04:15):
Do you have a favorite fictional robot?
JF (04:19):
Yeah, so I have a couple. I've loved Wall-E, he just touches my heart in a place and Betamax from Big Hero 6, because he's this caregiver and then they kind of turn him evil but then he goes back to being the nice guy again, which is cute.
EW (04:36):
Okay. On the website, it says you make hyperscaler infrastructure. How is that related to large racks of computers?
JF (04:47):
Yeah, so basically what hyperscalers like Facebook, Microsoft, Google, all of those huge companies did is they were pretty much sick of using any proprietary vendor's hardware for their computers, because mostly what you're doing is just stacking a bunch of boxes on top of each other. It kind of makes no sense when you're running it at this super large scale, because on those boards, you get a lot of things that you don't need in a data center. You don't need a mouse. You don't need VGA. You don't need a lot of things. So, they kind of redesigned everything getting rid of all of the old existent desktop components of this hardware, and then they wrote all the software on top of it.
JF (05:32):
So, what you get is actually way more dense racks, so you can pack a lot of compute in there. You also own the entire stack. So, when it comes to bugs, these vendors that you're kind like of slap sticking together to build this data center, they are pointing fingers at each other. Sure, you have a CPU vendor and vendors for storage and stuff that, but you get rid of a lot stuff on top, software vendors and stuff like that, because they wrote all of the stack. So, you get this really nice way to deploy. It speeds up productivity, and you get a lot more compute for the power that you're using. Yeah, it really just cleaned things up throughout.
JF (06:22):
So, we're basically doing that for everyone else, everyone who doesn't have a team to build out a whole hardware infrastructure for their business, because it's not what they do as a business. So, we allow all the rest of Fortune 500 companies or whatever to have what these hyperscalers have internally without all the work.
EW (06:45):
Okay, I'm going to go back to the word 'hyperscaler'. I didn't realize that that meant just the big companies, the companies that are big enough to do this sort of thing.
JF (06:55):
Yeah, I mean, I think that it means more than that. It also means the companies who run at this massive scale, but even companies that run at massive scales, they don't necessarily have the skill set or the right people internally to do it.
EW (07:13):
Totally makes sense. I mean, this seems something if you're not a software company that you should just buy. I mean, this is something that is kind of boring. It's almost a commodity.
CW (07:28):
You're a software company, not a hardware company.
JF (07:35):
Totally.
EW (07:35):
I mean, not a software company, not a hardware company too. I mean, you'd have to become a hardware company partially to-
CW (07:38):
Sure.
EW (07:39):
... do this, but it's the Facebook, Google, as Jessie mentioned. Okay, so the reason they do it is because it is more cost effective for them in the end.
JF (07:53):
Yeah.
EW (07:54):
How come big companies aren't doing this, Dell or Apple or any of the computer manufacturers?
JF (08:03):
So, they're trying. You can get a set of Dell boxes with VMWare on top. They're kind of trying to make this thing that is all put together, but it's two different companies within Dell. So, you really aren't getting this really integrated software and hardware experience. Also, you aren't getting on top, really a modern interface, an API for deploying. There's a lot of just old thoughts there, but, yeah, it's very different than what you would see inside an actual hyperscaler. It's definitely different than what we're doing, because it is still the old hardware at the bottom. You still have all the desktop stuff that you don't need in a data center, which makes no sense.
JF (08:58):
So, we're definitely getting rid of all of that. We even went so far as to not even have a BMC on the servers. We have a very traditional service processor instead, so that the BMC has less control now. I mean, the BMC should have never had so much privilege in the first place, but people just kept adding features there because it was where they needed to add features. But now, we've taken out a lot of that stuff that doesn't need to be there. Actually, the BMC just does what the BMC should do. Well, the service processor in our case, and that's boot the machine.
EW (09:36):
What does BMC stands for?
JF (09:39):
Baseboard Management Controller.
EW (09:41):
Okay.
CW (09:43):
So, what do these look like architecturally from the hardware standpoint? So, I worked on rack scale routers a long time ago, and it was one thing. It had a central control processor playing and it had a bunch of line cards and things, but it was one unit. When I think about data centers, I think of a rack full of individual blade servers. Is this a rack full of custom designed individual blade servers, or is it a rack full of one kind of purpose built thing?
JF (10:14):
Yeah, so it's a bunch of what we're calling sleds, because they're more the Open Compute Project form factor. So, in a rack, you have the width of the rack. So, each slide is actually half the width of the rack. And then it's like OU or two OU height. These are the open units in the Open Compute Project. That's how they're defined. But yeah, so we have just a bunch of those slides in there, and that's what makes up the rack.
EW (10:46):
When I worked at HP in their NetServer Division 9 million years ago, we had monitors to make sure that the server was doing what it should do. Does each rack have a monitor?
JF (11:06):
Yeah, so each rack has rack controllers, which internally does all the software for distributing compute and managing networking for the rack. It does all the kind of smart things in between. I mean, we have also a top of rack switch, which might be in the middle of the rack and might be a middle of rack switch. We have two of those for making sure if one goes down, we have another. But yeah, so these rack controllers, if you have multiple racks, all of them are going to talk together.
JF (11:39):
And then in this kind of one pane of glass, you have this user interface where you can see all the racks at a given point of time and then everything that's going on. You can drill into individual racks and then see like, "Oh, this rack, this disc needs to be replaced," or anything like the latency on this network cable is bad. You can drill into any finite amount of detail.
EW (12:10):
Do you still use SNMP?
JF (12:14):
That I do not know.
EW (12:17):
It's a very old protocol, although it is still in use, that describes the health of units monitoring that. I just wondered what the latest technology... I mean, what is this view built upon?
JF (12:31):
So, this is going to be built on all of our software that we're writing, even from the firmware up. So, I think that largely, we won't use something like that, but I actually have no idea. I'd have to ask one of our engineers.
EW (12:47):
Fair enough. How did you decide that this was a good idea? How did you say this was a space that was worth doing?
JF (12:58):
So, I think that between the three of us, me, Bryan and Steve, we all had a different process for going about this. But mine was mostly talking to people who are already running on-premises. I talked to a bunch of different people, just cold reached out to people too, who I knew that they had to have some sort of on-premises existing hardware. And then I wanted to see what problems they had. Because I wanted to make sure we were actually solving a problem for people. It turns out like it's a huge problem for people running on-premises. There's problems all over the sack. It could be in the firmware where you have a huge outage and then you get two vendors basically pointing fingers at each other.
JF (13:39):
Your bug doesn't actually get fixed. Your business is actually on the line, because these two vendors are pointing their fingers at each other; or it's bugs higher up in the stack, because you have some software layer on top that was then talking to the hardware and then those layers got basically pointing fingers at each other. So, almost by putting this together as almost one unit, then you remove a lot of that, "We don't know what's happening between these two layers."
JF (14:11):
Also, by open sourcing a lot of it, you get visibility into the layers of the stack as well. A lot of the folks I talked to, they don't have on-premises necessarily nice API's that you would get in a cloud. It's not like an easy two. It's not GCP, where you're deploying VMs. It's not kind of the same API-like very developer friendly experience. So, it seemed there was a lot that we could do there as well.
EW (14:45):
So now they can just point the finger at you.
JF (14:47):
Yeah, which is fine. I'm fine with that having finger pointed at me.
EW (14:53):
Open source, how much of what you're doing is open source?
JF (14:57):
Yeah, anything that we are writing software-wise is going to be open source. We're going to open source the hardware as well, but we're definitely going to open all of it. The things that might not be open sourced because of vendors where we're wrapping, say, AMDs proprietary firmware for their silicon, that we can open source that. But we're working with a lot of vendors to get these down to the lowest, lowest bits, so that we open source as much as possible.
EW (15:30):
Do you have anything open sourced already, or is it all future?
JF (15:34):
Yeah, there's a few things on our GitHub, it's github.com/oxidecomputer. I think there's just a few things with regard to the API, but yeah, we're just open sourcing random stuff as we go.
EW (15:49):
You talked about open compute. That's where Microsoft, Google, Facebook got together and said, "We want our computers to look this, so that we stop having to deal with all you, people. We just want them to be the same." Is that-
JF (16:01):
Yeah.
EW (16:02):
Okay. Why are you doing that? I mean, of course, if you do eventually want to sell into the bigger data centers, the hyperscalers, you would need to conform to that. But if you're not selling to them, why not do your own?
JF (16:21):
So yeah, no, it's super true and that's a really good point. What we actually are doing is a little bit different than what they're doing. So, there's the Open Compute Project, and then there's this other one called Open19. Open19 was... LinkedIn kind of took their design for hardware and put it into a different foundation basically, but there's good parts of both. So, Open19, you get this really nice cabling on the back of the rack for networking, and you can just basically pop it right into the network slot versus having individual cables that you have to unplug. So, we're going to take that, and then we're also going to take the power bus bar from the Open Compute Project. We're basically taking the best parts of both and putting them together, which is not what people in the Open Compute Project typically do.
EW (17:20):
I don't want to ask how much is a rack, because that seems odd. But if I wanted to make a CPU or if I wanted to make my own chip, I know that I shouldn't even consider it if I don't have $2 million.
CW (17:35):
Two?
EW (17:36):
Well, that's a small chip. What scale of business are you looking for?
JF (17:46):
Yeah, we're looking for pretty large-scale businesses. Sadly, this won't be the type of server that's just running in someone's garage. I mean, unless they have the money to do so. That would be really fun. I mean, I would love that. But yeah, we're really targeting Fortune 500 companies, large enterprises.
EW (18:03):
Cool.
CW (18:04):
Think of running it in your garage, so if I wanted one, what kind of power would I have to install in my house?
JF (18:11):
Yeah, so here's where things get interesting. Because a lot of enterprises and traditional companies, they either are running them in their own data centers or they're running them in colos, it's a little bit different than... Hyperscalers kind of can draw as much power as they want to. They get up to a huge amount of power on the draw. But for us, it was actually hard because we need to conform to basically all these colos where getting 16 watts or whatever, it's the maximum that you can get. Whereas a hyperscaler, you would go well above that. So, we have that as a restraint.
JF (18:56):
So, some of these racks might not be fully populated based on the power that people can draw from them, but then we can also start filling it in as they realize how much power they're actually drawing. So, let's say start with half a rack and then we'll give them servers once they realize that they can handle the capacity for that.
CW (19:17):
I think I recall the TELCOs used to want DC power, high amperage, DC power. Is that still the case?
JF (19:26):
We haven't actually talked to that many TELCOs.
CW (19:28):
Okay, good.
JF (19:28):
So I'm not sure there.
CW (19:29):
Okay, fair enough.
EW (19:31):
I mean, that's been true for some data centers as well that they want DC power, big DC power, lots of DC power. Which actually brings me into you recently wrote an article about data centers and carbon footprints. Could you summarize?
JF (19:48):
Yeah. Well, actually, okay. So, it started as one of our hardware engineers. He wrote this really nice thing internally for how power works and then I got kind of nerd sniped by. I was, "Whoa, this is really interesting." I didn't know anything honestly before writing this article. I super dug into it, because I was super interested in where basically we could help people on their carbon footprint. Because it seems like there's a lot that we could do to get the power usage better.
JF (20:24):
So, I looked into what all the hyperscalers basically are doing about their carbon footprints and how they're looking at that. And then I really dug into just how this works in general, what power usage efficiency is, and how each hyperscaler does power, because actually Microsoft has a different way of doing it than Google and Facebook, which is interesting.
EW (20:51):
In this week of ash raining over the San Francisco Bay Area and lightning storm is predicted for more fires, why does carbon footprint matter?
JF (21:02):
Yeah, so it super matters. Sadly, the ashes due to a natural disaster of the fires. But with data centers, that's us, we are the enemy. We're the ones that causing it. So, if there's any way that we can make it more neutral, then it's a huge deal.
EW (21:23):
So, what are some of the things that are different between the different hyperscalers?
JF (21:29):
Yeah, so Microsoft didn't go the route of using a power bus bar, which is just this huge metal thing on the back of the rack that serves power. They actually do individual cables. When I first started looking into this, I was very naive, but I was like, "Whoa, I wonder by making this decision, Microsoft is actually causing them to use a lot more power where they don't need to." It actually turns out it's pretty negligible. There's a lot of nuance here when it comes to the right thing for the job, but as long as you have cables that are very, very, very high quality, which I assume they do, then you're getting the same power draw as with a power bus bar. I don't know exactly why they did it that way. There's a couple talks on it.
EW (22:22):
[crosstalk 00:22:22] They did it the first time that way and it worked.
JF (22:24):
Yeah, it's true. There's a couple talks on it, but you don't get the serviceability gains that you get from having the bus bar. Because when you have to service one of these things, you have to go unplug the cable. Whereas with the bus bar, you just pop the server in and out from it. There's no cable, it's super nice.
EW (22:46):
In the article, you said Google sets its data centers at 80 degrees Fahrenheit, 26°C, instead of the usual 70°F 21°C. I mean, does matter that much?
JF (23:02):
So, it's on the fence. I've been looking into this and I would actually be curious honestly Google's take on this, because it seems there's a few articles that say the original studies which say you should get to a max of 77 degrees before your hardware starts... It's damaging to the hardware. Those studies are pretty old. So, a lot of people are like, "Google doing this, maybe it's not that damaging." But I would actually be interested in Google's take on it since they've been the ones doing this for the longest amount of time. They would probably have the best take, but it does seem the industry is more coming towards maybe we should retake a look at how CPUs get damaged in heat or something like that. There's an opportunity for improvement.
CW (23:53):
That's a huge part of the whole power draw too, because if folks haven't been in a data center... The ones I've been in are you have to put earplugs in before you go in, because the sound of the air conditioners and the fans in the racks are so loud that it's damaging to human hearing. There's just so much cooling going on. It was just a huge portion of the total power.
JF (24:16):
Totally.
EW (24:18):
The Power Usage Efficiency, PUE, does that measure the cooling? Does that measure the processing? what is that? How is it measured?
JF (24:30):
So that's the total energy required to power and that includes lights, cooling, anything that's within the building and drying power. And then that's divided by the energy used for servers.
EW (24:46):
Okay, so that's the overhead. It measures the overhead versus what the server is doing, but is the server doing computing or is it doing monitoring or is it just sitting there doing nothing and drawing power for no reason?
JF (25:02):
I mean, it depends actually. PUE is this actual point of contention, because when I started asking people for feedback on this article, it was funny because a lot of them were like, "Hey, you can't really take PUE seriously." Especially numbers people give out, a lot of people who give out their PUE numbers, they might not be taking into even environment from the outside, especially if a data center is located in a hot and humid place. There's a bunch of stuff that isn't included. So, the workload's being used definitely, not included.
EW (25:38):
Okay, so PUE doesn't measure everything and may be a confusing measure?
JF (25:45):
Yeah, I mean, it can be done well, and it can be done poorly basically. People don't give a lot of information as to what exactly they put into their PUE numbers. But if they're transparent about it, then it's way better.
CW (25:57):
That's like all benchmarks.
JF (25:59):
Yeah.
EW (25:59):
This article was posted in ACM Queue. What is that?
JF (26:06):
Oh yeah, so Queue is this magazine put out by ACM. It's more oriented towards practitioners. So, IEEE has Spectrum which I absolutely love. It is super nerdy. They talk about robots and all this crazy stuff like drones, but Queue is more oriented towards what engineers actually do day-to-day in their jobs and problems they encounter. It's more oriented towards practitioners than academics. Yeah.
EW (26:38):
That's useful. Is it mostly computer stuff, or is it more mathematics?
JF (26:45):
There are a lot of mathematics articles. It depends on who writes them honestly. So, minor typically, I just dive into something random. There's another one that's more an Ask Alice or whatever, but you're getting software help from one of the authors, which is hilarious because he's very blunt. It's a mix.
EW (27:10):
You also have a podcast.
JF (27:13):
Yes.
EW (27:14):
Tell me about it.
JF (27:16):
Our podcast is pretty cool. We started it before we started the company. So, So we had a series of episodes that we could release. It was mostly people who we had talked to before starting the company that we had learned things from and that had interesting stories about things between the software and hardware interface. So, it's mostly stories. There is a romantic aspect of it. So, some episodes are more romantic than the other. We like to talk about old computers, people's love of computing.
EW (27:51):
You wrapped up a season in February, but it's August. When is the next season coming?
JF (27:59):
Yeah. So, this is sad. We love recording in person. With the pandemic, it's been really hard to do that. We have a few episodes that are already recorded, then we had a bunch of people who we had lined up to be on it. And then things kind of got derailed. So, we're hoping to get started again soon, but we'll see. Yeah.
EW (28:22):
You claim it's the nerdiest podcast on the planet.
JF (28:26):
Yeah, I feel I've learned a lot from the podcast. It's funny, because I'll still listen to episodes. In doing show notes, I even learned stuff from going back and doing the show notes. When recording the podcast, I found that I miss a lot of things. Just because I'm trying to keep the conversation going or trying to catch it all. On the real lessons, I get a lot more out of it. So, I don't know. I think it's the thing that keeps on giving.
CW (28:55):
That seems very familiar
EW (28:59):
It does. What made you start a podcast?
JF (29:01):
We thought it would be fun and a way to break up just working day to day and a way to just get folks opinions and their own experiences from working on hardware. I mean, I think what I love about it is that a lot of folks that we saw during our raise and stuff like that, a lot of people think that this lower level programming and lower level engineering and really hard tech that, it's a dying breed. People don't really see it anymore, because there's so many SaaS companies. Actually, no, it exists. So, this is our way of being like, "Hello, there's people down here." Everyone has stuff down here, and it's a huge thing. We're trying to kind of unravel that at same time.
EW (29:48):
You mostly do talk about computers as opposed to embedded devices. Is that right?
JF (29:57):
Yeah, there's not much embedded. We could have ventured in that area. It's definitely on topic. We didn't purposefully not include that. It just happened.
EW (30:15):
Oh, yeah. I mean, you haven't done that many.
JF (30:18):
Totally.
EW (30:19):
So, you still have a lot of things you can explore before you get to origami, glassblowing, all the other things.
CW (30:25):
To be fair, so we started on some of those early, so.
EW (30:30):
That's true.
CW (30:32):
I don't like asking people for their favorite episodes, but if somebody wanted to check out the podcast, would you recommend, "Oh, start with this one"?
JF (30:41):
So, I guess one of my funniest stories would be we had Jonathan Blow on the podcast. He wrote The Witness, the video game, and I played that game. I was super stoked for him to come on it, because I played the game. But I didn't want to make it weird, because I can make it really weird really easily. So, I was like, "Okay, don't be this weird fan girl." So, I had all these questions for him about these videos that are embedded into the game, and I didn't ask any of them. And then basically he left and then we're going back over it. I was trying to explain to Bryan and Steve these videos that are embedded into the game, they drove me nuts for days. I was trying to figure out the meaning and all this stuff.
JF (31:25):
I mean, I'm a crazy person when it comes to video games. I finish them in two days and won't shower, won't eat, I just have to get it done. So, this was all the crazy that I was trying to hide from him. Bryan and Steve are just like, "Oh my god, I don't even know what to do with all this information that you're giving us." But so in the recap episode, I basically go over all of this. And then I was just hoping that Jonathan Blow would listen to it and be like, "What the hell was going on in that girl's brain during this episode?" But I don't think he did.
EW (31:54):
Do you have a lot of interaction with listeners?
JF (31:56):
What do you mean?
EW (32:00):
I mean, do they email you and tell you you're wrong like ours do?
JF (32:03):
So yeah, no, we've had a few of those and they're pretty funny honestly. I mostly forward them to Bryan, but yeah, no, we get a lot. It comes in over our catch all email address, which I mean, we get a lot of fun stuff in there.
EW (32:19):
It's nice to get the complimentary emails, although they don't stick as long in my head as some of the others.
JF (32:25):
Yeah, totally.
EW (32:28):
I want to go to some of our listener questions. One of which is your co-worker, Rick Altherr.
CW (32:35):
That doesn't seem right.
EW (32:35):
No, it really doesn't, does it? Why open source firmware? Wait a minute. Let's actually go back to just firmware. When I say firmware, it's what's running on my processor. It's not really software, because it doesn't really interface with people. It's not hardware, because it's typey typey C, C++. But when you say, firmware, you mean something else, don't you?
JF (33:01):
No, I think that we have the same thoughts there. We also are going to have firmware running on a bunch of microcontrollers and stuff like that. I think it's basically the same definition.
EW (33:17):
Okay. I thought it was the BIOS part.
JF (33:22):
Oh, yeah, no, we definitely have that. I would consider the BIOS what runs on the CPU. No, is that not?
EW (33:29):
You would consider the BIOS firmware.
JF (33:31):
Yeah.
EW (33:33):
When I worked on servers a second time for PARC and they were calling it firmware, I was like, "What do you mean, this isn't firmware? This is a giant chip. You don't put firmware on this sort of thing." But yes, it was where I encountered BIOS firmware. I mean, that has long been one of those things that nobody wants to give you. It's very expensive to get BIOS code if you want to be able to modify it, but now you're going to open source it.
JF (34:08):
Yeah, I mean, mostly this comes from... When I was talking to a bunch of people about their pain with running on-premises, a ton comes from the firmware and it comes from the lack of visibility there. You don't know when things go wrong, why they're going wrong, which drives me nuts. You get the vendors pointing their fingers at each other. And then you also talk to members of this team, and then you get routed to a bajillion different teams. It just seems like no one knows how this thing actually works, which is nuts.
JF (34:42):
By making it an open source, one, when we fix bugs, it's very visible, where the bug was, where it came from. That just helps me personally sleep at night, because I actually know like, "Oh, that thing, not fixed. Look at those lines of code, the change," or whatever. And then you actually know what the bug actually was, which you can never get a straight answer from a vendor on. And then also when it comes to security, a bunch of the stuff running in the lower levels of the stack, all of this has way too many privileges. They have full power basically over the computer. It's also the code that we know the least about, which is super messed up.
JF (35:22):
So, you get a bunch of vulnerabilities. Rick found the vulnerability in the BMCs. I mean, there's just a bunch of vulnerabilities there. The Bloomberg kind of huge expose, where they thought that the supply chain had been modified. That was interesting, but also, why even go through the supply chain when you could just walk through the firmware because there's so much in there? Intel has huge web servers in there that no one knew about for a long time. It's just like all this stuff just needs to be opened up. So, that we have more eyes on it. There can be more audits and people can know what's actually happening in their computers.
CW (36:07):
It seems like people don't even know what's in there., not even how does it work, but look, there's a little port you can connect to and that's with Intel's firmware. Why is this here?
JF (36:19):
It's a mystery. Yeah.
EW (36:21):
Well, to be fair and having seen some of this code, it's built on layers. I mean, it's a huge layers of layers of layers, because they have to support a whole bunch of different interfaces. I mean, all the mice, all the keyboards, all the processors running this speed and that speed and monitoring for this exception and that exception. It started out one code base 25 years ago, and that code is still running.
JF (36:54):
Yeah. That's also why we have this opportunity to clean it up. We don't need all those drivers for mice, we don't need all the drivers for keyboards. Our customers will never actually interact with the firmware because it makes sense honestly why the BMC even got so packed with a lot of code. It's because when that's all you exposed to people, that's where you have to put features, but most features shouldn't have that level of privilege. So, we actually get a chance to clean that out.
EW (37:26):
So, the level of privilege, it seems like the firmware does need a high level, because it needs to wake everything up. It needs to be able to talk to the... Well, I don't know if it needs to talk to the internet. That's different, but it needs to talk to the memory and the CPU and the hard drive. What privileges are you taking away?
JF (37:48):
So, it's still going to be very, very privileged. What we're taking away is these feature sets where it does have to talk over the internet, or it does have to interface with a bajillion different keyboards and mice and all these vendors for various things. We don't need all of that. We only have certain vendors that we're working with and stuff like that. So, we don't have to have crazy interfaces for doing whatever we need. Actually, all we need to do is boot and interact with what's on the board.
EW (38:18):
Yeah, sometimes starting over is easier than maintaining.
CW (38:24):
Yeah, that's the basis for million startups.
EW (38:28):
You do a lot of automation in your work at Oxide. What kind of automation do you do?
JF (38:37):
Yeah, so I love automating things. I mean, I do this at home too. I all automate anything that can be automated. So, for the company, we're still super small. With adding every single person that we hired, I was like, "Okay, this can be automated." Adding people to G Suite can be automated. Adding people to a Zoom account can be automated. Adding people to Airtable, all the kind of internal tools that we use, the accounts all get set up automatically. I have a bunch of scripts that make short URLs.
JF (39:12):
Because having worked at Google, I love the go/thing, and you just go to this page. It's very nice, and it's an easier way to remember things. We have this RFD process, Requests for Discussion, internally. Now we have over 100 requests for discussion. So, it's hard to actually link out to these things, because they live out on different Git branches. So, I made short URLs for those.
JF (39:34):
So, you can just go to 100.rfd.oxidecomputer and you get routed to the right branch, stuff like that, where it makes everything a lot easier. And then I automate between a lot of these tools. So, Airtable, the GitHub or whatever, just weird statistics and monitoring and stuff like that. So, it's a lot of random stuff, but it was fun to build. I joke that the role of CIO or Chief Infrastructure Officer, it's a robot.
EW (40:06):
Your title is Chief Product Officer. What does that mean?
JF (40:12):
So, I think it means a different thing, different places. But for me, it means talking to people who have problems with their current infrastructure or talking to people that don't have problems but just have interesting experiences in infrastructure that we can learn from. And then taking that back and kind of putting all the conversations together into how we want to build our product to make it better than everything else and make people's lives easier.
EW (40:41):
Okay, you write some of these tools in Rust. Is that right?
JF (40:47):
Yeah, I wrote a lot of them in Go at first, because it was just what I was fluent in. But as a company, we're really writing everything in Rust, because it's great for embedded and it's great at a lot of things. So, I was like, "Okay, I have to just knock this off. I got to just do it in Rust." Rust, it didn't start out as being great for Rust API's, which is what a lot of this is. It's just interactions [inaudible 00:41:18] at REST APIs, but it's actually super getting there. But I ran into a lot of pain points there, but now it's a lot better with... Async/await is way better.
JF (41:29):
But yeah, I know it was a fun exercise to write it in Rust. We open source a lot of the libraries. There are on our GitHub page as well. They're all purpose built. So, it's not the entire API. It's just the parts of the API that I needed. Yeah, it was a fun exercise. I now am, I would say, maybe still a little bit more fluent in Go, because I find myself writing Rust like it's a Go, but I'm getting there. We have some really good Rust folks on the team and getting there feedback on code is great.
EW (42:02):
Why would I choose Rust over Go? I mean, would you have if it wasn't that the rest of the team was Rust centric?
JF (42:12):
Yeah. So, when I changed the code for the bots from Go to Rust, I mean, even the Rust folks on our team were like, "Why'd you do that?" I mean, Go is actually better for what I was trying to do with concurrency of pushing things out to various API's. I did it mostly as a learning experience. Where you actually want Rust is memory management, stuff like that. Actually, for Docker, I think that had Rust been a thing at the time and had it been at the place where it is today, I think that it would have been a better language for the job, because we got into a lot of problems when it came to embedding C and Go. I mean, it's hard.
JF (42:54):
I mean, there's really load bearing parts of the Docker codebase, which is now in run C, where it's all C and very few people actually know what's happening. So, with Rust, you can avoid that, because you can get the level of granularity that you need.
CW (43:15):
I see a lot of people seem interested in Rust for deeply embedded stuff and for even micros, but it seems almost more suited to this kind of thing like we were talking about with Docker. Shouldn't have been doing stuff in C probably for utility of that level, right?
JF (43:33):
Yeah, no, it would have been great to have Rust then. I think Rust is perfect for firmware and what we're using it for.
EW (43:41):
Firmware being your firmware or firmware being microcontrollers?
JF (43:47):
Both. We're using it for microcontroller kernels, stuff like that. Yeah, we're using it for everything honestly. The control plane for deploying VMs is going to be in Rust.
EW (44:01):
Did you choose Rust, before the name of your company?
JF (44:06):
Yeah, Bryan wanted to use Rust for sure before that. So, the name is very much so a hat tip to Rust as well.
CW (44:15):
Could have been any oxide?
JF (44:17):
No, it has ties back into computers. It has ties back into a lot of things, so it's almost perfect.
EW (44:25):
Well, except Rust the language was named after Rust the fungus pathogen, not-
CW (44:31):
What?
EW (44:31):
... oxidation.
JF (44:33):
That's interesting. I didn't know that.
EW (44:36):
If you're going to bash a language as much as I do, you've got to learn as much as you can about it. That's not true. I don't bash Rust that much. I don't know when or why I would choose it over something I know very well. You chose it because Steve chose it and he cared about it enough to build your company around it, but I don't know.
CW (45:02):
Well, it depends on what you're trying to accomplish, right? If you're finding limitations in C, if you're finding a lot of that the memory safety and things are becoming an issue. I mean, I don't know. By that logic, why did anybody switch from Forth? It's from the 70s.
JF (45:20):
Yeah, I didn't like-
EW (45:21):
Pascal to C.
JF (45:23):
Writing safe C, it's a very rare talents to find anymore. So, Rust allows you to kind of do that in a better way.
EW (45:35):
Let's see. I have questions from Phillip Johnston about "What lessons can all firmware developers learn from your investigations of proprietary and open source BMC firmware?"
JF (45:49):
Yeah, I mean, there's a lot there. OpenBMC you have to give them a lot of props, because they were really the thing that started the whole kind of open source firmware ecosystem from what I can tell. They were really the first open source firmware out there. So, with a lot of open source projects, what it becomes is largely an interface for dealing with a lot of kind of sub-modules. It has to deal with a lot of proprietary things. So, OpenBMC became this basically communicator over the system D-Bus of various kind of sub-modules that interact with various vendors things.
JF (46:36):
So, for ours, we don't necessarily need all of that, because we know exactly what vendors we're working with. We wanted the BMC to have a lot less features. We just wanted it to actually just do what a BMC or a traditional service processor, which is what we're calling it, does which is boot the machine and then I interact with a few things on the board. So, we kind of took out a lot of the complexity, but OpenBMC is great when it comes to being able to work with a large variety of vendor components.
EW (47:18):
Phillip also asked, "What are the low hanging fruit for security and secure boot that most teams miss out on?"
JF (47:28):
Yeah, so I mean, a lot of the vendors have their own kind of secure boot built into their products. So, Intel has their own. AMD has their own. ARM has their own. What we're actually doing is very much in line with... Apple has T2, which is their hardware root of trust. Google has Titan. Amazon has their own thing as well. We're doing our own root of trust. It seems a lot of kind of the big companies, they don't wind up using a lot of those features from the vendors. I mean, they're mostly all proprietary. It's hard to get visibility into what it's actually doing.
JF (48:09):
So, by doing our own and open sourcing it, then we have a really firm level of attestation going on in the machine, which is really, really nice. So, I mean, I would love if maybe there were ways that either the vendors could open source their things so that people knew how they worked, or maybe just not waste time on stuff that, but I could see why people want it. But it seems like if you're really serious about writing secure software, you likely aren't using the proprietary vendors feature.
EW (48:48):
There's a lot of benefit from starting over. You're seeing a lot of that. Do you worry if you succeed and make it and you're still building these in 5 or 10 years, you're going to look back and say, "Why are we still supporting that?"
JF (49:09):
Yeah, I mean, these machines stay around forever. I mean, if we are successful, there is an analog and Bryan's going to laugh because I actually using this as an analog. But the AS/400, those machines have been around forever. My mom was actually a recruiter for people to work on the AS/400. Just the fact that they're still around and you still need people to work on AS/400 goes to show how long these machines survive out in the wild. So, we're going to have ways to easily update our software on the machines, but this hardware, if we're successful, will be around for a really long time. We're going to have to support it for a really long time.
JF (49:52):
I think there are ways to make it easier on us and also ways to make it easier on potential customers. Because if they turn on, say, automatic updates and everything goes swimmingly, then they will always have the latest software. It won't be like an Android where a lot of the Android ecosystem is still on the oldest version of Android. Whereas with an iPhone, everyone basically runs the new software. We're trying to go towards that model, I would say, but yeah.
JF (50:25):
I mean, we're going to have to sport it for forever honestly. I mean, even Windows in their kernel, they have this custom code for SimCity. So, we're probably going to have stuff that. I mean, not SimCity as an example, but the things that stay around forever. We're eventually going to become the thing that we hated in the first place, I'm sure.
CW (50:46):
But think of the size of the city, you could have it in your server.
JF (50:49):
Yeah.
EW (50:52):
Actually, are you planning ahead for that sort of thing? It's hard because most of the code ends up being corner cases, things that don't quite work together, timings that work on some series of chips and not others. How are you going to avoid that? Are you just accepting that that's a problem for after you're successful?
JF (51:15):
Yeah, I mean edge cases, I think there's no way to avoid that, but I do think by getting people to actually update their software and having automatic updates and trying to make that the default is a way to make sure one, that people get the best experience that they can get, because we're fixing a lot of bugs at the same time. But also, then, we can get rid of things that don't need to be there.
EW (51:42):
Christopher, how do you feel about automated software updates right now?
JF (51:49):
Terrible.
CW (51:49):
Well, it wasn't automated, I allowed it to go.
EW (51:54):
If you sound different, it's because Logic updated-
CW (51:55):
That doesn't sound any different.
JF (51:58):
Yeah, I mean, that's the thing with automatic updates is you have to have this level of trust where people trust that bad things won't happen. That comes from not messing it up consistently, which is hard. It is very hard to get that right.
CW (52:13):
Yeah, it's really hard with things that are mission critical, right? There's probably a lot of reluctance within the customer base to that kind of thing still. Wait, I want to decide when to do this and whether it's safe. I want to see the other companies have run this patch for six months before I apply it, that kind of thing.
JF (52:35):
Totally.
EW (52:37):
How do you battle against that IT managers who are reluctant?
JF (52:45):
I mean, it's hard honestly, because the hyperscalers can do it, because it's their internal team updating. We do want to give that functionality to people where it's like, "Oh, you can automatically update, jobs migrate to different servers, those servers get updated and then you do a slow rollout." I think, honestly, the ways to battle that are just transparency, being fully open about how this works, and having hopefully the first set of potential customers that we get be very open minded.
JF (53:18):
As long as we can nail it consistently, then we don't become the "Oh, this Windows Update just broke my whole thing and now we've been scarred for life." You want to avoid that. We tried to do that with Docker on the upgrades. I would say it was not great in the beginning, but then you eventually get to a place where it's better and better and better and better. I think it's just time.
EW (53:41):
Would you use Dockers on Windows now?
JF (53:44):
Me?
EW (53:45):
Yeah.
JF (53:46):
I actually have because [crosstalk 00:53:48].
EW (53:48):
I've gotten burned enough time that I just say no to Dockers, but unless it's in Linux. So, I just wondered as somebody who probably has a lot more experience with it than I do.
JF (54:01):
It's all right on Windows. To be honest, it's not the same thing, because on Windows, you're getting a VM and you're not getting a container. What's nice about containers is you can actually share things. You can share the network. You can share various file paths. You can share the PID. You can share your process space with the container, which you can do that on Windows. You can share files on Windows, which is fine. Network, I actually don't know if you can do that, but it's a different experience. You can't share the processors. I mean, they're actually VMs at the end of the day. They're slower. They got it pretty fast.
JF (54:42):
Actually, it's pretty comparable now, but it is just a different thing. It is cool. It's cool that Windows IT people are now coming into this container space and getting super modern and updated. That's super awesome. They're having easier ways to deploy, but it's also Windows running in a container. So, Linux, you can automate a bunch of things very easily and get things up and running easily and just get a process started. But on Windows, that's a whole different experience.
EW (55:18):
Jess, it's been really good to talk to you. I need to go buy some turnips from Daisy Mae. Do you have any thoughts you'd to leave us with?
JF (55:27):
I would say, we're hiring, if anybody is interested in joining, but also, we're going to hopefully have another season of the podcast out. I hope that people actually learn something from this. If I got anything wrong, please feel free to email our catch all email address and I will actually read it.
EW (55:52):
What are you hiring for?
JF (55:54):
We're hiring for hardware or software systems engineers, anything in that mix.
EW (56:00):
Cool.
CW (56:01):
Remote or local to...
JF (56:04):
Most of us are remote now. So, yeah, we're open.
EW (56:09):
Cool. Our guest has been Jessie Frazelle, Co-Founder and Chief Product Officer at Oxide Computer.
CW (56:16):
Thanks, Jessie.
JF (56:19):
Thank you. Thank you for having me.
EW (56:19):
Thank you too, Christopher, for producing and co-hosting. Thank you to Philip Johnson for recommending Jessie. Thank you for listening. You can always contact us at show@embedded.fm or the contact link on embedded.FM. We are still doing transcripts. Soon they will be open to all of you. For now, they're only open for Patreon supporters.
EW (56:41):
Now a quote to leave you with from Vladimir Nabokov from Lolita, "And the rest is rust and stardust."
EW (56:52):
Embedded is an independently produced radio show that focuses on the many aspects of engineering. It is a production of Logical Elegance and Embedded Software Consulting company in California. If there are advertisements in the show, we did not put them there and do not receive money from them. At this time, our sponsors are Logical Elegance and listeners like you.