395: I Can No Longer Play Ping Pong

Transcript from 395: I Can No Longer Play Ping Pong with Tyler Hoffman, Elecia White, and Christopher White.

EW (00:00:07):b

I'm Elecia White, and you are listening to "Why I Hate Tools." Your host is Christopher White.

CW (00:00:13):

Wait a minute.

EW (00:00:13):

Our guest is Tyler Hoffman of Memfault, who has returned to talk with him about tools.

CW (00:00:20):

Hey, Tyler, I feel like I've been hoodwinked here, but how are you doing? Welcome back...This may be the closest -

EW (00:00:26):

The shortest -

CW (00:00:26):

- guest return that we've had. So, welcome.

EW (00:00:30):

Well, I'm not sure he is going to stay on the line after that. We'll see.

TH (00:00:34):

Yeah. Yeah. Happy to be back. We ran far too long the last time, and didn't even cover a second topic.

CW (00:00:41):

And the second topic was tools, and I had some things just kind of vaguely that I wanted to talk about. And then you came up with a whole bunch of other ideas to talk about. But I guess the place to start for me is asking you, I've worked at a lot of places, probably too many.

CW (00:01:03):

And many of those places, the tools was just kind of an afterthought. Engineers kind of came up with a collection of things to use as we developed the software. There was no necessarily well-defined process for things. Some people used different stuff even.

CW (00:01:20):

And then at Fitbit, I remember when the Pebble acquisition came in, you and some other folks came in and really kind of took a more methodical approach to that. That's kind of what I wanted to talk about partly, is how to get a more professional tools organization within software and development.

TH (00:01:41):

I get to hear this from your perspective now, because we only got it from our perspective as the company being acquired -

CW (00:01:47):

Yes. Yes.

TH (00:01:48):

- and trying to move things in a direction that we thought was right. But it'd be fun to explore whether you thought it was the right direction.

CW (00:01:56):

Oh, absolutely. And it really was eye-opening. Like I said,...maybe it's a black mark my career, but I hadn't really seen that kind of seriousness around tools before. Except maybe at Cisco, where it's a gigantic company...where we had a huge kind of developer infrastructure thing that had to be somewhat well-defined.

CW (00:02:21):

But even there, it was like people just kind of did random stuff. So one of the first things was just getting everybody to be on the same page with tools. Just going back to the Fitbit example, we all had IAR and that was kind of the commonality. Everything else was random.

CW (00:02:39):

And getting the right versions of things was very difficult, especially when there were more dependencies. So...did you guys come up with that at Pebble based on experience, or did it just seem like the right thing to do?

TH (00:02:56):

So here's the brief story of Pebble, and why I felt like we started off on good footing, which was process and tools. The original firmware, and hardware, and generally software engineers at Pebble, were young. They were from the University of Waterloo. They were all friends of the CEO generally.

TH (00:03:20):

And...even from their mouths, we were all pretty naive about how to develop a hardware product. It was like, "Can't be so hard. You write some firmware code. It mostly works.” And yes, there are some bricked units here and there.

TH (00:03:38):

But I think we were taking generally a software approach. ...We've all written some form of a Java application in school. We've probably written an iOS or an Android app.

TH (00:03:47):

And when you're writing those pieces of software, you have a debugger, you have easy logs, you have already tools that capture crashes, and capture all of this information for you, and kind of tell you exactly where it went wrong. And so...even when I was joining, it was still towards this path...

TH (00:04:10):

Coding firmware shouldn't be impossible and shouldn't be as hard as most people make it out to be. And honestly, we didn't really talk to too many other firmware engineers outside of Pebble, I would say. We mostly had our heads down working hard almost all the time.

TH (00:04:27):

And so when we were like, "Oh, we now have units in the field. We need logs. Let's build a system that automatically does this for us. And we'll capture logs. And we'll do circular buffers on logs." Anyways, it was just we didn't know any better.

TH (00:04:42):

And we had kind of already had a software mentality, and we knew how important those processes and tools were. And so we just built them, not knowing that, I think, many engineers from ten years ago and before just never even thought to do it.

TH (00:04:58):

And so that's kind of still the status quo a lot of times today in hardware and firmware and organizations.

EW (00:05:05):

I mean, I've built many circular logging systems.

TH (00:05:09):

For sure.

EW (00:05:11):

And I think that the compilers are important. And I think the tools are critical. But I want to spend all my time playing with the technology, or with the application, or getting things done. So I have a hard time with tools unless I need them right this second.

CW (00:05:37):

But didn't you get irritated when something went wrong with tools and got in the way of your development?

EW (00:05:44):

Yes. And then I would crankily storm out of the building and hope that somebody else would solve it.

CW (00:05:51):

See,...I just wanted to have that on -

EW (00:05:52):

That's not entirely true, but yes. I mean, I find working on tools to be an order of magnitude more frustrating, just because I'm working on tools, than working on hard technical problems.

TH (00:06:07):

We discussed this a little bit last time as well. It's like, "What is the hardest part of our jobs? Or I think you asked me, "What is the hardest part of being a firmware engineer?" And...I would honestly say it's not working on the actual hardware product. It's supporting the hardware product before, during and after it's ship.

TH (00:06:26):

I still feel like that holds true. And yeah...I have a different approach on it...It sounds like you are very motivated by working on the cool tech, which is fantastic. Every company needs those people. I, myself, am motivated by getting other engineers to be better at their jobs.

TH (00:06:52):

And I love the compounding effect that if we have two engineers, and I can write a couple tools or a couple processes, and clean up a few things, that's okay. We sped it up by maybe 20 to 30%.

TH (00:07:03):

But if you have an organization of 20 people or 50 people and you speed everything up by 10%, you've made an order of magnitude more improvement in how quickly things can actually be done. And that's what I get excited about. That's the role that I carved myself out at Fitbit ultimately.

CW (00:07:23):

Yeah. And that experience, I mean, you say 10%, there were things that were sped up there by much more than 10%.

TH (00:07:32):

80%?

CW (00:07:32):

Yeah. Yeah.

TH (00:07:33):

Yeah. We can talk about all the things that we did which is always fun...Going back to your question, so you asked, "How did...the Pebble engineers come into Fitbit knowing that this was the right way to do it?"...We had just done things entirely differently.

TH (00:07:58):

And I'm going to say things from a different perspective here is, I feel like we had accomplished...just as much at Pebble...compared to Fitbit, in terms of the firmware with maybe 10 to 20% of the amount of people working on it.

TH (00:08:21):

And that's not to say that we shipped...the same amount of products...We weren't at the 10 to 50 million device scale.

TH (00:08:29):

But in terms of the robustness of the firmware, how often we got Bluetooth bug reports or battery bug reports, those two companies compared. We did pretty well with 10 or 15 people that we had working on the project.

CW (00:08:44):

To be fair, and I agree with you, that's a common thing that I experienced at startups when going from a large company to a smaller company that did the same thing, just finding, "Wow, there's a huge speed up here for some reason."

CW (00:08:59):

But to be fair, Fitbit for a long time had a couple of firmware engineers. I mean, it was really tiny until not that long before Pebble was acquired.

EW (00:09:10):

Right. Right.

CW (00:09:11):

It was -

EW (00:09:11):

It was fewer than five people -

CW (00:09:14):

Yeah.

EW (00:09:14):

- for a long time.

TH (00:09:14):

Got it.

CW (00:09:15):

And that's including contractors.

EW (00:09:17):

Yeah. That's including me.

CW (00:09:18):

Yeah, so it was quite small. So I think part of it was just a really small, small team trying desperately to get stuff out.

EW (00:09:26):

On multiple lines.

CW (00:09:27):

Yeah.

TH (00:09:28):

And that's it...I think that's when the tools and processes go completely -

CW (00:09:35):

Out the window?

TH (00:09:35):

- to the side -

CW (00:09:36):

Yeah.

TH (00:09:36):

- and never get done is, when you have a very small team, and you are just, quote unquote, trying to get things done. And...you're already underwater.

EW (00:09:45):

Focusing on the ship date.

CW (00:09:47):

Because there's nobody with time to do it, I mean, or it feels like there's nobody with time to do it.

EW (00:09:54):

Not when everybody's focused on the ship date, and you feel like it can't change, because Christmas doesn't change.

TH (00:09:59):

Correct. Oh my gosh. Yes. The number of times that I realize that I'm not, again, working for a hardware company, and realize that an Apple announcement or the holiday season doesn't have to stress me out is amazing.

CW (00:10:19):

That is one of the horrible things about working on consumer products, is getting forever a sense that December is more like the start of school than Christmas vacation.

TH (00:10:29):

Exactly.

EW (00:10:30):

When you say tools, okay, there's diff, and grep, and compiler -

TH (00:10:39):

Great question.

EW (00:10:40):

What tools are we talking about here?

TH (00:10:42):

This was a contentious point as well. I actually don't love the word tools. I think I just like the word non-firmware. It's a terrible word, but I think that's actually what I mean when I say tools. So just a brief history of why the word actually is a little wrong in my mind, I did do developer productivity at Fitbit. That was generally my role.

TH (00:11:10):

I had a few people working with me on that. And I think what a lot of people decided it was, was the tools team. And if it wasn't firmware, and it was written in Python or a web application, it was our responsibility to fix anything and everything and build anything and everything that was tools.

TH (00:11:33):

I got frustrated by that, because there are things that probably won't even help a firmware team at large, which is a random Python script or tool that somebody needs.

TH (00:11:46):

And then there are things that are like we were talking about before, like compounding benefits, like, "Let's entirely rewrite this process or entirely change the way that we're doing something." And that's what I had carved a team out to do with those types of things. But you asked, "What is a tool?" Yeah. Just non-firmware.

TH (00:12:07):

If it's not C code, if it's not running on a device, and it's not directly in line with what the customer will actually see, I feel like that is kind of what...we're trying to talk about today is, what are the things that support developers, processes, teams that is not directly going to be customer-facing?

EW (00:12:28):

So what are the top three not-tools?

TH (00:12:32):

I don't know. But I can tell you...the three quickest things that I found, at least...We were talking about the Pebble acquisition of Fitbit, what were the three things that we focused on first? We can try that, and then we can see what we uncover.

TH (00:12:47):

So the...first one,...it was the developer environments, I think, were just very hard to get right. I remember joining Pebble and I submitted a poll request on the morning of my second day at work, which I thought was really cool. I had the firmware working. I fixed a bug. I verified it in the UI on my little developer device.

TH (00:13:19):

That just worked pretty much out of the box on my Mac machine. And it was nice. It was easy. I loved it. The comparison is at Fitbit. I don't think I committed or submitted anything until...two or three months in.

TH (00:13:38):

And by then I was finally getting around to understanding how to get firmware on the device, get my machine working, get IAR working. Everyone had to get a license, which took a while. Yeah, there were just a lot of problems.

CW (00:13:53):

Yeah. I think just to give some context, I think I remember around that time most developers were using Macs. We were using IAR, so most developers were running virtual machines -

TH (00:14:05):

Yep.

CW (00:14:05):

- in Fusion...There was a standard image that everybody installed. So you did all your development in Fusion on IAR and Windows on a Mac. It was slow and people didn't like the editor. So some people had various hacks to use their favorite editor on their Mac and share the file. Anyway.

TH (00:14:23):

Yep.

CW (00:14:23):

Just context. It was turtles on top of turtles.

TH (00:14:28):

Yeah. And again, it's not bad....So,...I'll come back to this. So developer environments, the second thing we sped up or just knew was going to be an issue was the compiler and the compiler speed.

TH (00:14:51):

And it wasn't the compiler. I'm sure IAR is a fine compiler. I've seen fast builds. I've talked to a bunch of the engineers from early days at Fitbit where they were saying the IAR builds were very fast and -

CW (00:15:01):

That's because there was no code. Sorry.

TH (00:15:03):

Well, it's because there was no code. They were probably,...I don't actually know, maybe running on actual Windows machines.

EW (00:15:10):

Yes, we were.

TH (00:15:11):

But also IAR itself can be parallelized. You can run it in multiple threads, concurrent builds, but when it's a bunch of Python scripts, and external scripts, and XML parsing, and all of this kind of placed in there without order -

CW (00:15:32):

Yeah.

TH (00:15:32):

- the build will run slow. It'll be sequential. It'll be blocking on a lot of things. And that's kind of what it came to. I don't think IAR itself was the problem. It was the build system orchestration.

EW (00:15:45):

And those Python scripts did things like the assets for -

TH (00:15:50):

Exactly.

EW (00:15:51):

- images and -

TH (00:15:53):

Fonts.

EW (00:15:53):

- and fonts and things. So they had to be run, although they didn't always have to be run. And...I mean, they weren't things IAR should be running.

TH (00:16:03):

It needed to be part of the build process.

EW (00:16:05):

Exactly.

TH (00:16:05):

So it had to be at least orchestrated by IAR, but...yes, it didn't have to be part of that actual process. Well, the problem was IAR, they'd force you to use their editor. And the CLI version of it wasn't exposed, or easily enough in the Linux version was...guarded in a safe somewhere that they wouldn't let anyone touch.

TH (00:16:29):

And when we were traveling on a plane and trying to do work, -

CW (00:16:34):

With the license.

TH (00:16:34):

- you couldn't have the license, so you couldn't compile. So then you had to take the one offline license. I don't know it. Yeah. It wasn't great. And I don't think developers could work for a week, because they didn't have a license.

TH (00:16:44):

So the second thing we tried to do was fix the compiler. And the easiest way to fix the compiler for us at Pebble, we didn't like to use Windows...and we just knew that Windows couldn't be fixed in a way that it would be fast for firmer development, and so we switched to GCC.

TH (00:17:03):

And we slowly moved the entire build system to using GCC on Mac or Linux. We kind of had Windows working along for the factory builds. I think those needed to be run on Windows for some reason.

TH (00:17:23):

Chris Coleman, the CTO of Memfault, his greatest gift to Fitbit was, I don't even know if you guys know this, the way we got the GCC build to work and to be synced with the IAR build forever basically is, he had built an incredible amount of XML IAR build file parsing that converted it into a makefile.

TH (00:17:46):

So he parsed all of the project files for all the different builds, parsed the build commands themselves, three, basically, of the metadata things, and then converted those all into makefile commands. And that was the first part of the build step, was running these Python scripts.

TH (00:18:02):

And then it would compile everything in Make with a bunch of parallel threads. And then the build would go down from 18 minutes to 2 minutes and sequential builds were 30 seconds instead of 10 minutes.

CW (00:18:14):

Yeah, I do remember him doing that, and that was, well, don't need to get into specifics of IAR.

TH (00:18:20):

Yeah.

CW (00:18:20):

But the way they manage projects is very painful if you have to merge things. Anyway.

EW (00:18:26):

But it isn't as bad as Eclipse.

CW (00:18:29):

I don't know. They're all bad.

TH (00:18:32):

They're all great. And they're all cumbersome. Yeah. I mean, it's great if you can click a play button. That's wonderful. It's just it tries to do too much, hide too much. And then you can't optimize anything basically at that point.

CW (00:18:45):

They're very good for single developers. They make a lot of stuff super easy.

TH (00:18:51):

Single developers and small products. Yeah.

CW (00:18:51):

And then once you start to scale them to bigger teams, those things get in the way.

EW (00:18:55):

I would argue you could get up to five developers, but yes, it's a small team thing.

CW (00:19:00):

What you're saying is one of the revelations I had, was when we switched to GCC, and it was like, "Okay, my build from totally clean is 2 minutes instead of 18 minutes. Go get a coffee and try to figure out something to do." It was amazing. It was like, "Well, this is ridiculous."

TH (00:19:19):

That was the biggest complaint. I think I had a couple people message me on the side, half-joking, being like, "I don't love this. I can no longer play ping pong when my builds are running. I don't have enough time anymore."

EW (00:19:35):

That's... Alright...Well, I'm glad -

CW (00:19:39):

"I'm too efficient. You have to stop this. You expect me to work in this time?"

TH (00:19:45):

Yeah. So developer environments, getting them set up, was the first painful thing. You asked me for my top three. So I'm going to go to the third one now. The second one was just speed of the build itself. We knew that was so important.

TH (00:19:59):

We had it at Pebble, every time the build would get slower than two minutes, it was kind of like a all hands on deck, come fix the build and get it back down to a minute. And then the third thing is just process.

TH (00:20:16):

You had alluded to this earlier, Chris, where you were saying every developer has their own scripts, or everyone has their own process, or how did you get to use your favorite code editor outside of IAR? Everyone had their own way. It's both a good and a bad thing.

TH (00:20:34):

The one thing that I wanted to do, and this is what I spent most of my time on early on was getting people to do the same thing.

TH (00:20:42):

And if everyone has one single way to do things, like packaging assets, or loading assets on a device, if you have one way to take a screenshot of the device, or to run the build, or to debug the build, and if there's only one way to access all that thing, and then if some people know a better way to improve that system, they can apply the changes.

TH (00:21:06):

And then everyone will benefit from it. And so that's one of the things that I did very early on, was basically wrap all of our tools, you may call them, or processes, in something called Invoke, which is just a Python CLI wrapper. But to build, you don't have to run three scripts in various different places.

TH (00:21:26):

You just ran "invoke build", or if you wanted to debug, you would just run "invoke debug", and you didn't have to set up your environment in a certain way and get it all figured out.

TH (00:21:36):

And then when we radically switched things under the hood, when people didn't know, like, "Oh, we figured out a newer way to build, or a newer way to debug, or a better way to load assets," that happened all the time. We kept improving the speed at which assets were loaded onto the device.

TH (00:21:52):

People didn't have to change their ways. They just continued to run "invoke asset .load" and it just magically sped up by 2 or 3x here and there every now and then.

CW (00:22:05):

A fourth thing I will add that kind of all those things depend on is getting things installed in a consistent way on every developer's machine, -

TH (00:22:13):

Yeah.

CW (00:22:13):

- which was another revelation. And I think we used Conda, right?

TH (00:22:16):

Yes.

CW (00:22:18):

To deploy all of this stuff in a way that every developer had the same install. And so you didn't, wow. I was about to say you didn't hand somebody a floppy disk, so I'm going to just leave. I'm just going to head out. I'll talk to you guys later.

TH (00:22:33):

Are you sure you ate lunch?

CW (00:22:34):

Yeah, I did. But...you didn't say, "Okay, go to this internal website, and grab the zip file, and run the installer, and I'll make sure you get this version, or, "Go to these eight places and get the latest version of this, this, this, this, this...with a big install document."

TH (00:22:52):

Correct.

CW (00:22:52):

It was, "Type this command. Okay. All your stuff's installed." Now, if you want to, like you said, do a build or all this other stuff, look at this Invoke command. It's a top-level thing that drives all of those without having you type make -j8 -

TH (00:23:05):

Yes.

CW (00:23:05):

- install.

TH (00:23:08):

Yes. That was my most frustrating thing. Oh my goodness. The amount of times that I was frustrated, the fact that it wasn't just by default -j8, everyone had a 8-core machine at the time. And then people were developing for two years in and never knew about -j8 or just the j parameter.

CW (00:23:25):

Well, that was the thing with IAR. Everybody had IAR except me, because I go and look for the options on everything. Anytime I look for a tool -

TH (00:23:32):

Yeah. Yeah.

CW (00:23:32):

- I'm like, "I want to see all the options." But nobody had IAR's thing turned up either, and it was always by default -

TH (00:23:38):

Track.

CW (00:23:38):

- set to one core. And so like -

TH (00:23:40):

Yeah.

CW (00:23:40):

- "Oh my builds take 10 minutes." "Well, have you tried this?" "Oh, I didn't know it was there."

TH (00:23:44):

It was all just random knowledge you had to find on a wiki page, and that was the point of invoke. We detected the number of cores in your machine, or we detected a lot of things about the environment. And you could of course turn these things off in an options file, but all of the defaults were sane and wonderful.

TH (00:24:05):

And that was the beauty of it. And yeah, to go back there, we did use Conda. Conda, for the listeners,...it's like Docker files. You could kind of equate it to that, or running things in Docker. But it's more...like running a Python virtual environment, but also has a lot of system packages.

TH (00:24:27):

So ImageMagick, and lconf, and all these other system-level packages, you could also install those through Conda. And so, yeah, you'd basically end up installing one or two things using brew, or apt, or on Windows. And then the rest of it was just automatically pushed onto your machine by just running Conda env update.

TH (00:24:51):

And that was great. And if it ever got messed up, which it did once a month, you could delete the entire folder, and then just run it again, and then walk away. And five minutes later, your machine's back up and building, which is great.

EW (00:25:02):

Would you still use Conda, or would you go for Docker now?

TH (00:25:05):

I would still use Conda. I think it's both. I think if everyone is on Linux, you can use Docker. Because file system performance on Linux through Docker is 100%. There's no file system degradation.

TH (00:25:24):

If it's a Mac shop, or if it's a Windows shop, the file system performance...between Docker and then Mac and Windows is not great, or at least it wasn't at the time.

TH (00:25:35):

And so you're trying to edit your files in Mac or Windows on that file system, because you want your IDE to be able to process all the metadata very quickly, but it would slow the builds down or vice versa. The builds would be fast, but the editor would be super slow.

TH (00:25:50):

And in those cases I would probably just use Conda again. Because...there are packages for GCC, GDB, everything you could need, and it's pretty simple otherwise.

EW (00:26:00):

So you normalized the build environment. Did you normalize the editing environment?

TH (00:26:10):

We continued... It's a great question. There were pushes to do that, but I think that is something that I don't believe in. I don't think everyone needs to use the same editing environment. There are people that will take Emacs, or Vim, or Eclipse to the grave with them. And I think that's actually fair.

TH (00:26:32):

And so what I spent a good amount of time on is for the people who wanted a graphical editor that missed...IAR, IAR had a bunch of views, selection viewing registers, and threads, and graphical debugging, and point and click, for the engineers who wanted that, I got Eclipse working.

TH (00:26:56):

So you could click the play button, and the debug button, and dropdown, and select what build you were running, and you could graphically debug. And so for the people who wanted an IDE, we did settle on Eclipse, and that was easily working. Otherwise everyone used their own editor.

CW (00:27:12):

And people worked hard to get VS Code working well, too. So there were -

EW (00:27:18):

It was early days for VS Code though.

CW (00:27:20):

Yeah, yeah.

TH (00:27:21):

Yeah. Early days for VS Code. There were the embedded plugins. There were two of them, I think, at the time. And they were getting there, but I don't think I actually worked on anything VS Code for Fitbit.

EW (00:27:35):

Would you do VS Code now, or would you do Eclipse?

TH (00:27:38):

VS Code, most likely.

EW (00:27:40):

I really like VS Code. I'm surprised. And I understand not trying to push a single editor on everyone. I think you're right there, because there are people who are so ingrained in what they have. As a consultant, I will use any editor they ask me to, except vi. Sorry, people, just can't.

EW (00:28:04):

I mean, I do some of my development in nano, because that's the easiest thing to do. And so my goal is basically to be able to walk up to whoever's computer I'm trying to talk to and be able to use it. And that's a different goal than what a company needs.

EW (00:28:22):

I'm trying to be flexible, because that's what I should be for my clients. But if you're working full-time for a company, trying to figure out how much leeway to give in IDEs, it's a tough problem. Because you do want to be able to walk up to somebody else's computer and be able to debug with them.

TH (00:28:42):

...In those cases, yes. I liked to get the processes to be the same, but...I mean, my terrible analogy, you don't tell an artist to switch from a crayon, to a pen, to a paintbrush, to charcoal. You just tell them, "Have your artwork done in this way by this time," and that sort of thing.

TH (00:29:08):

And whenever I was coding with somebody, I very rarely used their keyboard. I mostly was like, "You should do it this way. Click this button. Type these commands now." And then I kind of helped people get into the flow of how things worked.

TH (00:29:24):

I interviewed for a company, just really quickly, a fun fact, interviewed for a company as my first job, and they did pair programming. It was their thing. You pair programmed most of your development.

TH (00:29:37):

And yeah, they had a, "This is your environment. This is the keyboards that you use. Generally, you need to use vi or vim. You need to use these plugins. And we expect that it'll take you a month or two to get ramped up on all these tools, but you will use the same tools as everyone else." Thought that was kind of cool.

EW (00:29:55):

It's a different choice, but it's a very viable one. It's a viable choice that everybody use a different IDE, but the same build tools. It is not really viable to use different IDEs and different build tools. At that point, what you've got is just a disaster.

TH (00:30:11):

A mess.

CW (00:30:11):

Yes.

EW (00:30:11):

Yeah.

CW (00:30:13):

Okay. So I think we covered kind of what you were talking about in the notes for ground rules for developer productivity, just accidentally by talking about Fitbit and stuff. You have some notes about evolution of tools, and I found that pretty interesting.

CW (00:30:28):

Because I've thought about this stuff, but only in the sense that I've done each of these steps accidentally. But I think walking people through how you go from the need for a script, or tool, or something, to taking it to something that actually increases developer productivity is an interesting path.

TH (00:30:53):

So now you get into a little bit into the brains of Memfault. This came up in an exercise. There is something called a maturity model. And I didn't really know what they were. I had never heard of one.

TH (00:31:06):

But what it is is basically you give yourself, or your process, or your company a scale from one to ten, whereas one is early days, your maturity model, it's like, "We can barely do something right," whereas ten, it's fully automated. There's probably some ML and AI at this point in time. That's where maturity models are going.

TH (00:31:30):

And I think of tools and processes kind of the same way now. And so the early days of a tool is like, it doesn't exist. And I did give an a example in the little bit of a script that I wrote for this, which was, you have firmware, and it crashes. It spits out a couple of those addresses, 0x8000, it's going to give you a line of code.

TH (00:31:57):

And you need to take those lines of code, use a symbol file, symbolicate it, and then you can kind get a backtrace on what happened. And -

EW (00:32:06):

Sure. You just search through the memory map file, and then you find the address that's closest, but...I mean, you have to go to the next greater, not next less than. And then you type in that command, you search for that function, and then you find where it's instantiated.

EW (00:32:26):

And then you look around there to make sure there aren't any other functions just in case you don't have it quite right. And then you look in that function, and you try to find which line. But that's kind of hard, because now you have the list file up.

CW (00:32:36):

It's 4:45, and it's kind of time to go home.

TH (00:32:41):

Exactly. Yes. And you realize you don't even have the right symbol file.

EW (00:32:45):

Yes. Exactly.

TH (00:32:46):

And then you realize you're out of code space. So some developer turns on LTO, and then you realize that all of this is a mess anyways, and it doesn't make sense. Yeah. I mean, this is exactly how you do it. And we've all been there too.

TH (00:32:58):

So that's a one, your very early days. You may not even know how to do it, but somebody on a wiki page has written, "Get out the map file, and do binary search in this map file for maybe a function." And then you realize there's this tool called address 2 line, addr2line. You can use that.

CW (00:33:18):

And what does that come with? I forget. Is that just part of the GNU tools?

TH (00:33:22):

Yeah. Binutils or GNU tools, you install at least the ARM embedded tool chain, and it comes with it.

CW (00:33:26):

Okay.

TH (00:33:28):

But you run the tool. You give it a symbol file. Hopefully it's the right one. And then you give it an address. It just gives you the file and the line number.

TH (00:33:36):

And so the inspiring engineer is like, "Oh, this is annoying. I can write a little script that wraps address to line. And maybe it can take in multiple addresses and give me multiple function and line numbers back at the same time."

TH (00:33:51):

And...you're moving along, and now you share that with your co-workers. And now it's revision-controlled and everyone's kind of using the same tool. But it's still very annoying to use.

EW (00:34:03):

But only three people are using it. The other eight know about it, but they're just not really willing to try that. They're still good with the map files.

CW (00:34:11):

And there's one person who doesn't know how to use address 2 line at all.

EW (00:34:15):

Oh, yeah. No, that person is just -

CW (00:34:18):

Sorry.

EW (00:34:18):

- going through the object file on their own.

TH (00:34:21):

...Oh, man. I'm letting you guys say all these things. This is not me.

CW (00:34:26):

Well, I've been that one person, so I'm okay.

TH (00:34:29):

Fair enough. And this keeps getting better and keeps evolving. Maybe your script now automatically detects the symbol file that needs to be downloaded from a remote server or Artifactory, what we were using at Fitbit. And then eventually all the engineers actually start using it, and then it's going to become better.

TH (00:34:52):

And then what I found is evolving these tools further, in which I don't think...many firmware engineers do...today is build a web application. That was one of the bigger realizations from the CEO of Memfault. At Pebble, François, he spun up a server that hosted a bunch of these small Python applications that we would use.

TH (00:35:16):

And one of them was kind of exactly this. It would take a bunch of addresses, go download the symbol file, and spit them back out kind of in a UI.

TH (00:35:26):

And so without even having an environment, without even having Python on your computer, you could log into the website, use your Pebble login, and you could debug your firmware that way.

TH (00:35:36):

And that allowed pretty much anyone, it became part of the process that if you're in QA, or if you're a web developer, or if you're the CTO that has no hand in firmware, but is technical, you can take the things that spit out in your email, these addresses, throw them into this tool.

TH (00:35:52):

And you can kind of know where it was. And you can kind of know if it's the same issue that you've been seeing all the time that everyone's talking about. Like, "Oh, the circular buffer crash. Same one." And then these things keep evolving, and evolving, and evolving.

TH (00:36:07):

And sure enough, the final piece of it is you actually never see these addresses anymore. They're automatically symbolicated in your CLI...It's actually cool, the ESP-IDF, I mean, this is the problem, because there are so many tools around it.

TH (00:36:24):

The ESP-IDF, if you use their terminal wrapper,...when the firmware boots, it spits out a build ID, or the firmware version, and then it also takes in a symbol file. And whenever it sees those addresses, it will automatically symbolicate the function inline numbers.

TH (00:36:42):

And you will actually never see those addresses ever. And it'll just automatically print out the file line number, which I think is super cool. And that's just tooling improving and improving slowly but surely.

EW (00:36:54):

Isn't some of this just doing what my IDE does when I run?

TH (00:37:00):

What do you mean?

EW (00:37:01):

I mean, if I have JTAG on and it crashes, it will give me a backtrace. And then I can just click on the stack trace or the backtrace, and it will go to wherever I'm supposed to be.

TH (00:37:13):

[Ooo], for sure. But what happens when your device is no longer connected to a debugger? And...most of the time spent is debugging bugs outside of the debugger, I would argue.

EW (00:37:28):

Oh yeah. I mean, if you're in the debugger, your life is so much better than when you're trying to debug bugs that -

CW (00:37:37):

Yeah.

EW (00:37:37):

You can't see them. You can't touch. You can't reproduce them.

CW (00:37:40):

From Fitbit experience though, it is extremely difficult to go for a run with your debugger attached.

EW (00:37:45):

Yes, exactly.

TH (00:37:46):

Yes.

CW (00:37:46):

The laptop is not compatible with jogging.

TH (00:37:53):

And those are the times when you need to act because that's how your customers are going to use their products too, right? Your customers are not going to use a products sitting on a desk, flat, not using it. They're going to be -

EW (00:38:03):

Shaking it whenever they want the step to go.

TH (00:38:05):

Yeah, they're going to be shaking it. I have to make a call out to my favorite blog post, "The Tower of Terror" Fitbit blog post, I think Shiva wrote that, -

CW (00:38:15):

Oh, yeah. Yeah. That's so great.

TH (00:38:17):

- where the Fitbit would crash in zero gravity falling in free fall on a theme park ride, which I thought again, great blog post. But those are the places you need to test your device out in the real field. And we're continuing to talk about tools.

TH (00:38:35):

But the only way you can debug products in the field are by using tools and processes that you've created. There are no out-of-the-box ways to debug a device without a debugger. Again, slight pitch for Memfault, that is the way you do it now.

TH (00:38:55):

But yeah, no embedded systems, no firmware. They don't have these things built in. So companies have to build them. And if you don't build them, you're just not going to be able to debug these products without a debugger.

EW (00:39:07):

And this is where Memfault comes in, because you've built them, and you're willing to help people use your tools to work on their products.

TH (00:39:16):

Exactly. The vision for us is you check out a new project using the Nordic SDK, or Zephyr, or the ESP-IDF. And your first thing that you do is install Memfault, which is getting closer to just a checkbox.

TH (00:39:34):

And then from here on out, all of your logs, all of your crashes, core dumps, metrics, everything is automatically collected and sent up and is on a dashboard waiting for you to find the issues rather than doing exactly what you said was really hard before, was like, "We just have to get stuff done. There's no time to build tools."

TH (00:39:56):

"There's no time to capture logs automatically. There's no time to capture metrics and build a whole system to do that. We just need to get the product working."

EW (00:40:06):

But addr2line, I didn't even know it existed until now.

CW (00:40:12):

What?

EW (00:40:13):

I didn't...I just do with my -

CW (00:40:14):

I'm doing that thing where I'm not supposed to be surprised. I'm a little surprised.

EW (00:40:18):

I mean, I know how to do it manually.

CW (00:40:19):

Yeah. Yeah.

EW (00:40:19):

So I've always done it manually.

CW (00:40:20):

Yeah.

EW (00:40:21):

And I...do develop on Linux some, but it's not my native environment. And the GNU tools, I've used them for a long time. I'm happy with makefile. But I started out with the expensive compilers, and they didn't -

CW (00:40:37):

They did all that stuff. Yeah.

EW (00:40:37):

- have that sort of things. Well, they didn't do it.

CW (00:40:40):

Oh, right.

EW (00:40:40):

They just didn't have that tool.

CW (00:40:42):

Yeah.

EW (00:40:43):

objdump is still something that I look at and go, "That's really useful." But you're talking about more tools...How does it not be come just a wall of tools that I don't know how to use and feel like I should, but I don't, and...that wall doesn't help me be productive because it's just too many?

TH (00:41:12):

It's a great question...So again, developer productivity engineer, I was mostly hired on as an engineer. That was my job. Most people thought I was coding most of the time. I feel like I spent at least 30 or 40% marketing.

TH (00:41:37):

I was doing internal marketing and training for what these tools we were building are. Why you should learn how to use makefiles. Why you should learn about this thing that we wrote, the wrapper, the Invoke wrapper. How you debug efficiently in GDB without using a GUI.

TH (00:42:00):

How you write GDB Python scripts to automate the things that you do every single day that don't need to take 30 minutes to do. It could just take a couple seconds. So that's what I did...Half of my day, I probably walked around the office asking people, "Tell me what you're working on. What are you frustrated with?"

TH (00:42:19):

And a lot of the times it was like, "Oh, did you know about this script? Did you know about this tool that we have? Do you know that you can do this automatically in your debugger," that sort of thing. And, "tell me all the blogs that you read that tell you all this stuff."

TH (00:42:36):

There's just not many. I wouldn't say there are many blogs explaining and dumbing everything down, that you should use these tools for these reasons. This is how you use them. But there are a ton of those for software in other worlds, I guess.

CW (00:42:52):

I think that's one of the things I was kind of alluding to at the beginning. And I think this is where you were kind of heading with your question, Elecia. It's not enough to have the tools and the processes...There's been a lot of places I've been at where there's just piles of stuff sitting around that could be useful.

EW (00:43:09):

Oh, many wiki pages I've written have never been read.

CW (00:43:13):

But it takes a commitment on the part of the company to have people whose job it is to be the educators and to make sure that these things are actually getting some uptake, and that people are understanding them, and then measure that there's been improvements.

CW (00:43:32):

And that has to be a dedicated person or people, at least an engineer whose 50% time is doing this stuff. And that's pretty rare from my perspective in the embedded world.

TH (00:43:46):

Yeah. And their title may or may not be firmware engineer.

CW (00:43:49):

Yeah.

TH (00:43:49):

That's the thing is, my title was a firmware engineer, but I didn't write much firmware. Very little that I wrote was actually shipped to customers. It was a lot of internal tooling, a lot of unit tests, and a lot of web applications.

EW (00:44:07):

How do you convince people to adopt it? Especially the cranky curmudgeons?

TH (00:44:12):

Well, they didn't exist.

EW (00:44:18):

I know the people at Fitbit. I knew them. I was leaving at the time, but there were some.

TH (00:44:28):

I probably traveled between all of those offices more than anyone, except for the directors and the managers, I would say, in terms of just a normal engineer. I spent time...next to everyone's desk, just watching them work and helping them out.

TH (00:44:47):

And...they always had frustrations. And then...I always tried to have answers, and that was just my role. That's exactly what I wanted to do at the time, and it did help. And so -

EW (00:45:02):

Yeah. After the second or third time you helped me answer a question, I would definitely start coming to you with questions.

TH (00:45:10):

Which wasn't always great either, because I wrote some pretty awesome wiki pages that I mostly pointed people to. I'm like, "Oh, you want to do this? Here's the wiki page that is so up to date it hurts."

TH (00:45:21):

Yeah...The curmudgeony people, at the end of the day, just want to get their job done too. And if you convince them that this is the best way to do it, they're going to use the new tool or the new way.

EW (00:45:42):

What about some of the things that they do that are more common in software world than from our world, like Jenkins?

TH (00:45:50):

Go on. Elaborate with the question.

EW (00:45:52):

Like unit tests and -

CW (00:45:54):

Continuous integrations.

EW (00:45:55):

- continuous integration, and automatic builds, and all of that?

TH (00:46:01):

That, oh man.

CW (00:46:04):

That's another show.

TH (00:46:06):

...I mean, that's a whole different type of tool. That's a whole different type of tool, right? I guess the tools we've been talking about kind of generally is, you do something as an individual. And you try to convince people to use it, and ultimately, or eventually, everyone does use it. And then it becomes part of the process.

TH (00:46:29):

CI, and probably not unit test, but CI specifically..., it becomes part of the process. Day seven, somebody builds it. It's immediately useful. And that is the way that you build to make sure the build doesn't fail.

TH (00:46:45):

And hopefully that's the way that you mark a release for deployment and build the final release that you actually release to customers. And so with that, I actually like CI the best, because you don't have to convince anyone to use it.

TH (00:46:58):

It just becomes the only way to do things.

CW (00:46:59):

You have no choice.

TH (00:47:02):

What's hard is convincing people to do things on their computer differently.

CW (00:47:06):

Yeah.

TH (00:47:06):

But if you just change the way that you actually release the firmware, and you disallow any firmware being built on the computer that you are running on, and ship, then it's a pretty easy sell. Unit tests are incredibly difficult to convince people that it's worth their time.

TH (00:47:25):

And the only way that I could convince or get anyone to use, it was literally, yeah, I was sitting next to their desk and pair programming. It's like, "Hey, it looks like you're struggling with this, or you've been working on this thing that I thought was kind of simple for two or three days. Let's take another approach."

TH (00:47:48):

"Let's wrap this function in a unit test and see if we can quickly iterate." And it was even worse when the build times were 10 to 15 minutes.

CW (00:47:58):

Yeah. Yeah.

TH (00:47:58):

Because their iteration cycle is 10 to 15 minutes on testing something. And the iteration cycle on a unit test is on the order of 500 milliseconds to 5 seconds depending on how you've set them up. And...people just want to get their job done.

TH (00:48:16):

And if you convince them that that is the quickest way they'll get that job done, they will generally favor that. Whether they will be able to do it without you next to them and sitting next to the desk, coaching them along, or whether they take the initiative to learn how everything works is a different story.

EW (00:48:31):

Do you have a preferred framework for unit testing?

TH (00:48:38):

We all love to talk about our favorites of things, and I don't have a favorite. I think what I generally turn to is CppUTest. There's a blog post on that as well on Interrupt, Memfault's Interrupt blog. It just doesn't matter.

TH (00:48:59):

At the end of the day, it is a C or C++ framework that creates binaries, which run a bunch of functions, and give you a status report at the end. I've seen people also successfully build robust and perfect software using one very large main function that runs on the device or on host.

TH (00:49:21):

And...every single assert will basically stop the program. And there's no status report. It's just like, "This assert failed." It's just the idea of doing it. And the other side of it is..., the bare minimum of a unit test framework in my opinion, is being able to build them separately.

TH (00:49:47):

And so, "I'm testing this module. This needs to be a separate binary. I'm building this other module. I want to switch in a couple modules, like a fake and a stub or a mock, depending on which piece I'm testing and that needs to be another binary." And they have to be debuggable.

TH (00:50:00):

I have to be able to throw this binary into a debugger, which is why I almost always advocate for running unit tests on the host machine, on your Mac, or on your Linux box, because getting it into a debugger is so much easier.

EW (00:50:15):

So much easier.

TH (00:50:16):

And then...if you're trying to just quickly iterate on something, to be able to build and run a single unit test. The suite, for better or worse, at previous companies you'd runtest.

TH (00:50:30):

And that would take two, three, four minutes to run the whole suite of tests, and sometimes five or six minutes, depending on how many tests were being committed or if one was really long. You just need to be able to run one very quickly.

TH (00:50:44):

And those are just are minimums...If you don't have these bare minimums of a unit test framework or main file that you're trying to test, you should work on that. But I generally don't mind.

EW (00:50:57):

You had the levels of maturity for talking about the tools. And I think that's true for unit testing as well, where sometimes, depending on the environment, or the client, or what's going on, I will take whatever module I'm having trouble with or building, and I will write some wrapper around it...so I can develop just it.

EW (00:51:23):

And I might write something underneath it that simulates what the hardware would do. But usually that's in memory or with text files, like a text file that will read in...ADC values, so that I can pretend to do my algorithm. And then I look at the output, or maybe I graph the output really.

EW (00:51:48):

And then once that works, I'm done. And it's a test to develop. It's a development test, but it's not going to be useful in the long run. I mean, I might run it again later if I think I'm changing this module, but it doesn't feel like it's useful to look at all of those graphs every time I commit.

EW (00:52:07):

Is it a different kind of test? And I'm still doing it the unit test way, where I'm writing a little bit of code, and testing it, and then making sure it breaks.

TH (00:52:16):

Yeah...I don't think there's a better word for it, at least that I know of. I would still call it a unit test, but there's nothing stopping you from committing that into the repo and at least verifying that the results of the algorithm or the module don't change.

TH (00:52:37):

Instead of viewing a graph, actually viewing a graph to verify the results is really great, but you can also just take the data points, write that into an assertion, and just verify that the data points don't ever change either. Because inevitably everything around that module and maybe even in the module will be refactored at some point.

TH (00:52:58):

And somebody will love you for having committed this -

EW (00:53:03):

Yeah.

TH (00:53:03):

- test to verify that this works. And I actually saw a bunch of remnants of that in the various repos that I worked on at Fitbit...Somebody had wrote a main function that runs on the device that tests this one specific module.

TH (00:53:16):

And I find it two years later, and I was like, "Oh, there is no way in any moons that I will actually get this running again." And so it was always nice when somebody committed a unit test to the overarching system, and it ran on CI, and you just verified that it never broke.

EW (00:53:36):

And with my graphs example, usually the next step to committing it or to make it part of CI is maybe not, does the value not change at all, because sometimes, you have things, floating point numbers being what they are,...within an epsilon, and that was a step that I don't see people take.

EW (00:54:02):

They're like, "Okay, well, it doesn't matter because...this works once, and that was all I needed."

EW (00:54:09):

But making it work over and over again really does help just to make sure things don't get broken, so you don't have to do that, "Okay. It worked two months ago. And now, so I'm going to go up a month back into version control, see if it worked then -"

CW (00:54:24):

And somebody on Slack says, "Hey, have you tried git bisect?"

TH (00:54:29):

That was also a lot of training I did was, how to use git bisect to find where faults happened. Yeah. And I mean,...to touch on that, flaky tests are terrible. And it's the unit tests that fails one out of every ten times or one out of every hundred times in CI that are really frustrating.

TH (00:54:50):

And it kind of is that. It's like a floating point number, or a race condition that you accidentally added, or some global state not being cleaned up between tests that will cause a lot of those issues. And so that's always an impediment too, merging your unit tests that you just wrote but don't really want to commit it to the repo.

EW (00:55:09):

Yeah. I'm working on a system now that has unit tests, which is great, except it's a machine learning system, and it fails at least one unit test for no particular reason pretty much every time.

TH (00:55:22):

Yeah.

CW (00:55:23):

I want to go back a little bit to the web apps, because I found that really interesting. And -

EW (00:55:29):

Not something I would've considered.

CW (00:55:30):

Yeah. I mean,...we've been doing ML stuff for a while now, so we're living in Python a lot and we use Jupyter Notebooks, which are web-based. That's a quick way of getting stuff running without having somebody need to install a bunch of Python dependencies and what have you, which is always a nightmare.

CW (00:55:47):

But it seems like a really powerful way to get things that developers shouldn't have to worry about installing and keeping up to date in a place where it's always right. And simple things run there. Is there a limit to that?...I know VS Code now works in the browser.

CW (00:56:08):

At some point it feels like everything should just move to the browser and then you don't have to install anything ever. Where is the appropriate place for those and where do you see that there's maybe limits where it shouldn't be?...

EW (00:56:20):

How do you tell which should be which?

CW (00:56:21):

Yeah. Yeah.

TH (00:56:22):

Which should be a web application versus which should be run locally?

CW (00:56:26):

Yeah.

TH (00:56:27):

Got it. I think if Google has their way, there's likely going to be no limits. Because...it's so cool. Chrome now has Web Bluetooth, and Web Serial, WebUSB, where basically you plug in your development device, and you just use the Chrome browser. And...in our case, we always send assets over the serial port, but you can send assets to it.

EW (00:56:56):

Wait, wait, wait.

TH (00:56:57):

You can -

EW (00:56:58):

Is this like HyperTerm or TeraTerm now lives in my browser?

CW (00:57:03):

No, the drivers live in the browser. There's a user level stack for all of that stuff. It doesn't go to the kernel. It's all just part of the browser. So the browser itself can connect to Bluetooth devices or to USB devices.

EW (00:57:14):

But if I want to connect it to a UART, am I looking at HyperTerm, or PuTTY, or something like that?

TH (00:57:21):

It gets exposed as a raw API in the Chrome browser.

EW (00:57:25):

Oh.

TH (00:57:25):

So the Chrome JavaScript API on my webpage, -

EW (00:57:29):

Oh.

TH (00:57:29):

- I can use JavaScript to basically say, "I'm faking the API," but webserial.querydevices. And then you see your device on that list, you say, "Connect to device," and then the device connects.

TH (00:57:41):

And then you say, "Send bytes or receive bytes," in JavaScript, in the browser itself...You're not coding the browser. You can literally go to Tyler Hoffman's webdebugger.com....I'll just send you a JavaScript file, which then it will connect on locally to those devices, connected to your computer over serial.

EW (00:58:05):

That would be magical for manufacturing.

TH (00:58:08):

Yes.

CW (00:58:08):

And for education.

EW (00:58:10):

Education.

CW (00:58:10):

Well, for deployment of tools and, yeah.

TH (00:58:14):

And yeah, so...the possibilities there are vast. We have an engineer at Fitbit, more of a manager or high-level engineer, who wrote a lot of these. He basically took what I had tried to do with Invoke, which was a lot of local scripts, and people ran them locally to orchestrate their Mac or their machine to do things.

TH (00:58:42):

He wrote a lot of this in the browser. And so if you wanted to send assets to a device, I had written "invoke .assets .load" or something like that. You would drag and drop the assets file to the browser, and then it would connect to the Fitbit device, and then send that over using the Chrome browser.

TH (00:58:59):

And you would query for logs from a device. And basically all of the serial CLI was wrapped so that all of the requests and responses were all kind of orchestrated with buttons. It was really cool. And it worked really well.

TH (00:59:16):

And the hidden benefit of the web applications that I didn't realize until I started just using them more and more at Pebble was yes, firmware engineers can probably figure out both how to use a web application and also how to use local tools.

TH (00:59:34):

But if you are on any other team in your company, you will not have the firmware developer environment that you need to run those local tools. And you will probably never have it...The salesperson will not go spend three months downloading IAR in a Windows VM or whatever to load the assets or debug the device.

TH (01:00:02):

But they will very easily open a web browser, plug their device in over USB. And if it can talk to this device without actually having any knowledge, that's amazing. Now your whole company can kind of do debugging or other things you need to do...on your device without an environment.

TH (01:00:20):

And...we built a lot of those tools that anyone could use. And that's kind of where we always focus on web applications. It was like, "Great. You wrote a local script that only you can use, or only we can use. This is only so useful. So go take a week and create a web application for it. And everyone can use it then."

TH (01:00:40):

That was kind of what we did at Pebble. It was like, "You proved that this is valuable. Go do it for everyone now."

EW (01:00:46):

There's an XKCD cartoon that seems slightly relevant, where there's a graph of how long it would take you to do something five times versus how long it would take you to write a script to do it.

CW (01:01:02):

Yes. Yes.

TH (01:01:07):

Yep.

EW (01:01:07):

How do you decide you shouldn't make web apps? You shouldn't have a script for everybody? You should, in fact, keep your little script to yourself?

TH (01:01:19):

I like to follow as a rough guideline the rule of three...If I've done it once or it's one engineer, they're fine. That needs it, it's fine having it be the one-off terrible way to do it. If it happens again, it's fine. If that person teaches, trains, sends it up somewhere that somebody else can download, it's fine for two.

TH (01:01:47):

But as soon as three people, or it's the third time I've done something that's really annoying, I really strive to then clean it up and quote unquote, productize it. It now becomes a product.

TH (01:02:01):

And it now becomes something that we will support, and that we will value, and will hopefully never become stale unless we don't need it anymore. But also all these things take time too. The -

EW (01:02:19):

Supporting tools takes a lot of time, because there's always somebody who's changed something. And they still want it to work, but that wasn't what you meant for it to do.

TH (01:02:28):

Yeah.

EW (01:02:29):

And yet it's not a bad idea, but I had plans this weekend.

TH (01:02:34):

This is another hidden benefit of the web applications...You run your web application in a Docker image on a web server that very much probably never changes. You can spin up a web application from ten years ago, and it will probably still work pretty well.

TH (01:02:57):

And so if you build a tool that does Address 2 line, and you deploy it on a web server, and you never take that web server down, it will work forever, no matter if...your local machine or your brew update goes bad, or Mac OS updates, and now you have a bunch of kernel errors or permission errors.

TH (01:03:19):

That web application will continue to work. And...that was one of the hidden benefits that I didn't realize for a while. It was like, "Oh, I made this four years ago. And it's still working, and I've had to put no effort into fixing it ever." But I've had to clean up the local version a bunch of times due to the Python 3 upgrades and everything.

CW (01:03:38):

Yeah. Yeah.

TH (01:03:40):

That was always really nice.

CW (01:03:41):

I just want to do everything in the browser now.

EW (01:03:44):

GDB and Python, I've never really played with them together. I mean, sure, I can imagine sometimes scripting things in GDB, because I get bored, but GDB has some scripting.

TH (01:04:00):

...Correct...I think you had a podcast about this a while ago. I think I listened to that one.

CW (01:04:09):

Don't expect us to remember past shows.

TH (01:04:12):

Yeah, fair enough. Anyways, I think hopefully a lot of people by now know that GDB has a Python interface, and...you can orchestrate mostly everything in GDB. And whenever you can't do something in directly Python, you can always kind of shell out to GDB's normal interface, and just run .execute, and then just orchestrate GDB itself.

TH (01:04:38):

My favorite Easter egg of GDB is everything that is not exposed or you feel like you should be able to do, there is likely a maintenance command for it. There is a whole wealth of commands that basically power a lot of this stuff on GDB, but it's all under the maintenance name space. Very cool place.

TH (01:05:00):

Anyways. But you can do all this through Python. And so...to give the grand vision, one of my favorite things that we had built both at Pebble and at Fitbit, an engineer did it there, is we kind of ran a doctor command in GDB using Python.

TH (01:05:19):

And what the doctor command would do is, for any device that is connected over the debugger, you run doctor, and it will kind of look at all of the global variables or a lot of them. It'll look at the state. It'll look at Bluetooth connection status. It'll look at Wi-Fi connection status, threads, memory, heap, block pools...

TH (01:05:43):

And then it'll just guess as to what could be wrong. So it'll be like, "Hey, it looks like your heap is almost out of space. It looks like there's a deadlock here. Or it looks like your stack overflow..., the watermark here is very high or very low." And it was just a one-stop shop for quickly diagnosing what could be the issue.

TH (01:06:05):

And if anyone found something that they could add to that command, they would then add it, that then anyone else could use and discover if they were just running the same doctor command as well. That was super cool. And that's the easiest and most applicable example to I think a lot of teams.

EW (01:06:27):

Well, wait a minute. Are you in GDB and executing Python scripts, or are you in Python and executing GDB commands?

TH (01:06:37):

You can do both...Great question. I probably should have figured that out first. You load up GDB, you connect the device, I highly advocate that this was one of the more important commands that I would wrap and invoke.

TH (01:06:55):

So you would basically run "invoke debug". That would grab the symbol file, spin up GDB, and load all of the common scripts that I, or other team members had written. And everyone would have access to all those.

TH (01:07:12):

The opposite of that is everyone has their own GDB Python scripts that they've written that they don't share, or they don't put in the repo, or they're in the repo, but no one knows about them, so they don't source them. And so that's why I like to wrap the command.

TH (01:07:24):

So you're in GDB. And if it loads all of these scripts that are basically written in Python, but can be loaded into GDB, now you have access to them. And so you can type just the command name, and it's actually executing Python code in that case. But it has access to the memory space.

TH (01:07:46):

So you can print a variable. You can print the type of a variable. You can see all the fields of a struct. You can run backtrace, and kind of get all the threads, and iterate over everything.

CW (01:07:57):

And this is something where, if you also had a piece of semi-custom hardware, or not even a custom hardware, something with peripheral registers and things that weren't exposed in your IDE, that you could decode them using a script like that, right?

TH (01:08:10):

Exactly.

CW (01:08:11):

Yeah.

TH (01:08:11):

Yes. There's a GDB Python package...,SVD files. I don't exactly know what it is, but you can load SVD files into the GDB using Python as well. And you can kind of get at all the peripheral registers that way as well.

TH (01:08:29):

...And the coolest part about the doctor command and the Python thing is,...at Pebble and then at Fitbit, we both built a way to basically capture core dumps from devices that were remote.

TH (01:08:51):

And so your device would crash. It would capture a core dump, which is essentially a memory space on a device, and the registers that were running at that exact time. You can actually load these into GDB...with a decent amount of work.

CW (01:09:08):

Yes.

TH (01:09:08):

You can convert that into a file that then you can load into GDB. And just as you can run doctor...on GDB connected to a device over JTAG, you can run the same doctor command on a core dump. And by that point you can do so much.

TH (01:09:31):

How cool is that that you can basically receive a report from a device halfway across the world. It sends you a core dump. You run doctor on it. And you immediately know that, "Oh, this weird behavior was exhibited because we were out of heap, and that's the issue. Let's go fix our memory leak."

EW (01:09:49):

Okay...But doctor is not just something I can run...You have to -

TH (01:09:54):

No. You have to make it.

EW (01:09:54):

- make it.

TH (01:09:54):

You have to make it. And there's nothing that GDB is going to give you out of the box that will fix all your problems, especially for embedded.

TH (01:10:05):

But there are, again,...all over the place, and all over GitHub, and not super shared, but there are various GDB scripts that will print out all the threads in FreeRTOS, or all of the heap information in FreeRTOS, or in Zephyr, or what have you RTOS.

TH (01:10:25):

And you can find those on GitHub. And you can load them up and whatnot and build your own doctor command. But it's just an API that allows you to very quickly and easily access all of the peripheral state and memory space of a device that is connected to a debugger.

EW (01:10:41):

And I'll make sure and put a link in, you have a blog post about it. So I will -

TH (01:10:47):

We have a bunch of blog posts about GDB with Python, because it's so underutilized.

CW (01:10:53):

Yeah. It really is. And it's something that's in the back of my mind when I use GDB all the time. And then I just forget about it. Because...either other people have done the legwork to make cool scripts, or I just forget about it.

EW (01:11:06):

I mean, I'm still excited about Python having a debugger. So, it's confusing now that you're going to put -

CW (01:11:13):

Your Python in your debugger.

EW (01:11:15):

Yeah, exactly. My brain only holds so much each week.

CW (01:11:19):

Tyler,...before we wrap up, because we are getting close to the end, there's one thing I do want you to teach me about. And it's in your list here, fuzzy search for terminal history. I don't know what that is.

EW (01:11:33):

Fuzzy.

TH (01:11:33):

Yeah.

CW (01:11:33):

But it sounds like something I need.

TH (01:11:36):

And so I mentioned before, I love to watch people. That sounded weird. I enjoy watching people work. I enjoy helping them and improving processes for the individual and the team at large.

TH (01:11:51):

And so I have watched so many people work that I realize deeply that just finding that command that they ran in the terminal at one point in time, trying to find that again, has proven to be very difficult for a lot of people, or they just don't know it exists. ...I at one point also didn't know that Control R searched through my terminal history.

EW (01:12:16):

What?

CW (01:12:16):

God.

TH (01:12:17):

In bash.

CW (01:12:18):

God.

EW (01:12:18):

What?

TH (01:12:19):

And...one of the other things is I had to watch people type the same command over and over again. And when that command is something like make -j8, pass this makefile, pass these build arguments and just this massive command-

EW (01:12:34):

The up arrow. Push the up arrow.

TH (01:12:36):

Well, the up arrow only gets you so far. I mean, I only going to push the up arrow -

EW (01:12:39):

History pipe grep.

TH (01:12:39):

...I mean, I never use the cat history grep. I don't do that. Anyways -

EW (01:12:47):

I have an alias to just be two letters.

CW (01:12:50):

I'm sorry. I'm not paying any attention anymore, because I just typed "Control R."

TH (01:12:53):

I'm give you a bronze star for that effort. The way that I suggest, and the way that I do it as well is, there is a tool called fzf, and there are probably many other tools as well that you can use. But it just takes your history file and runs a very quick fuzzy search on it. I think it's written in Go now. So it's actually insanely quick to run.

TH (01:13:16):

But if your history file is a hundred thousand commands, you can pretty much find that command you ran three years ago with your company during onboarding that you're like, "Oh, I wonder how I did that." It's incredibly quick to run that.

TH (01:13:30):

And yeah, you can just run grep on it too. But it allows you to, within just a couple seconds, find the command, click enter, it's now pasted...in your terminal, and you can run it again.

CW (01:13:43):

I'm still stuck on Control R.

TH (01:13:45):

But it's my favorite thing. And...that is the first thing I tell people to install when I watch them, if they don't know about Control R, or they click the up arrow 20 times. I'm like, "No, no. Brew install fzf. Now click Control R."

CW (01:14:00):

No, I'm sophisticated. I use grep.

TH (01:14:01):

Yeah. That's number one.

EW (01:14:05):

Exactly. Okay. Well, I need to go install fzf, fzf.

TH (01:14:12):

fzf. Yes.

EW (01:14:13):

Frank Zulu Frank. And Christopher apparently is stuck in a Control R loop, which I don't even know what that means, but that's cool. Should I come over to your computer and look?

CW (01:14:24):

Just hit Control R, and then you start typing, and then it just gives you the command that matches what you're typing, and your history.

TH (01:14:31):

Yeah, but it's very crude search.

CW (01:14:32):

Yeah.

TH (01:14:32):

It's not case insensitive, and it has to be an exact match. And so it's -

EW (01:14:37):

Oh, I don't care. I type the same -

CW (01:14:38):

But it's a lot faster, because I have really long Python lines that I run a bunch of times. And that will be a lot faster for me to find stuff.

TH (01:14:46):

Well, if they're really long lines that you write a bunch of times, you should probably wrap them in something that makes it easy too.

EW (01:14:50):

Yes, yes.

CW (01:14:50):

Yes, I know.

EW (01:14:52):

We know.

CW (01:14:52):

Yeah, I know.

TH (01:14:52):

I mean, yeah.

EW (01:14:53):

But if I can't see all the parameters, then I'm not sure I'm doing the right thing.

CW (01:14:57):

But I have automated a lot of the command line options to...sensible defaults. So I don't have type as much -

TH (01:15:02):

Nice.

EW (01:15:02):

Yeah, that's the important one.

TH (01:15:04):

As long as there's a constant improvement, then I'm okay with it.

EW (01:15:09):

That's actually a really good thing to end on...As long as there's a constant improvement, Tyler is good with it.

TH (01:15:17):

One of my core values internally is baby steps,...yeah. Constant improvement.

EW (01:15:24):

Tyler, do you have any thoughts you'd like to leave us with?

TH (01:15:26):

I talked a lot here about improving tools, and building tools, and building web applications. And honestly it takes a lot of time, and energy, and effort. I have to pitch at this point that a lot of it is now built into Memfault and at least check it out.

TH (01:15:50):

And if not that read our blog. A lot of this has been spit out in the Interrupt blog as well. But also reach out. I love talking about this stuff.

EW (01:16:00):

Our guest has been Tyler Hoffman, co-founder of Memfault. You can find the Memfault blog by typing Memfault Interrupt, or Interrupt Python GDB, or Interrupts unit test. And I'm sure you'll find it.

CW (01:16:17):

Thanks, Tyler. This was chock-full of information.

TH (01:16:22):

Of course.

CW (01:16:22):

I think I'm going to have to go listen this. Well, I have to listen to it anyway, because I have to edit it. But I'm going to do it with a notepad.

TH (01:16:28):

What will our third podcast be? That's the question. What topic? But we'll figure that out.

CW (01:16:36):

Something non-tech related.

EW (01:16:41):

Thank you to Christopher for producing and co-hosting. Thank you to our Patreon listener Slack group for questions in particular Phillip Johnston. And thank you for listening.

EW (01:16:52):

You can always contact us at show@embedded.fm, or hit the contact link on the top of the embedded.fm page, unless you're on a mobile device, in which case it's in the hamburger. I didn't even know it was called a hamburger, did you?

EW (01:17:04):

Now a quote to leave you with, from William Kamkwamba. "I didn't have a drill so I had to make my own. First I heated a long nail in the fire, then drove it through half a maize cob, creating a handle. I placed the nail back in the coals until it became red hot, then used it to bore holes into both sides of plastic blades."

EW (01:17:30):

That's from the book, "The Boy Who Harnessed the Wind: Creating Currents of Electricity and Hope."