477: One Thousand New Instructions
Transcript from 477: One Thousand New Instructions with Kwabena Agyeman, Christopher White, and Elecia White.
EW (00:00:01):
Before we get started, I want to let you know that I am giving a free talk for O'Reilly, on "An Introduction to Embedded Systems." If you or one of your colleagues or managers are interested, it is Thursday, May 23rd, 2024 at 9:00 AM Pacific. It will be recorded if you miss it, but if you go there live, you can ask questions. There will be a signup link in the show notes.
(00:00:30):
Welcome to Embedded. I am Elecia White, alongside Christopher White. Our guest this week is Kwabena Agyeman. We are going to talk about my plans to create a wasp identifying camera system.
CW (00:00:49):
Hi Kwabena. Thanks for coming back after being on the show already, and knowing what we are all about. <laugh>
KA (00:00:54):
Yeah, it has been seven long years, but I thought it was time. <laugh>
EW (00:00:59):
Could you tell us about yourself, as if we had never met?
CW (00:01:03):
<laugh>
KA (00:01:06):
Gotcha. Well, nice to meet you. I am Kwabena. I run a company called OpenMV. We do computer vision on microcontrollers.
(00:01:14):
Seven years ago, we were one of the first companies thinking about this kind of idea. Back then we were deploying computer vision algorithms on a Cortex-M4, and we had just upgraded to the Cortex-M7, which was the hottest thing that was coming out back in 2017 timeframe.
(00:01:36):
Since then, we moved on to the STM32H7, which is a higher performance processor. And now the i.MX RT, which is even more performance. So I am excited about the future, and plan to talk to you all today about how these things are going to get even faster.
EW (00:01:58):
That seems unlikely, but okay.
CW (00:02:00):
What? It seems unlikely-
EW (00:02:01):
I heard Moore's law was over.
CW (00:02:03):
Well, not for microcontrollers <laugh>.
KA (00:02:05):
No. For microcontrollers, it is still happening, but the gains are not like 2%. It is like 400% each generation, actually.
CW (00:02:12):
Microcontrollers are still on 50 nanometer process, so we have a long way to go. <laugh>
KA (00:02:17):
Oh, not even that. The latest ones are coming out at 12 to 16 nanometer for MCU, four dollar chips.
CW (00:02:24):
Oh, okay. So it has moved down. Okay. Yeah.
KA (00:02:26):
Yeah, but that used to be top of the line processors, and now it is coming to you for nothing.
EW (00:02:33):
All right, so clearly we have a lot to talk about, including wasps.
KA (00:02:35):
Yes.
EW (00:02:35):
But first we have lightning round. Are you ready?
KA (00:02:40):
Okay. Yeah.
CW (00:02:41):
Hardware or software?
KA (00:02:43):
Both.
EW (00:02:45):
Python or C?
KA (00:02:45):
C.
CW (00:02:46):
Marketing or engineering?
KA (00:02:48):
Both.
EW (00:02:50):
Cameras or machine learning?
KA (00:02:52):
Cameras.
CW (00:02:54):
AI or ML?
KA (00:02:56):
ML.
EW (00:02:57):
Favorite vision algorithm?
KA (00:02:59):
AprilTags.
CW (00:03:01):
Okay, we are going to break the rules. What is AprilTags?
KA (00:03:04):
What is AprilTags? You ever seen those QR code like things, that the Boston Dynamics' robots look at, to figure out what is like a fridge and what is a door and such?
EW (00:03:12):
Mm-hmm.
KA (00:03:14):
They put them all around. That is called a AprilTag. It is like a QR code, but easier to read. It also tells you your translation and rotation away from it. So if you see the code, you can tell given where you are, how it is rotated in 3D space and translated in 3D space.
CW (00:03:33):
Ah, that is very cool. It is like little fiducial markers for the world.
KA (00:03:37):
Yeah, each one of them encodes just the number, but then you just know, "Okay, number zero means coffee machine. Number one means door frame, or something." And that is how they get robots to navigate around without actually fully understanding the environment.
CW (00:03:54):
Complete one project or start a dozen?
KA (00:03:56):
One.
EW (00:03:57):
Favorite fictional robot?
KA (00:03:59):
I like WALL-E. I love the movie.
CW (00:04:01):
You have a tip everyone should know?
KA (00:04:04):
Tip everyone should know. You should learn SIMD performance optimization. We are going to talk about that today. It is something that blew my mind. I think everyone should really think about it more often. You can double or triple the speed of the process you are working on very easily, if you just put a little work in.
EW (00:04:22):
Okay. What is SIMD?
KA (00:04:23):
Well, single instruction, multiple data.
EW (00:04:25):
Okay. That sounds interesting for machine learning things, but can I actually use it?
KA (00:04:31):
Yes! You can. The way to say it would be, back seven years ago when I was doing OpenMV, I thought I was a hotshot programmer. I wrote vision algorithms to run on these microcontrollers, and I wrote them straightforward. I just wrote them in a way that would be the textbook answer, and just assumed, "Okay, that is the performance that it runs at. That is the speed that it runs at. Good enough. Let me move on." And-
EW (00:05:01):
Wait. These algorithms? They are like FFT and convolution and- Like what?
KA (00:05:07):
Yeah. Stuff like that. The best one example would be something called a "median filter."
EW (00:05:11):
Sure.
KA (00:05:13):
The median filter is basically, take a single pixel and then look at the pixels around it, the neighborhood. Let us say you look to the left, right up, down, so all eight directions. So there are eight pixels around it, for a three by three median filter. Then you just sort those, and you take the middle number, and then you replace the pixel there with that.
(00:05:32):
The median filter has a nice effect on that it blurs the image. It gets rid of sharp jaggy things, but it keeps lines. So strong lines in the image still remain. They do not get blurred. But then it blurs the areas that do not have strong lines. So it produces a really nice, beautiful effect. It is a nonlinear filter though, so the mathematics to make it happen are a little bit tricky. But yeah, that is the median filter.
(00:06:00):
When I wrote this originally on OpenMV Cam and we had it running seven years ago, that ran at one frame a second. I was like, "Yeah, that is how fast these things are. Cannot do any better. I know what I am doing."
(00:06:13):
And then I hired a performance optimizer, Larry Bank. I do not know if he has been on the show-
EW (00:06:17):
Unh-uh <negative>.
KA (00:06:18):
But this guy blew my mind. I asked him, "Hey, can you make this better?" And he got it a thousand percent performance increase on me. 1000%! So that is about 16X. That algorithm went from going from one frame a second to 16, and I was blown away.
(00:06:38):
When someone does that and is able to beat you by that badly, you have to wake up and start thinking, "What am I leaving on the table?" He just did two things to make the algorithm go faster. One, I was doing boundary checks, to make sure I was not running off the edge of the image, every single pixel.
EW (00:06:59):
Ooh. No. Yeah.
KA (00:07:00):
Yeah. So he just made a loop. He made two loops. One that checks to see are you near the edge, then it does boundary checks. And one that if you are not near the edge, it does not do boundary checks. Massive performance gain.
(00:07:11):
Second loop, second change he made was to do something called a "histogram." So when you are doing a sorted list of- Let me back up. How to explain this?
EW (00:07:23):
The median requires a sorted list.
KA (00:07:24):
Yes.
EW (00:07:24):
Because you need to know which is in the middle, so you need to know which is higher and which is lower. And so you end up, even if a three by three, you still have to order all three numbers.
KA (00:07:35):
Yeah. So instead of doing that, what we did is you can maintain this thing called a "histogram." So a bunch of bins. What you do is, when you are at the edge of the image, you initialize the histogram with all the pixels in that three by three. And then you walk the histogram really quick, to figure out where the middle pixel is.
(00:07:55):
So you just start at the beginning and walk it. And do this thing called the CDF, where you sum up the bins, until you see which one is bigger than 50% of how large it could be. That tells you the middle pixel. That can be done pretty quickly. You still have to do this to every pixel, but it is a nice fast linear for-loop, so processors can execute that really quick.
(00:08:16):
But the big change he did was instead of initializing the histogram every pixel, you just drop a column and add a column.
EW (00:08:24):
Yeah.
KA (00:08:27):
What this does is it separates- Even if your kernel becomes 11 by 11 or something like that, you just drop a column, add a column. You are not doing the work of re-initializing the histogram every pixel. That is where the 16X performance came in. By just doing that little change and going from O squared, I think, O(N squared), to a 2N, O(2N), just massive difference in performance.
(00:08:54):
On these MCUs it matters. On a desktop, you can code this stuff with the minimal effort approach I originally took, and it does not cost you anything. But on an MCU, putting that effort in to actually do this well, it really is a huge game changer.
(00:09:12):
Here is where the SIMD comes in. It turns out you can actually compute two columns, or up to four columns, of that histogram at the same time. Because on a Cortex-M4, there is a instruction that allows you to grab a single long, and basically add that- Split it into four bytes at a time, and add those four bytes as accumulators to each other. So it will do four additions in parallel, and not have them overflow into each other.
EW (00:09:41):
Ooh. Okay.
KA (00:09:42):
Yeah. It has existed there since the Cortex-M4. So I swear it is in every single processor that has been shipped for a decade now. It has just been sitting there, and no one uses this stuff. But if you break out the manual, it is available.
CW (00:09:57):
I have known about SIMD instructions on various processors for a while. This sounds like a really standard one for Arm Cortex. Arm has another whole set of things called "Neon," which is I think a- Is this part of Neon? Or is Neon a bigger set of even more SIMD things?
KA (00:10:14):
No, Neon is a bigger set. That only runs on their application processes.
CW (00:10:17):
Oh. Okay. Right.
KA (00:10:19):
So on their MCUs, they have this very- Technically, these instructions are also available on their desktop CPUs also. They are not utilized heavily. What Arm did is they wanted to make DSP a little bit more accessible. Well, faster.
(00:10:33):
So there is a lot of stuff in the Arm architecture that allows you to do something called "double pumping." Where basically you can split the 32 bits into 16 bits, and they have something called "S add 16." So you can add the two 16 bits at the same time, or subtract.
(00:10:47):
There is an instruction that will take two registers, two 32-bit registers, split them into 16 bits. And then multiply the bottom 16 bits, by the bottom 16 bits of one register. And the top 16 bits, by the top 16 bits of the other register. Then add them together. And then also do another add from a accumulator. So you will get two multiplies and two adds, in the same clock cycle.
EW (00:11:10):
Do I have to write in assembly to do this? Or are there C compilers that are smart? Or are there Cortex libraries that I should be using?
KA (00:11:18):
GCC just has intrinsics. So it is just like a function call, where you just pass it a 32-bit number, and then it gets compiled down to a single assembly instruction, basically. So you are able to write normal C code, and then when you get to the reactor core of a loop, the innermost loop of something, you can just sprinkle these in there. And get that massive performance.
(00:11:38):
One of them that is really valuable is something called USAT ASR. So, did you know that whenever you need to do clamping, like the min and max comparisons for a value, Arm has an assembly instruction that will do that for you in one clock.
CW (00:11:51):
<sigh>
KA (00:11:51):
<laugh>
EW (00:11:55):
But you should not go out and use this, unless that is part of your loop. You should not do the min-max during your initialization of your processor, with this fancy instruction.
CW (00:12:05):
No, it is pointless.
EW (00:12:06):
You should limit these-
CW (00:12:08):
They are optimizations.
EW (00:12:09):
To optimizations of things that you need to run faster.
CW (00:12:12):
Yeah.
EW (00:12:12):
Not optimizations because they are fun.
CW (00:12:16):
But, the sigh was because I wished I had known about this, for several projects in the past.
KA (00:12:23):
Well, it is just like- Here is an example. We actually have something I wrote recently, which I am really proud of, is we actually have a bilinear- Sorry, nearest neighbor, a bilinear and a bicubic image scaler in our codebase now.
(00:12:41):
I scoured the internet looking for someone who had written a free version of this, that was actually performant and none such existed. So we are the only company that actually is bothered to create this for a microcontroller.
(00:12:50):
It actually does allow you to scale a image up or down at any resolution, using bicubic image scaling and bilinear. Bicubic basically can take a really pixelated image and then produce these nice colorful regions. It really blends things well. It looks beautiful when you use it.
(00:13:11):
To do that though, it is like doing a crazy amount of math per pixel. So being able to use the SIMD instructions, when you are doing the blend operation for upscaling these things, it makes a huge amount of difference. If you do not do this, the code runs two to four X slower.
EW (00:13:28):
I have always known that graphics code is an area where there are a lot of optimizations that are non-obvious. I mean, there is the Stanford Graphics page about optimization hacks. Have you seen that?
KA (00:13:46):
Maybe. I have seen the bit- If you have ever had fun searching around stuff, there are always the bit magic things where- If you have ever seen, what is it, Doom, the original Doom. There was one thing which was the inverse square-
CW (00:14:01):
The magic number. Yeah.
EW (00:14:03):
And there is "Hacker's Delight" that is similar.
CW (00:14:05):
Yeah.
KA (00:14:05):
Yeah. "Hacker's Delight."
EW (00:14:11):
So are these algorithms optimizable because they are graphics?
CW (00:14:17):
No. The instruction he was talking about, with breaking up the 32-bit into 16 and doing that- I mean it sounds like matrix stuff, that would be easily applicable to matrix stuff or FIR filters, I would assume.
KA (00:14:31):
Yeah. No, I think it is meant for FIR filters, and also- It is not FFT so much. Maybe if you were doing fix point. But definitely for audio though, if you wanted to mix two audio channels together, for example, it would be probably good for that. Like you could set a gain on each channel, and then it would automatically mix them, as long as you were doing both in two 16-bit samples at the same time.
(00:14:53):
The cool thing is you can set all this up. Like your DMA system to receive audio could be producing audio chunks in a format that actually is applicable to the processor, then churning through it this way.
(00:15:04):
One of the tricks you have to know what to do when you are trying to do this stuff, is the data does have to be set up to feed well into these instructions. You cannot actually utilize them if you have to reformat the data constantly, because then your speed gain will be lost on data movement, more or less.
EW (00:15:24):
You just said fixed point. Of course you are doing fixed point, are you not?
KA (00:15:28):
Yeah. It is all in fixed point.
EW (00:15:29):
Oh, okay.
KA (00:15:30):
Yeah. The reason why this is so cool, is that as I got into this- When I first was doing the microcontroller stuff, it was just, "Hey, we are just having fun here. Just trying out cool things. Is this going to go anywhere? Not really sure." Performance is not there. Speed is not there. Usability is not there.
(00:15:49):
Honestly, after I met Larry, and we started actually making things go faster, it started to dawn upon me that, "Well, hey, if you are getting a thousand percent speed up here-" This is before we added the SIMD part. Once you add that again, you can get even another 2X speed up on top of that. So 2000% speed up, basically. You are going from one frame a second, to now you are at 30, at 320 by 240, right? And that is on a microcontroller.
(00:16:19):
Now let us say the microcontroller's clock speed doubles. Instead of being at 400 MHz, you are at 800 MHz.
(00:16:30):
And then there is this new thing called "Arm Helium" that is coming out. Arm Helium offers an additional four to eight X speed up on all algorithms. This is where the new Cortex-M55 MCUs that are coming. Arm Helium is actually closer to Arm Neon.
(00:16:50):
So it is not a very limited DSP set. But it is actually like a thousand new instructions that will allow you to do up to eight or 16 elements at a time math. It also works in floating point too. You can do doubles, floats, four floats at a time. Even does 16-bit floating point too.
CW (00:17:13):
I love these RISC CPUs with thousands of instructions.
KA (00:17:15):
<laugh>
EW (00:17:19):
The "R" is just there to make the- I mean if they were just called "the ISC," everybody would be like, what does that even mean? But "RISC" sounds cool.
CW (00:17:27):
The Helium stuff was what I was actually thinking of, when I said Neon, that I had read a little bit about. Because we are using M55s on a project.
KA (00:17:34):
Have you actually gotten into using some of the new stuff yet?
CW (00:17:37):
No. I was just reading through data sheets and saying, "Oh, that looks cool. I do not know what to do with it yet." But yeah, that is amazing. It reminds me- I have always felt like we are leaving performance on the table, because we tend to- These days things move so quickly, we tend to just wait for the next CPU, if we cannot do something fast enough.
(00:17:57):
Whereas, I am going to "back in the old days" <laugh>, people were doing amazing things because they only had a 6502 or an 8080 or something. "Well, there is no other computer. I need this to go as fast as it can. So I am going to hand optimize assembly." I am not suggesting people hand optimize assembly.
(00:18:15):
But looking for the optimizations that the vendors have already provided, and that people just do not know about, I think is something people miss.
KA (00:18:27):
Oh yeah. But it is actually really, really huge though. It is much bigger than you think. Okay, so let me say it like this. We actually got one algorithm called "erode and dilate." I have that running at- It is able to hit 60 FPS now on our latest gen system, the OpenMV Cam RT1062. It runs at 60 FPS at VGA.
(00:18:48):
So when we get to a Helium based processor system for our next generation based OpenMV Cams, we are going to be able to 4X that number. Now you are talking 1280 by 960 at 60 FPS. That is like a 1.3 megapixel image. Two megapixel is 1080p. So if we go to 30 FPS, now we are talking, we are able to run 1080p image processing on a $4 microcontroller.
CW (00:19:17):
<laugh>That is really insane. Now you are running into, "Do I have enough RAM for this?" <laugh>
KA (00:19:23):
Well, that is the thing. Microcontrollers are also coming out. Like, there is the Alif Ensemble for example. The thing has ten megabytes of RAM on chip.
CW (00:19:30):
Does it now?
EW (00:19:31):
<laugh>
KA (00:19:32):
Ten megabytes of RAM! On chip.
CW (00:19:33):
I am sorry, I am using that right now. So I have to keep my mouth shut.
KA (00:19:40):
<laugh> But this is the thing though, is we are going to be crossing this chasm where these MCUs are really going to be able to do things, that previously you needed a Linux application processor, an OpenCV, to do.
(00:19:56):
Once you get to 1080p, that is good enough for most people. They do not really care if you can have more resolution. Even 1280. I mean the Nintendo Switch, it has sold quite well, and that is a 1280 by 720 system. You do not need to go to 4K or 8K, to have something that will be in the market.
CW (00:20:18):
Unless you talk to my other client.
KA (00:20:19):
<laugh>
EW (00:20:21):
Pushing all of Christopher's buttons. It is great!
CW (00:20:25):
What if we had four or six 4K cameras?
KA (00:20:28):
<laugh>
EW (00:20:28):
Ohh.
CW (00:20:28):
Anyway.
KA (00:20:32):
Yeah. Pushing the pixels.
EW (00:20:35):
But OpenMV, if I remember, works with MicroPython, which is really cool. I loved working with MicroPython, and yet it was not the fastest.
KA (00:20:49):
Well, the thing is that we do not write the algorithms in Python. Python is just a layer to assemble things together. So we actually have the algorithms. They are written in C. Then we use the SIMD intrinsics, and we try to write things so that they are fast in C.
(00:21:02):
What MicroPython is really just doing is being a orchestration layer to tie everything together. And it really, really helps the use case, because what we are trying to do is to pull in all the non embedded engineers, to give embedded systems a try.
(00:21:18):
So if you say, "Hey, you need to learn how to use all these crazy tools. Get the JTAGer out. By the way, you got to buy that, and it is going to be a thousand dollars just to program it. Here is a crazy makefile build system and all this other stuff." It is really going to run into a lot of brick walls for most people.
(00:21:37):
But when you get someone who has worked on the desktop, they are used to Python scripts. And you say, "Here is the library. Here is the API. You can basically write normal Python language and just look at the API for what you are allowed to call." It is a much easier transition for folks. It is part of the key on why our product has been successful.
(00:21:54):
It has also been nice in that a lot of middleware Python libraries now can run on the system. What we recently- So we have a system called "ulab" that actually runs on board. Ulab gives you a NumPy like programming interface. So if you want to do NumPy like operations, you can actually do that.
EW (00:22:14):
Ooohhh.
KA (00:22:14):
Yeah. So it supports stuff like matrix multiplies. They are adding singular vector decomposition right now. You have that on board, so you can actually write a lot of the standard matrix math you would have on a desktop, and just port it right over.
(00:22:31):
Additionally, we also have a sockets library and a system for Bluetooth. The sockets library allows you to write pretty much a desktop app, that would normally control low level sockets. You can port that onto the OpenMV Cam, and now you can connect to the internet, do Python urequest, do API calls, and so on and so forth. It makes it really powerful actually.
EW (00:22:55):
We had a listener, Tim, ask a question that seems really relevant at this point. "Is this intended for genuine production usage, or more for hobbyist prototype work? The OpenMV homepage and docs have a heavy emphasis on MicroPython. Are there plans to provide a C or C++ API?"
KA (00:23:17):
Yeah, so we are never going to provide really a C API. You can take our firmware and actually just write C code directly. So what we found is, once- A lot of customers who look at it and think, "Oh yeah, this is one thing," and then do not give it a try. Then we have a lot of customers who say, "This is amazing," and go and modify it for whatever ways and shapes they need.
(00:23:40):
So what we see a lot of times, is a customer will take the system, and if they do not like MicroPython, they will just literally rip that out of the codebase, since everything is makefile based, and they will just shove it into whatever they are actually using.
(00:23:50):
Since it is open source, you can actually do Frankenstein edits like that. As long as you follow some good practices, and do not completely edit our code and have a unmaintained fork, you can do a pretty decent job of staying in sync with upstream, while not having a totally broken system.
(00:24:11):
But no, we plan to keep it in MicroPython. The reason for that, as I mentioned, we want to get the larger developers who are not working in embedded systems to jump on board. But also we found that it is pretty usable in production for a lot of people with the Python interface.
(00:24:27):
I was just talking to a customer this week actually, who is putting these things in power plants and they are loving it actually. For them, they just needed to do some basic editing of- They were just using it as a sensor, that would connect to their infrastructure and then do some remote sensing. I do not want to mention exactly what they are doing, to not spill the beans.
CW (00:24:52):
They are monitoring the nuclear fuel rods, right?
KA (00:24:53):
Yeah. But they loved how we had a system that was flexible. One of the big things for them is they did not want a black box system, that could not quite do what they needed. They wanted something that was open, and available for them to tweak it in any ways they needed.
EW (00:25:10):
It is called OpenMV. So you would think that people would recognize that it is open source.
KA (00:25:14):
<laugh>
EW (00:25:16):
But it is actually open source and open hardware. All of these instructions and GCC intrinsics we are talking about, you can go to the code and look up.
KA (00:25:25):
Yeah, you can. We should also rename ourselves "ClosedAI," I guess. Maybe.
CW (00:25:34):
What is the- I know we probably talked about this in the past, but people probably ask. What open source license does OpenMV come under?
KA (00:25:45):
We are actually under the MIT license for most of our code.
CW (00:25:47):
Yay.
EW (00:25:47):
Woo-hoo.
CW (00:25:47):
We do have- Yeah.
EW (00:25:51):
Party time.
KA (00:25:52):
Honestly, trying to enforce this, it would be insane, right? It has led to some weird situations. So we are actually really popular in China. So much so that people actually take our IDE, which is- Our changes are MIT licensed, but the IDE base is GPL. So you do have to keep that to be open source.
(00:26:14):
But for that system, we actually see people who want to compete with us. They actually take our IDE, take the source code, they remove our logo, put their name and logo on it. Then sell a product that has the same similar use situation as us, and actually try to compete with us with our own tools. It is crazy. <laugh>
EW (00:26:40):
And by crazy you mean exceedingly frustrating?
KA (00:26:43):
Exceedingly frustrating. But hey, it is kind of flattery though, at the same time. They like our stuff so much, they are not going to build their own. They are going to just copy and paste what we have been doing.
EW (00:26:53):
So you said closed AI. Are you closed AI? What was I headed with that?
CW (00:26:59):
I think that was just a joke.
KA (00:27:00):
That is a joke on OpenAI. <laugh>
EW (00:27:03):
Oh, I see. Okay. We have talked a little bit about some of the graphics things with the erode and dilate, but you have a whole bunch of machine learning stuff too.
KA (00:27:13):
Yeah, so we integrated TensorFlow Lite for microcontrollers a long time ago. And we have been working with Edge Impulse, and that has been great. They basically enabled people back when this was super hard. We did not have this back when I did a interview with you seven years ago, also.
(00:27:29):
They made it easy for folks to basically train a neural network model, and get it deployed on the system. At first, we started with image classification, but that has moved on to something called "FOMO." I think it is called "Faster Objects, More Objects," but obviously it is a play on YOLO.
CW (00:27:48):
Right. I remember that. Okay. Yeah. I knew it was connected to YOLO somehow, but I forgot how.
KA (00:27:53):
Yeah. It allows you to- It basically does image segmentation. So let us say it will take an image and- It will take a 96 by 96 input image, and then basically it figures out the centroids of objects in that image, and will output a 16 by 16 pixel array where the centroid of a object is. And this allows you to do multi object tracking on a MCU, and you can do 30 FPS.
EW (00:28:22):
Okay, where do the Jetson TXs fit in here? I thought those were super powerful and did all those super amazing things. But now he has got microcontrollers doing this.
CW (00:28:36):
Jetson would be able to do much larger image than 96 by 96.
KA (00:28:40):
Oh yeah, absolutely.
EW (00:28:42):
Of course. But Jetson is also running Linux, and doing a bunch of other stuff that slows it down.
CW (00:28:47):
Right. But it is a big real GPU. I do not think it slows it down <laugh>.
KA (00:28:50):
No, no. Jetsons are awesome. I think here is the problem. So if you look at the latest Orin, right?
CW (00:28:58):
<laugh> I am looking at it right now.
EW (00:28:59):
<laugh>
KA (00:29:00):
Yeah, the high end one. So I was actually- My last thing I was doing. By the way, I only recently went full-time on OpenMV. I was side hustling this forever. When we started, this company was always a side hustle.
(00:29:13):
I recently, last year- The company I was working for, Embark Trucks, they had gone public. I had joined them at employee 30 and ridden with them for five years. I was a ride or die employee there. They went public for $5 billion in 2021, part of a SPAC process. The company shut down then in 2023, and was sold for $78 million. So I got to see-
EW (00:29:42):
So in the meantime, there were constant parties in the Caribbean?
KA (00:29:46):
Yeah, no. <laugh> For engineers, we were just working hard to make that stock go back up. But it was interesting ride. I say that though, because I am at full-time now on OpenMV. One of my last jobs at Embark before we shut down, was I was trying to figure out how to get a NVIDIA Orin into our system. That thing is amazing. It can replace so much. But it is also thousand plus dollars?
CW (00:30:11):
$2,000 on Amazon.
KA (00:30:13):
Yeah, so here is the thing.
CW (00:30:16):
Also has a 60 watt power supply, or something like that.
KA (00:30:19):
Yeah, you need that. I was doing serious engineering. I was actually building a $10,000 PCB board, by the way. $10,000!
CW (00:30:26):
More than two layers? <laugh>
KA (00:30:29):
To put NVIDIA Orin on. Like 18 or something. It was crazy. We had lots of fun stuff. I am not going to mention more than that, but it was an amazing system. We were really pushing the limit. I was like, "This is an incredible system for what we are trying to do. Self-driving truck brain. Yes, absolutely."
(00:30:48):
But the challenge is when you have a system that costs that much, this means your final sale price for your robot or whatever you are building needs to be much higher. If you are laying out $2,000 for an NVIDIA Orin, then you are going to need to sell a $10,000 system at minimum, to make some cash back. It is really hard to make those margins make sense, if you are not selling it in a high price system.
CW (00:31:13):
Yeah. The thing is, there are lots of other costs that go with that. If you are building a system with something that powerful, power becomes a big issue. Especially if you are on batteries.
KA (00:31:22):
Mm-hmm.
CW (00:31:22):
And weight and size. Orins, they do come in modules that you can get smaller carriers for. But it is not the same as building a custom PCB with a Cortex-M55 on it or something, if you can get away with that.
KA (00:31:39):
Oh, yeah, no. I have actually heard from some of our suppliers, that NVIDIA's position is that they are focused on really the money for them, and that is in the cloud. The rise of what Arm is doing with tinyML and all these other processors, it is really going to be the future.
(00:31:58):
There is a EE Times article where the previous CEO of Movidius, he now works at ST running all of their microcontrollers. His position is that there is a wave of tinyML coming. It is basically from microcontrollers becoming super, super powerful.
(00:32:15):
This is why I am going full-time on OpenMV, because I see this wave happening. Where, what does it mean when your MCU can now process 1080p video, and cost $4, and has instant on capabilities? It draws less power. It produces less heat. It does not need SDRAM, or eMMC. So the bill of materials is like $10 off from what you would pay for a Linux-based system. It is also less physical size, because you are now down to one chip versus three.
(00:32:49):
So you have got four wins. How do you compete against that? Again, also it can go onto low power on demand and wake up instantly. This is that future where these things are becoming really, really powerful.
(00:33:05):
What they need is a software library. So that is what we are focused on, is really building out that algorithm base. So instead of you having to sit down and say, "How do I write efficient SIMD code that makes this algorithm go super fast?" it is already built, and you can just use an OpenMV Cam to do what you want to do.
EW (00:33:23):
Okay, talking about what I want to do.
CW (00:33:25):
<laugh>
KA (00:33:27):
Cool, let us go.
EW (00:33:28):
Okay. I have an application idea. It is a terrible idea, but I want to try it anyway. Mostly this is an exercise in how would I actually use OpenMV to accomplish my goals, and possibly to make my own product, and where do I make the decisions.
KA (00:33:45):
Okay.
EW (00:33:45):
Okay. We say I want to find and identify wasps.
CW (00:33:51):
<laugh>
KA (00:33:51):
Mm-hmm.
EW (00:33:51):
I have a big book of wasp. A listener, Brent, noted that his spouse wrote a big book of wasps after I said I liked bees, and it is very comprehensive. I have many, many pictures. My desktop wasp ID in TensorFlow works fine. Now what I want to do is I want to mount it on my roof, and I want it to identify all the wasps in the forest.
CW (00:34:17):
In one direction, or multi directions?
EW (00:34:19):
All the directions.
CW (00:34:21):
Okay. So you have a 360 degree wasp scanner.
EW (00:34:23):
Right.
CW (00:34:23):
All right. <laugh>
KA (00:34:23):
Okay. Question for you. How good of a image of a wasp do you have? Do you have nice high resolution images, where you can see the hair on a wasp?
CW (00:34:36):
<laugh>
EW (00:34:38):
Yeah, they have little back legs that have little serrations. At least some wasp. I mean there are so many wasps. But then they can use that to wipe off the fungus that tries to attack them and take over their brains.
KA (00:34:50):
<laugh> Oh yeah, I heard about that.
CW (00:34:51):
<laugh>
EW (00:34:51):
They definitely have high quality images. Not only the hairs, but the serration on the hairs.
KA (00:34:59):
Okay. So even before you get into OpenMV, I think this is the problem setup thing you have to ask yourself, which is "How do you actually take an image that is that high quality of a wasp, that is flying around in your backyard?" That is the first question.
(00:35:13):
Are we talking a DSLR image that is on top of your roof, just pointing at wasps and then snapping really awesome pictures with a super great lens. Is that what we are looking at?
EW (00:35:24):
No, no. I want just to know- I do not want more pictures of wasps. I want wasp identification.
CW (00:35:35):
Right. But if you need a feature size that is very fine, to identify one wasp with another, that informs how high resolution your camera has to be. Or-
EW (00:35:43):
Ah, yes.
KA (00:35:44):
Yeah, that is the problem.
CW (00:35:45):
That is how close the wasp <laugh> has to be to the camera.
KA (00:35:47):
Yes. Yeah.
EW (00:35:47):
Right.
KA (00:35:47):
I can tell you how to find wasps.
CW (00:35:51):
How many pixels on wasp, do you need to identify a wasp?
KA (00:35:55):
<laugh> Yes, that is the best way to say it. Thank you.
EW (00:35:58):
Yeah, this is a good concept and Chris said it really well, but maybe we need to think about the visual here. If a wasp is far away, it may only take up four pixels and we will not be able to see very much about it, because it is far away.
(00:36:13):
Just because the camera resolution- If I had a higher resolution camera, it would take up more pixels. Or if the wasp came closer, then it would take up more pixels. What was the phrase you used?
CW (00:36:26):
Pixels on wasp. <laugh>
EW (00:36:28):
The pixels on the item of identification is really important.
KA (00:36:33):
A POW.
CW (00:36:33):
<laugh>
EW (00:36:37):
Pixels on wasp. Yeah, that is a big choice for me is, do I want higher resolution cameras, or am I willing to accept things to be closer?
KA (00:36:50):
Well, I think it is actually both, really, because a lot of times more pixels in an image does not actually do anything for you. Most cameras cannot resolve optically. A lot of the extra pixels, they just become noise.
(00:37:03):
So it is really about the quality of the optics that you are dealing with. Can they actually produce an image that is focused and sharp for every pixel? Because you can show an 8 megapixel or 12 or 43 megapixel camera in with a bad lens, and you will have no better image quality, than if you actually just improve the lens itself.
CW (00:37:25):
I swear this is what talking to myself in a meeting three weeks ago.
KA (00:37:27):
<laugh>
EW (00:37:31):
I do not want to buy $10,000 cameras.
KA (00:37:34):
So then you are going to want to have some zoom action. That is what needs to happen. I think if you want to identify wasps, you are going to need to do two things. You are going to need to have one camera that has a really nice quality lens that can do ranging, where it can zoom in on the wasp, and then it can track it and follow it.
EW (00:37:55):
So I have one that identifies flying objects from my background, and one camera that I say, "Go there. Take a good picture."
KA (00:38:02):
Yes. Mm-hmm.
EW (00:38:04):
Then I send that to my wasp identification, as opposed to my-
CW (00:38:07):
But now you need a gimbal.
KA (00:38:09):
Yep, you need a gimbal.
CW (00:38:11):
This is getting expensive, Elecia.
KA (00:38:12):
<laugh>
CW (00:38:15):
What about an array- A larger array of crappier cameras?
EW (00:38:21):
Yeah, like a wasp's eyeball. A compound eyeball, for cameras. OpenMV compound.
KA (00:38:27):
I think that would work too. You could do a bunch of zoomed in cameras. That would be a detection field, where if a wasp flew in front of them, you could see what is going on. Yeah. You are going to need for that- If you are not doing a gimbal, I would say it is probably out of the spec of our system now. But you probably need an NVIDIA system on this, but even then, it is still going to be challenging.
(00:38:51):
Because at the end of the day, I think the gimbal system is the most likely to happen. But if you wanted to do something like you just had a bunch of cameras and you created a detection field, the challenge is each of them has a different zoom and area they can see.
(00:39:07):
So then you will need multiple cameras at different focal lengths. You have one that is wide angle, and one that is more zoom, and one that is more zoom, and et cetera to see every position and such. So getting away, I think the gimbal is actually better, because a gimbal with a zoom lens that would probably do the best job.
EW (00:39:26):
What if?
CW (00:39:27):
<laugh>
EW (00:39:27):
Okay, I like that, but I also do not want it to have a moving part. So gimbals are probably out. What if I did not have them have multiple zooms? What if I had a fixed zoom on all of them? But this allows me to look in lots of directions, and have them be slightly overlapping at their edges.
KA (00:39:48):
You could do that. It is really hard to set up. A better way would be, could you force the wasps to walk into something, where they are going to be in a fixed focal distance? Could you do a little hive or something where the wasps have to fly through? Then they will all be in the same area and about the same size, and that really simplifies the problem at that point.
CW (00:40:10):
Yeah, I think a lot of ML problems, people have not thought about the social engineering aspect of it.
KA (00:40:15):
Yeah.
EW (00:40:16):
The social engineering of the wasps?
CW (00:40:17):
Yeah. Mm-hmm.
EW (00:40:17):
And then he is going to tell me that I just need to have good lighting, and to have them go one by one walking through some sort of little wasp fashion show.
CW (00:40:25):
<laugh>
KA (00:40:25):
<laugh>
EW (00:40:25):
I do not want to corral the wasps for one thing. They sometimes eat each other, or do weird mind control, or just lay babies in each other. So we do not want that. We want the wasps to be free flying. But it sounds like because I do not have enough pixels on the wasps, this will not happen unless I can...
CW (00:40:54):
Unless you have more processing, higher resolution, better optics.
EW (00:40:58):
But I really like the idea of having a whole little...
CW (00:41:01):
360.
EW (00:41:01):
360, eight cameras, and each one identifies a wasp.
CW (00:41:08):
The problem with the world is it is large.
EW (00:41:09):
Oh.
CW (00:41:11):
And then there are pixels, and they have to go on the world, which is large. And then, too many pixels.
KA (00:41:17):
Well, I think you probably could do it. You could do one of those 360 camera things. I have seen people with an NVIDIA Jetson do that, where they have two cameras that are mounted back to back, and they are doing a 360 view. But the challenge is the level of detail-
EW (00:41:32):
But the optical resolution, the opticals are not as good, and the resolution is not enough, and your pixel's unidentified object are just too small.
KA (00:41:42):
Yeah. So it could tell you that a wasp was flying around, or a thing was flying around. But it could not actually tell you what version of that thing was.
EW (00:41:50):
Okay. So two large cameras in this instance with fish eye lenses, is better than eight small ones, because it just changes your field of view.
KA (00:42:02):
Yeah. You get a 360 field of view, but then your challenge is how close are you to the particular wasps? That is what is really going to matter? So maybe if they are within the distance of a foot or maybe three feet, you might be able to see them if you have enough resolution on the cameras, and then you could possibly do it.
(00:42:20):
But then, of course you want to put this on your roof though, and so if the wasps are not even going to get near it, that is the challenge. So you need to get that wasp fashion show thing again, and have some bait to get them to fly nearby.
CW (00:42:32):
That is like those bird feeders that identify birds for you. They are bird feeders!
EW (00:42:36):
Well, yes. All we need for wasps really is a tuna can.
CW (00:42:40):
Gross.
KA (00:42:41):
Will not a bird come in and eat that?
CW (00:42:43):
Or a cat.
KA (00:42:45):
Yes.
CW (00:42:45):
Or a raccoon, more likely.
EW (00:42:47):
This is going to be the greatest video set ever.
CW (00:42:49):
<laugh>
KA (00:42:49):
<laugh>
EW (00:42:49):
Okay. Let us say I go ahead, and I have some area where I can cover. It is not exactly the wasp fashion show, but they are a foot to three feet away from my camera. Let us say I do not want to shell out for the huge cameras. I have four OpenMVs and I have pointed them, and I just want to do the best I can.
KA (00:43:20):
Okay.
EW (00:43:22):
What algorithms am I looking at here?
KA (00:43:26):
There are a few different things you can do, if you want to work with this. As you mentioned, does the lighting have to be good? Yeah. So if you actually want to be able to take nice pictures in the dark or in the day, you are going to need to have some good lighting on that. And then there is the problem that the wasps are going to fly into the lights. So what do you do?
(00:43:43):
There is something you can do with thermal cameras to see them. That is a really easy way to pick out wasps during day or night. They are going to be visible in the background. There is also something to do with event cameras. So we have some customers right now that we have been talking to.
CW (00:44:00):
Yeah! I was reading about these. Please tell me. Yes.
KA (00:44:03):
Oh yeah. There is a company called Prophesee, for example. They are making a event camera. More or less, these things run at literally whatever FPS you want. If you want a thousand frames a second, they can do that. They literally just give you an image that is- So for every pixel, what they are doing is they check to see, did the charge go up, or did the charge go down? And then based on that, they produce a image.
(00:44:31):
They are like HDR in the sense that, even if they are staring into the sun, they can still detect if a pixel increased in charge or decreased in charge. So it does not really matter what is going on in the background or et cetera. They basically just give you a difference image of what moved around and such.
(00:44:48):
That actually creates these interesting convex hulls of things. You can really see blobs moving very, very easily because of that. It is not going to be useful though for identifying what the wasp is per se, but it will tell you there is a wasp though walking there. Then you can easily overlay that with the regular colored image, and you can tell what is going on there.
(00:45:12):
Or you can do everything directly from the color image itself. It is just going to be harder when it gets night-time and you do not have lighting, because then you will need to somehow boost that image quality to see still.
EW (00:45:25):
What was the name of the- I heard "event"?
CW (00:45:27):
Yeah, "event cameras," I think.
EW (00:45:28):
Okay.
KA (00:45:28):
Event camera. Yeah.
CW (00:45:29):
They basically do- It is like the architecture of the camera itself, does the things you would do in software to do motion vectors. I am making stuff up, but-
KA (00:45:39):
Frame differencing.
CW (00:45:39):
Yeah, frame differencing. And then figuring out motion directly. So it is just done in the camera, and it has such a high frame rate, that it can do that much better than say, doing that on a 60 frame per second camera in software.
KA (00:45:51):
Yeah. Well, the benefit is that camera can sample at one microsecond each pixel. So you can actually go beyond a thousand FPS if you want.
CW (00:46:00):
Okay. Yeah. It was something crazy. Yeah.
KA (00:46:03):
So it is technically a million FPS, but you probably could not read out the data that quickly. It allows you to do really, really fast object tracking. That is the best way to say it.
(00:46:14):
So this will allow you to actually find the wasp in the image that they are flying around. And actually track them with such precision, you know exactly where they are. The trick is though, then the color camera cannot keep up with that. So now you are back to, you have convex hulls of wasps flying around, but at least you could see them in the daytime and night-time.
(00:46:30):
Here is the interesting thing though, assuming the wasps are all about the same size, then if you just wanted to identify whether or not you had have bigger wasps versus a smaller wasp, you could probably do that on board, because you would have this outline of them.
EW (00:46:50):
How would I be able to tell a close wasp from a far bird?
KA (00:46:55):
Well, they are all in this wasp thing, right, where they are corralled.
CW (00:47:00):
We are all back to corralling.
EW (00:47:01):
Oh, right. We are back to corralling. Okay. Sorry, I was thinking open sky.
KA (00:47:06):
Well, you could probably do open sky too. I am not sure how you tell-
CW (00:47:10):
You could train your model on shapes.
EW (00:47:12):
Well, yes, but with the event camera?
KA (00:47:15):
Yeah.
CW (00:47:15):
It gives you a shape.
EW (00:47:17):
It gives you a shape. But they are blobby shapes, are not they? Or are they pretty crisp?
CW (00:47:21):
I think it is the outline. Oh yeah, I do not know. It depends on the-
KA (00:47:23):
It is the outline. This is what Prophesee is actually trying to sell on, is that they believe that you do not actually need the full image, the full color. They say you can do everything from an outline, and it is not wrong.
(00:47:36):
I remember back at my day job at Embark, we actually did vehicle identification and stuff, all based on the LIDAR scans from objects. And LIDAR scans did not contain anything, but like a- If you hit the back of a truck, you would only have a crescent shape to see part of a vehicle.
(00:47:57):
You would not see the entire shape of it. So we actually had neural networks that ran on board that identified what was a truck, what was a car, what was a motorcycle, all based on just partial side scans of them.
EW (00:48:10):
So this would be really awesome for tracking paths, and for identifying things without having to worry about light. And it is outline, so it is again, going to have some number of pixels on the wasp.
CW (00:48:24):
I think they are pretty low resolution right now. If I remember.
KA (00:48:29):
They are 320 by 240. That is for the cheaper thing that they are selling, but they also have some 1280 by 720 cameras. But do not ask about the price on that one, because you cannot afford it.
EW (00:48:44):
<laugh> And we mentioned frame differencing, which is something that I think would be really useful if I am dealing with things that are flying around, or moving quickly in ways that I do not expect.
KA (00:48:52):
Yeah, that is simply where you just have one image in RAM, and the next image comes in, and you just subtract the two, and boom.
(00:48:59):
By the way, on the Arm Cortex-M4 processors, there is an instruction that basically takes four bytes of a word and another four bytes of another word, subtracts every four bytes from each other and does absolute value on it, and then adds them all together in one instruction.
(00:49:16):
So if you want to do frame differencing on the Cortex-M4, we can do that very, very fast on OpenMV Cam. So super easy to get the max FPS on a large resolution, thanks to features like that.
(00:49:30):
But you still run into a challenge, that the camera itself is going to have a limit in its frame rate, and it will cause a lot of motion blur. So you are really going to want a global shutter imager at that point, but then you run into, now the lighting needs to be improved to go faster.
EW (00:49:46):
The constant upselling.
KA (00:49:49):
Yes. <laugh>
EW (00:49:52):
When I saw on your website frame differencing, and I was thinking about how to track things, I went straight to convolution, which is a far more expensive algorithmic process.
KA (00:50:06):
We actually had a customer for that.
EW (00:50:08):
It seems a lot more accurate than what you are talking about.
KA (00:50:11):
Yeah, no, it is. So frame differencing is just one way. I will say this. We had a customer that I actually put some work into doing SIMD optimizations for our morph algorithm, which lets you do custom convolutions on the OpenMV Cam. We are capable of doing about 200 frames a second at 160 by 120. Yes, we can do that.
(00:50:36):
We had to do this for this customer, because they wanted to track a single pixel of a object, with background noise. It turns out you can do something called a "masked filter," which is like- It basically is a convolution that suppresses all pixels that are not just a single bright pixel. And this allowed us to track a IR LED-
CW (00:51:04):
Ohh. Okay.
KA (00:51:04):
In the daytime. So imagine an IR LED in the daytime, the sun emits IR light.
CW (00:51:10):
Yes, yes.
EW (00:51:10):
Yes.
KA (00:51:13):
<laugh> Very, very hard. But we managed to do it, so we could see this object moving around. So that is something you can use. It is though very specific, I would say. It was a good use case for their algorithm, for their problem. I do not know if it will work though that well, for wasps, since wasps might be more than one pixel.
EW (00:51:38):
Yeah, I would not think it would be a single pixel. That means I have no features, and it might as well be a speck of dust.
CW (00:51:47):
What if you put IR emitting LEDs on wasps?
KA (00:51:51):
Yes. <laugh>
CW (00:51:54):
Or, those AprilTags.
EW (00:51:56):
That would be very useful.
CW (00:51:58):
They could just carry around little billboards.
EW (00:51:59):
That would be so much easier than what I am talking about. We just AprilTag all the wasps, and each one will have its own little number and reference on where it is in the world.
KA (00:52:09):
I will say this. Someone did actually use the FOMO algorithm with a regular color camera, to count bees. So that is definitely possible. Their goal though, was not to identify the difference between bees though. They just wanted to know were there objects of a similar size flying by in the image.
(00:52:25):
I think Edge Impulse had a tutorial about this. They had the Raspberry Pi running with FOMO. It was totally capable of checking bee movements, and seeing and counting the number of bees entering and exiting a hive.
EW (00:52:40):
Excuse me while I Google "bee FOMO."
CW (00:52:48):
<laugh>
KA (00:52:48):
<laugh>
EW (00:52:48):
Okay. I had questions here about LSTM ML algorithms and trying to track my wasps. But I feel like I am on a totally wrong path here, with my wasp identification project.
KA (00:53:04):
I think it is possible. You just have to solve the physics problems first. This is unrelated to the compute. It is just first you got to get a image that is high enough quality, of these really, really small things. That is the challenge, is that the wasps were so small. If you were trying to track badgers running around in the fields or a groundhog, that would be a lot easier.
EW (00:53:27):
I think this is true of a lot of machine learning problems, is that we get so excited about how computers can do so much, and how machine learning empowers things. And forget that, "Oh, physics. That thing. Who cares about physics, when we have machine learning?"
CW (00:53:50):
Well, it is a continuum, right? Because you can apply lots of compute to bad data, sometimes. Like he was talking about the IR. If you want to, you can make the computer think really hard, and try to clean up bad images, sometimes. Or you can spend more on getting good data, and do less work.
KA (00:54:13):
Yes.
EW (00:54:15):
But you also have the problem that sometimes what you want to do, is so not really suitable for the hardware you are working on.
KA (00:54:27):
That is the case. I think it is all about just the problem setup first. This is something I see a lot of our customers and people wanting to do computer vision, is folks just like they have a idea on what they want to do. And you have not taken the step of sitting down and thinking, "Okay, what does this look like exactly, on what am I trying to answer?"
(00:54:46):
That is always really important for any one of these problems, and especially in vision. That you have to go through that setup of trying to do the work of actually engineering what is actually reasonable, and what I am trying to accomplish. It does involve that physics aspect point.
(00:55:05):
We have seen a lot of demos that show off really, really strong ML happening. But even for back at Embark, we had, our computer systems cost as much as a house, so unlimited budget. And even then though, you had the best engineers working on this stuff, running the biggest algorithms with the biggest GPUs, and it was still challenging.
(00:55:25):
That was unlimited power, unlimited budget, unlimited compute, unlimited image resolution. But you still had to actually make an ML algorithm perform, and do a good job at segmenting these images well, and locking onto what objects were being tracked.
CW (00:55:40):
Reliably. <laugh>
KA (00:55:43):
Yeah. It is like drawing a bounding box that jitters all over the place, is not really good for your self-driving truck, right? It has got to be like super locked, no jitter, really, really high quality.
(00:55:51):
So labeling, figuring out what is bad data versus good data, lighting situations. In the real use cases, even when you have enough power to do anything, you still have to work really, really hard on getting good data in.
EW (00:56:08):
As you are now full-time at OpenMV, how much of your time is spent trying to help people with applications, and convince them that what they want to do is not exactly suitable? Versus being able to say, "Oh yeah, I can help you with that."
KA (00:56:25):
It is about 50 50. We have a lot of folks who will ask us random questions. I do not want to waste their time, and I do not want my time to be wasted. So I try to make sure we steer them in the right direction. If they need a higher end system, they should go forward to that.
(00:56:38):
I am also driving though, towards the image and future I want to create. So right now it is a lot of engineering work in developing, trying to build out the company and build out what we are trying to do. Truly, last year was a lot of pulling the company out of the ditch, to be honest.
(00:56:55):
While Embark was my sole focus, I went AWOL on OpenMV. 2021 to 2022, I was not really at the helm of the ship, let us just say it like that. We were staying alive, but we were out of stock because of the chip shortage. I had not foreseen how bad that was going to be. I do not know if any of you all tried to buy STM32s ever.
EW (00:57:18):
<laugh>
CW (00:57:18):
Mm-hmm.
KA (00:57:18):
<laugh> That was some unobtainium for about three years. So that really hurt. But luckily the end of last year, we managed to do two things. One, an order of about 5,000 STM32H7 chips finally arrived, after waiting for two and a half years. So we managed to get back in stock finally. We also pivoted and supported NXP's i.MX RT.
(00:57:49):
So this gives us then two verticals. Now we are not dependent on just the STM32, but now we have NXP also. This allowed us to produce the new OpenMV Cam RT1060. Because of learnings we had with our partnership with Arduino, we tried to really include a lot of features, that we saw customers really wanting on this system.
(00:58:11):
So built in Wi-Fi and Bluetooth. I am also proud of myself recently, because we are going through FCC certification, and CE and other certifications for the product. So far it looks like it is going to pass. So we will have a certified Wi-Fi and Bluetooth enabled product.
(00:58:31):
We also built in things like battery charging and low power. One of the biggest features on board is being able to drop down to 30 microamperes on demand, and then wake up on a I/O pin toggling. We had a lot of customers ask for such things, so that they can deploy this in low power environments.
(00:58:48):
But we also added ethernet support now. So you can actually- We have PoE shield we are selling on our website. This allows it to connect and get online that way. So this is a PoE powered microcontroller, if you want to make that.
(00:59:02):
And we do have a RTSP video streamer, so if you want to stream 1080P JPEGs to VLC or FFmpeg, we have got demo code that shows it able to do that. This is what your Raspberry Pi was doing back in 2013, so we are at that level of performance now. But even farther, like Raspberry Pi 2, not quite three, but about one to two with our current system.
CW (00:59:26):
RTSP streaming from a Cortex.
KA (00:59:28):
Yeah, I know. Crazy, right? But yeah, no, totally legit. We were sending ethernet packets or Wi-Fi packets and streaming video.
CW (00:59:34):
<sigh>
KA (00:59:34):
Yeah. The future is coming, I am telling you.
CW (00:59:40):
Do I have to still use GStreamer though?
EW (00:59:42):
Yes. You still have to use GStreamer. He said FFmpeg, but he meant GStreamer.
CW (00:59:46):
When can I stop using GStreamer? That is what I-
EW (00:59:48):
<laugh>
KA (00:59:50):
At least on the device, you do not have to use it anymore.
CW (00:59:52):
Yeah, right. <laugh>
KA (00:59:52):
<laugh> It is actually funny because one of the first things I had to do at Embark, was I had to produce a driver interface camera for our trucks. Basically we wanted to know what the driver was doing in the vehicle. Right?
CW (01:00:06):
Oh, right, okay. Yeah.
KA (01:00:08):
Yeah. So we had to have a camera that just sat inside the cab, and looked at people and would record video of the driver. I was like, "Sure, this is easy." Went online, went to Amazon. There was one company that said, "Hey, 4K HDR webcam $100." I am like, "Cool." We buy it.
(01:00:24):
You have to go into their GUI, and figure out how to set it up to stream RTSP. It is a little annoying. There is a mandatory password, of course. Mandatory password, which means that your techs assembling these, are going to have to go through this hour long process to get these things set up.
(01:00:41):
Of course it has to be on its own. You have to have it on the public network first, and then use their tool that uses some identification thing, before you can log into its GUI. And then having to set it in its GUI, then you can set it to be on a static IP, and then force it to stream. A lot of setup just to get these things working.
(01:00:59):
Then we deployed in the truck, and it turns out you hit a bump and the ethernet would just drop. The connection would just disconnect, go down and then come back. My boss was like, "Hey, Kwab. It is like gone for a second or two every now and then." I am like, "Uh, interesting. Huh. We could have an accident in that second or two. That is a pretty big liability for the company. We need a new webcam."
(01:01:26):
This ends up being a multi month long project of me trying out many different webcams. The same annoying GUI setup, trying to get them to stream video. Finally we settle on a $700 IP cam, not a hundred dollars.
CW (01:01:43):
From Axis, right? From Axis. They are always from Axis.
KA (01:01:45):
No, Opticom. Opticom makes webcams that can literally survive explosions.
CW (01:01:51):
<laugh>
KA (01:01:51):
We did not buy the one that costs that much. That was like a $2,000 one, but we ended up buying the $700 one, which is still very expensive compared to the hundred dollars cameras you can get. But it did not drop its connection. Rock solid.
(01:02:07):
I say that story to say that it is funny that I am able to replicate now, what I spent all that time on, on a microcontroller.
EW (01:02:18):
Could I buy an OpenMV for my truck, so that I can watch people? I am just wondering, should I get his product, now that he does not- Does OpenMV drop ethernet packets?
CW (01:02:31):
No. <laugh> He is going to say "No."
KA (01:02:34):
No.
EW (01:02:34):
<laugh>
KA (01:02:34):
No, it does not. <laugh> It does not. That is one of the nice things. So that is the focus there. No, it does not at all. Even if it did, hey, you can at least go in the firmware and figure out why. It is not a black box. That is the big thing.
(01:02:51):
With the previous system, was like, "Huh. We are going to have to go to each truck, and physically remove these, and put a new thing in. Ugh. Wish I could just do a firmware update, or ask the manufacturer what is wrong." But that was not possible. No, but anyway.
(01:03:06):
We built in a lot of features into this system, just to make it easier for customers to really build things they want. That is what we are excited about. But then moving forward, there are these new Cortex-M55 processes coming out. That is the exciting thing.
(01:03:22):
I actually want to ask Chris about that. What do you think, Chris? You have been playing around with one?
CW (01:03:28):
I have been playing around with one at a very high level. I have not really explored the feature set, and I am using vendor tools and things. It seems fine. <laugh> It is very capable. They are very high clock for- I am used to Cortex-M3s and M4s. These are clocked I think at four or 600 or something like that.
(01:03:50):
It is a bit unusual to be using something that is not Linux, running that fast, and putting Zephyr, FreeRTOS, ThreadX or whatever on it. It feels like overkill <laugh> in some ways.
EW (01:04:04):
It is a very big hammer.
CW (01:04:05):
But the stuff we are doing with it, which I cannot talk about, needs a lot of processing. So then data throughput and things like that.
KA (01:04:15):
Yeah. Well, I am excited about the new ones that are coming along. 2025 is going to be exciting year. Just imagine doubling the clock speed of what you just mentioned. Integrated hardware modules you would only see on application processors. So like actual video encoding in hardware, that kind of stuff. Like large resolution camera support, and then even more ML.
(01:04:43):
One of the coolest things is- We were talking about all this processor performance that is coming, but that is not even the important bit. The important bit is the Arm Ethos kind of processors. These offer hundreds of gigaFLOPS of compute now for these microcontrollers.
(01:05:00):
What that means is if you wanted to run a neural network on board some of these chips, they will actually outperform the Raspberry Pi 5, and they will not draw any power either. Something like the Alif, that has a 200 gigaFLOP neural network accelerator. So if you ran the Raspberry Pi 5 at 100%, every single core pegged to the limit, you get a hundred gigaFLOPS of performance. And it would catch fire.
CW (01:05:28):
<laugh>
KA (01:05:30):
One of these MCUs will draw 20 milliamps, 20 to 30 milliamps of power of 200 gigaFLOPS, thanks to onboard neural network accelerators.
EW (01:05:40):
Do you really think it is machine learning that is important here? Or do you think these other features, like outline detection and HOG algorithms and convolution- Do you think it is the features?
CW (01:05:55):
Well, that stuff, as far as I know, the neural engines are not applicable for.
EW (01:06:03):
Well, no, These would feed into that.
CW (01:06:04):
Yeah. I know.
EW (01:06:06):
How much time should we be doing straight ML on raw camera frames, versus-
CW (01:06:14):
Doing some-
EW (01:06:15):
Giving them some hints?
CW (01:06:16):
Actual image processing.
EW (01:06:18):
Algorithms and heuristics and all of that?
KA (01:06:22):
Yeah, it is actually both. Definitely you want to use ML as much as you can. Transformer networks being one of the new things that people are really excited about. Those require a little bit more RAM though. A lot of these network accelerators, they are really good at adding that. We needed more compute, and so these offer literally a hundred X compute of where things were previously. So now you can actually run these large networks.
(01:06:48):
But then you need more RAM, if you actually want to run the new transformer models, which are dynamically creating weights on the fly more or less.
(01:06:56):
But you still need to do a pre-processing for these things. An example being, if you want to do audio, you do not want to feed the network the raw PCM audio 16-bit samples. You want to take a FFT first, and then feed it slices of an FFT that are overlapping each other.
(01:07:14):
That is where the processing performance with the Cortex-M55 comes in. Having that extra oomph there, that allows you to churn out these FFTs and to generate those slices. So that you can feed the neural network processor something that is going to be directly usable for it. Being able to compose these two things together, is what really brings the awesome performance and power that you are going to see.
(01:07:34):
Similarly for video data, you need to actually scale down your image. It does not sound like a cool, honest, awesome thing. But that has to happen before you feed it to the accelerator, since your resolution is going to need to be at some limit. Maybe 200 or 300 pixels by 300 pixels. It is not going to be the full 1080p though, of what the camera sees.
EW (01:07:58):
Transformer? I missed transformers. I have been out of machine learning for a couple of years, which means that I am five million years out of date.
CW (01:08:05):
That is what the LLM stuff- I think it is what the LLMs use.
EW (01:08:07):
It is less training and no recurrent units, even though it seems like it is sort of a recurrent neural architecture? So not as much feedback? No feedback?
KA (01:08:19):
It does have feedback internally, I guess though. I am not an expert on this stuff either. I will say it like this though. From what I have learned so far, more or less they look at the data inputs coming in, and then dynamically adjust their weights based on what they see. So the network is not static. It is dynamically adjusting what it is doing, based on what it is seeing.
(01:08:43):
It can remember that through a stream of tokens that are coming in. Whatever is being sent to it is tokenized. And that stream of tokens is used to dynamically update relationships between those tokens, while it is running. So it does not necessarily have memory inside of it, but the memory comes from the tokens that are going to it.
CW (01:09:07):
The "T" in ChatGPT is transformer.
EW (01:09:08):
Ah.
KA (01:09:08):
Mm-hmm.
CW (01:09:12):
They are also used for language translation and things like that.
KA (01:09:17):
The big thing is that they are just like a bulldozer. Pretty much every problem people have been trying to solve forever, is now solved instantly by them.
EW (01:09:27):
Okay. So there were LSTMs, and then LLMs. And transformers are after that. And I need to read up some. Okay. Cool.
KA (01:09:34):
Me too. <laugh>
EW (01:09:38):
What do you wish you had told yourself, when you were on the show last in 2017? On episode...
CW (01:09:45):
219. Was it 219?
EW (01:09:47):
212.
CW (01:09:47):
Darn.
EW (01:09:47):
Which I believe at that point we were in SeaWorld. "You are in SeaWorld," I think is the title of that show.
KA (01:09:56):
What I wish I had told myself-
EW (01:09:58):
What do you wish you- Or what do you think we should have told you about starting a business? Or what would have been good information?
KA (01:10:07):
Hmm. I do not know. It has been like a really weird random walk, just doing OpenMV. I would say, definitely it was a good idea not to go full-time too early on it. There is definitely a window of opportunity, when you are trying to run a business. Now I see that opening with these new faster processors coming down the line, where we can really do some amazing things with the concept that OpenMV has.
(01:10:31):
If you tried to do this earlier, I think it would have been just pain and suffering to the max, especially when the chip shortage happened. So I am glad I did not go full throttle on it originally.
(01:10:45):
Honestly, I think the performance thing is the biggest. If I had known about this beforehand, maybe we would have created less random code, and really focused more on things that had lot of value. I think at the beginning when we were doing OpenMV, we were just trying to write as much stuff as possible, and throwing things at the wall to see what sticks.
(01:11:03):
Now it is a little bit more focused on actually providing good features that people really, really want, and making those work really well. So putting more time into one thing, versus trying to spread it all over the place.
EW (01:11:20):
Kwabena, do you have any thoughts you would like to leave us with?
KA (01:11:25):
I want to ask you guys, it has been seven years of running Embedded.fm. What episode are we on now?
CW (01:11:31):
Four hundred...
EW (01:11:31):
70? 68?
CW (01:11:31):
<laugh> This will be 477.
EW (01:11:37):
This will be 477.
KA (01:11:39):
Awesome. So tell me about your experience with Embedded.fm. I want to know.
EW (01:11:46):
It has been good. I still meet interesting people. You have given us a lot to think about. You mentioned hearing Ralph Hempel be on. He gave us a lot to think about. I like that. But we have both talked openly about burnout, and our disillusionment with some of the AI features that are happening.
CW (01:12:09):
Which is not necessarily disillusion with ML writ large. I actually work on ML stuff and I enjoy it. But there are parts of "AI" that have been bothering me.
EW (01:12:17):
How it is being used. Yes. I do not know. I like doing the show, because I like talking to people. But we have gone to every other week, which has been really good. I suspect we will go to once a month, at some point in the next year or two.
CW (01:12:33):
Really? We have not even discussed that.
EW (01:12:33):
We have not really talked about that. What about you, Christopher?
CW (01:12:39):
Ah. You know-
EW (01:12:39):
You hate everything because your computer died this morning.
CW (01:12:44):
Well, computers have been bothering me since 20- Since 1984.
EW (01:12:48):
<laugh>
CW (01:12:48):
No, I do not know. This stuff objectively is very exciting to me. It is cool to see the capabilities these microcontrollers, getting so much more power in a very short amount of time. It was not that long ago that an Atmel AVR was the microcontroller of the day, and some pics, right? Now we are talking about close to gigahertz-
EW (01:13:17):
Megabytes of RAM. That is ridiculous.
CW (01:13:18):
A few bucks for something that is close to a gigahertz with mega- Yeah.
KA (01:13:20):
There is literally a gigahertz processor, if you want to. <laugh> They have one.
CW (01:13:24):
But I also feel like, maybe it is time for me to let other people do that stuff <laugh>, because I miss the small processors <laugh>. On the one hand, it is extremely exciting and it is cool where you can do- But I feel like, yeah, but 128K of RAM is fun to try to make something happen in.
EW (01:13:47):
I do not get to optimize nearly as much anymore.
CW (01:13:49):
Yeah.
EW (01:13:49):
It is a lot of trying to figure out what vendors are doing, and putting together their Lego blocks so they work, and then optimizing little pieces of it. But I never get to sit down and think, "Oh, here is a new algorithm. How can I make it go as fast as possible? And how can I learn the chip deeply enough to find the instructions?"
(01:14:06):
I remember- You talked about SIMD, and I remember working on a TI DSP, oh, probably 2001, maybe 2002. It had some neat caching systems, but it was all pipelined, and you had to do everything manually. So I wrote a program in C, and then optimized the assembly by modifying my C, so it would use the caches the way I wanted them to.
(01:14:35):
It meant really understanding what was happening with the processor and the RAM, and what the algorithm- Not what the client told me the algorithm was supposed to do, but what they actually wanted it to do. I liked that piece, and I miss the deep optimizations that I have not gotten to do lately. But that is partially client selection.
CW (01:14:59):
And yet, as much as I complain every week almost, about I do not want to write another linked list, or I do not want to write another SPI driver for something. On the other hand, I have been using Zephyr lately.
EW (01:15:09):
You do not have to do that.
CW (01:15:09):
And most of the coding has been editing config files <laugh>. It is like, "Oh, okay, well, I need a driver for this." Well, it has that. It is buried in a directory somewhere, and you just have to edit the right DTS file, and then suddenly- And you do not even have to write a program. You just pull up the shell and make sure that the thing works, by doing sensor get thing, and it automatically- Anyway.
(01:15:29):
It does feel like things are getting easier, which is good. But it is a little bit of a shock for people who have been working in this industry for a very long time, because it is a change.
EW (01:15:43):
I loved learning MicroPython, and developing C modules for it. It was amazing! But I did not-
CW (01:15:51):
It is different work.
EW (01:15:52):
It is very different work than needing to optimize things.
CW (01:15:57):
And it is probably good. Embedded should not be the way it has been.
EW (01:16:00):
Yes.
CW (01:16:02):
Everything is going in the right direction. But we existed in the wrong direction for so long, that we convinced ourselves it was fun.
EW (01:16:10):
<laugh>
KA (01:16:10):
<laugh>
EW (01:16:10):
<laugh> Yes, we convinced ourselves it was fun.
CW (01:16:14):
So now it is hard to change. Yeah.
KA (01:16:16):
Well, this is what I got started on. My first processor was the BASIC Stamp from Parallax.
CW (01:16:21):
Oh, yeah! Yeah. Okay. I had one of those Stamps.
KA (01:16:24):
Then I graduated to the Propeller chip.
CW (01:16:26):
Right.
KA (01:16:26):
I loved writing stuff-
EW (01:16:26):
Lots of cores.
KA (01:16:29):
Yeah, no. It was so cool.
CW (01:16:30):
Weird chip <laugh>.
KA (01:16:33):
Yeah. No, I actually did a whole bunch of drivers in pure assembly, where I would- I think the most proud one I was of is, okay, I did a pixel driver. It could drive 640 by 480 VGA with one core, running at 20 megahertz. What it would do is, it would have a- So the Propeller chip had a character map of 32 kilobytes of ROM mass, that had characters...
CW (01:16:57):
Okay. Yeah.
KA (01:16:58):
The characters were like 16 by 32. Well, 32 by 16 or something. The way the frame buffer worked, the frame buffer just encoded a single byte that told you what character to look up. That meant the frame buffer was way smaller than encoding every pixel as three bytes or so. Because you only had 32 kilobytes of RAM to play with, so you did not have much memory at all. That was the way the frame buffer was worked. You would have a byte that told you what character to present.
(01:17:31):
Then the system would, in assembly, it had to feed a video output system, that was like a shift register that had a certain frequency. So your loop had to hit a certain frequency. There was literally- It was like the difference between one assembly instruction too much, it broke, one too less, it worked. So you were just staring, "How do I get rid of one assembly instruction, to make this function?"
(01:18:00):
Here is the thing I did though, that was so awesome. During the vertical sync time, I would download the pixel maps on where- So you have a mouse cursor, right? I want to do mouse cursor overlay. But how do you do that when your frame buffer is just glyphs? You cannot put a mouse cursor there.
(01:18:19):
So in the vertical blinking time, I actually downloaded the four pixel areas where the mouse cursor was. Loaded that into RAM. Then blitted the mouse cursor on them. Then re-uploaded them back to the main memory, and put them somewhere where the system could seamlessly swap out the character glyphs of the actual character, to the ones that I had blitted a mouse cursor on.
CW (01:18:44):
<laugh>
KA (01:18:46):
So it looked like there was a mouse cursor being overlaid on the image, but the actual frame buffer did not include any mouse cursor in it. I would just look up the XY position of a mouse cursor and present that. It was the coolest thing. I was actually able to make a GUI with text boxes, where you can move a mouse cursor and click a button, and actually do this. And it was 32 kilobytes of RAM.
CW (01:19:09):
This is the kind of stuff I am talking about. <laugh>
KA (01:19:10):
Yes.
EW (01:19:13):
Thinking about it. I miss having to do the math analytical puzzle, where now I spend a lot more time reading and digesting information, which is also good and interesting and useful. But I prefer the information about wasps.
CW (01:19:28):
<laugh>
EW (01:19:31):
So I guess, maybe I was never that interested in computers? No. I am! But I really do miss the sitting down and thinking about the analytics.
CW (01:19:41):
The challenge of there are limits.
EW (01:19:44):
The limits, and I have no limits now.
CW (01:19:45):
I think removing the limits makes it less interesting for us, in some ways.
KA (01:19:50):
Yeah. It is all about the puzzle. What really made me super happy with the SIMD optimization on OpenMV, when you unlock that 200- Okay, the most recent one I mentioned, with erode and dilate? We have had the same code there sitting forever. So this year I was just like, "I need to optimize stuff."
(01:20:07):
We actually scored a contract of Arm, where Arm has recognized what we are doing is so interesting. That we are actually getting paid to optimize some of our code. And produce benchmarks to show, "Hey, you can do serious things with these MCUs. These are real algorithms that people actually use being optimized. Not just the CMSIS DSP library, which sometimes is lower performant than what you could write yourself."
EW (01:20:31):
Definitely.
CW (01:20:33):
Yeah. That is the thing. It is like-
EW (01:20:34):
But it is useful.
CW (01:20:35):
These things move so fast, and I said it already in the show. But if things move slower, people would squeeze out all the performance that they could out of these. Now you do not have to as much, because there is always going to be something faster.
(01:20:47):
I do not know. I do not even know what my point is. But what you are talking about is, Arm did not bother to squeeze out all the performance of their own things in CMSIS.
EW (01:20:55):
They did not have to.
KA (01:20:56):
Yeah.
CW (01:20:57):
Well they should! It is their thing! <laugh>
KA (01:21:00):
Well, I think it is just they did not have the market response. I think that is the new thing happening now though, is now that people are going to be able to see that, "Oh, hey, you could actually replace a more complex system with one of these things." Now there is actually some juice behind, "Oh, maybe we should actually try to see what we could do with these."
(01:21:17):
But back to the algorithms I was mentioning. I got 150% speed up on it, using some of the Cortex SIMD. 150- Sorry, more than that. No, no. Yeah, 150%. That is two and a half times performance. Two and a half times speed up. You go from, "Ah, man, it is kind of slow," to "Wow!"
(01:21:36):
Being able to pull out two and a half times, and four X speed ups, on things. I mean, what do we get from processors nowadays? It is like a 7% speed up, and people are amazed by that. When you are at the 150% level, that is a whole different chip at that point. Anyway.
EW (01:21:58):
Well, we should go. It is time for us to eat. If we start talking again, we will just start talking again for quite a while.
CW (01:22:08):
We will have you back sooner next time.
EW (01:22:10):
Yes. I think that is the key.
KA (01:22:13):
Absolutely. I super enjoyed this. It is awesome talking to folks who have been in embedded systems for a long time. Equally enjoy making the hard things happen, and solving these puzzles.
EW (01:22:25):
Our guest has been Kwabena Agyeman. President, CEO, and co-founder of OpenMV.
CW (01:22:33):
Thanks, Kwabena.
KA (01:22:33):
Thank you.
EW (01:22:35):
Thank you to Christopher for producing and co-hosting. Thank you to our Patreon listener Slack group for your questions. Sorry to John and Tom whose questions I did not get to. And of course, thank you for listening. You can always contact us at show@embedded.fm or hit the contact link on embedded.fm. And now a quote to leave you with.
KA (01:22:57):
I have a nice quote that I actually like to read. So the founder of Luxonis, this is my friend Brandon. He passed away recently, but he has one from Theodore Roosevelt that I absolutely love. It is a little bit long, but I would like to say it.
(01:23:17):
"It is not the critic who counts, not the man who points out how the strong man stumbles, or the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred of dust and sweat and blood. Who strives valiantly, who errs and comes short again and again.
(01:23:31):
Because there is no effort without error and shortcoming. But who does actually strive to do the deeds. Who knows great enthusiasm, the great devotions, who spends himself in a worthy cause. With the best knows in the end, triumph of high achievement. And who at the worst, if he fails, at least he fails daring greatly. So this place shall never be of those cold and timid souls, who neither no victory nor defeat."
(01:23:52):
And that is how it feels when you optimize C code.
CW (01:23:55):
<laugh>