When we last saw my typing robot (Ty), I was adding current sense resistors and studying Reinforcement Learning out of a book. The resistors were easy but the book-learning went badly. Given it is a graduate level textbook, I kept getting lost. I was about 100 pages in before I admitted that a) I was reading the text but not internalizing the material and b) this path would not work for my robot.

A Maze of Twisty Passages

Reinforcement learning (wiki) is where the robot does random stuff and you tell it when it is right or wrong, as though it was an intellectually-challenged cat. It often needs many thousands of repetitions before the system starts to get it right. I have a relatively fragile mechanical arm: thousands of repetitions ensures there will be breakage long before we get to the good parts.

Ideally, I would return to Robot Operating System (ROS) and their simulation systems to do the training. But I still don’t have the mechanical engineering skill to describe the linkages on the arm. I tried learning Fusion 360 from their tutorial but it flashed a lot so I couldn’t understand the tutorial before my brain demanded we depart for less blinky and greener pastures.

I also couldn’t figure out, even with mechanical modelling, how to apply the Reinforcement Learning techniques to my goals. In general, sure: “show the robot what to do” but tactically, what am I going to code? No idea.

I picked up Ian Goodfellow’s book Deep Learning because it seems like the current canonical artificial intelligent text, maybe I needed a better foundation for machine learning and there is a bigger audience for this (here is a YouTube channel that goes through it). This was a much better starting place but it was not at all easy going. And the math-to-code was possible but I still couldn’t see how to get from here to where I wanted to be with Ty.

Self-driving Cars?

Just as I was getting frustrated, Udacity called and asked if I’d be an industry expert for their new introduction to self-driving cars. They were adding a module explaining why they shift from Python to C++. I said yes, but could I see what the students would be doing? That is how I got enrolled into the first term of Udacity’s self-driving car nanodegree. There will be a podcast about this later but I’ll just say that many of the machine learning problems I was struggling with were answered in very satisfactory ways (including how to use AWS without it costing a fortune!). I don’t want to be a shill for the program, but the first term has been very good for me, just the right level between too easy and too hard.

In the class, I learned to understand some of the terms from the Nvidia Jetson’s introductory tutorial. I learned that there are canonical machine learning techniques and they have names (who knew?).

Even better, the techniques aren’t monolithic. There is neural network called AlexNet that can be trained on the ImageNet database for 1000 items (hey, I did a demo about this but now I’ve read the paper!). Note that AlexNet is a way to structure the neural net (like a canonical algorithm for solving a class of problems). It is really easy to replicate.

However, to actually make AlexNet identify objects in an image, you have to train the neural net with real data, generating weights for the neural to apply when it sees something new (want to know more?). This training process is computationally intensive (really, really computationally intensive; it starts with collecting hundreds of gigabytes of data and goes downhill from there). We end up with millions of weights after all this training.

There are fairly standard techniques to take this well-trained, object-identifying AlexNet algorithm instantiation and change out the 1000 objects for a different 1000 objects by modifying part of the weights. I don’t necessarily need a supercomputer (or super AWS instance) calculating for weeks to do this. With something called transfer learning, I keep the structure of AlexNet, and most of the training weights, only changing out the last few thousand weights. It keeps the trained algorithm’s ability to find important curves and patterns but replaces the part where it puts them together to identify the object in an image.

If your brain isn’t starting to shriek in alarm at that, I didn’t explain it right. I can take this huge algorithm, snip off its head, put on a new one and call it my own creation. I don’t need to develop a broad library of millions of images or fuss with multistage back propagation calculations. My laptop can do this. It may have to work all night, but it can make a new object identification system without requiring access to all the other images originally used to train the algorithm.

Still no shrieking? Ok, how about this: someone on the internet makes this AlexNet thing that works well for finding cats for their meme generator. I want to find bats. I can use 100% of their algorithm and 95% of their trained weights to find bats. If you take my bats and find houseboats, unless you slice the algorithm just the way I did, your houseboats-finder has cats and bats lingering.

I think the part I find most distressing is that no one can regenerate any of this. It is all probabilistic and fuzzy. The trading of algorithm weights and partial retraining is terrifying. Neat and powerful and useful and omg, omg, you are applying this to medicine?!?!?!? And self-driving cars?!?

Object Identification vs. Detection

Hyperventilating aside, I still have some problems trying to apply this to Ty. I don’t really want to identify objects as that usually involves identifying one thing in the image.

On the other hand, detection is usually about finding multiple things in an image: detecting people or detecting facial features.

For Ty, I want to detect keys on a keyboard. Happily, there is this new-ish algorithm called YOLO (you only look once) which does object detection. It can detect many objects in one image. And, it runs on my Jetson TX2.

In the YOLO download, they give the weights for their 9000-object detection and identification algorithm. Theoretically, I can use transfer learning on this too. I can de-cap the weights I got from them (which presumably took weeks on a superfast computer to create), change only the top most (on my laptop), and then detect keyboard keys and other things Ty might be interested in (like laser dots, the silly cat). I should note that this is theoretical, these tools are new to me. But they are amazing. And if I can make them work, well, it will be more amazing.

YOLO is described by its code and by a journal paper. The Udacity class has been helpful in getting me to read more journal papers. I’ve found that it really helps if you understand parts of them instead of having puzzle through the whole thing confusedly. This YOLO one is outside my ken and I’m not sure how to get from where I am to understanding this paper. On the other hand, this chasm seems smaller than many of the ones I started with on this project so I’ll keep trying.

(I’ve also started Udacity’s Self-Driving Cars Term 2: Sensor Fusion, Localization, and Control.)