For part 1, see here.
For part 2, see here
The machine is now capable of running un-attended for hours on end, which is a huge milestone. No more jamming or other nastiness that causes me to interrupt a run. Many little things contributed to this, I’ve looked at all the mechanics and figured out all the little things that went wrong one by one and have come up with solutions for them. Some of those were pretty weird, for instance very small Lego pieces went off the transport belt with such high velocities that they could end up pretty much everywhere, the solution for this was to moderate the duration of the puff relative to the size of the component and to make a skirt along the side of the transport belt to make sure that pieces don’t land on the return side of the belt which would cause them to get caught under the roller.
The machine is now twice as fast, this due to a pretty simple change, I’ve doubled the number of drop-off bins from 6 to 12, which reduces the number of passes through the machine. It now takes just 3 passes to get the Lego sorted in the categories below. This required extending the pneumatics with another 6 valves, another manifold and a bunch of wiring and tubing, the expanded controller now looks like this:
I’ve also ground off the old legs from the base (the treadmill) and welded on new ones to give a little bit more space to accomodate the new bins, but that’s pretty boring metal work.
The image database is now approximately 60K images of parts, this has had a positive effect on the accuracy of the recognition, fairly common parts now have very high recognition accuracy, less common parts reasonably high (> 95%), rare parts are still very poor but as more parts pass through the machine the training set gets larger and in time accuracy should improve further. Judging by the increase in accuracy from 16K images to 60K images there is something of a diminishing rate of return here and it will likely be that by the time we reach 98-99% accuracy there will be well over a million images in the training set.
I’ve reduce the image size a bit in the horizontal dimension, from 640 pixels wide to 320 wide. Now that I have more data to work with this seems to give better results, the difference isn’t huge but it is definitely reproducible. The RESNET50 network still seems to give the best compromise between accuracy and training speed. I’ve added some utilities to make it easier to merge new samples into the existing dataset, to detect (and correct) classification errors and to drive the valve solenoids in a more precise way to make sure that valves reliably open and close for precisely determined amounts of time. This also helps a lot in making sure that parts don’t shoot all over the room.
Overall the machine works well but I’m still not happy with the hopper and the first stage belt. It is running much slower than the speed at which the parts recognizer works (30 parts / second) and I really would like to come up with something better. I’ve looked at vibrating bowl feeders and even though they are interesting they are noisy and tend to be set up for one kind of part only. They’re also too slow. If anybody has a bright idea on how to reliably feed the parts then please let me know, it’s a hard problem for which I’m sure there is some kind of elegant and simple solution. I just haven’t found one yet :)
The project has had tremendous coverage from all kinds of interesting publications, IEEE Spectrum had an article, lots of internet based publications listed or linked it (for instance: Mental Floss, Endgadget, Mashable). If you have published or know about another article about the sorter please let me know and I’ll add it to the list.
Of course I have this totally backwards
Starting by buying Lego, sorting it and then thinking about how to best sell the end result is the exact opposite of how you should approach any kind of project, but truth be told I’m in this far more for the technical challenge than for the commercial part. That leaves me with a bit of a problem: The sorter is working so well now that I am actually sitting on piles of sorted Lego that would probably make someone happy. But I have absolutely no idea if the sort classes that I’m currently using are of interest.
So if you’re a fanatical Lego builder I’d very much like to hear from you how you would like to buy Lego parts in somewhat larger quantities, say from 500 Grams (roughly one pound) and up. Another thing I would like to know is where you’re located so I can figure out shipping and handling.
As you can see right now the sorting is mostly by functional groups, slopes with slopes, bricks with bricks and so on. But there are many more possibilities to sort lego, for instance by color. Please let me know in what quantity and what kind of groupings would be the most useful to you as a Lego builder.
My twitter is at twitter.com/jmattheij and my email address is email@example.com, here are some pictures of what the current product classes look like, but these can fairly easily be changed if there is demand for other mixes.
Space and Aircraft:
1 wide bricks:
1 wide plates, modified:
1 wide plates:
Hinges and couplers:
Minifigs and minifig accessories:
2 wide plates:
Plates 6 wide:
Plates 4 wide:
Bricks 2 wide:
Doors and windows:
Bricks 1 wide, modified:
One of my favorite writers of all time, Douglas Adams has a neat little plot device in that wholly remarkable book ‘The Hitch Hikers Guide to the Galaxy’, called the Babel Fish.
Let me quote the master himself to explain the concept of the Babel Fish to you if you’re not already aware of it:
“The Babel fish is small, yellow, leech-like, and probably the oddest thing in the Universe. It feeds on brainwave energy received not from its own carrier, but from those around it. It absorbs all unconscious mental frequencies from this brainwave energy to nourish itself with. It then excretes into the mind of its carrier a telepathic matrix formed by combining the conscious thought frequencies with nerve signals picked up from the speech centres of the brain which has supplied them. The practical upshot of all this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish.”
“Now it is such a bizarrely improbable coincidence that something so mind-bogglingly useful could have evolved purely by chance that some thinkers have chosen to see it as a final and clinching proof of the non-existence of God.”
“The argument goes something like this: ‘I refuse to prove that I exist,’ says God, ‘for proof denies faith, and without faith, I am nothing.’ ‘But, says Man, the Babel fish is a dead giveaway, isn’t it? It could not have evolved by chance. It proves you exist, and, by your own arguments, you don’t. QED.’ ‘Oh dear,’ says God, ‘I hadn’t thought of that,’ and vanishes in a puff of logic. ‘Oh, that was easy,’ says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing.”
“Most leading theologians claim that this argument is a load of dingo’s kidneys, but that didn’t stop Oolon Colluphid making a small fortune when he used it as the theme of his best-selling book, Well That About Wraps It Up For God.”
“Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.”
So, now that you have the general idea of what the Babel Fish was all about, I want you to keep an eye on that last part of the entry in the guide, especially the ‘more and bloodier wars’ bit combined with the ‘removing barriers to communication’.
I’ve seen a question posed in more than one place and that sort of pattern tends to trigger my curiosity. The question has two components: Why is the world moving towards a more authoritarian kind of rule all of a sudden, and why is this happening now.
Me, I blame the Babel Fish. Let me explain. Since 1995 we’ve been working very hard at removing those barriers to communication. There used to be a degree of moderation and a lower bound to the cost of communication, especially across longer distances and to larger numbers of people. It’s one thing to have a thought in your head, quite another to communicate that thought at the long-distance or international rates of 1990 or so no matter how important you think it is and even worse if you want to tell more than one person. But that has changed - dramatically.
The cost of almost all forms of communication, written, voice, video, worldwide to an unbelievably large audience is now essentially zero. The language barrier is still there but automatic translation is getting better and better and it won’t be forever or we really can communicate with everybody, instantaneously. That kind of power - because it is a power, I don’t doubt that one bit - comes with great responsibility.
If what you say or write is heard only by people already in your environment, who know you and who can apply some contextual filters then the damage that you can do is somewhat limited.
But if you start handing out megaphones that can reach untold millions of people in a heartbeat, and combine that with the unfiltered, raw output and responses of another couple of million of people then something qualitatively changes. The cost drop from $0.50 / minute long distance, a photo copy of your manifesto or airtime on a radio station to $0 is far more than a quantitative change. It means that unfiltered ramblings and polarized messages from people that you’d normally have no contact with have immediate access to your brain, and in a quantity that even the most balanced person would find hard to resist. It’s an incessant barrage of updates from all over the globe (this blog is one such input, and you’re reading it, right?). So suddenly the word of some agitator or angry person carries roughly the same weight as a well researched article in a respected newspaper. Our brains do not have a ‘quality of source’ meta-data setting, they either remember the data or they don’t, and before you know it one grade of bullshit starts to re-inforce another and then your brain is polluted with garbage.
You might feel that you are able to process all this information with care but I highly doubt that is effective in the long run, just as there is no such thing as ‘bad advertising’, as long as a brand is seen or heard about it will take root, even if that root is started from a negative position we are still exposed and to some extent defenseless. Do this for a decade or two and the world will change and I firmly believe that is what we are witnessing, and that Douglas Adams totally nailed it when he wrote that removing barriers to communication could become the cause of conflict.
In the present that conflict takes the form of polarization, of splitting harmonious groups of people into camps, and it doesn’t really matter what causes the split. People that are split tend to be much easier to manipulate, to get them to do stuff against their own interest, get them to support causes that they would not support if they were capable of pausing for long enough to think things through, as used to be the norm.
So, to make it specific, this reduction in cost has made it possible to do a number of things:
it allows the manipulation of public opinion on a vast scale
it allows this from all over the globe to everywhere else
it makes it possible for single individuals to communicate broadcast wise with millions of recipients without any kind of filter
it allows the creation of echo chambers so vast that it seems as if the whole world is that chamber and has become representative of the truth
it levels the value of what used to be in print, which required the collusion of a large number of people against the word of an individual
it allows the people on both sides of an argument to duke it out directly
all this happens on a moments notice
If you look at the past, there are other examples of really bad cases of manipulation of public opinion. And those led to predictable and very bad consequences. Today we no longer need large amounts of capital to buy a printing press or a television satellite or radio transmitter, all it takes to wreak havoc worldwide and to put people up against each other is an internet connection.
In closing, I know Douglas Adams wrote fiction, but he also was a very smart cookie. Removing barriers is generally good, and should be welcomed. But we also should be aware that those barriers may have had positive sides and that as a species we are not very well positioned to deal with such immense changes in a very short time. We seem to need some time to react, time to grow some thicker skin lest we’re overly vulnerable and allow ourselves to be goaded into making big mistakes, such as accidentally empowering authoritarian regimes, which tend to be very capable when it comes to using communications systems for propaganda purposes.
Great power comes with great responsibility, the power to communicate with anybody instantaneously at zero cost is such a power.
Edit: HN User tarr11 linked this piece by DNA about the internet (some users report the link does not work but it works for me, strange).
Edit2: And HN User acabal points out that the fact that anonymity is so easy to come by is also an important factor.
If there is one thing that never ceases to amaze me it is that the hacker community tends to place itself outside and by their own perception above politics. This is evidenced in many ways including ‘safe spaces’ and moratoria on discussing anything political because it has no bearing on the more interesting bits of IT.
What bugs me about this is that anything you make or do has a political dimension, and that hackers, more than any other profession, create the tools and the means with which vast changes in the political landscape are effected. It’s as if arms dealers and manufacturers refuse to talk about war, the ultimate consequence of the tools they create in the environment where they will be used.
Both from an ethical viewpoint as well as from one related to personal responsibility this is simply wrong. The ability to influence with disproportional effect on the outcome of all kinds of political affairs compared to someone not active in IT, the ability to reach large numbers of people, the ability to pull on very long levers, far longer than you’d normally be able to achieve comes with some obligations.
Hackers, computer programmers and associated groups can not afford this Ostrich mentality, burying their head in the ground as to the consequences of their work as long as they can play with their shiny toys. Between ‘Wikileaks’ and ‘Cambridge Analytica’ it should be more than clear by now that computer programming as a trade has effects that are felt the world over, and that if you feel that you should be granted a safe, politics free space to discuss your trade then that probably should be limited to hobby programming only. As soon as you and your software hit the real world politics will rear its ugly head.
One of the best examples to me are the disconnect between Paul Graham’s (founder of Hacker News) tweet where he’s shocked there is a 16% chance of winning the presidency and the Hacker News Political Detox Week.
As if that was needed, HN has a tendency to try to squelch any political debate anyway.
Whether you’re working on some cool ad technology, a way for people to reach others with 140 character bursts of text, a way for people to connect to their class-mates, a way to make it easier for people to find information on the web or to collect all the news outlets of the world in one portal everything has a political dimension and sometimes that political dimension can overshadow all other aspects of the project. This translates into an obligation to engage the political angle of whatever it is that we collectively produce in order to minimize feelings of regret later on and to really help to make the world a better place, rather than just to pay lipservice to that concept.
You simply can not afford to stick your head in the ground and your fingers in your ears because you don’t like politics, if you’re not careful you may end up complaining about the end result of your own product. So if what just about every hacker is proud to claim is true (that they are ‘busy improving the world’) then you can’t afford to ignore politics any more than a manufacturer of weapons can afford to know nothing about armed conflict. Because whether you like it or not your work product will be used in ways you may not have thought about, and could even be used against you.
edit: predictably, this was posted to HN, equally predictably, it got flagged off the homepage by the Ostrich brigade because just talking about political responsibility is politics and we really can’t be exposed to that. The overvaluing of Silicon Valley Unicorns is still riding happily at #2.
It happens at least once in the lifetime of every programmer, project manager or teamleader. You get handed a steaming pile of manure, if you’re lucky only a few million lines worth, the original programmers have long ago left for sunnier places and the documentation - if there is any to begin with - is hopelessly out of sync with what is presently keeping the company afloat.
Your job: get us out of this mess.
After your first instinctive response (run for the hills) has passed you start on the project knowing full well that the eyes of the company senior leadership are on you. Failure is not an option. And yet, by the looks of what you’ve been given failure is very much in the cards. So what to do?
I’ve been (un)fortunate enough to be in this situation several times and me and a small band of friends have found that it is a lucrative business to be able to take these steaming piles of misery and to turn them into healthy maintainable projects. Here are some of the tricks that we employ:
Before you start to do anything at all make a backup of everything that might be relevant. This to make sure that no information is lost that might be of crucial importance somewhere down the line. All it takes is a silly question that you can’t answer to eat up a day or more once the change has been made. Especially configuration data is susceptible to this kind of problem, it is usually not versioned and you’re lucky if it is taken along in the periodic back-up scheme. So better safe than sorry, copy everything to a very safe place and never ever touch that unless it is in read-only mode.
Important pre-requisite, make sure you have a build process and that it actually produces what runs in production
I totally missed this step on the assumption that it is obvious and likely already in place but many HN commenters pointed this out and they are absolutely right: step one is to make sure that you know what is running in production right now and that means that you need to be able to build a version of the software that is - if your platform works that way - byte-for-byte identical with the current production build. If you can’t find a way to achieve this then likely you will be in for some unpleasant surprises once you commit something to production. Make sure you test this to the best of your ability to make sure that you have all the pieces in place and then, after you’ve gained sufficient confidence that it will work move it to production. Be prepared to switch back immediately to whatever was running before and make sure that you log everything and anything that might come in handy during the - inevitable - post mortem.
Freeze the DB
If at all possible freeze the database schema until you are done with the first level of improvements, by the time you have a solid understanding of the codebase and the legacy code has been fully left behind you are ready to modify the database schema. Change it any earlier than that and you may have a real problem on your hand, now you’ve lost the ability to run an old and a new codebase side-by-side with the database as the steady foundation to build on. Keeping the DB totally unchanged allows you to compare the effect your new business logic code has compared to the old business logic code, if it all works as advertised there should be no differences.
Write your tests
Before you make any changes at all write as many end-to-end and integration tests as you can. Make sure these tests produce the right output and test any and all assumptions that you can come up with about how you think the old stuff works (be prepared for surprises here). These tests will have two important functions: they will help to clear up any misconceptions at a very early stage and they will function as guardrails once you start writing new code to replace old code.
Automate all your testing, if you’re already experienced with CI then use it and make sure your tests run fast enough to run the full set of tests after every commit.
Instrumentation and logging
If the old platform is still available for development add instrumentation. Do this in a completely new database table, add a simple counter for every event that you can think of and add a single function to increment these counters based on the name of the event. That way you can implement a time-stamped event log with a few extra lines of code and you’ll get a good idea of how many events of one kind lead to events of another kind. One example: User opens app, User closes app. If two events should result in some back-end calls those two counters should over the long term remain at a constant difference, the difference is the number of apps currently open. If you see many more app opens than app closes you know there has to be a way in which apps end (for instance a crash). For each and every event you’ll find there is some kind of relationship to other events, usually you will strive for constant relationships unless there is an obvious error somewhere in the system. You’ll aim to reduce those counters that indicate errors and you’ll aim to maximize counters further down in the chain to the level indicated by the counters at the beginning. (For instance: customers attempting to pay should result in an equal number of actual payments received).
This very simple trick turns every backend application into a bookkeeping system of sorts and just like with a real bookkeeping system the numbers have to match, as long as they don’t you have a problem somewhere.
This system will over time become invaluable in establishing the health of the system and will be a great companion next to the source code control system revision log where you can determine the point in time that a bug was introduced and what the effect was on the various counters.
I usually keep these counters at a 5 minute resolution (so 12 buckets for an hour), but if you have an application that generates fewer or more events then you might decide to change the interval at which new buckets are created. All counters share the same database table and so each counter is simply a column in that table.
Change only one thing at the time
Do not fall into the trap of improving both the maintainability of the code or the platform it runs on at the same time as adding new features or fixing bugs. This will cause you huge headaches because you now have to ask yourself every step of the way what the desired outcome is of an action and will invalidate some of the tests you made earlier.
If you’ve decided to migrate the application to another platform then do this first but keep everything else exactly the same. If you want you can add more documentation or tests, but no more than that, all business logic and interdependencies should remain as before.
The next thing to tackle is to change the architecture of the application (if desired). At this point in time you are free to change the higher level structure of the code, usually by reducing the number of horizontal links between modules, and thus reducing the scope of the code active during any one interaction with the end-user. If the old code was monolithic in nature now would be a good time to make it more modular, break up large functions into smaller ones but leave names of variables and data-structures as they were.
HN user mannykannot points - rightfully - out that this is not always an option, if you’re particularly unlucky then you may have to dig in deep in order to be able to make any architecture changes. I agree with that and I should have included it here so hence this little update. What I would further like to add is if you do both do high level changes and low level changes at least try to limit them to one file or worst case one subsystem so that you limit the scope of your changes as much as possible. Otherwise you might have a very hard time debugging the change you just made.
Low level refactoring
By now you should have a very good understanding of what each module does and you are ready for the real work: refactoring the code to improve maintainability and to make the code ready for new functionality. This will likely be the part of the project that consumes the most time, document as you go, do not make changes to a module until you have thoroughly documented it and feel you understand it. Feel free to rename variables and functions as well as datastructures to improve clarity and consistency, add tests (also unit tests, if the situation warrants them).
Now you’re ready to take on actual end-user visible changes, the first order of battle will be the long list of bugs that have accumulated over the years in the ticket queue. As usual, first confirm the problem still exists, write a test to that effect and then fix the bug, your CI and the end-to-end tests written should keep you safe from any mistakes you make due to a lack of understanding or some peripheral issue.
If required after all this is done and you are on a solid and maintainable codebase again you have the option to change the database schema or to replace the database with a different make/model altogether if that is what you had planned to do. All the work you’ve done up to this point will help to assist you in making that change in a responsible manner without any surprises, you can completely test the new DB with the new code and all the tests in place to make sure your migration goes off without a hitch.
Execute on the roadmap
Congratulations, you are out of the woods and are now ready to implement new functionality.
Do not ever even attempt a big-bang rewrite
A big-bang rewrite is the kind of project that is pretty much guaranteed to fail. For one, you are in uncharted territory to begin with so how would you even know what to build, for another, you are pushing all the problems to the very last day, the day just before you go ‘live’ with your new system. And that’s when you’ll fail, miserably. Business logic assumptions will turn out to be faulty, suddenly you’ll gain insight into why that old system did certain things the way it did and in general you’ll end up realizing that the guys that put the old system together weren’t maybe idiots after all. If you really do want to wreck the company (and your own reputation to boot) by all means, do a big-bang rewrite, but if you’re smart about it this is not even on the table as an option.
So, the alternative, work incrementally
To untangle one of these hairballs the quickest path to safety is to take any element of the code that you do understand (it could be a peripheral bit, but it might also be some core module) and try to incrementally improve it still within the old context. If the old build tools are no longer available you will have to use some tricks (see below) but at least try to leave as much of what is known to work alive while you start with your changes. That way as the codebase improves so does your understanding of what it actually does. A typical commit should be at most a couple of lines.
Every change along the way gets released into production, even if the changes are not end-user visible it is important to make the smallest possible steps because as long as you lack understanding of the system there is a fair chance that only the production environment will tell you there is a problem. If that problem arises right after you make a small change you will gain several advantages:
- it will probably be trivial to figure out what went wrong
- you will be in an excellent position to improve the process
- and you should immediately update the documentation to show the new insights gained
Use proxies to your advantage
If you are doing web development praise the gods and insert a proxy between the end-users and the old system. Now you have per-url control over which requests go to the old system and which you will re-route to the new system allowing much easier and more granular control over what is run and who gets to see it. If your proxy is clever enough you could probably use it to send a percentage of the traffic to the new system for an individual URL until you are satisfied that things work the way they should. If your integration tests also connect to this interface it is even better.
Yes, but all this will take too much time!
Well, that depends on how you look at it. It’s true there is a bit of re-work involved in following these steps. But it does work, and any kind of optimization of this process makes the assumption that you know more about the system than you probably do. I’ve got a reputation to maintain and I really do not like negative surprises during work like this. With some luck the company is already on the skids, or maybe there is a real danger of messing things up for the customers. In a situation like that I prefer total control and an iron clad process over saving a couple of days or weeks if that imperils a good outcome. If you’re more into cowboy stuff - and your bosses agree - then maybe it would be acceptable to take more risk, but most companies would rather take the slightly slower but much more sure road to victory.
For part 1, see here.
Overview of the software components
All the software written for this project is in Python. I’m not an expert python programmer, far from it but the huge number of available libraries and the fact that I can make some sense of it all without having spent a lifetime in Python made this a fairly obvious choice. There is a python distribution called Anaconda which takes the sting out of maintaining a working python setup. Python really sucks at this, it is quite hard to resolve all the interdependencies and version issues, using ‘pip’ and the various ways in which you can set up a virtual environment is a complete nightmare once things get over a certain complexity level. Anaconda makes that all managable and it gets top marks from me for that.
The Lego sorter software consists of several main components, there is the frame grabber which takes images from the camera:
Scanner / Stitcher
Then, after the grabber has done it’s work it sends the image to the stitcher which does two things: the first thing it does is determine how much the belt with the parts on it has moved since the previoue frame (that’s the function of that wavy line in the videos in part 1, that wavy line helps to keep track of the belt position even when there are no parts on the belt), and then it will update an in-memory image with the newly scanned bit of what’s under the camera. Whenever there is a vertical break between parts the stitched part gets cut and the newly scanned part gets sent on.
All this is done using OpenCV,
After the scanner/stitcher has done its job a part image looks like this:
Stitching takes care of the situation where a part is longer than what fits under the camera in one go.
This is where things get interesting. So, I’ve built this part several times now, to considerably annoyance.
The first time I was just using OpenCV primitives, especially contour matching and circle detection. Between those two it was possible to do a reasonably accurate recognition of parts as long as there were not too many different kinds of parts. This, together with some simple meta data (l, w, h of the part) can tell the difference between all the basic lego bricks, but not much more than that.
So, back to the drawing board: enter Bayes. Bayes classifiers are fairly well understood, you basically engineer a bunch of features, build detectors for those, create a test-set to verify that your detector works as advertised and you try to crank up the discriminating power of those features as much as you can. This you then run over as large a set of test images as you can to determine the ‘priors’ that will form the basis for the relative weighing of each feature as it is detected to be ‘true’ (feature is present) or ‘false’ (feature is not present). I used this to make a classifier based on the following features:
- cross (two lines meeting somewhere in the middle)
- circle (the part contains a circle larger than a stud)
- edge_studs (studs visible edge-on)
- full (the part occupies a large fraction of its outer perimeter)
- holes (there are holes in the part)
- holethrough (there are holes all the way through the part)
- plate (the part is roughly a plate high)
- rect (the part is rectangular)
- slope (the part has a sloped portion)
- skinny (the part occupies a small fraction of its outer perimeter)
- square (the part is roughly square)
- studs (the part has studs visible)
- trans (the part is transparent)
- volume (the volume of the part in cubic mm)
- wedge (the part has a wedge shape)
And possibly others… This took quite a while. It may seem trivial to build a ‘studs detector’ but that’s not so simple. You have to keep in mind that the studs could be in any orientation, that there are many bits that look like studs but really aren’t and that the part could be upside-down or facing away from the camera. Similar problems with just about every feature so you end up tweaking a lot to get to acceptable performance for individual features. But once you have all that working you get a reasonable classifier for a much larger spectrum of parts.
Even so, this is far from perfect: it is slow, with every category you add you’re going to be doing more work in order to figure out which category a part is. The ‘best match’ can come from a library of parts which itself is growing so there is a nice geometrical element to the amount of computer time spent. Accuracy was quite impressive but in the end I abandoned this approach because of the speed (it could not keep up with the machine) and changed to the next promising candidate, an elimination based system.
The elimination system used the same criteria as the ones listed before. Sorting the properties in decreasing order of effectiveness allowed a very rapid elimination of non-candidates, and so the remainder could be processed quite efficiently. This was the first time the software was able to keep up with the machine running full-speed.
There are a couple of problems with this approach: once something is eliminated, it won’t be back, even if it was the right part after all. The fact that it is a rather ‘binary’ approach really limits the accuracy, so you’d need a huge set of data to make this work, and that would probably reduce the overall effectiveness quite a bit.
It also ends up quite frequently eliminating all the candidates, which doesn’t help at all. So, accuracy wasn’t fantastic and fixing the accuracy would likely undo most of the speed gains.
Tree based classification
This was an interesting idea. I made a little tree along the lines of the Animal Guessing Game. Every time you add a new item to the tree it will figure out which of the features are different and it will then split the node at which the last common ancestor was found to accomodate the new part. This had some significant advantages over the elimination method: the first is that you can have a part in multiple spots in the tree which really helps accuracy. The second is that it is lightning fast compared to all the previous methods.
But it still has a significant drawback: you need to manually create all the features first and that gets really tedious, assuming you can even find ‘clear’ enough features that you can write a straight up feature detector using nothing but OpenCV primitives. And that can get challenging fast, especially because python is a rather slow language and if your problem can’t be expressed in numpy or OpenCV library calls you’ll be looking at a huge speed penalty.
Finally! So, after roughly 6 months of coding up features, writing tests and scanning parts I’d had enough. I realized that there is absolutely no way that I’ll be able to write a working classifier for the complete spectrum of parts that Lego offers and that was a real let-down.
So, I decided to bite the bullet and get into machine learning in a more serious manner. For weeks I read papers, studied all kinds of interesting bits and pieces regarding Neural Networks.
I had already played with when they first became popular in the 1980’s after reading a very interesting book on a related subject. I used some of the ideas in the book to rescue a project that was due in a couple of days where someone had managed to drop a coin into the only prototype of a Novix based Forth computer that was supposed to be used for a demonstration of automatic license plate recognition. So, I hacked together a bit of C code with some DSP32 code to go with it and made the demo work, and promptly forgot about the whole thing.
A lot has happened in the land of Neural Networks since then, the most amazing thing is that the field almost died and now it is going through this incredible rennaisance powering all kinds of real world solutions. We owe all that to a guy called Geoffrey Hinton who just simply did not give up and turned the world of image classification upside down by winning a competition in a most unusual manner.
After that it seemed as if a dam had been broken, and one academic record after another was beaten with huge strides forward in accuracy for tasks that historically had been very hard for computers (vision, speech recognition, natural language processing).
So, lots of studying later I had settled on using TensorFlow, a huge library of very high quality produced by the Google Brain Team, where some of the smartest people in this field in the world are collaborating. Google has made the library open-source and it is now the foundation of lots of machine learning projects. There is a steep learning curve though, and for quite a while I found myself stuck on figuring out how to proceed best.
And then several things happened in a very short time: about two months ago HN user greenpizza13 pointed me at Keras, rather than going the long way around and using TensorFlow directly (and Anaconda actually does save you from having to build TensorFlow). And this in turn led me to Jeremy Howard and Rachel Thomas’ excellent starter course on machine learning.
Within hours (yes, you read that well) I had surpassed all of the results that I had managed to painfully scrounge together feature-by-feature over the preceding months, and within several days I had the sorter working in real time for the first time with more than a few classes of parts. To really appreciate this a bit more: approximately 2000 lines of feature detection code, another 2000 or so of tests and glue was replaced by less than 200 lines of (quite readable) Keras code, both training and inference.
The speed difference and ease of coding was absolutely incredible compared to the hand-coded features. While not quite as fast as the tree mechanism accuracy was much higher and the ability to generalize the approach to many more classes without writing code for more features made for a much more predictable path.
The hard challenge to deal with next was to get a training set large enough to make working with 1000+ classes possible. At first this seemed like an insurmountable problem. I could not figure out how to make enough images and to label them by hand in acceptable time, even the most optimistic calculations had me working for 6 months or longer full-time in order to make a data set that would allow the machine to work with many classes of parts rather than just a couple.
In the end the solution was staring me in the face for at least a week before I finally clued in: it doesn’t matter. All that matters is that the machine labels its own images most of the time and then all I need to do is correct its mistakes. As it gets better there will be fewer mistakes. This very rapidly expanded the number of training images. The first day I managed to hand-label about 500 parts. The next day the machine added 2000 more, with about half of those labeled wrong. The resulting 2500 parts where the basis for the next round of training 3 days later, which resulted in 4000 more parts, 90% of which were labeled right! So I only had to correct some 400 parts, rinse, repeat… So, by the end of two weeks there was a dataset of 20K images, all labeled correctly.
This is far from enough, some classes are severely under-represented so I need to increase the number of images for those, perhaps I’ll just run a single batch consisting of nothing but those parts through the machine. No need for corrections, they’ll all be labeled identically.
I’ve had lots of help in the last week since I wrote the original post, but I’d like to call out two people by name because they’ve been instrumental in improving the software and increasing my knowledge, the first is Jeremy Howard, who has gone over and beyond the call of duty to fill in the gaps in my knowledge, without his course I would have never gotten off the ground in the first place, and second Francois Chollet, the maker of Keras who has been extremely helpful in providing a custom version of his Xception model to help speed up training.
Right now training speed is the bottle-neck, and even though my Nvidia GPU is fast it is not nearly as fast as I would like it to be. It takes a few days to generate a new net from scratch but I simply don’t think it is responsible to splurge on a 4 GPU machine in order to make this project go faster. Patience is not exactly my virtue but it looks as though I’ll have to get more of it. At some point all the software and data will be made open source, but I still have a long way to go before it is ready for that.
Once the software is able to reliably classify the bulk of the parts I’ll be pusing through the huge mountain of bricks, and after that I’ll start selling off the result, both sorted parts as well as old sets.
Finally, to close off this post, an image of the very first proof-of-concept, entirely made out of Lego: