Jacques Mattheij

Technology, Coding and Business

The Web in 2050

If you’re reading this page it means that you are accessing a ‘darknet’ web page. Darknets used to refer to places where illicit drugs and pornography were traded, these days it refers to lonely servers without any inbound links languishing away in dusty server rooms that people have all but forgotten about. Refusing to submit to either one of two remaining overlords these servers sit traffic less and mostly idle (load average: 0.00) except for when the daily automated back-up time rolls around. Waiting for a renaissance heralded by the arrival of a packet on port 80 or 443 of the WWW as it once was known, a place where websites freely linked to each other. Following a link felt a bit like biting into a chocolate bon-bon, you never quite knew whether you were going to like it or be disgusted by it but it would never cease to surprise you.

In 1990, when the web was first started up, there was exactly one website, http://info.cern.ch . There was nothing to link to, and nobody linking in, the page was extremely simple without fancy graphics or eye candy, just pure information. Much like this lonely server here today. In January 1991, so pretty soon after, this was followed by the first webservers outside of CERN being switched on, making it possible for the first time to traverse the ‘web’ from one institution to another.

It was about as glamorous as you’d expect any non-event to be, but the consequences would be enormous. To reference another darknet site, the long since defunct w3 standards body, whose website which is miraculously still available In March of 1993 web traffic accounted for 0.1% of all traffic, by September it was 1%, with a grand total of 623 websites available at the end of the year. Anybody with a feeling for numbers seeing these knows what’s coming next, by the end of 1994 there were 10,000 websites and a year later the number was 25000. We skip a few years to 2014 when there were a bit under a billion websites live.

So, here we are in March 2050 and as of yesterday, when Amazon gave up the fight for the open web and decided to join Google there are only two websites left. What went wrong?

Two companies deserve extra attention when it comes to murdering the web: Google and Facebook, as you all know the last two giants standing after an epic battle that lasted for decades for control of the most important resource all of us consume daily: information. Whether you’re a Googler or a Facebookie, we can all agree even if our corporate sponsor might not that it seems as if there is less choice these days. If you’re under 30 you won’t remember a time before Google or Facebook were dominant. But you will remember some of the giants of old: Microsoft, The New York Times, The Washington Post, CNN and so on, the list is endless. If you’re over 50 you might just remember the birth of Google, with their famous motto ‘Don’t be Evil’. But as we all know the road to hell is paved with good intentions, and not much later (in 2004) Facebook came along with their promise to ‘connect the world’. Never mind that it was already well connected by that time, but it sure sounded good.

Goofy kid billionaires and benevolent corporations, we were in very good hands.

But somewhere between 2010 and 2020 the tone started to change. The two giants collided with each other more and more frequently and forcefully for control of the web, and in a way the endgame could already be seen as early as 2017. Instead of merely pointing to information Facebook and Google (and many others, but they are now corpses on the battlefield) sought to make their giant audiences available with them as the front door only. Many tricks, some clean and some dirty were deployed in order to force users to consume their content via one of the portals with the original websites being reduced to the role of mere information providers.

This was quite interesting in and of itself because we’d already been there before. In the 80’s and 90’s there was a system called Viditel in Europe (and called ‘minitel’ in France) which worked quite well. The main architecture was based around Telecommunications providers (much like Google and Facebook are today, after their acquisition war of the ‘roaring 20’s’ left them in the posession of all of the worlds Telcos and by running into the ground the ones that wouldn’t budge through ‘free’ competition subsidized from other revenue streams). These telco’s would enter into contracts with information providers which in turn would give the telco’s a cut of the fee on every page. In a way today’s situation is exactly identical with one twist, the information providers provide the information for free in the hope of getting a cut of the advertising money Google or Facebook make from repackaging and in some cases reselling the content. The funny thing - to me, but it is bittersweet - is that when we finally had an open standard and everybody could be a publisher on the WWW we were so happy, it was as if we had managed to break free from a stranglehold, no longer wondering whether or not the telco would shut down our pages for writing something disagreeable, no more disputes about income due without risking being terminated (and after being terminated by both Google and Facebook, where will you go today, a darknet site?).

Alas, it all appears to have been for nought. In the ‘quiet 30’s’ the real consolidation happened, trench wars were being fought with users being given a hard choice: Join Google and your Facebook presence will be terminated and vice versa. The gloves were definitly off, it was ‘us’ versus ‘them’. Political parties clued in to this and made their bets, roughly half ended up in Google’s bin and the other half with Facebook. Zuckerberg running for president as a Republican candidate in the United States more or less forced the Democratic party to align with Google setting the stage for a further division of the web. Some proud independents tried to maintain their own presence on the web but soon faded into irrelevance. Famillies split up over the ‘Facebook or Google’ question, independent newspapers (at first joining the AMP bandwagon not realising this was the trojan horse that led to their eventual demise) ceased to exist. The Washington Post ending up with Team Google probably was predictable and may have been a big factor in Facebook going after Amazon with a passion. One by one what used to be independent webservers converted into information providers to fewer and fewer silos or risked becoming completely irrelevant. Regulators were powerless to do anything about it because each and every change was incremental, ostensibly for the greater good and after all: totally made out of free will.

The ‘silent 40’s’ were different. No longer was there any doubt about how this would all end, if the battle for the WWW had been in full swing a decade earlier this was the mopping up stage, the fight for scraps with one last big prize left. An aging Richard Stallman throwing his towel into the ring and switching off stallman.org rather than declaring allegiance to either giant was a really sad day. Amazon fought to the bitter end, trying to stay in the undivided middle ground between Google and Facebook. But dwindling turnover and Facebook’s launch of a direct competitor (‘PrimeFace’) forced their hand yesterday. And now with Google sending free Googler t-shirts to all of the former members of team-Amazon it comes to a close.

Well, maybe. There is still this website and info.cern.ch is also still up. So maybe we can reboot the web in some form after all, two sites can make a network, even if they don’t have users. Or are there? Is anybody still reading this?

Sorting 2 Tons of Lego, Many Questions, Results

For part 1, see here. For part 2, see here

Reliability

The machine is now capable of running un-attended for hours on end, which is a huge milestone. No more jamming or other nastiness that causes me to interrupt a run. Many little things contributed to this, I’ve looked at all the mechanics and figured out all the little things that went wrong one by one and have come up with solutions for them. Some of those were pretty weird, for instance very small Lego pieces went off the transport belt with such high velocities that they could end up pretty much everywhere, the solution for this was to moderate the duration of the puff relative to the size of the component and to make a skirt along the side of the transport belt to make sure that pieces don’t land on the return side of the belt which would cause them to get caught under the roller.

Speed

The machine is now twice as fast, this due to a pretty simple change, I’ve doubled the number of drop-off bins from 6 to 12, which reduces the number of passes through the machine. It now takes just 3 passes to get the Lego sorted in the categories below. This required extending the pneumatics with another 6 valves, another manifold and a bunch of wiring and tubing, the expanded controller now looks like this:

I’ve also ground off the old legs from the base (the treadmill) and welded on new ones to give a little bit more space to accomodate the new bins, but that’s pretty boring metal work.

Accuracy

The image database is now approximately 60K images of parts, this has had a positive effect on the accuracy of the recognition, fairly common parts now have very high recognition accuracy, less common parts reasonably high (> 95%), rare parts are still very poor but as more parts pass through the machine the training set gets larger and in time accuracy should improve further. Judging by the increase in accuracy from 16K images to 60K images there is something of a diminishing rate of return here and it will likely be that by the time we reach 98-99% accuracy there will be well over a million images in the training set.

Software

I’ve reduce the image size a bit in the horizontal dimension, from 640 pixels wide to 320 wide. Now that I have more data to work with this seems to give better results, the difference isn’t huge but it is definitely reproducible. The RESNET50 network still seems to give the best compromise between accuracy and training speed. I’ve added some utilities to make it easier to merge new samples into the existing dataset, to detect (and correct) classification errors and to drive the valve solenoids in a more precise way to make sure that valves reliably open and close for precisely determined amounts of time. This also helps a lot in making sure that parts don’t shoot all over the room.

Mechanics

Overall the machine works well but I’m still not happy with the hopper and the first stage belt. It is running much slower than the speed at which the parts recognizer works (30 parts / second) and I really would like to come up with something better. I’ve looked at vibrating bowl feeders and even though they are interesting they are noisy and tend to be set up for one kind of part only. They’re also too slow. If anybody has a bright idea on how to reliably feed the parts then please let me know, it’s a hard problem for which I’m sure there is some kind of elegant and simple solution. I just haven’t found one yet :)

Media

The project has had tremendous coverage from all kinds of interesting publications, IEEE Spectrum had an article, lots of internet based publications listed or linked it (for instance: Mental Floss, Endgadget, Mashable). If you have published or know about another article about the sorter please let me know and I’ll add it to the list.

Of course I have this totally backwards

Starting by buying Lego, sorting it and then thinking about how to best sell the end result is the exact opposite of how you should approach any kind of project, but truth be told I’m in this far more for the technical challenge than for the commercial part. That leaves me with a bit of a problem: The sorter is working so well now that I am actually sitting on piles of sorted Lego that would probably make someone happy. But I have absolutely no idea if the sort classes that I’m currently using are of interest.

So if you’re a fanatical Lego builder I’d very much like to hear from you how you would like to buy Lego parts in somewhat larger quantities, say from 500 Grams (roughly one pound) and up. Another thing I would like to know is where you’re located so I can figure out shipping and handling.

As you can see right now the sorting is mostly by functional groups, slopes with slopes, bricks with bricks and so on. But there are many more possibilities to sort lego, for instance by color. Please let me know in what quantity and what kind of groupings would be the most useful to you as a Lego builder.

My twitter is at twitter.com/jmattheij and my email address is jacques@mattheij.com, here are some pictures of what the current product classes look like, but these can fairly easily be changed if there is demand for other mixes.

Technic:

Fences:

Space and Aircraft:

Slopes:

Wedge plates:

Vehicle parts:

Wheels:

1 wide bricks:

1 wide plates, modified:

1 wide plates:

Hinges and couplers:

Minifigs and minifig accessories:

2 wide plates:

Tiles:

Round:

Decorated:

Arches:

Plates 6 wide:

Plates 4 wide:

Baseplates:

Bricks 2 wide:

Doors and windows:

Construction equipment:

Brackets:

Cupboards:

Bricks 1 wide, modified:

Macaroni pieces:

Corner pieces:

Turntables:

Flags:

Vegetation:

Wedges:

Helicopter blades:

Stepped pieces:

I Blame The Babel Fish

One of my favorite writers of all time, Douglas Adams has a neat little plot device in that wholly remarkable book ‘The Hitch Hikers Guide to the Galaxy’, called the Babel Fish.

Let me quote the master himself to explain the concept of the Babel Fish to you if you’re not already aware of it:

“The Babel fish is small, yellow, leech-like, and probably the oddest thing in the Universe. It feeds on brainwave energy received not from its own carrier, but from those around it. It absorbs all unconscious mental frequencies from this brainwave energy to nourish itself with. It then excretes into the mind of its carrier a telepathic matrix formed by combining the conscious thought frequencies with nerve signals picked up from the speech centres of the brain which has supplied them. The practical upshot of all this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish.”

“Now it is such a bizarrely improbable coincidence that something so mind-bogglingly useful could have evolved purely by chance that some thinkers have chosen to see it as a final and clinching proof of the non-existence of God.”

“The argument goes something like this: ‘I refuse to prove that I exist,’ says God, ‘for proof denies faith, and without faith, I am nothing.’ ‘But, says Man, the Babel fish is a dead giveaway, isn’t it? It could not have evolved by chance. It proves you exist, and, by your own arguments, you don’t. QED.’ ‘Oh dear,’ says God, ‘I hadn’t thought of that,’ and vanishes in a puff of logic. ‘Oh, that was easy,’ says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing.”

“Most leading theologians claim that this argument is a load of dingo’s kidneys, but that didn’t stop Oolon Colluphid making a small fortune when he used it as the theme of his best-selling book, Well That About Wraps It Up For God.”

“Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.”

So, now that you have the general idea of what the Babel Fish was all about, I want you to keep an eye on that last part of the entry in the guide, especially the ‘more and bloodier wars’ bit combined with the ‘removing barriers to communication’.

I’ve seen a question posed in more than one place and that sort of pattern tends to trigger my curiosity. The question has two components: Why is the world moving towards a more authoritarian kind of rule all of a sudden, and why is this happening now.

Me, I blame the Babel Fish. Let me explain. Since 1995 we’ve been working very hard at removing those barriers to communication. There used to be a degree of moderation and a lower bound to the cost of communication, especially across longer distances and to larger numbers of people. It’s one thing to have a thought in your head, quite another to communicate that thought at the long-distance or international rates of 1990 or so no matter how important you think it is and even worse if you want to tell more than one person. But that has changed - dramatically.

The cost of almost all forms of communication, written, voice, video, worldwide to an unbelievably large audience is now essentially zero. The language barrier is still there but automatic translation is getting better and better and it won’t be forever or we really can communicate with everybody, instantaneously. That kind of power - because it is a power, I don’t doubt that one bit - comes with great responsibility.

If what you say or write is heard only by people already in your environment, who know you and who can apply some contextual filters then the damage that you can do is somewhat limited.

But if you start handing out megaphones that can reach untold millions of people in a heartbeat, and combine that with the unfiltered, raw output and responses of another couple of million of people then something qualitatively changes. The cost drop from $0.50 / minute long distance, a photo copy of your manifesto or airtime on a radio station to $0 is far more than a quantitative change. It means that unfiltered ramblings and polarized messages from people that you’d normally have no contact with have immediate access to your brain, and in a quantity that even the most balanced person would find hard to resist. It’s an incessant barrage of updates from all over the globe (this blog is one such input, and you’re reading it, right?). So suddenly the word of some agitator or angry person carries roughly the same weight as a well researched article in a respected newspaper. Our brains do not have a ‘quality of source’ meta-data setting, they either remember the data or they don’t, and before you know it one grade of bullshit starts to re-inforce another and then your brain is polluted with garbage.

You might feel that you are able to process all this information with care but I highly doubt that is effective in the long run, just as there is no such thing as ‘bad advertising’, as long as a brand is seen or heard about it will take root, even if that root is started from a negative position we are still exposed and to some extent defenseless. Do this for a decade or two and the world will change and I firmly believe that is what we are witnessing, and that Douglas Adams totally nailed it when he wrote that removing barriers to communication could become the cause of conflict.

In the present that conflict takes the form of polarization, of splitting harmonious groups of people into camps, and it doesn’t really matter what causes the split. People that are split tend to be much easier to manipulate, to get them to do stuff against their own interest, get them to support causes that they would not support if they were capable of pausing for long enough to think things through, as used to be the norm.

So, to make it specific, this reduction in cost has made it possible to do a number of things:

  • it allows the manipulation of public opinion on a vast scale

  • it allows this from all over the globe to everywhere else

  • it makes it possible for single individuals to communicate broadcast wise with millions of recipients without any kind of filter

  • it allows the creation of echo chambers so vast that it seems as if the whole world is that chamber and has become representative of the truth

  • it levels the value of what used to be in print, which required the collusion of a large number of people against the word of an individual

  • it allows the people on both sides of an argument to duke it out directly

  • all this happens on a moments notice

If you look at the past, there are other examples of really bad cases of manipulation of public opinion. And those led to predictable and very bad consequences. Today we no longer need large amounts of capital to buy a printing press or a television satellite or radio transmitter, all it takes to wreak havoc worldwide and to put people up against each other is an internet connection.

In closing, I know Douglas Adams wrote fiction, but he also was a very smart cookie. Removing barriers is generally good, and should be welcomed. But we also should be aware that those barriers may have had positive sides and that as a species we are not very well positioned to deal with such immense changes in a very short time. We seem to need some time to react, time to grow some thicker skin lest we’re overly vulnerable and allow ourselves to be goaded into making big mistakes, such as accidentally empowering authoritarian regimes, which tend to be very capable when it comes to using communications systems for propaganda purposes.

Great power comes with great responsibility, the power to communicate with anybody instantaneously at zero cost is such a power.

Edit: HN User tarr11 linked this piece by DNA about the internet (some users report the link does not work but it works for me, strange).

Edit2: And HN User acabal points out that the fact that anonymity is so easy to come by is also an important factor.

No politics please, we're hackers, too busy to improve the world

If there is one thing that never ceases to amaze me it is that the hacker community tends to place itself outside and by their own perception above politics. This is evidenced in many ways including ‘safe spaces’ and moratoria on discussing anything political because it has no bearing on the more interesting bits of IT.

What bugs me about this is that anything you make or do has a political dimension, and that hackers, more than any other profession, create the tools and the means with which vast changes in the political landscape are effected. It’s as if arms dealers and manufacturers refuse to talk about war, the ultimate consequence of the tools they create in the environment where they will be used.

Both from an ethical viewpoint as well as from one related to personal responsibility this is simply wrong. The ability to influence with disproportional effect on the outcome of all kinds of political affairs compared to someone not active in IT, the ability to reach large numbers of people, the ability to pull on very long levers, far longer than you’d normally be able to achieve comes with some obligations.

Hackers, computer programmers and associated groups can not afford this Ostrich mentality, burying their head in the ground as to the consequences of their work as long as they can play with their shiny toys. Between ‘Wikileaks’ and ‘Cambridge Analytica’ it should be more than clear by now that computer programming as a trade has effects that are felt the world over, and that if you feel that you should be granted a safe, politics free space to discuss your trade then that probably should be limited to hobby programming only. As soon as you and your software hit the real world politics will rear its ugly head.

One of the best examples to me are the disconnect between Paul Graham’s (founder of Hacker News) tweet where he’s shocked there is a 16% chance of winning the presidency and the Hacker News Political Detox Week.

As if that was needed, HN has a tendency to try to squelch any political debate anyway.

Whether you’re working on some cool ad technology, a way for people to reach others with 140 character bursts of text, a way for people to connect to their class-mates, a way to make it easier for people to find information on the web or to collect all the news outlets of the world in one portal everything has a political dimension and sometimes that political dimension can overshadow all other aspects of the project. This translates into an obligation to engage the political angle of whatever it is that we collectively produce in order to minimize feelings of regret later on and to really help to make the world a better place, rather than just to pay lipservice to that concept.

You simply can not afford to stick your head in the ground and your fingers in your ears because you don’t like politics, if you’re not careful you may end up complaining about the end result of your own product. So if what just about every hacker is proud to claim is true (that they are ‘busy improving the world’) then you can’t afford to ignore politics any more than a manufacturer of weapons can afford to know nothing about armed conflict. Because whether you like it or not your work product will be used in ways you may not have thought about, and could even be used against you.

edit: predictably, this was posted to HN, equally predictably, it got flagged off the homepage by the Ostrich brigade because just talking about political responsibility is politics and we really can’t be exposed to that. The overvaluing of Silicon Valley Unicorns is still riding happily at #2.

How to Improve a Legacy Codebase

It happens at least once in the lifetime of every programmer, project manager or teamleader. You get handed a steaming pile of manure, if you’re lucky only a few million lines worth, the original programmers have long ago left for sunnier places and the documentation - if there is any to begin with - is hopelessly out of sync with what is presently keeping the company afloat.

Your job: get us out of this mess.

After your first instinctive response (run for the hills) has passed you start on the project knowing full well that the eyes of the company senior leadership are on you. Failure is not an option. And yet, by the looks of what you’ve been given failure is very much in the cards. So what to do?

I’ve been (un)fortunate enough to be in this situation several times and me and a small band of friends have found that it is a lucrative business to be able to take these steaming piles of misery and to turn them into healthy maintainable projects. Here are some of the tricks that we employ:

Backup

Before you start to do anything at all make a backup of everything that might be relevant. This to make sure that no information is lost that might be of crucial importance somewhere down the line. All it takes is a silly question that you can’t answer to eat up a day or more once the change has been made. Especially configuration data is susceptible to this kind of problem, it is usually not versioned and you’re lucky if it is taken along in the periodic back-up scheme. So better safe than sorry, copy everything to a very safe place and never ever touch that unless it is in read-only mode.

Important pre-requisite, make sure you have a build process and that it actually produces what runs in production

I totally missed this step on the assumption that it is obvious and likely already in place but many HN commenters pointed this out and they are absolutely right: step one is to make sure that you know what is running in production right now and that means that you need to be able to build a version of the software that is - if your platform works that way - byte-for-byte identical with the current production build. If you can’t find a way to achieve this then likely you will be in for some unpleasant surprises once you commit something to production. Make sure you test this to the best of your ability to make sure that you have all the pieces in place and then, after you’ve gained sufficient confidence that it will work move it to production. Be prepared to switch back immediately to whatever was running before and make sure that you log everything and anything that might come in handy during the - inevitable - post mortem.

Freeze the DB

If at all possible freeze the database schema until you are done with the first level of improvements, by the time you have a solid understanding of the codebase and the legacy code has been fully left behind you are ready to modify the database schema. Change it any earlier than that and you may have a real problem on your hand, now you’ve lost the ability to run an old and a new codebase side-by-side with the database as the steady foundation to build on. Keeping the DB totally unchanged allows you to compare the effect your new business logic code has compared to the old business logic code, if it all works as advertised there should be no differences.

Write your tests

Before you make any changes at all write as many end-to-end and integration tests as you can. Make sure these tests produce the right output and test any and all assumptions that you can come up with about how you think the old stuff works (be prepared for surprises here). These tests will have two important functions: they will help to clear up any misconceptions at a very early stage and they will function as guardrails once you start writing new code to replace old code.

Automate all your testing, if you’re already experienced with CI then use it and make sure your tests run fast enough to run the full set of tests after every commit.

Instrumentation and logging

If the old platform is still available for development add instrumentation. Do this in a completely new database table, add a simple counter for every event that you can think of and add a single function to increment these counters based on the name of the event. That way you can implement a time-stamped event log with a few extra lines of code and you’ll get a good idea of how many events of one kind lead to events of another kind. One example: User opens app, User closes app. If two events should result in some back-end calls those two counters should over the long term remain at a constant difference, the difference is the number of apps currently open. If you see many more app opens than app closes you know there has to be a way in which apps end (for instance a crash). For each and every event you’ll find there is some kind of relationship to other events, usually you will strive for constant relationships unless there is an obvious error somewhere in the system. You’ll aim to reduce those counters that indicate errors and you’ll aim to maximize counters further down in the chain to the level indicated by the counters at the beginning. (For instance: customers attempting to pay should result in an equal number of actual payments received).

This very simple trick turns every backend application into a bookkeeping system of sorts and just like with a real bookkeeping system the numbers have to match, as long as they don’t you have a problem somewhere.

This system will over time become invaluable in establishing the health of the system and will be a great companion next to the source code control system revision log where you can determine the point in time that a bug was introduced and what the effect was on the various counters.

I usually keep these counters at a 5 minute resolution (so 12 buckets for an hour), but if you have an application that generates fewer or more events then you might decide to change the interval at which new buckets are created. All counters share the same database table and so each counter is simply a column in that table.

Change only one thing at the time

Do not fall into the trap of improving both the maintainability of the code or the platform it runs on at the same time as adding new features or fixing bugs. This will cause you huge headaches because you now have to ask yourself every step of the way what the desired outcome is of an action and will invalidate some of the tests you made earlier.

Platform changes

If you’ve decided to migrate the application to another platform then do this first but keep everything else exactly the same. If you want you can add more documentation or tests, but no more than that, all business logic and interdependencies should remain as before.

Architecture changes

The next thing to tackle is to change the architecture of the application (if desired). At this point in time you are free to change the higher level structure of the code, usually by reducing the number of horizontal links between modules, and thus reducing the scope of the code active during any one interaction with the end-user. If the old code was monolithic in nature now would be a good time to make it more modular, break up large functions into smaller ones but leave names of variables and data-structures as they were.

HN user mannykannot points - rightfully - out that this is not always an option, if you’re particularly unlucky then you may have to dig in deep in order to be able to make any architecture changes. I agree with that and I should have included it here so hence this little update. What I would further like to add is if you do both do high level changes and low level changes at least try to limit them to one file or worst case one subsystem so that you limit the scope of your changes as much as possible. Otherwise you might have a very hard time debugging the change you just made.

Low level refactoring

By now you should have a very good understanding of what each module does and you are ready for the real work: refactoring the code to improve maintainability and to make the code ready for new functionality. This will likely be the part of the project that consumes the most time, document as you go, do not make changes to a module until you have thoroughly documented it and feel you understand it. Feel free to rename variables and functions as well as datastructures to improve clarity and consistency, add tests (also unit tests, if the situation warrants them).

Fix bugs

Now you’re ready to take on actual end-user visible changes, the first order of battle will be the long list of bugs that have accumulated over the years in the ticket queue. As usual, first confirm the problem still exists, write a test to that effect and then fix the bug, your CI and the end-to-end tests written should keep you safe from any mistakes you make due to a lack of understanding or some peripheral issue.

Database Upgrade

If required after all this is done and you are on a solid and maintainable codebase again you have the option to change the database schema or to replace the database with a different make/model altogether if that is what you had planned to do. All the work you’ve done up to this point will help to assist you in making that change in a responsible manner without any surprises, you can completely test the new DB with the new code and all the tests in place to make sure your migration goes off without a hitch.

Execute on the roadmap

Congratulations, you are out of the woods and are now ready to implement new functionality.

Do not ever even attempt a big-bang rewrite

A big-bang rewrite is the kind of project that is pretty much guaranteed to fail. For one, you are in uncharted territory to begin with so how would you even know what to build, for another, you are pushing all the problems to the very last day, the day just before you go ‘live’ with your new system. And that’s when you’ll fail, miserably. Business logic assumptions will turn out to be faulty, suddenly you’ll gain insight into why that old system did certain things the way it did and in general you’ll end up realizing that the guys that put the old system together weren’t maybe idiots after all. If you really do want to wreck the company (and your own reputation to boot) by all means, do a big-bang rewrite, but if you’re smart about it this is not even on the table as an option.

So, the alternative, work incrementally

To untangle one of these hairballs the quickest path to safety is to take any element of the code that you do understand (it could be a peripheral bit, but it might also be some core module) and try to incrementally improve it still within the old context. If the old build tools are no longer available you will have to use some tricks (see below) but at least try to leave as much of what is known to work alive while you start with your changes. That way as the codebase improves so does your understanding of what it actually does. A typical commit should be at most a couple of lines.

Release!

Every change along the way gets released into production, even if the changes are not end-user visible it is important to make the smallest possible steps because as long as you lack understanding of the system there is a fair chance that only the production environment will tell you there is a problem. If that problem arises right after you make a small change you will gain several advantages:

  • it will probably be trivial to figure out what went wrong
  • you will be in an excellent position to improve the process
  • and you should immediately update the documentation to show the new insights gained

Use proxies to your advantage

If you are doing web development praise the gods and insert a proxy between the end-users and the old system. Now you have per-url control over which requests go to the old system and which you will re-route to the new system allowing much easier and more granular control over what is run and who gets to see it. If your proxy is clever enough you could probably use it to send a percentage of the traffic to the new system for an individual URL until you are satisfied that things work the way they should. If your integration tests also connect to this interface it is even better.

Yes, but all this will take too much time!

Well, that depends on how you look at it. It’s true there is a bit of re-work involved in following these steps. But it does work, and any kind of optimization of this process makes the assumption that you know more about the system than you probably do. I’ve got a reputation to maintain and I really do not like negative surprises during work like this. With some luck the company is already on the skids, or maybe there is a real danger of messing things up for the customers. In a situation like that I prefer total control and an iron clad process over saving a couple of days or weeks if that imperils a good outcome. If you’re more into cowboy stuff - and your bosses agree - then maybe it would be acceptable to take more risk, but most companies would rather take the slightly slower but much more sure road to victory.