Jacques Mattheij

Technology, Coding and Business

The Fastest Blog In The World

I positively hate bloat in all its forms. Take this BBC News Article, it’s 2300 bytes but it loads 1.2 million bytes of data. That’s more than a megabyte for what probably should not be more than several tens of kilobytes. (edit: this used the google homepage as an example before which was a poor choice because the google homepage does a lot under the hood that is not visible to the user, though personally I actually liked the really simple old page.)

Bloat to me exemplifies the wastefulness of our nature, consuming more than we should of the resources that are available to us. A typical blog post on most blogging platforms will (even if the blog post itself is just a few kilobytes of text) load an easy megabyte. The words themselves are usually less than a few kilobytes even for the largest posts. Imagine an envelope for a letter that weighed a couple of pounds for a 1 gram letter!

So, when the time came to finally attack the issue of slow re-generation of these pages when I was using ‘octopress’ I decided to not only upgrade the blogging engine (to ‘hugo’, a lightning fast static site generator that is very easy to install), but also to strip the blog of anything and everything that did not matter without impacting functionality. The blog had to look exactly like it did before, work exactly like it worked before and it had to work on both regular browsers and mobile platforms.

Mobile matters a lot these days and I think that when a large chunk of your readers sits on metered bandwidth you can do them an easy favor by making sure that they don’t download more than they have to, it saves them both money and time.

This took a bit of doing, but I’m pretty happy with the end result, the ratio of data pushed to the user for a single page is 20:1 for old versus new, the ratio of wrapper:content is now 5:1, before it was a whopping 100:1! This particular article is about 5000 bytes in its original un-rendered form, the server has transferred about 13000 bytes to your computer fully ‘wrapped’ in HTML, with CSS and so on. That’s about 3:1, which isn’t all that bad. (You can verify this yourself using firefox by pressing shift-ctrl-Q and then reloading the page, that’s a pretty useful tool in determining what gets sent to load a page.)

The steps I took to get rid of the bloat are:

  • inlined the few images that are still left

  • inlined the stylesheet (there is a cache penalty here so you have to trim it down as much as possible but the page starts rendering immediately which is a huge gain at the cost of a little bit of extra data transferred, all the measurement tools I’ve used seem to agree on this)

  • got rid of most of the CSS rules that weren’t used

  • got rid of allmost all javascript (jquery, various plug-ins, analytics)

  • got rid of external fonts (the slightly nicer look is not worth the extra download and delay)

  • replaced twitter plug ins for ‘latest tweets’ and ‘twitter button’ with static content

  • reduced the number of resources loaded from the server to render the page to 1 (the page itself)

The end result is pretty lean-and-mean. And all of that change barely affected the look or functionality of the site for a user, the difference is really minimal. So on all pages that do not contain images (and that’s most of them) the page is one single request. That’s it. No css, no javascripts, no fonts, no images loaded from the server. The pages load < 20 kilobytes from the server on average (compressed), they load in under 150 milliseconds from start to finish and they render in less than 200 milliseconds. Clicking around on the pages in this blog should be instantaneous and should never result in having to wait for the next page to load, it should look to your eye as if you already had the page in your local cache. The embedding of the stylesheet in particular was a good move, it dramatically reduced the time required to render the page because it doesn’t require the loading of an extra resource before the rendering engine can fire up. The overhead from sending the CSS data multiple times when multiple pages are loaded is definitely not fantastic but by pruning down the CSS that overhead by itself was reduced by a factor of 4 or so.

I’m sure I can do better still, for instance the CSS block is still quite large (too many rules, not minified yet) but it can be quite hard to figure out which rules can be lost and which are essential (nice idea for a browser plug in, take the css loaded by the page and remove all unused rules). Even so, the difference between what was there before (600+K, 18 requests) and what is there now (< 20K, 1 request) is so large that any further improvements are unlikely to move the needle. Optimizing a thing like this is likely a bad investment in time but it is hard to stop doing a thing like this if you’re enjoying it and I really liked the feeling of seeing the numbers improve and the wait time go down. This is a nice example of ‘premature optimization’ but I do hope that the users of the blog like the end result.

If you know of a blog or have one that loads faster than this one or uses tricks I’m not aware of I’d like to hear about it!

Divide And Conquer the most powerful concept in programming

Divide and Conquer is a name given to a group of algorithms that take a problem and then solves it recursively (recursion is a programming concept where a program uses a simpler version of itself, a bit like those Matroshka dolls). For instance, there is a very elegant algorithm that sorts lists of elements like this named Quicksort.

But that’s definitely not the only interpretation of those words. Historically they came from ‘divide and rule’, the concept of getting the locals to fight with each other which allowed a common enemy to rule them. United we stand and divided we fall…

The reason why I think that divide and conquer is the most powerful strategy in programming is that it can be applied on a higher level: while writing complex programs. If you are faced with a task that is too difficult to solve in one go you can apply the divide and conquer method stepwise until the problem you have to solve is a trivial one. Then you back up one level and re-apply until all the sub-problems are solved and this in turn solves the ‘too complicated to solve’ problem that you had in the first place.

Even more powerful: this applies to all problem solving, even outside programming. Let me give you an example: Need to remodel a house? Too complex to oversee in one go and for sure if you just start without a plan you’ll get stuck and/or depressed and you’ll burn down your house. No need for that! Break it down and it all gets a whole lot easier:

  • infrastructure
    • electricity
    • gas lines
    • water lines
    • heating
  • rooms
    • kitchen
      • do ceiling
      • prepare walls
      • tile floor
      • place cabinets
      • place countertops
    • bathroom
      • do ceiling
        • place wooden lattice
        • screw giprock
        • mudding and sanding
        • paint
      • prepare walls
        • tile walls
        • tile floor
        • place shower base
        • place shower doors
    • bedroom1
    • bedroom2
    • livingroom
    • attic
    • basement
I’ve detailed a few entries further to show that you don’t need to stop at any particular level, you can subdivide as far as you want. This is an extremely handy trick, it allows an individual to solve almost any problem given enough time. It applies equally to repairing cars, troubleshooting electronics learning and so on. There isn’t a task that I can think of where ‘divide and conquer’ does not make that task easier and it always surprises me when people either don’t know about it or do not know how to put it into practice. And if you do the dividing job before you start the actual work your divisions automatically become the plan which makes it usually a lot easier to spot what could go wrong and/or what the proper order is to do things in. For instance: in the example above it is now painfully obvious why you should do the floor before you place the cabinets in the kitchen which can save you costly time and money as well for re-work.

Obviously there are some classes of problem where the strategy will fail. If you wish to lift Mount Everest by your lonesome then you’re not going to get very far even if you manage to somehow break it up into ‘two roughly equal halves’. But that’s because the task itself is beyond the reach of any single person or even humanity. But for most problems we are faced with on a day-to-day basis the rule holds that you can usually break them up into simpler or smaller problems.

So the next time when faced with something that you can’t do, either in computer programming or simply in the real world: Divide and conquer! Break the problem into two roughly equal pieces if at all possible, then try to solve those and if they are still too big repeat the process until at least some portion of the larger problem becomes tractable. Even slicing one small portion of the problem makes the remainder easier to solve.

The 'No True Programmer' Fallacy

Computer programmers are a weird bunch. They go around telling the rest of the world who can and can’t be a programmer. There are even studies on the subject to prove that there are ‘two kinds of people, those that can program and those that can never learn it no matter how much effort they put into it’.

But that’s total nonsense. It’s like saying that there are only two kinds of people when it comes to swimming, those that can learn how to swim and those that can’t. It aggravates me because I spend quite a bit of time passing on my skills to those that wish to learn how to program. Sure, it’s not equally easy for everybody and it’s not going to turn someone into a virtuoso programmer overnight any more than that you can learn how to play the piano overnight. And when it comes to the piano there are plenty of people that are able to play piano even though they’re not the next Arthur Rubinstein or Ivan Ilic.

Of course there are exceptions, and I’m sure there are a couple of people in the world that could not learn how to program if their life depended on it, just like there are people that really can’t learn how to swim. But that really is a very tiny minority, the majority of the people out there can learn how to program to some degree. For one, this all revolves around what it actually means to program a computer. After all if we’re going to decide to be inclusive or exclusive we should really concentrate on what it is that we’re trying to achieve here, for me it’s simple: if you know how to tell a computer how to do something to create a result that it would not come up by itself then you’re a programmer in my book. Computers are brain amplifiers, and knowing how to use them gives you a leg up on those that don’t which is why it pays off for those that have the skill to pretend that it is something special that you can’t simply go out and learn.

If you say someone isn’t a real programmer then you’re falling right into the No True Scotsman fallacy, where say making a spreadsheet is ‘not true programming’ because ‘No True Programmer’ would use a spreadsheet to solve a problem. But in my eyes being able to use a spreadsheet is already one step up the ladder and it does make you a programmer. Now you can choose whether to get better at it or not but you’re already off the ground and flying. Plenty of businesses use spreadsheets for business critical stuff, and it’s not rare at all to hear from the users of these that it’s the only way they can get through their day, they’d much rather have ‘a real programmer’ do the work for them but the IT department is too busy to attend to their needs. And so they fixed it, all by themselves.

The IT world is rife with this ‘no true programmer’ nonsense, the ‘real’ programmers are the ones that have mastered ‘x’ (insert name of arcane and difficult to use programming language here), the rest is still stuck on ‘y’ (insert accessible and easy to use language here). The latter of course aren’t real programmers, if they were then they would get it.

Programming is not a binary skill, not something that you either have or you have no hope of achieving it. Being able to program ‘just a little bit’ is better than nothing at all, just like being able to swim ‘just a little bit’ might be enough to save your life one day. Making a spreadsheet or a mathematica notebook is very closely related to the branch of computer programming we call ‘functional programming’, the fact that there is some magic under the hood is no different from someone writing a program in say ‘python’ or ‘perl’ or some other interpreted language where a ‘real’ programmer uses some other program to let better programmers translate their wishes and desires into a stream of instructions the computer is better able to comprehend. And let’s not kid ourselves, very very few people are programmers at the level of a Linus Torvalds, Peter Norvig or Fabrice Bellard. Imagine them telling you that you’re hopeless and you’ll never really get it so you might as well give up now.

And it’s not as if programmers themselves aren’t really aware of the fact that ‘programming’ is not a binary ability, there are plenty of contests for programmers to show off their art and by whatever objective measures they use they seem to underscore that not everybody is equally talented. So why cut off those at the bottom entirely? That’s just an attempt at creating an ivory tower, a way to say ‘hey, I’m better than you because I can do ‘x’ and you can’t’. Imagine athletes getting together and saying: there’s nothing I can do for these wanna-be runners, they either can run or they can’t and if they can’t they can never learn it. Imagine piano teachers telling the majority of their customers ‘you’re hopeless, you’ll never get it’.

When it comes to programming the biggest differentiator in whether or not someone will be able to learn it at a level where they can hold down a job seems to be more rooted in passion than in anything else. When teaching someone how to program if there is an interest and a will then more likely than not they’ll get to some degree of proficiency, very few people get stuck at rock bottom and are unable to grasp any of the concepts. At some level even writing instructions for others is a way to learn how to program and usually the fact that a person says ‘x can’t learn how to program’ says more about the teacher than about ‘x’.

So, programmers: get off your high horse. Our skill is not very special and almost everybody can learn how to do it to some degree. I’m willing to take a bet on this: anybody that wants to program but for some reason has been told they can’t is welcome to start a correspondence course with me for $0 and I’ll do my very best to help you along the way to the point where the coin drops and gravity takes over. After all it’s mostly a problem of insight rather than one of inborn native ability and if you can write a shopping list, write instructions on how to make coffee in your house when you’re not there yourself or if you can make a (simple!) spreadsheet then you already know a little bit about how to program. The most important factors in whether or not someone will end up being a bad, good or great programmer are experience, passion and dedication. Combined those can overcome quite a bit of lack in the talent department.

For the record: I took three ‘cracks’ at learning how to program, and only the third time did that coin drop, this was in total over a three year period in bloody ‘BASIC’ and all that time I really wanted to learn how to program but I just simply didn’t get it. I definitely wasn’t a ‘real programmer’ by the definition of some and for sure they would have given up on teaching me if it had been a face-to-face affair. They might even have been able to make me give up. But then one day it clicked, I recall vividly how suddenly I understood what a variable was and then an array. And from there it went quickly but the initial learning for some reason was extremely hard for me. We’re quick to forget how hard our initial mastery of a subject is and of course it is nice to belong to some elite group. But talking people down and telling them what they can’t do is really bad form, instead we should be lifting those that wish to acquire the skill up.

Your Head As A Battleground, Dueling Memes

There’s a war going on and your head has been designated the battleground. For every battle there are simple objectives: switch your loyalty from one brand to another, make you vote for a certain group or person, join a religion, a certain school or pursue a career etc.

Many such battles are waged with simultaneous campaigns. The weapons used are such as images, videos, text, print, music, television programming, product placement and many other strategies. Psychologists are recruited in order to tune these media elements to push your emotional buttons harder and longer. The idea behind this war is that you have a limited amount of attention that you can freely spend, a kind of mental equivalent to ‘expendable income’. And every minute of your attention and brain cycles that you spend on something else than furthering the cause of the purveyors of the current campaign is - in their eyes - an opportunity lost to make money or to recruit you into their ranks.

This results in our lives being utterly dominated by all kinds of advertising and messaging, the more direct and targeted the more effective because of the increased ability to elicit an emotional response which will likely result in swaying our judgment one way or another. Armies of creative people, art directors, computer programmers, focus groups and psychologists are the officers setting the stage for the battle going on in your head. And then there are the footsoldiers, friends and family as well as colleagues and acquintances, basically anybody you come into contact with who have been successfully conscripted now furthering the cause. And it’s not just overt advertising either: plenty of it is advertising masquerading as something else. People posting on some forum while in the pay of some entity, guiding the discussion in a direction that benefits their employers, so called shills. And as soon as they’ve successfully derailed the discussion in their preferred direction they’ll fade into the background letting the regular users do the rest of their work for them. Fake product reviews on e-commerce sites. So called ‘advertorials’ where the content looks as though it is a normal article but actually it is a message that someone paid to get out there. Or maybe it is a video that gets you to forward it to all your friends and then a few days later you find out that it really was the lead-in to some kind of campaign.

Imagine political opponents that are trying to get you to vote for them, or two brands of toothpaste, sugared beverage, cigarettes or bottled water that want you to switch your allegiance to their side. A constant saturation bombardment is called for, after all if you’re not seeing or thinking about a message that involves the one party you might be seeing a message by the other and whoever gets to tell their story the most effective and the most frequently tends to win these battles. It’s called Mind Share, and it’s the yardstick by which the effect of the war is measured.

The battle for mind share is an old one. For instance, if you look at religion from the point of view of dueling memes then you’ll notice that quite a few religions have a whole array of tricks of the trade to make sure that they - and not some other religion - win the battle for your brain. Youthful indoctrination is one (‘get them while they’re young’), an element of exclusivity (‘the one true god’), a viral component (‘spreading the word’) and so on. The same goes for propaganda during actual wars, demonizing the enemy and other tactics that you can trace to ancient Rome and even further back. Modern advertisers have taken all these lessons to heart and have added a whole raft of their own using our increased technological abilities and our further insights into how the mind works.

With the advent of the internet and its most popular applications (email, www) the battle has moved into higher gear. An ever larger amount of these applications is dedicated to getting you to become affiliated with some party or other or to spend your money in a certain way. The arms race is absolutely incredible, both in ferocity as well as in the amount of money spent. 500+ $US Billion spent on advertising alone. There were some forms of advertising that I could deal with: display advertising in print, a small and to the point textual ad on a webpage, that sort of thing. But the volume and intensity have been cranked up to 11 and it is getting harder and harder to see the wood for the trees.

Here is a great video about this subject (thanks HN user elisee This Video Will Make You Angry

I’ve decided to ‘opt-out’ of this war entirely. I refuse to let my head be used by brands, politicians and others because there isn’t enough time in the day for all the stuff I’d like to do to begin with. So any time that I devote to consuming advertising is going to come out of my private enjoyment budget. That’s a pity because I do understand that advertising pays for a quite a few services and content online that would otherwise not be available for free. But from now on, if I want a product or a service (or even a religion) then I’ll go out and research trying very hard to spot either second hand advertising (friends, family and colleagues repeating advertising messages just as infected disease carriers spread pathogens) and advertising masquerading as regular communications between people, such as paid reviews, product placement in movies and other media as well as people that you ‘accidentally’ overhear in elevators. It’s getting harder by the day to really shield yourself from purposeful messaging in order to obtain mind-share. But I’m optimistic that by cutting out the major offenders my life will be more quiet and that my decisions will - hopefully - be more objective.

And by making this page I’ve sent yet another ‘meme’ into the world but I hope it acts more as a vaccine than another pathogen.

After all, if all the parties in a war are pushing you to ‘join their side’ then they are trying very hard to make you forget that there is another side: your side, non-participation in the war, the ability to opt-out and to refuse to become an unpaid foot-soldier for any one party.

Evercookies in the wild, Kia, Mazda, German & Polish Newspapers, Piracy Honeypots and more

Of all the privacy violating tracking methods on the web there are two that are particularly bad, the first one is called ‘evercookies’ for being particularly hard to get rid of, the other is called Browser Fingerprinting and is impossible to detect.

Evercookies are to regular browser cookies just as superglue is to cellotape. Evercookies work by storing cookies on your computer using a large number of different techniques and upon refresh re-creating all of the cookies if you have tried to delete them. An evercookie sent to your browser has a very high likelihood of remaining on your system until you re-install it from scratch. On a hunch, while doing the 1,000,000 homepage crawl I decided to see if anybody actually uses evercookies for their tracking rather than to just appreciate it as a neat proof-of-concept. The trigger for the evercookie detector is particularly naive, it simply checks for the presence of the word ‘evercookie’ in the URL since that’s the name of the script as originally distributed. I expected to see some activity, particularly in the lower regions of the toplist where less well known websites unconcerned with their public image and brand name damage when found out might engage in this practice.

Indeed, I found a bunch of those. But that’s not all that I found, I also found them on a couple of sites with high visibility and big brand names!

There are two main groups of evercookie using sites: those that have the evercookies on their main domain and those that have them indirectly by choosing the wrong partners to trust with their user’s browsers. Such rogue inclusions by service providers were exactly the kind of abuse I had in mind when I started the 1,000,000 homepage project in the first place, and evercookies are a particularly good example of the kind of nastiness website users can be exposed to when external resources are included in an otherwise innocent looking webpage from a party that they trust.

It is very well possible that the companies mentioned here that include the evercookies only indirectly are simply not aware of the fact that they have opened up their visitors to this danger.

Besides the expected marketeers, porn sites, scammers and advertisers (who ever expected those to have something in common?) there are also some more surprising entries.

For example: car manufacturers KIA and Mazda have evercookies on their Russian corporate website, included indirectly from one of their service providers (A company called ‘exebid.ru’ for KIA and ‘facetz.net’ for Mazda). So even though the two car manufacturers do not engage in the practice themselves such big and well-known brands have absolutely no business to be seen near such technology, let alone using it even if indirectly on their websites. When you include javascript components or iframes on your website from third parties you are resonsible to your end-user for whatever those third parties end up serving to your customers.

Particularly disturbing is that I found hard evidence of indirect evercookies on the sites of many Polish newspapers including gazetawroclawska.pl, dzinniklodzki.pl (indirectly, through a site called ‘Gratka.pl’) and finally served up directly from the websites of German Newspapers allgemeine-zeitung.de and www.echo-online.de.

What reason newspapers have to attempt to track their visitors for ever is beyond me and especially in the case of the German newspapers this is quite possibly illegal. Contrary to EU law no warning or agreement was asked before these practically un-deletable cookies were placed on my computer, they - unlike the car brands and the Polish newspapers - don’t have any fig leaf to hide behind because the evercookies were served directly from their main domains.

One would expect newspapers to be at the forefront of the protection of the privacy of their users, not working as hard as possible to erode that privacy.

Finally, I think I’ve found evidence that moneyplatform.biz and all the associated domains are a filesharing honeypot. I can’t think of any other reason for having evercookies on a filesharing site unless the goal is to build up a case against uploaders/downloaders of pirated content.

The topmost offending domains that use evercookies (that I detected, there could very well be more for instance the script could be renamed) are here:

Domains using evercookies directly:

Domain: Evercookie Url: Kind: Comments:
paidviewpoint.com paidviewpoint.com... Marketeers You really have to love the 'about' page on this one, pretend they are really hot on privacy.
k2s.cc k2s.cc/ext/ev... filesharing I guess they want you to share just a little bit more about you
fboom.me static2.fboom.me... filesharing safe, secure and 4ever
keep2share.cc keep2share.cc... filesharing
profittask.com profittask.com... Russian scam mturk clone
echo-online.de www.echo-online.de... German newspaper
deccoria.pl deccoria.pl/j... Polish online pinboard
allgemeine-zeitung.de www.allgemeine-zeitung.de... German newspaper
nbamania.com nbamania.com/... Chinese sports site
moneyplatform.biz static1.moneyplatform.biz... Parent of keep2s.cc, k2s.cc, keep2share.cc and fileboom.me 404's, but on other sites still there. Filesharing sites with evercookies, seems like they're either a filesharing honeypot or an accident waiting to happen
uduba.com uduba.com/lib... Russian meme site
grabo.bg grabo.bg/ever... Bulgarian groupon clone
existenz.se existenz.se/e... Swedish link sharing page
pornme.pm www.pornme.pm... German porn site Porn sites uncareful with users privacy, what could possibly go wrong
ptrack1.com www.ptrack1.com... Survey fill-out farm
keep2s.cc keep2s.cc/ext... another keep2share url
person.com person.com/ec... adult chat site

Domains using evercookies indirectly:

Domain: Evercookie Url: Kind:
mazda.ru front.facetz.net… car manufacturer
urbangroup.ru front.exebid.ru… Russian real estate site
expressilustrowany.pl statystyki.gratka.pl… Polish Newspaper
gazetawroclawska.pl statystyki.gratka.pl… Polish Newspaper
dzienniklodzki.pl statystyki.gratka.pl… Polish Newspaper
kia.ru front.exebid.ru… car manufacturer
trial-sport.ru front.facetz.net… Russian sports site

The Polish media sites seem to be linked to ‘gratka.pl’ tags included on other websites, which in turn pull in the evercookie javascript, and this in turn may have something to do with the fact that the evercookie has been created by Polish hacker Samy Kamkar, who did the world a pretty good service by pointing out this dangerous possibility.

The code for the crawler and analysis is up at github.com/jacquesmattheij/remoteresources.