« Innovation in the broadcast television bands | Main | Radio Inks pocket HD Radio supports graphics, live pause »

Monday, 13 December 2010


Stephen Hill


The rhetoric of this post betrays some of the differences between traditional broadcast infrastructure and the new era of web-based infrastructure.

Web services are built on hardware and network infrastructure, but operate entirely on software. Even highly standardized software like the Apache web server has hundreds of variables in setup and operation which increase entropy and decrease reliability. When you add custom software to create any kind of practical web service, the variables (and therefore the possible bugs) multiply exponentially.

I'm not a CTO or even close, but as a small web music service provider, we have been forced to grapple with the Inescapable Truths of Online Reliability, which go something like this:

1. Reliability is inversely proportional to complexity in a hardware/software system.

1A. The larger the number of users and/or more functionally sophisticated the site, the more complex the hardware/software system must be....therefore the less reliable.

2. Reliability can be bought at a premium by adding additional servers, load balancing, "hot spares" and redundant functionality. However:

2A. Each increase in real or virtual (cloud) hardware and software makes the overall system more challenging to manage. More servers are also more attractive to attack and increase security issues unless the right preemptive steps are taken to defend them.

3. You can buy "five nines" of uptime (= 5 min/year of downtime) for a big premium, but it can never be 100% guaranteed. Each .9 increase in reliability will be roughly 5 to 10x more costly. Besides, all a guaranteed Service Level Agreement really gets you is a better attitude from the vendor and a credit when things inevitably fuck up.

4. The growth curve of the most highly visible and successful Internet sites (like Twitter) makes the problem of scaling infrastructure under load 100x more difficult.

5. It makes more sense to plan for minimizing recovery time after an outage, not preventing them completely.

6. Except for the goal of 100% reliability, broadcast infrastructure management practices are largely irrelevant. The valid comparision would be to an entire broadcast network, not a single station. Broadcast infrastructure remains at a relatively fixed size once operational regardless of the number of listeners/viewers, and can be optimized over time. Digital network infrastructure has to "scale" and change constantly over time to support millions of users and is much more difficult to optimize and manage.

Considering the above, it is quite a remarkable achievement that some sites, like Google, Amazon, Flickr, Yahoo and Facebook, are as day-to-day reliable as they are. Twitter is a particularly troubled example of a site that has had difficulty keeping up with its growth.

BOTTOM LINE ON WEB RELIABILITY: Easy to say -- very, very hard to do.

The comments to this entry are closed.


Bookmark and Share

March 2018

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31