Last night I missed a great opportunity. I could have asked Google co-founder Larry Page about what kind of backup system Google, Inc. uses, but I forgot, so now I’m going to have to find out by another means.
You may not know this (I didn’t), but in order to provide its indexing services, Google has to download the entire World Wide Web. Most of us, as individuals or business owners, only need to worry about backing up our own websites. Google has to back up everyone else’s sites, too. (Which is, of course, why pages can exist in the Google cache when they’ve been taken down from the Web, but Yours Obliviously never thought of that before.)
What was I doing at the Googleplex last night, anyway? Attending a University of Michigan Alumni Networking Night, actually. It turns out that Larry Page was at U-M during the same years I was, but since he was an undergrad in Engineering and I was a grad student in Classical Studies, our paths never crossed. While Larry was building programmable plotters out of Lego™ and attempting to convince the administration that they should replace the diesel buses between North and Central Campus with a monorail, I was working as a teaching assistant, translating and producing Greek and Roman plays, and attempting to write a doctoral dissertation. (I never did finish it, but then, Larry never finished his PhD at Stanford, either. On the other hand, I don’t have a billion-dollar IPO to show for my non-doctorate.)
Even without asking the crucial question, however, I can say several things about Google in relation to backups. The first is that without Google Alerts, I would long ago have run out of things to write in this newsletter and post on the Backup Blog—at least, without spending all of my time doing research instead of working. (Google, of course, has changed the nature of research.) I signed up for a daily alert on the subject of “backup,” and every day I get a collection of news headlines. About a third of them relate to sports, a third to traffic jams (a type of backup I did indeed encounter on my way down to Mountain View), and the other third to computer backups, with occasional references to something else. It was through Google that I heard about the Bank of America scandal, for instance, and many other topics that have appeared here over the last several months.
Then there’s Google’s new joint venture with Michigan: scanning 7 million books from the U-M libraries, the better to make them searchable. Naturally Larry didn’t get anywhere remotely near revealing any proprietary technology that might be involved in that project, but he did assure us that it was a non-destructive process and the books themselves would remain intact. Given that many of those books are rare or out of print (or soon might be, given the limited print runs of most academic books), the project acts as a massive library backup system. (See Google Print for sample results.)
Which brings me back to my original question: how in heck do they back all that up? I promise to let you know as soon as I find out.