Disclosure: this post contains Amazon affiliate links. I bet you’re smart enough to figure out which ones they are.
As we were driving back from Palm Desert to Pasadena earlier this month, the Ur-Guru pointed out a problem with the current craze for cloud computing. No, not the problem I’ve talked about before—slow broadband or no broadband at all making it impractical if not impossible to store or transfer large files online—though that problem is a long way from being solved. It’s the reason the “prosumer” photographers with their 10-megapixel cameras are all lining up to buy the DroboPro. The Ur-Guru will be going home with about 70 GB of photos from this trip alone. (The RAW files are sitting on my Maxtor Shared Storage II drive until he tells me it’s safe to delete them; he’ll have more than a few backup routines to run when he’s back in his world-famous home office.) You try uploading that over your home cable or DSL line. I dare you.
But in spite of the bandwidth bottleneck in getting our data into the cloud, more and more of us are either storing our documents and photos online or creating them there in the first place. It’s not just the online backup services, but the photo sharing sites (the Ur-Guru likes Flickr, whereas my mother prefers Picasa), the social networks, the blogs, and the file sharing and collaboration tools like Google Docs, Box.net, and Huddle Workspaces. And then there are all those Kindle books hanging out on Amazon’s Whispernet servers waiting to be downloaded—and maybe sucked back up again when Amazon finds out it’s made a mistake. (That last episode points out another problem with online storage: if your only copy of the data lives on someone else’s server, is it really yours?)
But there’s a down side all this online storage, as connected as it makes us all, and as useful as it can be to get our critical data off-site in case of theft, fire, or natural disaster. The data you back up online, the photos you post to Flickr, the PDF of your resume that you just dropped into Box.net so it would show up on your LinkedIn profile, even the website you can’t conduct your business without—all of those live on servers in data centers. This is the good news and the bad news. Your website and mail server have to be up and running 24/7, of course. That’s just as true if you run them out of your home office as if you hire a web hosting company.
But what about the rest of it? Sure, some of it is there because we want anyone to be able to see it at any time, whether we’re around to show it off or not. That means it has to be on a computer that’s using power. But what about the data that really is just backup? Or even reference or archival material? My colleague Eve Abbott says that 80% of paper that gets filed is never referred to again. (And after a year has passed, it goes up to 90%.) I’m pretty sure that statistic holds true for digital documents, too. It’s easier to search through our electronic detritus, but if you go back and look at all the articles and white papers you’ve saved over the years, I bet you’ll be amazed at how many of them there are—and how few you can remember anything about.
That means that not only don’t we need immediate access to a lot of what’s going onto those 24/7 RAID-whatever servers in the high-security data centers with the enormously high power bills, we probably don’t need it at all. Most backup software is configured for lazy people: it backs up everything, automatically, so you don’t have to think about or do anything. Anyone who has experienced data loss is likely to err on the side of caution anyway, so unless there are severe limitations on that online storage space and harsh financial penalties for overrunning the quota, very few people are going to sit down and triage the contents of their hard drives before sending their data up into the cloud—never mind afterward.
The Ur-Guru was all set to go look up statistics for the cost of operating a data center so he could figure out the math of exactly how much it really cost (and how much power it used) to store everything online. Me, I started avoiding math as soon as I left high school—once was enough to take calculus. Besides, I don’t think we need the actual figures in order to reach some sobering conclusions.
Data stored on tape, CD, DVD, or a disconnected hard drive requires no power at all. The storage media aren’t particularly green—and never have been—but they aren’t using up electricity. When I shut down my computer, my USB drives power off. The Rebit that I use to back up my netbook spends most of its time unplugged on the computer stand, because the netbook itself spends most of its time there, except when I’m traveling or out at a business meeting. There’s no need to have it running all the time. Likewise the four Western Digital drives that the Ur-Guru brings with him on his visits wait patiently, unpowered, for him to return with a full SD card and offload new photos.
Because these external drives are not spinning all the time, we can expect them to live much longer than the drives in the RAID boxes in data centers. Those are designed to be easy to replace without interrup
ting the machine’s function or causing any data loss, and it’s a good thing, too: constant use wears them out. They have to be replaced by the boxload. That means more toxic electronic components to dispose of than if you were backing up at home.
Is there a way to get the benefits of online backup and storage without increasing the need for power at a time when we’re all trying to save energy? Possibly. In most cases, it’s not really necessary for backup to be continuous or for restoration to be instantaneous. In theory, then, data could be stored on removable drives and handled in much the same way the automated tape libraries in large corporation handle tapes. Retrieving that data would take a short while, since the appropriate disk would have to be found and inserted into a drive bay, but restoration would still be faster than with tape. But to utilize this kind of storage, service providers would have to invest in entirely different facilities and infrastructure. FalconStor could see a real boom in its sales of Virtual Tape Libraries if the world of online storage decides to go green.
I don’t think you’ll see everyone and his brother rushing into that, however. Setting up a VTL requires serious capital outlay, much more than renting space in a data center. That cost would inevitably get passed on to the user. It would take some serious dedication to reducing your carbon footprint to choose an expensive service that doesn’t use power just to store your data over a low-cost—or even free—service that does. Especially when you could be equally “green” by just making your backups at home and storing the drives in your safe-deposit box.
The Ur-Guru thinks the real solution probably involves a storage medium that hasn’t been invented yet, but says that all conversions to alternative power sources and more efficient systems require an outlay of capital. If we take that as given and are willing to invest in some R & D, then it would probably be possible to reduce the world’s collective power bill for data substantially even before we invent that new storage medium:
My idea for a VTL was not about using disks as tapes and having robotic arms load them, but rather using entire arrays of disks and having a robotic arm move entire sections in and out while some Drobo-like software makes duplicate copies when needed, caches data for users who are using restore more often, or use some statistic saying which user might be restoring in the next x amount of time, etc. It’d be quite a cool thing to work out and make work but with logic applied you could still design it.
What I want to take out of the equation is “physical storage units allocated to x users” and turn the data into a dynamic movable chunk so the data for one or more users can “move around” in the storage units that make up an entire grid, offloading any disks that contain data that is hardly ever accessed or updated and basically powering them off.
Maybe instead of robotic loader arms some kind of electronic solution might also work where entire arrays can be powered down or run on low power, only to be powered up when needed (to move “offloaded” data to “cache”).
That sounds like just the sort of mathematically complicated problem that the Ur-Guru would enjoy and that I went into the humanities to get away from. But it’s an intriguing idea. Any enterprising entrepreneurs out there interested in making it work?