Friday, March 30, 2007

Rotating Backup Media and Other Tricks: FileSlinger™ Backup Reminder 03-30-07

Alas, I am in a tearing hurry this morning. I hate being in a hurry. I’m good with deadlines, but I’m not at all good at hurrying. Things tend to take longer when I hurry.

Despite the fact that I don’t have time to write a regular column, I do have several handy bits of backup-related news to share, so I’m going to direct you to some other sources of useful information.

The last time I talked about rotating backup media was in 2005, as part of the general discussion of tape backups. The most complex backup rotation schedule (at least of those I was surveying) is called the Tower of Hanoi.

A company called BackupCheckup has now created a web-based Tower of Hanoi Backup Scheduling Assistant. A friendly member of their staff pointed it out to me in an e-mail recently. Here’s what the website says about the service:
The Tower of Hanoi method of tape rotation provides for archiving an effective number of backups. It also gives the maximum ability to go back over time with a limited number of tape sets. The disadvantage is that it is one of the more complex rotation methods to keep track of.

Backup Checkup’s Tower of Hanoi scheduling assistant will keep track of which tapes to use next and will send you a friendly email reminder on the date the backup is to be run. This greatly simplifies the task of tracking which tapes to use. You can also come back here to determine which set of tapes to use when you need to restore data.
It’s a free service, so if you use removable media like tape, REV drives, or even flash drives and keep getting confused about which tape goes with what day’s backup, you might want to check it out.

To any backup vendors who might be reading this: I’m happy to investigate and mention your product if you approach me politely and it’s appropriate for my readership (small offices and home-based businesses). I’d much rather a direct approach than a quasi-spam posting in the comments, though I will let those comments stand if they refer to legitimate backup products.

For those of you who use external drives or online backup services and don’t need a rotation schedule, here are a couple of other things that have come down the pike recently.

Scott Hanselman accidentally deleted all his father’s e-mail and set up a family backup policy to prevent a recurrence. His account, including hand drawings of the new setup, is here.

Mac OSX Hints has an article with a script for making fast backups to an ISO image (which is the thing you need to burn a CD). The author of the script, “Eldino,” has an engaging article about why he created it and what it does, starting out with
I bring my iBook everywhere, and I store on its drive 80% of my important data, since I use it for studying, working, scripting, doing projects for university, listening to music, downloading stuff, browsing, printing nice articles found on the Web to PDFs, etc. Scared by the fact that hard drives fail when you least expect it—and trust me, they still fail if you (like me) love your hardware more than your girlfriend.
If you’re a Mac user and want to be able to back up your drive to an image, check out the article.

So that’s it for today. I hope you find something useful in at least one of those articles. Now it’s time for me to go earn a living.

Labels: ,

Friday, March 23, 2007

Where Do Those Statistics Come From? FileSlinger™ Backup Reminder 03-23-07

Last week I got an e-mail asking about sources for statistics on hard drive crashes:
I've seen all of the hard drive crash statistics around the internet, several of which you reference on your site, however I've yet to find the source for any of the most startling. The one I'm most interested in, and hoping you can help me find the source of, is this:

"1 in 5 computers suffers a fatal hard drive crash in its lifetime."

Do you know the source of this? Any similar statistics you do have sources for?
I couldn’t remember the source of that offhand, but I do try to cite sources when I know them, and the question is one worth asking.

Before you can accept the validity of statistics, it’s important to know where they come from. For instance, a couple of weeks ago I talked about the way manufacturers claim their drives have a “mean time to failure” of a million hours. Given that hard drives were only invented in 1956, no drive has actually lasted a million hours, so where do numbers like that come from?

From math of the sort I deliberately avoided, actually, and I won’t embarrass myself by trying to duplicate the equations here. You can read more about calculating Mean Time Between Failures (for things that can be repaired) and Mean Time To Failure (for things that can’t).

Bit-Tech.Net provides an explanation in English:
The MTBF is attained from running a large batch of drives, sometimes hundreds sometimes thousands, and measuring how often a drive fails. Given a batch of 2,000 drives, if one fails on average every 25 days, the drive would be given an MTBF of 1.2 million hours. Unfortunately, it isn’t quite as simple as that.

It’s usually the case that the tests are accelerated by altering the conditions, for example, increasing the temperature. The end result is highly dependent on these acceleration factors being accurate. The number of drives tested is also not standardised, so manufacturers are free to increase or decrease the amount of drives to attain the ideal MTBF rating.
Wikipedia adds that
Many manufacturers seem to exaggerate the MTBF of their products (e.g. Hard Drives), in order to either sell more product or sell for a higher price. A common way that this is done is to define the MTBF as counting only those failures that occur before the expected "wear-out" time of the device. Continuing with the example of hard drives, these devices have a definite wear-out mechanism as their spindle bearings wear down, perhaps limiting the life of the drive to five or ten years (say fifty to a hundred thousand hours). But the stated MTBF is often many hundreds of thousands of hours and only considers those other failures that occur before the expected wear-out of the spindle bearings.
But back to the question of where the scary statistics I sometimes quote come from. It’s usually storage manufacturers, backup service providers, or data recovery companies who commission studies about things like drive failure rates and the impact of data loss. The nature of the research may vary from something as informal as the poll on the home page of the FileSlinger™ Backup Blog (which has all of 68 respondents) to something as extensive as the recent Google study of more than 100,000 disks.

Here are some of the places I’ve found statistics about data loss and drive failures:
And here, finally, is a history of the hard drive from 1956-2006.

But I can’t seem to find a specific source for the “1 in 5” quote, though it’s repeated all over the internet. I’m pretty sure I got it from the Data Deposit Box site, but they don’t mention where they got it, even though they cite sources for some of their other statistics.

I will say that the figure sounds right to me, based on my encounters with computers over the years. I’m typing this on my seventh computer, and none has yet suffered a drive failure, but as two are still alive, that means they still might. The Ur-Guru runs through several hard disks every year. I’ve encountered system failures as often as drive failures among clients, but I can think of three clients and two colleagues whose hard drives were beyond repair without even having to concentrate.

Of course, the nature of research is such that anything one study can “prove,” another study can “disprove.” You just need the right population sample.

If you know the source of that startling statistic, let me know and I’ll publish it on the Backup Blog!

Labels:

Friday, March 16, 2007

Backing Up Applications: FileSlinger™ Backup Reminder 03-16-07

It’s always nice to get questions from readers. For one thing, it makes writing this newsletter go much faster.

Loyal Reader and Mac user Mike Van Horn wrote in response to last week’s article on restoring data with an example and a question:
We use Silver Keeper as our backup program. It does incremental backups and makes straightforward copies of the files. We use a couple of pocket-sized Hitachi TravelStar hard drives (40 gig?) to back up three Macs weekly, then take these home.

I've never had to restore data due to a disk death, but I do transfer data from my backup drive to my home computer, so I know how easy it is. I can move any set up file icons from one to the other, and place them where I want. Since it's USB, it's not the fastest, but I can move the entire backup across in 5 minutes.

I wish you'd say a bit more about backing up, then restoring, applications. Currently, I don't back up my applications. When I've had to re-install them, I use the CD, which has the appropriate codes. But the original CD has to be one place or the other, and if it were lost along with the computer on which it is installed (fire? theft?), how could one restore it?
The reason I don’t talk much about backing up software is that software, unlike data, is replaceable. But that doesn’t mean it’s never a good idea to back up your installation CDs, particularly for your most critical and/or expensive software, not to mention your operating system. (Do I have more than one copy of my $350 mind-mapping software? Heck, yes. And there are software licenses that go for thousands of dollars per “seat.”)

It's a good idea to make copies of the installation CDs (and the serial numbers) of the programs that matter to you. You can duplicate the CDs or just copy them onto an external drive. I'd recommend a copy for work, a copy for home, and a copy with wherever you store off-site backups. The same is true for programs you downloaded, at least if you had to pay for them. (With many free programs, like Firefox or AVG Free Anti-Virus, you’re better off downloading the newest version.)

If you have lots of software and need to install the same programs onto multiple machines, you might set up an entire shared drive just to host your applications. (The Ur-Guru calls this his “deploy share.”) But don’t back them up on your website or an online backup service. Your web host is likely to suspect that you’re in the software pirating business. (Remember, you're supposed to buy a separate copy or license of every program for each computer you plan to install it on.)

These days, the only way to get around reinstalling software after a system crash or drive failure is to make a complete drive image. Drive imaging software takes a “snapshot” of your whole system and stores it somewhere else. When it works, it means that you can skip the long, tedious process of reformatting your drive, installing the operating system, installing the software, copying the data onto the computer, and fussing around for weeks to get all your little system preferences re-set and custom fonts and icons installed.

In most cases, drive images have to be restored to the same computer they were made from, or an identical computer, in order to work properly. If you have an image of one drive and replaced the dead computer with a different one, you’re better off reinstalling the hardware and then retrieving the data, fonts, and so forth from the image through an “explore” feature.

The Ur-Guru uses drives images all the time, for both real and virtual systems, and has become a fan of True Image Workstation 9.1, but I haven’t had a chance to check it out, myself. If you’re interested in more details, maybe I can persuade him to write something.

Meanwhile, go forth and back up!

Labels:

Friday, March 09, 2007

Restoring Your Data: FileSlinger™ Backup Reminder 03-09-07

Last week a Loyal Reader asked if I could write something about restoring data from backups. This is about as important a topic as you could find, since backups won’t do you any good if you can’t restore from them.

“Restore,” in this case, means “getting your data back.” It’s one of the more logical bits of jargon in the computer industry, since the technological meaning is pretty close to the everyday meaning. Restoring data is a bit like restoring a painting: the idea is to put it back like it was and fix any damage. Ideally, however, restoring your data should be faster and less painstaking than taking the smoke and wax off the ceiling of the Sistine Chapel.

If your backups are straightforward copies of files, the restore process is as simple as dragging files from your backup media onto your hard drive. If the reason you need the backup copy is because you deleted the original, then the drag-and-drop should proceed without interruption. If the original file is damaged but still in its folder, you’ll get a dialog box asking whether you’re sure you want to replace the existing file with the one you’re copying. Just tell it “yes” and you should be all set.

It’s often possible to restore individual files from drive images by using an Explorer-like feature included in the backup program. I can start up Ghost Explorer, open a Norton Ghost backup image (this takes a while, as the drive images are both large and spanned across multiple files), then navigate through the file structure to the file I want, right-click, and restore it to its original location or a different one.

Mozy, the online backup service I use, integrates with Windows and shows up in Windows Explorer and “My Computer” as “Mozy Remote Backup.” I can right-click on any file there and restore it to the destination of my choice. This does not remove it from the Mozy backup. (In fact, the only way to remove anything from a Mozy backup is to redefine your backup set, as I discovered when reshuffling my folder structure left me with lots of duplicate files in the backup, and no space left in my quota.)

If it’s the whole drive you need to restore, you normally insert the boot CD or floppy, start the program you used to make the backup, and select an image to restore. This can take some time, depending on the size of the drive image and the type of backup media. If you have to restore from multiple CDs or DVDs, it’s likely to take a lot longer than if you’re restoring over a USB 2, FireWire, or network connection.

In working with Norton Ghost, I’ve found that restoring data is much faster than backing it up. I’m sure the Ur-Guru could tell you why, if you could actually get through to him. (The waves of rubberneckers drooling over his home office page have crashed the web server several times in the past weeks, earning the wrath of the hosting company.)

Because all backup programs are different, I can’t be much more specific than that on how to restore your data. I do want to emphasize the need to test your backups by restoring files, even if you don’t want to take the plunge and restore a drive image right after you’ve made it, to be sure it worked. (This is a bit scary.)

If you can’t restore your data from your backups, you’re faced with the much more expensive proposition of trying to recover it. And anyone who’s had to do that will tell you that it’s no fun at all.

Labels:

Friday, March 02, 2007

Shocking Truths About Drive Failures: FileSlinger ™ Backup Reminder 03-02-07

That’s not really a hard drive inside your computer. It’s a time bomb. And if it’s more than three years old, it’s ticking pretty loudly. And if it’s giving you scan errors, its chances of survival have just plunged.

Drive failure is something most computer users experience sooner or later. All hard drives fail eventually. In some cases, however, another component fails first, leaving you with a dead computer but a living drive. If you’re really lucky, you may find yourself replacing a computer not because it’s broken, but because it’s hopelessly obsolete.

I’ve been very lucky in the drive department, myself. Not that it’s any fun having, say, memory meltdowns (my 1993 Mac PowerBook 145B) or power-supply failures (my 2002 Enpower laptop), but those don’t necessarily destroy your data when they go. The damage to your bank account for repairs—or even replacement—is unlikely to approach the cost of data recovery.

Hard drive manufacturers like to claim that their drives have a mean time to failure (MTTF) of 1 million hours or more. That’s just over 114 years, which is completely preposterous.

Where do drive manufacturers get numbers like this? A drive that still works after ten years is possible, if not common. A drive that still works after a hundred years…would be useless anyway, at least if file sizes keep on increasing the way they have over the past few decades.

These MTTF claims lead you to wonder whether drive manufacturers have the same definition of “failure” that their customers do. The answer is “No.” Manufacturers define “failure” pretty rigorously. Consumers and businesses define failure as “We can’t use it for the purpose we bought it for.” Even if the fault isn’t in the drive, if the drive doesn’t play nicely with the rest of your system, it’s no good to you and you have to replace it with one that does.

One reason that manufacturer datasheets and user experience are so different is that the MTTF on the datasheet is based on artificial “accelerated aging” tests. Two recent studies, one from Google and one from Carnegie Mellon, demonstrate pretty conclusively that these tests do a lousy job of simulating the real effects of age on hard drives.

They also provide a few other dramatic surprises, at least to those familiar with the received wisdom about why drives fail. Everyone knows drives fail when they get too hot, right? Well…wrong. Google found “a clear trend showing that lower temperatures are associated with higher failure rates.” And they’re not talking about freezing temperatures, either.

Heavy usage just seems intuitively right as a reason for drive failure. After all, the more miles you drive your car, the more likely it is to break down. But according to Google’s survey of more than 100,000 disks, “After the first year, the AFR [annual failure rate] of high utilization drives is at most moderately higher than that of low utilization drives.”

Failure in the first year is known as “infant mortality.” Hard drives apparently have a pretty high infant mortality rate. Using a drive heavily seems to be the best way to find out whether it will die young—though that’s learning the hard way.

Drive failure rates were thought to follow a “bathtub curve”: fairly high in the first few months to a year (infant mortality), followed by a low but steady rate (the flat bottom of the bathtub) followed by a steep increase after a few years. Google’s results show a roughly 3% AFR at three months, dropping to 2% at one year—and jumping up to 8% at two years. Their graph doesn’t look much like a bathtub. And the Carnegie Mellon researchers “observed a continuous increase in replacement rates, starting as early as in the second year of operation.”

The Carnegie Mellon study also covers about 100,000 disks, most of these higher-end than the ones used by Google. (Google, it seems, buys cheap consumer disks, perhaps because the cost of replacement is lower.) These are the disks with the claimed MTTF of 1 million hours and corresponding alleged annual failure rate of less than 1%. The study found that the annual replacement rate of these various kinds of disks ranged from a low of about .5% to a high of 13.5%, with a weighted average of 3%.

And on top of that, drive failure in RAID systems seems to be contagious. I’ve forgotten most of what math I ever knew, but despite its somewhat dense language, the Carnegie Mellon study is clear enough: “the probability of seeing two drives in the cluster fail within one hour is four times larger under the real data, compared to the exponential distribution. The probability of seeing two drives in the cluster fail within the same 10 hours is two times larger under the real data, compared to the exponential distribution.” In this case, all you really need to know about “exponential distribution” is that it means “the way we expected it to be.”

The problem with statistics, like all generalizations, is that they’re no good at all when it comes to predicting individual behavior. The Google study shows that even a single scan error makes your drive 39 times more likely to fail within 60 days—but that’s not a guarantee. It could last longer, just as your car could continue to work just fine after 100,000 miles. Frankly, I’d put more faith in the car.

On the other hand, even the 8% failure rate Google found at 2 years is still pretty good odds. Certainly not as good as manufacturers’ data sheets would lead you to expect, but the hard drive on my PowerBook 145B still worked when I recycled it some ten years after buying it. The drive on my 1997 laptop functioned just fine in 2002, even though it was starting to feel a little cramped. None of the seven computers I’ve owned has suffered from drive failure.

But as far as I can tell from my own anecdotal experience, I’ve been unusually lucky. And it’s important to remember that hardware failure only causes a small fraction of the world’s data loss. There are system errors, software problems, natural disasters, theft—and the big one, human error. So even if failure rates several times higher than expected don’t scare you, there’s still no excuse for not backing up your data.

Labels: