Friday, March 23, 2007

Where Do Those Statistics Come From? FileSlinger™ Backup Reminder 03-23-07

Last week I got an e-mail asking about sources for statistics on hard drive crashes:
I've seen all of the hard drive crash statistics around the internet, several of which you reference on your site, however I've yet to find the source for any of the most startling. The one I'm most interested in, and hoping you can help me find the source of, is this:

"1 in 5 computers suffers a fatal hard drive crash in its lifetime."

Do you know the source of this? Any similar statistics you do have sources for?
I couldn’t remember the source of that offhand, but I do try to cite sources when I know them, and the question is one worth asking.

Before you can accept the validity of statistics, it’s important to know where they come from. For instance, a couple of weeks ago I talked about the way manufacturers claim their drives have a “mean time to failure” of a million hours. Given that hard drives were only invented in 1956, no drive has actually lasted a million hours, so where do numbers like that come from?

From math of the sort I deliberately avoided, actually, and I won’t embarrass myself by trying to duplicate the equations here. You can read more about calculating Mean Time Between Failures (for things that can be repaired) and Mean Time To Failure (for things that can’t).

Bit-Tech.Net provides an explanation in English:
The MTBF is attained from running a large batch of drives, sometimes hundreds sometimes thousands, and measuring how often a drive fails. Given a batch of 2,000 drives, if one fails on average every 25 days, the drive would be given an MTBF of 1.2 million hours. Unfortunately, it isn’t quite as simple as that.

It’s usually the case that the tests are accelerated by altering the conditions, for example, increasing the temperature. The end result is highly dependent on these acceleration factors being accurate. The number of drives tested is also not standardised, so manufacturers are free to increase or decrease the amount of drives to attain the ideal MTBF rating.
Wikipedia adds that
Many manufacturers seem to exaggerate the MTBF of their products (e.g. Hard Drives), in order to either sell more product or sell for a higher price. A common way that this is done is to define the MTBF as counting only those failures that occur before the expected "wear-out" time of the device. Continuing with the example of hard drives, these devices have a definite wear-out mechanism as their spindle bearings wear down, perhaps limiting the life of the drive to five or ten years (say fifty to a hundred thousand hours). But the stated MTBF is often many hundreds of thousands of hours and only considers those other failures that occur before the expected wear-out of the spindle bearings.
But back to the question of where the scary statistics I sometimes quote come from. It’s usually storage manufacturers, backup service providers, or data recovery companies who commission studies about things like drive failure rates and the impact of data loss. The nature of the research may vary from something as informal as the poll on the home page of the FileSlinger™ Backup Blog (which has all of 68 respondents) to something as extensive as the recent Google study of more than 100,000 disks.

Here are some of the places I’ve found statistics about data loss and drive failures:
And here, finally, is a history of the hard drive from 1956-2006.

But I can’t seem to find a specific source for the “1 in 5” quote, though it’s repeated all over the internet. I’m pretty sure I got it from the Data Deposit Box site, but they don’t mention where they got it, even though they cite sources for some of their other statistics.

I will say that the figure sounds right to me, based on my encounters with computers over the years. I’m typing this on my seventh computer, and none has yet suffered a drive failure, but as two are still alive, that means they still might. The Ur-Guru runs through several hard disks every year. I’ve encountered system failures as often as drive failures among clients, but I can think of three clients and two colleagues whose hard drives were beyond repair without even having to concentrate.

Of course, the nature of research is such that anything one study can “prove,” another study can “disprove.” You just need the right population sample.

If you know the source of that startling statistic, let me know and I’ll publish it on the Backup Blog!

Labels:

Friday, March 02, 2007

Shocking Truths About Drive Failures: FileSlinger ™ Backup Reminder 03-02-07

That’s not really a hard drive inside your computer. It’s a time bomb. And if it’s more than three years old, it’s ticking pretty loudly. And if it’s giving you scan errors, its chances of survival have just plunged.

Drive failure is something most computer users experience sooner or later. All hard drives fail eventually. In some cases, however, another component fails first, leaving you with a dead computer but a living drive. If you’re really lucky, you may find yourself replacing a computer not because it’s broken, but because it’s hopelessly obsolete.

I’ve been very lucky in the drive department, myself. Not that it’s any fun having, say, memory meltdowns (my 1993 Mac PowerBook 145B) or power-supply failures (my 2002 Enpower laptop), but those don’t necessarily destroy your data when they go. The damage to your bank account for repairs—or even replacement—is unlikely to approach the cost of data recovery.

Hard drive manufacturers like to claim that their drives have a mean time to failure (MTTF) of 1 million hours or more. That’s just over 114 years, which is completely preposterous.

Where do drive manufacturers get numbers like this? A drive that still works after ten years is possible, if not common. A drive that still works after a hundred years…would be useless anyway, at least if file sizes keep on increasing the way they have over the past few decades.

These MTTF claims lead you to wonder whether drive manufacturers have the same definition of “failure” that their customers do. The answer is “No.” Manufacturers define “failure” pretty rigorously. Consumers and businesses define failure as “We can’t use it for the purpose we bought it for.” Even if the fault isn’t in the drive, if the drive doesn’t play nicely with the rest of your system, it’s no good to you and you have to replace it with one that does.

One reason that manufacturer datasheets and user experience are so different is that the MTTF on the datasheet is based on artificial “accelerated aging” tests. Two recent studies, one from Google and one from Carnegie Mellon, demonstrate pretty conclusively that these tests do a lousy job of simulating the real effects of age on hard drives.

They also provide a few other dramatic surprises, at least to those familiar with the received wisdom about why drives fail. Everyone knows drives fail when they get too hot, right? Well…wrong. Google found “a clear trend showing that lower temperatures are associated with higher failure rates.” And they’re not talking about freezing temperatures, either.

Heavy usage just seems intuitively right as a reason for drive failure. After all, the more miles you drive your car, the more likely it is to break down. But according to Google’s survey of more than 100,000 disks, “After the first year, the AFR [annual failure rate] of high utilization drives is at most moderately higher than that of low utilization drives.”

Failure in the first year is known as “infant mortality.” Hard drives apparently have a pretty high infant mortality rate. Using a drive heavily seems to be the best way to find out whether it will die young—though that’s learning the hard way.

Drive failure rates were thought to follow a “bathtub curve”: fairly high in the first few months to a year (infant mortality), followed by a low but steady rate (the flat bottom of the bathtub) followed by a steep increase after a few years. Google’s results show a roughly 3% AFR at three months, dropping to 2% at one year—and jumping up to 8% at two years. Their graph doesn’t look much like a bathtub. And the Carnegie Mellon researchers “observed a continuous increase in replacement rates, starting as early as in the second year of operation.”

The Carnegie Mellon study also covers about 100,000 disks, most of these higher-end than the ones used by Google. (Google, it seems, buys cheap consumer disks, perhaps because the cost of replacement is lower.) These are the disks with the claimed MTTF of 1 million hours and corresponding alleged annual failure rate of less than 1%. The study found that the annual replacement rate of these various kinds of disks ranged from a low of about .5% to a high of 13.5%, with a weighted average of 3%.

And on top of that, drive failure in RAID systems seems to be contagious. I’ve forgotten most of what math I ever knew, but despite its somewhat dense language, the Carnegie Mellon study is clear enough: “the probability of seeing two drives in the cluster fail within one hour is four times larger under the real data, compared to the exponential distribution. The probability of seeing two drives in the cluster fail within the same 10 hours is two times larger under the real data, compared to the exponential distribution.” In this case, all you really need to know about “exponential distribution” is that it means “the way we expected it to be.”

The problem with statistics, like all generalizations, is that they’re no good at all when it comes to predicting individual behavior. The Google study shows that even a single scan error makes your drive 39 times more likely to fail within 60 days—but that’s not a guarantee. It could last longer, just as your car could continue to work just fine after 100,000 miles. Frankly, I’d put more faith in the car.

On the other hand, even the 8% failure rate Google found at 2 years is still pretty good odds. Certainly not as good as manufacturers’ data sheets would lead you to expect, but the hard drive on my PowerBook 145B still worked when I recycled it some ten years after buying it. The drive on my 1997 laptop functioned just fine in 2002, even though it was starting to feel a little cramped. None of the seven computers I’ve owned has suffered from drive failure.

But as far as I can tell from my own anecdotal experience, I’ve been unusually lucky. And it’s important to remember that hardware failure only causes a small fraction of the world’s data loss. There are system errors, software problems, natural disasters, theft—and the big one, human error. So even if failure rates several times higher than expected don’t scare you, there’s still no excuse for not backing up your data.

Labels:

Friday, August 11, 2006

FileSlinger™ Backup Reminder 08-11-06: Survey Says...

There’s a poll on the home page of the FileSlinger™ Backup Blog (top of the right column) which asks “When was the last time you backed up your computer?” The possible answers are:
  1. Within the last week
  2. Within the last month
  3. Within the last year
  4. I’m supposed to back up my computer?
Of the 38 people who voted, 23, or 60.5%, say they’ve backed up within the last week. (They must be paying attention to the reminders.) Five, or 13.1%, answered “within the last month,” four (10.5%) within the last year, and six (15.7%) responded with “I’m supposed to back up my computer?”

Because the sample is so small, no statistician in the world would claim any kind of validity for those numbers. But a company called The Diffusion Group surveyed more than a thousand US households which have broadband internet and either already have or plan to get home networks and found that only 11.3% backed up every week, and the greatest number of respondents (37.8%) backed up only a couple of times per year. 16.4% never back up their data, and 6.6% don’t know how often (or whether) they back up.

Those are not encouraging figures, though they parallel other studies on the subject of backup. Most people don’t back up nearly often enough. The only good reason not to back up more than a couple of times of year would be that your data doesn’t change more than a couple of times a year, and it’s difficult to imagine a computer user who could claim that.

In fact, according to The Diffusion Group, the average home generated 322 GB of data in 2005, and they expect that number to grow to 1933 GB by 2010. This boggles my mind, since I don’t have 300 GB on all my hard drives and computers combined, but then, I’m not recording digital video and I don’t even take that many photographs. It’s actually not that hard to imagine creating that much data in a year, particularly for doting parents capturing everything their offspring do on the HandyCam.

Of those households that do back up, 88% use optical media (CDs or DVDs), 33% use external hard drives, 6% use an online backup service, and 17% back up over the network, either to a server or a Network Attached Storage device. (Yes, I know that adds up to more than 100%. Clearly some of the respondents make more than one kind of backup.)

This explosion in data generation leads TDG to conclude that “consumers will need a secure, non-PC-based platform on which to store the vast amounts of personal digital data created by these devices—a single storage platform that is networked and can share resources with both fixed and mobile PC and CE devices.” I have to say I find the new network-capable XHDs very attractive, and my brother is building a DIY NAS device to back up his photos of my niece and nephew.

Most existing options require some technical expertise on the part of the user, though not necessarily more than is needed to set up the home network in the first place. TDG sensibly emphasizes “storage equipment that is easy to use and fits into the normal cycle of daily PC and CE usage.” The most common reasons I’ve heard people give for not backing up are “It’s too complicated,” “It takes too much time,” and “I just forget,” so “easy” and “automated” are both important if the numbers of those who back up are going to improve.

Source: The Diffusion Group
Press Release
Analysis and Forecast
(The actual report costs $1495, so I haven’t read it.)

Labels:

Friday, June 16, 2006

FileSlinger™ Backup Reminder 06-16-06: You Can Lead a Horse to Water...

As I write this, the Ur-Guru is running TrueImage on his laptop. The new version looks pretty good; I’m going to install it and test it next week.

Meanwhile, I want to report some shocking statistics about backups. Maybe they shouldn’t shock me, because they’re not so different from last year’s statistics, but my jaw still drops when I read these things.

Hard-drive manufacturer Maxtor (the creaters of Backup Awareness Month) sponsored a survey by Harris Interactive. The results: 46% of the respondents don’t make backups, even though 43% of them have lost data to viruses, system crashes, and drive failures, and 55% consider their data worth at least $1000.

I haven’t seen the survey, so I don’t know whether the people who don’t back up are also the people who value their data highly and/or the people who have lost data before, but the E-commerce Times article I read certainly implies it.

It doesn’t surprise me that much that average computer users don’t back up. If everyone backed up regularly, I wouldn’t have started writing these reminders. What shocks me is that people who have lost data still don’t back up.

Iron Mountain, which recently purchased LiveVault, conducted a survey of laptop users. An astonishing 64% of laptop owners whose machines have been stolen still don’t make daily backups. If you’re carrying all your data around with you, daily is probably better than weekly, because the machine could get dropped or spilled on as easily as stolen.

And the effect of data loss on these laptop users? Eighteen percent of them didn’t get back up and running for weeks or months. Another 64% were out of action for several days. That’s money down the drain for anyone in business, particularly if your laptop is your main machine.

Storage vendors are finding disk-based backup systems, virtual tape libraries, and continuous data protection a hard sell when it comes to small and medium businesses. In addition to the usual resistance to change, there’s the cost of replacing existing tape systems, and/or the monthly fees for CDP and other online backup systems.

Yet expense can’t be the only issue for consumers and home-office users. There are more online backup providers every day. They offer “set it and forget it” solutions for $10/month—or less. External hard drives cost very little per gigabyte, and many come with backup software to automate the process. Somehow, stubbornly, people persist in not backing up.

Naturally, storage vendors and data protection companies have a vested interest in getting more people to back up, so they’re working hard to raise awareness and educate people about what constitutes a real backup. (An extra copy of a file on the same disk as the original is not a real backup.) They’re also trying to make backup simpler to use, so that even technophobes have no excuse. Yet there doesn’t seem to be much impact so far: Maxtor’s 2004 backup survey statistics were almost exactly the same as those for 2005.

I try to educate people about their options for backup, but all I can really do is remind people who are already making backups that it’s time for another one. If losing data isn’t enough to convince a person to back up, nothing I can say is going to make a difference.

I would like to know, though: if you don’t back up, what’s your excuse? What excuses have you heard your colleagues, clients, friends, or family make for not backing up their computers? I’m willing to bet there’s a counter for all of them, but it’s just barely possible that somewhere in the universe someone actually has a valid excuse for not backing up.

Labels: ,

Friday, July 08, 2005

FileSlinger™ Backup Reminder 7-8-05: Why Back Up?

I asked one of my readers what I should write about today, and she suggested a reminder about why it is we need to make backups. As it happens, I collected a number of statistics for my presentation to the National Youth Leadership Forum on Technology last week, so I have them handy—and they are indeed sobering.

Every 15 seconds, another hard drive crashes.

And when I say "crash," here, I don’t mean those cases where your machine mysteriously freezes up or you get a Blue Screen of Death but your computer is fine when you reboot it. I mean "crash" in the literal sense, where one of the moving parts of your drive has crashed into another moving part and is chewing its way through the disk and sending your data to permanent oblivion. All those virtual bits and bytes still depend on a physical medium to hold them.

You can usually tell when your computer has had this kind of crash, because the drive starts making strange noises. (You can hear some of them on the Data Recovery Net website.) This is akin to the horrible noises that your car starts making if something goes seriously wrong with the engine, though you may not actually see smoke coming out of your PC tower.

1 in 5 computers suffers a fatal hard drive crash during its lifetime.

Given that the lifetime of an ordinary computer is five years at the outside, that means this will almost certainly happen to one of your own computers over the course of your lifetime.

The other four computers will die of something else, like the power supply problem that put paid to my previous laptop.

The overall average failure rate of disk and tape drives is 100%—all drives eventually fail.

Not only that, hard drives have a much higher failure rate than they used to. And even if it’s not the drive that goes, you’re going to need to be able to move your data from one computer to another when your current machine stops working or becomes so obsolete that you might as well not have a computer.

60% of companies that lose their data close down within 6 months of the disaster.

These days it’s all but impossible to run a business without a computer. Even the local laundry keeps its client records and invoices on a computer, and there are very few hairdressers, bodyworkers, housecleaners, or home care workers who don’t rely on the internet to find and communicate with clients. Even those few who don’t own computers rely on people who do, like accountants and designers.

Recreating data from scratch is estimated to cost between $2000 and $8000 per MB.

Simple drive recovery can cost upwards of $7,500 and success is not guaranteed.

Those two figures go a long way toward explaining why so many companies that lose their data shut down. Replacing hardware costs money, but not as much as replacing or retrieving data. If you’re a one-person, one-computer business, the drive recovery attempt may run you in the hundreds rather than the thousands (and some companies won’t charge if they don’t get your data back), but it’s still going to cost you a lot more than an external hard drive will. And the time cost for you or your employees to re-create and re-enter your data makes the costs for data recovery (which start at around $200/hr) look trivial.

40% of Small and Medium-Sized Businesses don't back up their data at all.

Given the figures I’ve quoted above, this is roughly akin to playing Russian Roulette with only one empty chamber. Somewhere along the line, you’re going to suffer a severe hardware failure. The only thing that prevents hardware failure from turning into data loss is backups—regular, frequent backups that you check for validity and label clearly.

Remember: your business data is your responsibility. Back up early. Back up often.

Labels:

Friday, July 01, 2005

FileSlinger™ Backup Reminder 7-1-05: Backup Insights from Teens

For ten days every year, three thousand teenagers from around the country descend on downtown San Jose for the National Youth Leadership Forum on Technology, where they get to visit high-tech companies and universities, hear keynote speeches from industry leaders, and “work to develop solutions to real-world dilemmas.” (I asked whether any of the solutions they came up with were ever applied; apparently not.) Not to mention taking advantage of the free (and of course totally unsecured) wireless in the halls of the San Jose Fairmont and eating (shades of school cafeterias) chicken a la king in the Regency Ballroom. These are very bright kids, some of them already developing software or working as consultants while finishing high school.

Yesterday I went to talk to them about backups. It’s a little intimidating to speak at a technology forum to a group of people who are guaranteed to know more about computers than I do, but I realized, as I was preparing my presentation, that backups and data loss are not really a technology problem.

Oh, yes, the statistics and surveys claim that between 44% and 78% of data loss is due to hardware failure, and anyone who has used computers over any period of time knows that hardware fails constantly. Hardware failure is more likely than human error (like hitting the delete key at the wrong time) to trash your files, and much more likely than actual disasters such as floods, fires, and PG&E. But if you have current, valid backups, you won’t lose data when your hard drive fails. You just have to replace your equipment.

According to Lasso Logic, 80% of small businesses don’t have adequate backups. They may be defining “adequate” by comparison with their Continuous Data Protection service, but it wouldn’t surprise me that much if 80% of home-based businesses didn’t have any backups. As for home users and students, the percentage without backups may well be higher.

In preparation for yesterday’s presentation, I asked parents (not, I confess, a statistically valid sampling) about the backup habits of high school students. Their responses were almost all along the lines of “Our teenagers never back up” or “She backs up her homework to the school network, but that’s it.”

I also posted a question to the NYLF online community asking what they would lose if their computers committed suicide or their school burned down. Tech Forum participants and alumni had an interesting variety of answers, ranging from making a point of never storing data on their hard drive at all to using their iPods for backups to having RAID arrays, domain controllers, and tape drives. One of them pointed out in his response that “this group of Tech attendees probably backs up more than any typical student. It just isn’t as easy as it should be, and more than that, people (students and adults) don’t think about it until it’s too late.”

Curiously, none of the 16 people who posted answers to my question on the community bulletin board was in the group of 36 I faced across the high-ceilinged conference room in the San Jose Fairmont yesterday afternoon. When I asked them why they thought people didn’t make backups, their answers were consistent with the replies of their peers.

“It takes time.”

“External drives cost money.”

“General ignorance.”

After going through some examples of different approaches to raising awareness about the need for backups (and selling backup solutions), I had them break up into small groups to brainstorm answers to the following questions:

1. What backup solution do you think is best for teens/students? How would you market it to them?

2. What would a backup product have to do or have to get people your age to use it?

3. If there was a law or school policy requiring backups in order to use computers, how would you enforce and implement it?

4. What kind of backup solution would work for your grandparents, and how would you sell it to them?

5. What do you think is the best way to keep off-site backups secure?

6. How can we raise awareness about backups? Can you think of techniques that would succeed where Maxtor’s Backup Awareness Month failed?

For both teens and grandparents (and many of them did have grandparents with computers), they agreed that a backup solution would have to be inexpensive, simple, user-friendly, and preferably automatic. Here are some of their suggestions:

  • Make your backup device small and colorful—easy to carry, easy to personalize.
  • Integrate backups with other computer activities, for instance, backing up the computer every time you download music.
  • Design the backup software so it translates your data into music while copying it to the backup drive.
  • Use pop stars in ad campaigns.
  • Use subliminal messaging to implant the urge to back up. (Illegal, but tempting.)
  • Stage midnight raids or random inspections to see whether people are backing up.
  • Have your corporate or school network detect whether people are backing up and impose fines or take away user privileges for people who don’t have backups.
  • Wipe all data off the system every week: if you don’t backup, you lose your work. People would get the idea after the first time.
  • When marketing backups to grandparents, focus on saving family memories and compare backups to health insurance for your computer.
The thing that struck me as I thought these ideas over later was the emphasis on making backups fun. “Fun” and “backing up” are not terms that usually appear in the same sentence. One of the respondents to my bulletin-board posting described her process of backing up to both a network drive and a USB key as “tedious and more like a chore than anything.”

Drives that look like sushi or rubber duckies or tiki statues are one aspect of this, but that only gets to the problem of a device being fun to own. Making a backup program fun to use, now, that’s a challenge worth taking up. Can we translate backing up into a video game? How would you go about converting data streams to music? Would we have a different genre for each type of file? Is there a backup equivalent of ring tones for cell phones?

Given a little time, I bet these kids could figure it out.

Labels: ,