Friday, February 23, 2007

NASty Surprises: FileSlinger™ Backup Reminder 02-23-07

Those of you who have been reading for a while may remember that Seagate gave me a Maxtor Shared Storage II drive a few months back, and very excited I was about it, too. It’s still working perfectly. In fact, I haven’t turned it off once since I installed it. It does a good job making scheduled backups of my computer and my housemate’s, and I’ve had good luck using it for storing Ghost backups on when I make those (which I confess is less often than it probably should be, right now).

Not everyone has had such a good experience with the Shared Storage II, however. About ten days ago I got a message from someone who’d found the FileSlinger™ Backup Blog while searching for a solution to the hair-raising problems he was having. I’ll be quoting some of the ensuing discussion in a minute, but I’ll say now that the upshot of it is that Network Attached Storage is not right for all kinds of backups.

I cut some parts of the messages for brevity and edited here and there for typos, then ran the article past both the unhappy drive owner and the Ur-Guru to make sure I hadn’t misrepresented anything.

On 2/10/07, PT wrote:
I (optimistically, it turns out) purchased two Maxtor Shared Storage II (1Tb NAS) units, but they seem I-N-C-R-E-D-I-B-L-Y S---L---O---W, to the point of being impractical (well—actually, virtually unusable).

There's a long story behind it; I've had one of them replaced THREE TIMES (yes, that's THREE [3] times!!) because of drive failure [within a period of weeks]. I am extremely disappointed with the product (and its virtual lack of documentation, or useful software). At any rate, the point of telling you all this (which—believe me!—is the very SHORT version) is to let you know that I have, in fact, been working diligently to try to figure out how to make these work.

The problem is that it is taking WEEKS (of 24-hour-a-day file copying) to get 1Tb of data copied to the drive, or (perhaps even worse yet—don't know how to benchmark speed, and neither does {evidently} the senior tech) from one of these MSSII drives to the other. This is seemingly orders of magnitude slower than USB2.

I'm running this from a dual-processor, fairly recent (Gateway 836GM) machine. Neither the processor nor the network usage is anywhere near strained (typically ~5%, and 0.5% [gigabit-speed network, Netgear switch], respectively).

Copying more than one file at once sometimes (not always) makes throughput even worse. The Seagate tech says something about the processor on the MSSII being slow—but jeez, this is craziness.

Somehow, this just doesn't seem right.
It didn’t seem right to me, either, so I wrote back to say so. Then, since I had no relevant experience, I passed the question on to the Ur-Guru. He, too, was puzzled, but he had more insight than I did:
That's not how those things are supposed to work. I'd say try the drive on a completely different system if you have a chance to do so, in order to isolate whether this is a continuous problem with the drive or related to the system. Another option is to hook the device directly to the machine and cut out the Netgear switch to see if perhaps that is causing trouble.

Config of the NIC and/or switch could also perhaps be an issue, like, maybe the switch and NIC are set to use Jumbo frames and perhaps the NAS doesn't like that (don't know, I'd have to look it up, I have no clue about the MSSII in that area).

Based on the description I'm seriously inclined to think it is definitely network related, not MSSII related.
PT promised to check out these suggestions and get back with us. When I saw Friday looming this week, I thought I’d ask him how he’d made out. Here’s what he said:
Initially, I reset my network switch and everything went very fast for a short while. I think this was just coincidence, though. Later, I spent an HOUR on the phone with NetGear to see if we could isolate a network problem. Bottom line is: for single-file-at-a-time huge files (we tried >300Mb), it's pretty dang fast. But for realistic use—in my case, tens of thousands of small [ca. 2k] files—the MSSII just dead-dog slow. Smaller drives could be slow with less consequence; but huge drives have GOT to be fast!
When he read this, a light bulb went on for the Ur-Guru. (Which is good, because it didn’t illuminate anything for me.)
Ohhhhhh....

Yes, that would explain a lot. Tons of tiny files are murder on any network load and disk performance. But even if you widen the network to 10Gbit, it'd still perform quite badly.

Lots of my source code trees are riddled with tons of small files and the sync/copy over the network got so slow as the amount increased that I wrote a script that would first zip all the trees and then copy the zip to the other side (zip in this case since it is faster in packing and the size of the resulting file is less of an issue). I don't remember any statistics I ran but... 400,000 small 4K-50K files were zipped into an approx. 500MB file and that goes over the network in a few seconds. Copying those 400,000 small files, however, takes about 22 minutes. Huge difference there. And we're talking very fast RAID systems in my case, not a drive with a slow processor like the MSSII.

The tons of small files explains the problem he's seeing quite well. The ethernet packet wrapping overhead for that is more than the file data combined so it really starts to slow things down, and unless you have some serious heavy hitter RAID on the other end of the network, it'll be slow as hell. And even with heavy hitter RAID on the other end, copying 1GB in 100.000 small files is easily 100 times slower than copying a 1GB file as a single file (very rough number here, non-scientific). It's also murder on disks in terms of seek time and fragmentation and... just about everything. :-)
So there you have it: network backups are not good for huge numbers of small files. I passed the bad news on to PT, who said:
I guess my expectations were just too high—and the disappointment (other than the multiple returns of defective drives!) was primarily due to my lack of experience with large amounts of data being transmitted over networks. (I've had several PCs networked for years now, but I guess I've only transferred large amounts of data via USB2/Firewire-type drives—and, until recently, I only had a 10/100 system anyway (so the network would max out before the drives did anyway for transfers of large files.)

The idea of zipping the files pre- (and post-) transfer makes a lot of sense. Not sure that I'll be able to be as fancy as writing a routine to do it for me, but I'll definitely use the (manual) technique to circumvent some of the problems I've had.
I pointed out that even without a custom routine it might be possible to get the zipping automated, as there are various backup programs that will zip the files before copying them to their destination.

As for the multiple defective drives…I wonder whether they burned out trying to handle those many files? It seems like resoundingly bad luck to have. Maybe I got the only reliable drive in the batch. I have to admit I’d hesitate to buy one of these after hearing that story, even though I’ve had no problems whatsoever with my own Shared Storage II.

In any case, it’s clear that size matters when it comes to network backups. The Ur-Guru thought everyone knew that, but I didn’t, and neither did PT. And I’m betting we’re not the only ones.

Labels:

Friday, February 16, 2007

Backup School—Coming to a City Near You

Storage Decisions is sponsoring a free "Backup School" seminar with Curtis Preston (of whom I've never heard, but that doesn't mean anything, as most backup specialists are not household names, even in geeky households like mine). The tour starts on the West Coast in March (San Diego, San Francisco, Seattle), then moves to the South in April (Atlanta, Charlotte), the Midwest in August (Chicago, Columbus, Toronto), and the East Coast in October (Washington DC, Philadelphia, Boston, New York). Okay, yes, I know, Toronto is not in the Midwest, or at least not as the US defines it. But it's a much nicer place to have an event than Detroit is.

Topics include backup configuration blunders, a vendor showcase, and encryption. The event is from 8 AM until 4 PM. It's sponsored by EMC, Symantec, Syncsort, CommVault, Copan, Daymark, and Atempo, which should give you an idea of what kind of solutions they'll be discussing. (That is, mostly not the home office kind.)

Hat tip to Anil Gupta for pointing this out on his blog.

Labels:

There Are No Guarantees: FileSlinger™ Backup Reminder 02-16-07

This week’s InfoWorld GripeLine addresses the lack of warranties from online backup providers. A careful consumer went out to compare the license agreements from several services, including Mozy, which I’ve been using for a while, and Iron Mountain, which now owns LiveVault of the John Cleese Backup Trauma Institute videos.

The results of this investigation prompted a long letter to GripeLine and a column by Ed Foster. So what do these license agreements say?
All five [services] disclaimed the warranty of merchantability, fitness for purpose, and all damages beyond price paid. All disavowed that the product had to actually function at all except Iron Mountain, which in its warranty promises to at least try to fix bugs, and only Iron Mountain doesn't say the product is sold ‘as is.’ Mozy, Iron Mountain, and SOS disavow damages even for negligence on their part. SOS and Xdrive disclaim responsibility for making sure their product is virus-free, and SOS also refuses to be responsible for any duty of good faith or reasonable care.
Naturally, the InfoWorld reader was a bit hot and bothered about this. While most companies of any kind disclaim as many kinds of responsibility and liability as their lawyers can think of, somehow it hurts more when the company’s job is protecting your data. Many people think of backups as a kind of insurance, there to save you in case of emergency.

Of course, anyone who’s ever had to file an insurance claim knows just how much turns out not to be covered and how long it can take to get the money—which will not necessarily put things back the way they were. My auto insurance company paid up very promptly after the accident last fall, and even gave me more than I thought the car was worth—but nowhere in my policy did it promise that I’ll be able to find the same car with the same mileage and a clean title for that money. And auto insurance costs a lot more than most online backup services.

Never mind the fact that there are lawyers whose job it is to investigate claims and tell insurance companies whether they really have to pay.

What’s more, if you look at offline backups, the warranties aren’t much better. If you buy an external hard drive, the manufacturer’s liability is limited to replacing the drive if the hardware fails within the warranty period—anywhere from six months to three years. After that, you can’t even get the data-free new drive you’d get if it crashes sooner rather than later. Data recovery is your own problem, and something that you have to undertake at your own expense.

If you back up onto CD and the CD rots, the best you’re going to be able to get is a blank CD to replace it. If you scratch the CD yourself, forget it. And if the CD gets damaged because your optical drive malfunctions, then you either get a new drive and a blank CD, if your machine is still under warranty, or you have to take the machine to a shop and pay for any repairs yourself.

As for backup software, here’s what the license agreement for Symantec Ghost says:
Symantec warrants that the media on which the Software is distributed will be free from defects for a period of thirty (30) days from the date of delivery of the Software to You. Your sole remedy in the event of a breach of this warranty will be that Symantec will, at its option, replace any defective media returned to Symantec within the warranty period or refund the money You paid for the Software. Symantec does not warrant that the Software will meet Your requirements or that operation of the Software will be uninterrupted or that the Software will be error-free.

TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW AND REGARDLESS OF WHETHER ANY REMEDY SET FORTH HEREIN FAILS OF ITS ESSENTIAL PURPOSE, IN NO EVENT WILL SYMANTEC BE LIABLE TO YOU FOR ANY SPECIAL, CONSEQUENTIAL, INDIRECT, OR SIMILAR DAMAGES, INCLUDING ANY LOST PROFITS OR LOST DATA ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE EVEN IF SYMANTEC HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

IN NO CASE SHALL SYMANTEC’S LIABILITY EXCEED THE PURCHASE PRICE FOR THE SOFTWARE. The disclaimers and limitations set forth above will apply regardless of whether or not You accept the Software.
That sort of policy certainly isn’t going to encourage online backup providers to offer customers anything better. Only if one company starts providing a guarantee and steals everyone else’s business by doing so will anything change. And in order to steal anyone else’s business, they’ll have to advertise their guarantee prominently, because almost no one bothers to read license agreements. Why should we, when all they tell us is that the other party refuses to be liable for any problems?

The moral of the story is that your data is your responsibility. And if that data is important to you for either business or personal reasons, you’d better make multiple copies of it and store them in different locations. It’s not very likely that all of your backup systems will fail at once (as long as the data you’re backing up isn’t corrupted in some way). Do your best to minimize your risks, but recognize that in backups, as in life, there really are no guarantees.

Labels:

Friday, February 09, 2007

Downloads, DRM, & Backups: FileSlinger™ Backup Reminder 02-09-07

I haven’t written about backing up downloads before, because it’s not something I do much of, myself. (Well, okay—I do keep copies of program install files.) After all, if I downloaded a file once, I can download it again, as long as I know where to get it. And anyway, if the file is, say, a PDF of a white paper about backups (I’ve lost count of the number of those I’ve downloaded over the years), it ends up in my IT info folder and gets backed up with the rest of my data. Programs I’ve downloaded and not yet installed, like podcasts I’ve downloaded and not yet listened to, get backed up when I make a drive image, or as part of the automated backups to my shared storage drive.

But why worry about losing something that you download from the Internet? Unlike files you create yourself, you always download them again—can’t you?

You can if you’re like me and what you download is freely available. Backing up things that you’ve paid for the privilege of downloading is something else again. Welcome to the wonderful world of Digital Rights Management (“DRM” for short).

Even I am not living under a rock to the extent that I don’t know that the music industry frowns on bootleg copies, and Hollywood doesn’t appreciate them, either. In fact, most people who make money by selling their intellectual property aren’t too keen on having someone else take their stuff and give it away. And when people take our stuff and sell it, we get really burned up.

Book publishers have the security of knowing that it usually costs more to photocopy a book than to buy one, but copying electronic files is normally fast, easy, and cheap. This results in attempts to copy-proof digital media. These attempts frequently backfire, creating serious PR problems for the media company. (Anyone remember the Sony rootkit scandal?)

Worse yet, neither proprietary file formats nor any other kind of DRM will deter a determined professional cracker for long. Illegal copies of Windows Vista have been available for weeks. At best, DRM might keep out the amateurs. At worst, it punishes the very people who have done the right thing and paid for the files.

My only personal encounter with the problem came shortly after I bought an iriver IFP-795 to record presentations with. I decided to test it out as a listening device, too, so I transferred a couple of podcasts onto it with the iriver Music Manager. Later I wanted to move them onto the SanDisk Sansa 230 I normally use for listening, but found I couldn’t. The iriver Music Manager refuses to let you copy files from the player onto the computer. It doesn’t matter whether there’s any DRM on the actual files. The software assumes that any audio format besides its own .REC files is copyrighted, copy-protected material, and therefore it can only be transferred in one direction: from the computer to the player.

Even though this was a minor inconvenience (I just downloaded the MP3 again and put it on the Sansa, which sensibly acts just like a memory stick and lets you move things onto and off it without complaining), I was seriously annoyed.

This is the kind of thing that has led to the creation of third-party programs like the iBack iPod backup tool, which is designed to let you copy files from your iPod to any computer, and GetData’s Recover My iPod, which rescues iPod owners from the loss of their iTunes collection when they reinstall their computers and from assorted firmware problems with the iPod itself.

iTunes itself will let you back up to a CD or DVD. Just go to File | Back Up to Disc. You have the option of backing up the whole library and your playlists, or just your iTunes store purchases. The initial dialog box contains the helpful tip “To restore from a backup disc, open iTunes and insert the disc.” That sounded simple enough.

I decided to try it out, and discovered that I had more in my iTunes library than I’d realized. To expedite this experiment, I cleaned out the files I didn’t need so that everything would fit on one CD. It still took a surprisingly long time to burn the CD, but the result is a perfectly normal data CD with the different MP3 files stored in folders according to artist name. Copying individual files back onto the computer from the CD was no problem.

The author of the Hacking Netflix blog points out that most download services, including Wal-Mart, Amazon, and iTunes, will not let you download your movies again if the files are lost or damaged. Since you have proof that you did pay for the files, this seems pretty outrageous at first blush. On the other hand, I don’t suppose the local movie store would give me another copy of a DVD just because I showed them a receipt and said I’d lost or broken the first one, either. And I’d have trouble proving that I hadn’t made a copy of it.

As long as your download service lets you make backup copies, your downloads are as safe as the rest of your data—assuming, of course, that you go ahead and make those backups. Avoid any service that doesn’t provide you with an option to back up the files, and any files that you can’t copy back and forth to your computer.

Labels:

More Blogger Backup Tips

The Google Operating System blog posted some tips about backing up Blogspot blogs which use the new (no longer beta) Blogger. While they're not of much use to me for this blog, which I publish by FTP, they might help some readers out there.

Labels:

Friday, February 02, 2007

The CrashPlan Social Backup Club: FileSlinger™ Backup Reminder 02-02-07

Rafe Needleman of Webware came up with a great headline for his post about CrashPlan: “Back Up Your Mom.” For me, at least, this phrase conjures images out of science fiction novels where people keep clones of themselves in case of injury. Now and then I’m sure a spare mom would be useful for all of us, but of course that’s not what he meant.

CrashPlan (www.crashplan.com) is a new service for Mac and PC (with the Linux version soon to come) that’s designed to let people back up each other’s computers. They also provide more traditional online backups, but they emphasize the value in knowing where your backup files are.

With CrashPlan, I could choose to back up my computer to the Ur-Guru’s zillion-terabyte storage and rest secure in the knowledge that my data would be very secure indeed, at least from hackers—but unless I did the initial backup while visiting his world famous home office, it would be just as slow a process as any other online backup. Slow enough, if I wanted to back up my whole machine, that it would be faster to fly to Holland to do it on site.

But if you had a great deal of storage you weren’t using and wanted to act as the backup hub for your friends and family, CrashPlan would let you do it. And they can do the same for you, provided the bandwidth and storage space is there. If you have a small office network, each machine’s critical data can be backed up on every other machine, so that nothing is lost if one of them goes down. (This is less efficient than just using a file server, but perhaps also less vulnerable, and certainly less expensive.)

Naturally, if you’re choosing your own locations for stored data, you’re going to keep your backups with people you can trust, but that doesn’t mean you can’t or shouldn’t encrypt them. Creating a password is one of the first things CrashPlan asks you to do after you install it.

Once you create your account, CrashPlan starts analyzing your files and presents you with three backup options: CrashPlan Central, “My Friend,” and “Another Computer.” You can’t just add friends automatically: you have to invite them. The invitation template reminds me a bit of LinkedIn and other social networks:
I am using a great program called CrashPlan that automatically backs up my files to a friend’s computer. Let’s backup to each other!
  • It’s easy to use—only takes a few minutes to get started and won’t slow our computers down
  • Our files are encrypted and with a trusted friend, protecting us against theft, viruses and other disasters
  • If our computers crash or are stolen, we can recover all our files quickly (minutes instead of days)
  • No annoying monthly fees—it’s FREE to try out for 30 days and only $19 if we decide to use it.
Learn more about how CrashPlan works or download it.
If you have any questions, give me a call.
As with social networks, if you want anyone to cooperate with you, you’d better write your own personalized message and trash the ad copy. And, frankly, recovering the files in minutes seems to me as if it would depend a lot on the size of the files and where your friend is. And it assumes that only files are lost and you still have either your Internet connection or a way to get your computer to your friend’s house and vice versa.

Given that my Documents and Settings folder (CrashPlan’s default selection for backup, which happens not to include any of my client files) currently contains 16 GB (that would be all those podcasts I haven’t listened to yet), “days” might actually be an accurate description of how long it would take to retrieve all those files via the Internet. And that’s with what passes for a high-speed connection in this country.

You can change the file selection to make sure nothing important gets left out and to exclude things (like podcasts) that you could get again if you deleted them accidentally. A little editing gets me down to about 5 GB, a more reasonable amount to think about backing up online.

That’s an issue I’d have with any online service, whether my files are going to some corporate data center somewhere or to a friend’s PC. But let’s return to the idea of using CrashPlan to “give the gift of backups” to less-technical friends or family members.

The theory is a good one: CrashPlan is easy for non-technical people to install and it works automatically. You know your friend’s files are getting backed up, because you can see them on your computer. But even when my mother had a computer (which she never used and gave to my great-uncle), I’d’ve had to get on a plane to Ohio with my laptop in order to make the initial backup, and she would then have had to do her incremental backups via a dial-up line. (And she was on CompuServe, but that’s a whole ’nother story.)

I do actually like the idea of a backup social club. It’s good to know where your data is. Nevertheless, it’s clear that CrashPlan works best for groups of people in geographic proximity to each other. (Online social networks like LinkedIn also work best when your connections are people you know in the real world, though you don’t have to live in the same city to make that network effective.) If your mom lives in easy driving distance, go for it. If not, consider getting her an external hard drive with automatic backup software.

Labels: ,