The Ur-Guru has created his own automated backup routine to handle his frighteningly large array of computers (including several Virtual Machines). I asked him to describe his weekend backup process, and he said, “They might either jump out the window, fall asleep, or get confused after the first paragraph, I’m afraid. :-)”
Well, you might, at that. I find it fairly mind-boggling, myself. Visual aids would help, except that a map of the Ur-Guru’s network would cover an entire wall. I will, however, put a link to an image of a recent (but not current, as he’s been reconfiguring again) network map, and a photo of the monitor setup, on the blog.
For reference, Arena is the Ur-Guru’s main workstation. This is a man who almost never throws a computer away (his Amigas finally went to the old-age home a month or so ago), so he re-uses the workstations which are no longer up to speed for his software development work. Among those machines are Manticore, Nightshade, Meteor, Vortex, Whisper, and Moonlight. In addition he runs three servers: the “twins” Nova and Nebula, and his older server, Nexus.
So here it is, straight from the horse’s mouth. I’ve made a few minor grammatical and spelling corrections; English is not the Ur-Guru’s first language. (I think that was C++.)
It’s not all that complex since most of it is entirely automated. First Nova and Nebula, the two servers, start up a process of imaging their boot disks (using each other as the storage location for the resulting image). They then proceed to create archive backups of all important data (which end up as ZIP files, incrementally). Then they synchronize data that needs to be identical on both and run a quality check to see if all the above went correctly. If at any point it didn’t, they will restore the previous data and send off an e-mail to the administrator with the failure details. When all the backups and the images are created they are automatically copied over to a RAID5 array on Manticore.
During this time Arena, Manticore, and Meteor start doing their imaging process of the boot disks, storing them all on Manticore while Manticore stores its image to Arena. After this process the RAID arrays of Manticore and Arena perform a full synchronization of data, including the just recently created backup files.
This process will wait until Nitrous and Thunder (the laptops) are done synchronizing their data with the just freshly synchronized data on Manticore, while also taking partial synchronized data from Nova or Nebula (it figures out which of the two has the least to do and uses it to make sure no system is unduly stressed). If needed this process will also kick in a second synchronization process from Manticore to Arena and Meteor in case any files from the laptops are newer than the recently synchronized files.
Once all these processes are complete there are always two or more locations in which the backups of any system is available at any point in time. Manticore will proceed with a WOL (wakeup call) to Vortex which will power on and run a final check on all the backup processes that have completed. If any errors or failures are detected it will notify the administrator (me) via e-mail and loud annoying noises (just in case I fell asleep watching this boring but perfectly automated process).
If any files were changed on Whisper then Nova and Nebula would know about this. Vortex uses the information contained by Nova and Nebula to wake up Whisper to copy over its data to Manticore. The data from Whisper is separate from all other data.
Finally, Vortex will wake up Nexus and signal it to start a cleanup and redundancy process which involves removal of old archives and backups on itself and other systems in the network (in this process Nexus acts as a tertiary fail-safe for data that may be in error from a previous backup, allowing Nexus to directly restore any data to any system). When Nexus is done it will signal a power-off shutdown to Vortex after which it will shut itself down as well.
None of the Virtual Machines running on any of the systems are backed up. Their data is run through network locations on Nova and Nebula and that data was already part of the entire process so all I need to do for those is once in a while shut them down and just copy their disk images to a backup location. Similar to the PDA, a backup of it is done manually every time I feel like it or remember to do so.
The SQL server databases on Nova and Nebula keep eachother in sync so no additional backups are required, however, once a week a process runs that create a full backup of the databases which is later picked up during a weekend backup process as plain archives. These archives aren’t rotated but are kept in storage until I manually delete the older ones (after I am absolutely certain no rollbacks of 2-3 months are needed).
The cost of all this: 2.7TB (2700GB) worth of backups and storage which runs over 2 separated gigabit network segments, completing the entire process in under 2 hours.
Who, me, paranoid?!
Now, aren’t you glad you only have your own machines to back up? Though I must say I envy him both the automation and the amount of storage space he has!
And it underscores the point that if you’re serious about computers, you need to be serious about backups.