I was watching a mailing-list discussion of backup software, and how often you should back up, and based on some decades’ experience, found some of the thinking sloppy. Here are my life lessons on keeping your data safe while assuming that The Worst Will Happen. Some of it is Macintosh-specific, but there may be useful take-aways even from those parts, even for non-Mac-hacks.
Catastrophe · It’s been said that the eventual failure rate for hard-disk drives is 100%. This is a little misleading; I haven’t had an actual disk crash on any of my computers in at least a decade. But it can happen and it does happen and you should think about it.
And crashes aren’t the only way to lose data. The single greatest threat to my data is me. During the course of day, I touch lots of data, and each touch is a chance to screw up, and sometimes I do screw up, and sometimes data is lost. I suspect this may be true of you, too.
I can remember, back when I was a sysadmin, seeing the occasional newsgroup post, the kind of thing that makes you feel sick to read: “One of our users is suicidal because he had three years’ worth of data for his Ph.D. dissertation on an un-backed-up drive and it’s gone south. Can anyone think of anything?” Usually, no-one could, and it didn’t matter whether the problem was hardware failure or user error.
If I catastrophically lost everything that’s on my laptop, it would blast a major hole in the integrity of my life. I’d rather not even think about it.
How I Do It · I back up once a month or so, occasionally missing a month, and I use “tar”, a decades-old Unix command line utility. But I sleep easy at night. I’m quite confident that my data is really very safe; it’s been years since I lost anything I cared about.
The Rules · If you follow these, you almost certainly won’t lose data in any damaging way.
Don’t use proprietary file formats.
Don’t erase anything.
Store everything twice.
Do occasional ad-hoc and regular full backups.
Note that these are not rules for professionally-managed data centers, who have incredibly complicated backup requirements and often live in a regulated environment. If you’re a pro, the only way to be safe is to hire an expert and throw some real money at the problem.
This advice is aimed at the users of personal computers, whose needs are simpler and less variable.
No Proprietary Formats · This is actually a two-part rule. First, don’t store the actual live data you care about in proprietary file formats. Second, don’t use them for your backups either.
The live-data part is easy to understand. Don’t store your Last Will and
Testament in a
.doc file, or your priceless collection of
Delta Blues recordings in Apple-only or
Windows-only formats, or your wedding photos as a Corel Paint Shop Pro
It’s easy to be safe. For important documents, keep PDF or HTML versions around. For pictures, use JPG or PNG, and for audio use MP3 for compressed data and WAV or equivalent for high-quality stuff.
And yes, this is another reason why things like the Open Document Format are very, very important; increasingly so every year as our personal lives become embodied in bits.
The proprietary-backup-format issue is a little subtler; let me try to bring it alive:
Horror Story · Back sometime in the Nineties, I bought a new computer and for some reason that I now forget, was running Windows NT 4 on it. Since I’m serious about data protection, I bought it with a little cartridge tape drive, DAT format I think, and saved everything once a week or so, using the procedure in the Windows user guide.
Sure enough, a few months in, I accidentally nuked something important. Feeling smug, I slapped in the most recent tape and invoked the magic data-restore procedure. The computer silently ground away on the tape-drive for about twenty minutes and then said “Invalid save-set; exiting.” I spent some unsatisfactory time with Microsoft phone support, who asked a lot of questions and then said (more or less) “Invalid save-set; exiting.” I have rarely been angrier.
I managed to dig up a Unix box with the right drive that would actually let me read the raw data, but that didn’t help; it was ungodly mashed-up broken-as-designed spaghetti data with binary numbers splashed all over it; the real miracle is that it could ever have worked for anyone.
So when I hear the Macintosh experts talking about all those commercial backup utilities with their cute names, I tend to think they’re mostly nuts. Among other things, suppose I want to restore the data on a computer that’s not a Macintosh? It’s my data, after all.
So: Do not, whatever you do, feed your valuable data to a program that is going to save it in a file format that can only be read by that program, or by that kind of computer. Because when the program can’t or the computer can’t, you’re out of options.
Don’t Erase Anything · I’ve noticed that this basic idea is really hard for people to warm up to, but trust me, it’s a good one. Here are a couple of war stories.
Back in 1991 or so, my then-employer Open Text sold a search-and-display system to Ringier, a Swiss newspaper/magazine publisher. At the time of the deal, they were generating about one megabyte of data per day; so a year’s worth was a substantial chunk of data given the disk sizes back then. We were having one of our planning meetings and their IT manager said “And now we have to design the schedule for deleting the old data.” I had an epiphany right there in the room and said “Why would you ever do that?” He looked at me like I was crazy, but we walked through the economics; the cost of keeping a year’s data was, in the big picture, negligable.
Skip forward 14 years: Last summer, when I got Mom her new computer, and we were transferring everything over to it, she said “Now I’ll have to go through everything and decide what to throw out.” All of a sudden there were echoes from that smokey room in Zürich in 1991: “Why would you want to do that?”
Here’s the deal: every time you erase something, you might be making a mistake. This is the most common way data gets lost. So, don’t do that. If things are getting in your way, make a folder called “Dusty old files” or whatever, and put them in there. Then forget about them. One day you’ll want something from four years ago and you’ll be happy. Even if you don’t, you’re much less likely to accidentally lose those wonderful baby-shower pictures.
The cost of disk space these days is just so ridiculously low that most people never manage to fill up their laptops. So why on earth would you invest your precious time in a dangerous activity (deleting things) in order to conserve a resource that’s essentially free?
Store Everything Twice · Hard disks fail, but not very often. Optical disks (CD/DVD-ROMs) fail too, but also not very often. If you have your data on two disks, not plugged into the same computer, both of them would have to fail at the same time to lose your data, which happens not-very-often-squared; you’re more likely to get hit by a falling safe or win millions in the lottery.
Let me make this very personal. On my laptop, the data that I really care about is: email, pictures, music, blog entries, source code, and conference presentations.
My email is on my laptop but also on a server somewhere in the bowels of Sun.
Pictures (which I keep as JPGs in ordinary folders, one folder per month) get copied over to the family fileserver, a big little-used Windows box in the basement, every few days.
Music is all copied off of disks I own, except for the rarities I snagged from Napster back in the day, which I have a copy of on the fileserver.
My blog entries are all here, and also up on tbray.org as of the instant they’re published.
Program code gets synced up regularly with some source-code repository or another; and the one thing that will motivate me to do frequent backups is when I’m coding but unable, for a few days, to check the code in.
That leaves conference speeches, and I only write one or two of those a month, and they’re not all new material; I might lose one or two in a between-backups crash. But when I finish writing one and head off for the conference, I always dump the presentation onto one of those little USB disks and carry it along in my pocket; which has saved me more than once.
Keeping multiple copies is the single most powerful tool I know of to protect your data.
Do Ad-Hoc Backups · Suppose you’ve just finished some project: editing the pictures from a trip, or writing a conference presentation, or crunching some numbers. While you’re still feeling that warm glow of achievement, slap a CD-ROM or DVD-ROM in the computer and save it. Then get one of those Sharpie markers and write right there on the disk what’s on it. Then put the disk on a shelf somewhere and chances are you’ll never touch it again.
It won’t take long, and you’ll sleep better, and it might just save your butt.
Do Full Backups · There’s a very old, very bad idea that comes to us from computing’s dim dark past, and it’s called “The incremental backup”. Time was, you had 500-meg disks, but you backed up on tape (remember those spinning tape reels that computers used to have in old movies?) which were only 110 meg if I recall correctly. So you had this elaborate software that you ran every day that would go through the disk and just save onto the tape the stuff that had changed that day. Then, once a week or so, you’d back up the whole disk onto a box-full of tapes. In those days, computer departments had to have lots of space for big racks full of tape reels that held your backups. Imagine how complicated and elaborate it was figuring out how to restore just the right version from an incremental-backup set. It was worse than you imagined, a frequent source of error and further data destruction.
If you’re already storing everything twice, as I advised above, it’s still a good idea to do a massive backup of everything now and again, if only for convenience. If you’re updating anything daily and you care about it, you should bloody well find a way to store it twice (see examples above), and there’s usually a better way than an incremental backup.
So I recommend, when you do a backup, you include everything you own, without exception; don’t try to be smart. This minimizes the possibility that you’ve left out something that you’re not smart enough to think of now.
There’s another recommendation that falls out of this: try to keep everything you care about under one folder, so it’s easy to back up. Mac OS X makes this really easy, by default storing everything you create somewhere under your home folder. There is lots of Windows software that still squirrels important stuff away in a place that’s not your home directory; just another penalty for using a second-rate operating system.
How I Back Up · I’ve already described ad-hoc backups; I do lots of those.
Which brings us to regular full backups, and, I’m afraid, a kind of unsatisfactory ending to this essay. I had started writing a detailed blow-by-blow narrative of exactly how I do this, but started to get this nightmare scenario in my head where some non-Unix-geek tries to apply it and puts in back-quotes for quotes or something like that and silently destroys her data.
I will reproduce the two exact command lines I used on my most recent backup and restore, with commentary for Unix geeks, but DON’T EVEN THINK OF TRYING TO USE THIS unless you really know what you’re doing, because YOU WILL PROBABLY LOSE YOUR PRECIOUS DATA.
I’ll also appeal to the LazyWeb: if anyone cares to recommend tools or utilities that achieve the desired effect and are safe for use by real non-geek people, let me know and I’ll point to them here.
Having said that, here are two specific piece of advice:
Do Video Separately · Video files are so bloody huge that there’s no point trying to pretend they’re like ordinary data. When I finish a piece of video I save it onto a DVD all of its own, and leave it out of my regular backups. I keep video in a separate folder, not my home folder, so that this will work.
Buy An External Disk Drive · You can get these in any computer store, they’re called “FireWire” or “USB 2” drives. They’re real easy to use, you just plug ’em in and there they are. They come as small as 40G or so, the cost-per-byte goes down as they get bigger, so you might as well buy a big one.
In my case, I’m using a 250G disk and my whole life, compressed, is about 26G, so this one will last most but not all of a year before it’s full. Then I’ll go buy another. I already have two sitting at the back of a shelf somewhere. The costs are acceptable. Most people’s requirements will be much less extreme.
For Unixians: Doing It With ‘tar’ · This old, simple, reliable technology and has a lot of advantages, but...
THE FOLLOWING IS FOR UNIX GEEKS ONLY. DO NOT TRY TO DO THIS UNLESS YOU UNDERSTAND WHAT’S GOING ON OR YOU WILL LOSE YOUR DATA.
I’ve labeled the current FireWire drive “2005-05” to remind me when I bought it. I created my most recent save-set like so:
cd cd .. tar czf /Volumes/2005-05/archive/2006-01-28.tgz twbray
Recently, I stupidly overwrite a file named “Sigrid.sxw” with something completely different. Sigh. Here’s how I recovered, after plugging in the FireWire drive:
cd cd .. tar xzvf /Volumes/2005-05/archive/2005-12-03.tgz twbray/sun/Sun2005/08/Sigrid.sxw
The tar archives are readable on Windows (via cygwin) and Linux; I’ve checked.
Some may object that tar will bypass resource forks and other HFS voodoo. Well, fuck that; if I have any data that relies on resource forks to work, then I’m a sharecropper on Steve’s plantation, and serve me right if I lose it.
It takes a long time. Start the process, go to bed, and sleep soundly in the knowledge that your data is safe.