We’ve all sustained painful data losses at some point or another. After a number of years, you start to realize that you’re tired of getting sucker punched, and that you have the skill and knowledge to solve the problem once and for all. I’m going to outline here a relatively simple strategy to keep a set of data alive for the rest of your life.
This article is meant to give broad strokes on the strategy. I’m going to gloss over things that may seem non-trivial, only because they are all well documented elsewhere. Nothing a little googling on your part won’t solve, and it will help keep the information here somewhat platform independent. I will get into more specific technical details about my implementations in later posts.
Step 1 – Buy 3 identical drives
Hard drives. Regular cheap hard drives. The key here is upgradability. Moore’s Law will help us later.
Buy 3 of the largest, cheapest drives you can afford. They really don’t have to be identical, just the same size. Usually when you find a good deal on a large drive it’s best to just jump on it and buy three. When I’m ready to buy, I check the Fry’s ad every week at the LA Times “newspaper ads” site (EDIT: Apparently the LA Times site doesn’t have the Fry’s ad anymore, try the OC Register). Prices on stuff like this go up and down, so take a 2-3 week average so you’re sure you’re getting the current best price.
Unless you have special needs, technology doesn’t matter here, just go for the largest and least expensive drives that will work with your hardware.
Step 2 – Set up your main computer
You’re going to put 2 of the drives you just bought into your main computer, nightsync’ed. Don’t use RAID. I know many of you will be tempted to, but it’s a waste of time and effort. I ran a RAID 1 system for a long time until I realized how dumb it was. You have to assimilate and understand the fact that user error accounts for a large percentage of data loss cases. It doesn’t matter how careful you are. RAID 1 cannot protect you there. Save yourself from yourself, don’t use raid for backups.
I’ll save the specifics of my nightsync config for another post, just try to find the best (simplest) synchronizing/backup app for your platform and you should be fine. Make sure it creates a perfect mirror of your data, and make sure it happens automatically every night.
Now that you have the two drives in your main machine working and synchronizing properly, you can already let a wave of satisfaction wash over you. You are now safe from probably 99% of data loss situations. Think about it. The only things that can threaten your files now are physical disasters (like a fire, or large amounts of beer), and theft.
Step 3 – Set up the offsite machine
This is where it gets a bit tricky. You need a machine somewhere else you can connect to. Drop that third drive into an old pc you have lying around (old cheap pentium 2’s are perfect for this), set up a weekly backup to it over ssh and put it at a friend’s house, or your parents’ house. Everyone’s situation will be different here, but the important thing is to have your data not only on 2+ disks, but also in two different locations. Since this is an additional level of redundancy, you can do the syncs less often, but they must happen automatically.
There are several small wrinkles to work out with this approach, but none of them are really prohibitive, and you’ll have fun flexing some technical muscle. You have to punch a hole in your friend’s router for the ssh transfers. You also have to work some kind of dyndns magic if they’re on a dynamic ip. Depending on your level of trust with your backup partner, you may also want to look into some encrypted filesystem options.
You can take your time with this step. Get it set up and watch it all working locally for a few weeks before you drop the second machine at the offsite location.
Step 4 – The infinite loop, your data becomes alive
Fast forward 2 years, you’ve run out of space on your main drive. Guess what? Your friend Gordon Moore has your back and hard disks are twice as large and have become abundantly cheap. You are probably even thinking of buying a new machine.
Remember that old machine you set up at your friend’s house? Run a final sync to it and then take its drive out and put it in your safety deposit box, or your safe at home, or wherever. Keep it safe. It marks an era of your life. You may never ever plug it in again, but you’ll feel good knowing you can if you have to.
Now for the cool part. You only have to buy 2 drives this time. Make sure they are exactly twice as large as your previous drives. These will replace the drives in your main machine, as the main drive and nightsync drive. What do you do with the old drives? Well, we have to offer an apology to our old friend, much maligned earlier in the article: RAID.
We’re going to use arguably the most ridiculous RAID configuration, JBOD, or Just a Bunch Of Disks, to arrange the two old drives into 1 logical drive in our now drive-less offsite backup machine. This will let us backup twice the amount of data as before, because the disks’ capacities are simply concatenated together to form a drive twice as large. We don’t need any kind of speed or efficiency here either so software RAID is perfectly fine.
Optionally if you are buying a new machine at this point you may just want to replace your offsite machine with the old main machine to avoid transplanting the old disks.
Copy all your old data to your new drives before you start working again and the cycle is complete. Repeat this step every time you run out of drive space and you will never again have to worry about losing data. EVER.
Additional insights
It used to bother me to think that I had to buy two drives instead of just one every time I wanted to upgrade. At first it seemed like such a waste of money. But when you think about how much more security that extra $100 or $200 buys you, it kinda boggles the mind. Recently Google released an analysis of drives that said that the average failure rate of drives over 2 years was about 8%. That means that if you keep a single drive in your machine for 2 years, you have a 1 in 12 chance of losing all your data from drive failure. That’s scary stuff to me. You drop a second drive in there and run a nightly sync and that risk literally goes away. Fall on a bad drive? Replace it under warranty (or not) and you’ve dodged point-blank bullet. I can’t think of a better or easier way to protect yourself from the worst thing that can happen to your computer.
Laptop users are a bit out of luck on that front, because you can’t physically put a second drive in to run your nightly syncs. The best thing to do if you use a laptop as your main machine is to build a cheap file server at home to act as the “main machine” from this article, and nightsync your laptop to it with ssh. That applies to multi-machine setups too. Have them all backup to a local file server which then runs the sync to the offsite machine. You will need bigger disks, but the result will be near optimal.
Online backup services like Amazon s3 are also an option for the offsite component of the strategy, but I prefer the method outlined here because it scales transparently as the data set gets larger. If you crunch the numbers based on a meager 200GB data set with weekly syncs, the online services just become too expensive. Also it feels good to have final control over all your data. Big companies make mistakes just like the rest of us.
There is another, less obvious benefit to keeping your data forever. You’ll start to notice subtle changes in your directory structures. You’ll start arranging data more elegantly, refined from years of experience with it, combined with the knowledge that it will never be erased unless you want it to be. You’ll start to eliminate clutter naturally, instead of starting over every time you get a new machine. Your project folders will take on a new meaning in your mind, no longer becoming forgotten cruft from dead ideas, but evidence of work done, points of reference for your future. You’ll think of it like carving your work into stone instead of writing in pencil on post-it notes.
Lastly, whatever you do, remember to keep things simple for yourself and have fun learning along the way.
14 Comments
This was a really nicely written and helpful article. Thanks
This is just what I’ve been looking for. A nice simple method for backing up my data. Thanks!
You cite Moore’s law but you fail to take into account that some people accumulate data faster than drive capacity increases. Right now anyone with over 2TB of data will not be able to use your method. In six months I will have doubled my data, but 4TB drives will still not be available :(
neotoy:
You are absolutely correct, if your dataset grows faster than commodity drive sizes, you’re in trouble.
If you work with raw video footage every day you’re going to have to think about your strategy a bit more, but the same concepts will still apply.
We’ve hit a rough patch right now because digital video is still relatively new, and drive sizes haven’t caught up yet. As soon as video stabilizes into its “ultimate” archival format (like music in flac format for example, think uncompressed 1080p…), your data collection will only grow linearly, whereas drive sizes will continue to grow exponentially. This will take some time, but it will happen.
In the meantime it’s important to identify what you really want to keep forever. Personally I choose not to keep torrented shows or movies. They will always be out there to download, there’s no point hoarding them. If you stick with backing up documents, photos and music you’ll be ok.
Don’t let an irrational need to back *everything* up keep you from backing up the things you hold most dear.
This is still not failsafe. What if all 3 drives have the same flaw and fail at the same time. I would prefer to do an online backup instead of messing with having a box permanantly at a friend/relative’s house. What happens if both houses have pipes break or accidently spill something on both computers? I know these scenario’s are very, very unlikely, but it’s not impossible. Now the likelyhood of both your drives failing AND the online backup having a problem are small enough that I wouldn’t worry about it.
Thanks for the nice clear explanation! I’d like to add something else, another very similar method that suits smaller scale computing. I operate from a laptop and my critical stuff is tiny compared to the amounts of data described in your excellent article.
I have a 60gb macbook which contains everything I hold dear. I mirror it (?correct term) once or twice a week to an identical sized external firewire drive. I’ve then got a bootable perfect copy of my laptop drive that’s never more than a few days out of date.
I continuously backup my important files in the background to both an offsite destination (in the the sky for all I know, but I’ve checked it’s safe!) and to a windows computer in my house which is pretty much always turned on. Crashplan does this very well. I’ve never tried Mozy. It’s encrypted before it leaves my computer which is important for me – I’ve quite security conscious and use file vault on my macbook. My important unique data (documents/unique music/photos) only amounts to about 20gb. Most of the music can always be got again. Much of my unique data doesn’t change from day to day so internet bandwidth isn’t an issue for the offsite uploading.
Combine these two methods and it’s pretty close to bullet proof and quick to recover as well. My laptop drive died only two weeks after initiating this very simple routine. All I had to do was hook up the bootable firewire drive, download the files which were new or changed in the four days since I made the image and I was up and running again in a time measured in minutes rather than hours. I had to then get a new drive for the macbook and transfer the image onto it to get mobile again.
I haven’t really thought how this would scale up. I don’t think it would be as good for people who have huge changes in data from day to day.
Thank you for sharing!
More!
hi
People should read this.
A simple way to backup, and how I do it.
Every month or so I will boot Ghost or TrueImage from a cd and duplicate the system drive. And then, I’ll just filecopy using windows explorer, all the digital photos and mp3’s, and journal, and other important data onto the backup drive too.
I use USB hard drives for this purpose. And somewhat less often I’ll back those up once in a while.. It really doesn’t take much time at all. And for added safety, you can take one of the drives offsite.
It is important to practice restoring your data, especially the system drive. Always verify your drive images and backups.
*** Really, if your a home user, all you need is 2 external usb drives, and ghost or acronis trueimage.. For the cost of less than $150 you will not have major headaches. Do it once a month, or whenever you make huge changes to your system or when you get a ton of new invaluable data, like vacation pictures or something and got some free time. No complicated raid stuff, or scheduling, or swapping drives or worrying about online services.
Our solution is not necessarily for Home use but if you need 6TB or more then just check out our site. It is the coolest thing you will see in awhile.
hello,
this is kind of late commenting, i am sure, as your article is now nearly 3 years old.
but i have been thinking about this matter recently (i haven’t researched it that much yet, though) and i find an issue with your strategy:
– ultimately, you keep the last copy of the data (if i read correctly) on one drive, stored away, safely.
This is what bothers me. That drive will still be prone to failure. Even if you store it away inside a nuclear protected bunker, eventually, 5, 10, 20 years from now, the circuitry may be damaged, the motor may have rusted, some part of the ultra sensitive magnetic surface may have oxidated. Magnetic information fades away.
It doesn’t seem to be a permanent way to keep family photos and things of the sort. In 20 years, even a cheap, bad quality print, if stored away with minimum care, can be viewed. And more so with printed paper works.
I don’t really think it is possible to store away electronic information. it probably needs to be always “live” and being transferred to newer, and newer drives, as they come along.
nice text, though :)
@nuno
That old drive you’re putting in storage, you can technically just destroy it. Keeping it acts as a 4th (aka “crazy”) level of redundancy.
You have copied all its data to your new machine. That data is now getting replicated on the 3 newest drives according to the guide.
Thanks for your comment though, I’m happy people are still reading this 3 years later!