Sunday, February 19, 2012

Data backup with rsync

After years of backing up personal data on an external disk and keeping the disk in a cupboard, I finally got to implement something a bit more safe.

Off site copy

Having your backup disk at home protects you against data loss in case of disk failure. Should your house burn down, your backup disk would not help much.
If burglars pay you a visit and take your computer, they may just as well take all your electronic equipment as well, including your backup copy.

The solution is of course to keep your backup copy (or a copy of your backup copy) outside your home.

In my home backup implementation, I use two identical external disks. One disk stays at home for a month and takes periodical backups (weekly or daily), while the other one is locked at work.
Once in a month, I swap the home disk with the work disk.

If the disk that I currently have at home gets destroyed or stolen with my computer, I still have the data that I had at the latest disk swap.

Periodical backup

The primary failure of my previous backup (copy and forget) is that it was manual. Several months could go by between copying to my external disk. This new backup system is going to be automatic.

I must also find a good way to enforce the monthly switch with the offsite disk, which I may easily forget. The best I can do so far is to automate a monthly notification.

Disk format

The data saved on my backup disks must be as readable as possible. If my hackintosh crashes, I do not want to need another Mac to recover the data. The exFAT format is the first one I could find that supports writing and reading both in Windows and Mac OS X. Unfortunately, Snow Leopard does not let you format a disk in exFAT (surely because of legal issues, since Microsoft owns exFAT), so I had to do that from my Windows 7 machine.

Smart copying

Rather than just copying all data from an internal disk to the external, it would be better to only copy new files. Luckily, that is exactly what rsync does.

From rsync's man page:
Rsync is a fast and extraordinarily versatile file copying tool. [...] It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination.

rsync is available on Mac OS X, and probably on all modern linux distributions.

So far, this command is what I use:

$ rsync -av --exclude '*.app' ~/ /Volumes/Bluebird/
  • -a is for archive, which preserves dates, permissions, owners, groups, symlinks, and the like. It also browse the source folder recursively.
  • -v is for verbose. I like to see what is happening, this will probably be removed when I feel confident that the script is doing what I want.
  • --exclude excludes file patterns from the backup. I have some applications that stubbornly put themselves in my home directory. I have also excluded .Trash, and will probably exclude Library/. A list of patterns to exclude may be provided in an external text file.
  • ~/ is the source folder being backed up, and /Volumes/Bluebird/ is where the backup files are being copied to. Note the trailing slashes in the paths!

Future improvements

I do the copying with rsync and the swapping manually for now. Planned improvements are:
  • automate the local copying with cron
  • save the result in a log file
  • send a notification if something went wrong

To avoid swapping manually, I could put my offsite copy on an external server. This kind of services is already available from big companies, but I am not sure how much it would cost for several hundreds of gigabytes of raw format pictures and videos. It certainly wouldn't give me the satisfaction of having done the setup myself.

Another alternative is to use a friend's server to connect my offsite copy disk to. This would require a reliable friend (or encryption, or both), who has a server on most of the time, and a decent line. It seems like a viable option, which I may look into in the future.

I'll keep writing when I find time to implement these improvements.


  1. or Time Machine. Now that you're a mac guy...

    1. A good suggestion, but I want the data to be accessible from Windows and Linux. Time Machine requires a format that only Mac OS X supports. I'll maybe add the format information in this article.

  2. This comment has been removed by the author.