oreilly.comSafari Books Online.Conferences.


Creating Filesystem Backups with 'rsync'
Pages: 1, 2


The heart of the synchro script is the rsync command. What synchro does is automatically pass the right arguments to rsync for any of my servers, so that I don't have to build an rsync command file for each server.

First, some terms. A partition is a slice of a hard drive and is referred to by a device name. In Linux, the partition names for the first IDE drive are usually /dev/hda1, /dev/hda2, and so on. For a SCSI drive, the names are /dev/sda1, /dev/sda2, etc. A filesystem is a formatted partition. The mount command is used to mount a filesystem somewhere in the directory hierarchy and is referred to by its "mount point." For example, the filesystem located in partition /dev/hda7 could be mounted at /home and referred to as the /home filesystem.

I refer to a filesystem or partition containing original data as the source and the place to copy it as the destination.

synchro is written in Perl, any recent (5.x or better) version of Perl should work. It calls some system commands including mount and optionally fsck. You will need the rsync command which is often not installed by default. If you use a popular Linux distribution, it is on your CD-ROM. You can also obtain it from the primary FTP site.

The beauty of using rsync is that it only copies the files that have changed. If a given filesystem does not change much over a day then it can be thousands of times faster than using a copy or tar command.

'synchro' knows about different filesystems; I have tested it with the usual Linux ext2fs and reiserfs, the Reiser journaling filesystem. I had to make one small change to teach the fsck command to run the right check for reiserfs. I created a two-line script in /sbin/fsck.reiserfs which contains

echo "Yes" | reiserfsck $*

Now when anyone uses the command fsck -t reiserfs, the fsck command knows how to check a Reiser filesystem.


As distributed, synchro assumes that both your hard drives will be partitioned the same way. I put one drive on /dev/hda and the other on the second controller at /dev/hdc. So, for example:

   Source filesystem   Partition     Backed up in 

       /               /dev/hda1     /dev/hdc1
       /home           /dev/hdc7     /dev/hda7

This system makes it easy for me to remember find things when I need to recover a file. If a file is removed from /home, I can use mount to see that /home filesystem lives in /dev/hdc7 and then say mount /dev/hda7 /mnt/synchro to temporarily make the backup copy available. Normally all backup filesystems are left unmounted.

I put the code that determines the destination into a subroutine called get_dest. If you have different requirements (such as different drives than "a" and "c"), you can change the code in lines [70-94] to customize it.

You can either explicitly pass the list of filesystems in on the command line, or you can put them in a list in lines [45-52]. By default, I look for /boot, /, /var, and /home. The command line overrides the built-in list.

synchro uses a built-in list called "extras" mostly to exclude things that should not be copied, such as the /dev directory. The rsync command does not handle the /dev directory gracefully! If you tell it to copy /dev/hda1, for example, it tries to copy the entire unformatted partition instead of just replicating the device file. When a filesystem name matches an "extras+ entry, the right-hand part (after the => symbol) is added to the rsync command.

The default extras in lines [55-58] works well for all my systems.

I use /mnt/synchro as a temporary mount point. The script creates this directory if it does not exist. Change line [68] if you want to use a different location.

Initial setup

If you run synchro with a -h for help you will get this output:

This script synchronizes the partitions on two hard drives.

Usage: synchro [options] [filesystem...]

  -d  dryrun  - show commands that would be run 
      without performing any actions
  -f  fsck  - perform fsck commands on destination 
  -h  show this message and exit
  -n  pass -n option to rsync so that it will report 
      without copying files
  -v  pass -v option to rsync so it will report 
      while copying

When I install synchro on a new system, I first run it with -d to see what commands it will execute. If they look okay, then I run it once manually to copy everything. Then I run it again with -v. This time, it will report on what files if any have changed.

Because synchro will never back up the /dev files, I use a tar command pipeline during setup to copy the /dev files. Usually this is a one-time thing because /dev files don't normally change unless you change your hardware. Here is the command:

mount /dev/hdc1 /mnt/synchro
tar cvf - /dev | (cd /mnt/synchro; tar xpf -)

After I am satisfied that it's working correctly, I put an entry into /etc/crontab to run it once a day. I use the -f option, so that the destination filesystems are checked everytime it's run. I made this a command-line option so that you aren't forced to run it if you don't want to.

If I am about to perform major changes, such as removing an account, sometimes I will make a copy of /home using the command-line mode, such as

synchro -v /home

The -v is passed on to rsync so that it will list out the files that are changed.

Here is an outline of what synchro does. Line numbers are in brackets.

  1. Read command-line options [29] and filesystems [39-40], if any. If no filesystems are given use the default internal list [45-52].
  2. Create a mount point if one does not exist. [98-100]
  3. Run the mount command to build lists of filesystem types and partition names. [105-113]
  4. Loop over the list of filesystems [121-156]. For each filesystem,
    • Get any extra options from the "extras" list. [124]
    • Determine the destination name using info from step 2. [128-130]
    • Check the destination filesystem with the fsck command. [132-139]
    • Mount the destination filesystem. [141-144]
    • Perform the rsync to synchronize content. [146-150]
    • Unmounts the destination filesystem. [152-155]

That's it. Also of note in the script is the syscmd() subroutine in lines [158-176]. All system commands are routed through here to make it easy to run the script in "dryrun" mode. If -d is given as a command-line argument, the command will be printed in syscmd, but not executed.

I will readily admit I'd love to use hardware-supported RAID-1 in addition using this daily rsync scheme, but my tiny IT budget just does not allow it. I've used various incarnations of this script for a number of years now. I hope you find it useful, too.

Brian Wilson wrote most of this article while sitting in the Marin headlands overlooking the Golden Gate Bridge. He claims that bicycles and laptops and corporate downsizing definitely have their advantages.

Return to the Linux DevCenter.

Linux Online Certification

Linux/Unix System Administration Certificate Series
Linux/Unix System Administration Certificate Series — This course series targets both beginning and intermediate Linux/Unix users who want to acquire advanced system administration skills, and to back those skills up with a Certificate from the University of Illinois Office of Continuing Education.

Enroll today!

Linux Resources
  • Linux Online
  • The Linux FAQ
  • Linux Kernel Archives
  • Kernel Traffic

  • Sponsored by: