ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


FreeBSD Basics

Understanding Archivers

by Dru Lavigne
05/02/2002

In the next few articles, I'd like to take a look at backups and archiving utilities.

If you're like I was when I started using Unix, I was intimidated by the words tar, cpio and dump, and a quick peek at their respective man pages did not alleviate my fears.

So I quickly convinced myself that I really didn't need to learn how those utilities worked. After all, I didn't even own a tape drive on my home FreeBSD system. Yes, I knew that backups were really, really important, but surely I could just copy the files I needed as I needed them.

I've since learned that copying files is actually the hard way to do a backup and is not particularly conducive to me backing up everything I should on a regular basis. In today's article, I'd like to introduce the concept of archiving, which archiving utilities are available, and some of the differences between the archiving utilities. In the next few articles, I'll continue by demonstrating the usage of each of these archiving utilities.

I'm currently logged in as the user "dru". I'll cd into my home directory and take a look at its contents:


cd
ls
.             .xinitrc            perlscripts
..            articles            this
.cshrc        ip.c                tricks
.history      jpegs               unix
.mailrc       lynx_bookmarks.html
.ssh2         pdfs

Let's say I want to backup the contents of my perlscripts directory to another directory I'll call backup. If I try:

cp perlscripts backup
cp: perlscripts/ is a directory (not copied).

I'll see that the copy operation fails, since perlscripts is a directory. However, if I remember to use the r or recursive switch:

cp -r perlscripts backup

the copy command will be successful. A new directory named backup will be created for me and the contents of perlscripts will be copied to it.

That seemed easy enough, but it may not be the best way to do a backup. For starters, let's do a long listing of the original directory and the new backup directory:

ls -l perlscripts
total 6
drwxr-xr-x   2 dru  wheel   512 Oct  4 07:29 .
drwxr-xr-x  22 dru  wheel  4096 Mar 24 07:07 ..
-rwxr-xr-x   1 dru  wheel   801 Feb 16 12:32 time.pl

ls -l backup
total 3
drwxr-xr-x  3 dru  wheel  512 Mar 24 08:49 .
drwxr-xr-x  8 dru  wheel  512 Mar 24 08:49 ..
-rwxr-xr-x  1 dru  wheel  801 Mar 24 08:49 time.pl

You'll note that the last modified time for the file time.pl has been changed to the time I made the recursive copy, rather than the last time this file was actually modified, which was back in February.

This may or may not be a big deal to you if you are only interested in backing up some files in your own home directory. However, this could certainly cause confusion if this was the backup solution for larger portions of your FreeBSD system.

There are other considerations when using cp -r to backup files. What if I wanted to backup files for several users? I would probably do the backup as the superuser. Let's see what happens if I repeat that copy, but this time as the superuser:

rm -r backup
su
Password:
cp -r perlscripts backup

ls -l backup
total 3
drwxr-xr-x  3 root  wheel  512 Mar 24 09:20 ./
drwxr-xr-x  8 dru   wheel  512 Mar 24 09:20 ../
-rwxr-xr-x  1 root  wheel  801 Mar 24 09:20 time.pl

You'll note that both the backup directory and the time.pl file are owned by the user who did the copy, in this case root. This situation could have been avoided if I had remembered to include the p switch to preserve the original permissions.

Just imagine the nightmare if I had backed up each user's home directory as the superuser using cp -r; I would have to readjust the ownership and possibly the permissions of any file that needed to be restored, plus the original file modification times would still be unknown.

If that's still not a big deal to you, consider how I would backup my entire home directory using cp -r. I do NOT want to do it this way, even though it seems logical enough:

cd
cp -r . backup

If I do try to do this, my hard drive will churn for an eerily long period of time before giving me an error message that includes several screens worth of the word backup and something about the name being too long. This is because the cp command will go into an endless loop if your destination happens to be in the same directory or a subdirectory of the source you are backing up. It will copy backup to backup/backup to backup/backup/backup and so on until it runs out of space.

So how would I backup my entire home directory? This is where things start to involve a bit more work and I start to get the gnawing suspicion that there has to be an easier way to accomplish this. This will work:

mkdir backup
cp -r .cshrc .history .mailrc .ssh2 .xinitrc articles file ip.c jpegs 
lynx_bookmarks.html pdfs perlscripts tricks unix backup/

but will quickly become time-consuming and inconvenient as the number of files in my home directory continues to grow. I could get a bit fancier by coming up with wildcard expressions that represent all of the files and directories in my home directory, but I would still be doing things the hard way.

This is where the concept of archiving and utilities that were designed to do archiving come into play. So what exactly is an archive? It is a file containing a collection of other files in a structure that preserves the contents, permissions, timestamp, owner, group, and pathnames of the original files so they can be reconstructed at a later time. In other words, archiving utilities can copy all of the files and subdirectories within a directory and then recreate that original directory structure without losing any permissions or modification times along the way.

This is actually even more interesting once you realize that there are devices that don't even know what a filesystem is or how to read a filesystem hierarchy. We are used to thinking of our files living in a filesystem hierarchy. For example, my time.pl file is a file that lives in the perlscripts directory which is a subdirectory of my home directory (dru) which is a subdirectory of the home directory which is a subdirectory of the /usr filesystem, or:

/usr/home/dru/perlscripts/time.pl

Any device that can contain a filesystem and therefore understand a filesystem hierarchy is known as a block device. The hard drive that contains your FreeBSD operating system is an example of a block device.

However, there are devices that do not understand what a filesystem hierarchy is. Consider how a tape device works. When you write data to a tape, your characters are simply passed to the tape one after the other, or sequentially. There is no filesystem, or any concept that the file time.pl belongs within the perlscripts directory. Such devices are known as character devices and are often called "raw."

Archiving utilities can backup to either a block or character device. The archive file itself contains all of the information required to recreate the original file hierarchy; that information is saved along with your data. This means you can backup your data to a character device such as a tape drive, and then later restore your data to a block device such as your hard drive.

There are several archiving utilities that come with your FreeBSD system. I will be covering tar, cpio, pax, dd, and dump/restore. Let's see what the whatis command has to say about each of these utilities:

whatis tar cpio pax dd dump
tar(1)              - tape archiver; manipulate tar archive files
cpio(1)             - copy files to and from archives
pax(1)              - read and write file archives and copy 
                      directory hierarchies
dd(1)               - convert and copy a file
dump(8), rdump(8)   - filesystem backup

Note that tar, cpio, and pax are considered to be archivers. We'll see that tar is easiest to use when you want to backup entire directory structures. In contrast, the cpio utility is the easiest command to use when you want to pick and choose which files to backup. And the pax command is a combination of both these commands with a bit of added functionality thrown in.

The dd utility is interesting -- it can actually convert files as it backs them up. We'll see that this can be invaluable, say, when backing up files from a PC to a SPARC. Finally, the dump command is designed to backup an entire filesystem, not just a directory structure.

I want to discuss a few more items, though, before we start using each of these commands. Most of these commands assume that you will be backing up to a SCSI tape drive but will let you change this default with a switch. Even if you don't have a tape drive, it is useful to understand the naming syntax your FreeBSD system uses for tape devices.

Like other Unix systems, FreeBSD stores information regarding devices in the /dev directory. Let's do a long listing of the first few files in this directory:

ls -l /dev | head
total 62
drwxr-xr-x  3 root wheel       14336 Mar 17 19:31 .
drwxr-xr-x 18 root wheel         512 Jan 31 19:17 ..
-r-xr-xr-x  1 root wheel       43405 Sep 18 2001 MAKEDEV
-r-xr-xr-x  1 root wheel        2064 Sep 18 2001 MAKEDEV.local
crw-r-----  2 root operator 117,   0 Sep 22 2001 acd0a
crw-r-----  2 root operator 117,   2 Sep 22 2001 acd0c
crw-r-----  2 root operator 117,   8 Sep 22 2001 acd1a
crw-r-----  2 root operator 117,  10 Sep 22 2001 acd1c
crw-r-----  2 root operator 116, 0x00010002 Sep 22  2001 ad0

Notice the difference in the fifth field of that long listing. The first few files indicate their size in bytes -- for example, the file MAKEDEV is 43405 bytes in size. However, the last five files have a "117," or "116," instead. Note that these files are also character devices; you can tell this as their file mode is c (just before their permissions). Directories have a file mode of d and regular files have a file mode of -.

The device files in the /dev directory are really just pointers to a driver contained in the kernel for the device that each device file represents. This means that these files are really empty, they are just pointers. The value in what is normally the size field of ls -l represents a "major_number,minor_number. " For example, the device file acd1c has a major number of 117 and no minor number. The major number indicates which driver should be used; the minor number gives any additional information about the device to the driver.

Unix Backup and Recovery

Related Reading

Unix Backup and Recovery
By W. Curtis Preston

The MAKEDEV file in this directory is really a shell script used to make the device files. If you want to find out what a device file refers to, read the comments at the beginning of this file. For example, to see which devices refer to tape devices, I'll search this file for the word tape:

more /dev/MAKEDEV
/tape

And I'll find that the following tape drives are supported on my FreeBSD system:

sa        SCSI tape driver  (formerly called st)
wt        QIC-02 or QIC-36 3M cartridge tape

There is also a third type that is supported:

wst        ATAPI tape drive on IDE bus

Each of these has an associated man page which you can read if you have one of these tape devices.

If I look for these devices in the /dev directory, I'll note that they usually come with some additional letters:

ls /dev | grep wt
nrwt0
nrwt0b
nrwt0c
nrwt0d
rwt0
rwt0b
rwt0c
rwt0d

Most tape devices (but not all) will include the letter "r" indicating that they are a "raw" or character device. By default, after you backup to a tape device, it will rewind; meaning your backup will be overwritten if you do another backup to that tape. To prevent this default behavior, use the device that includes the letter "n" for no rewind.

Occasionally, a device will also include an "e," meaning that it will eject the tape once the backup is complete.

The last thing I want to mention in today's article is the difference between absolute and relative pathnames. Since an archiving utility will save the pathname of a file and use that pathname information when recreating the file, it is important to know the difference between the two types of pathnames.

If a pathname begins with a / it means it is an absolute pathname. This is usually considered to be a bad thing in a backup as you will only be able to restore that file to the original directory it came from, meaning you will lose any changes you've made to that file since you backed it up. Even if you are in a different directory when you restore that file, it will still restore that file to its original location.

If a pathname begins with ./ or no / at all it means it is a relative pathname. This is usually considered to be a good thing in a backup as the file can be restored anywhere. You simply cd to the directory you want to restore the file to, and the archiver will add the current directory to the pathname as it restores the file.

In next week's article, we'll continue this series by demonstrating how to use the tar utility.

Dru Lavigne is a network and systems administrator, IT instructor, author and international speaker. She has over a decade of experience administering and teaching Netware, Microsoft, Cisco, Checkpoint, SCO, Solaris, Linux, and BSD systems. A prolific author, she pens the popular FreeBSD Basics column for O'Reilly and is author of BSD Hacks and The Best of FreeBSD Basics.


Read more FreeBSD Basics columns.

Return to the BSD DevCenter.


Copyright © 2009 O'Reilly Media, Inc.