Understanding Archivers
by Dru Lavigne05/02/2002
In the next few articles, I'd like to take a look at backups and archiving utilities.
If you're like I was when I started using Unix, I was intimidated
by the words tar, cpio and dump, and a quick peek at their respective man pages did not alleviate my fears.
So I quickly convinced myself that I really didn't need to learn how those utilities worked. After all, I didn't even own a tape drive on my home FreeBSD system. Yes, I knew that backups were really, really important, but surely I could just copy the files I needed as I needed them.
I've since learned that copying files is actually the hard way to do a backup and is not particularly conducive to me backing up everything I should on a regular basis. In today's article, I'd like to introduce the concept of archiving, which archiving utilities are available, and some of the differences between the archiving utilities. In the next few articles, I'll continue by demonstrating the usage of each of these archiving utilities.
I'm currently logged in as the user "dru". I'll cd into my home
directory and take a look at its contents:
cd
ls
. .xinitrc perlscripts
.. articles this
.cshrc ip.c tricks
.history jpegs unix
.mailrc lynx_bookmarks.html
.ssh2 pdfs
Let's say I want to backup the contents of my perlscripts directory to another directory I'll call backup. If I try:
cp perlscripts backup
cp: perlscripts/ is a directory (not copied).
I'll see that the copy operation fails, since perlscripts is a directory.
However, if I remember to use the r or recursive switch:
cp -r perlscripts backup
the copy command will be successful. A new directory named backup will be created for me and the contents of perlscripts will be copied to it.
That seemed easy enough, but it may not be the best way to do a backup. For starters, let's do a long listing of the original directory and the new backup directory:
ls -l perlscripts
total 6
drwxr-xr-x 2 dru wheel 512 Oct 4 07:29 .
drwxr-xr-x 22 dru wheel 4096 Mar 24 07:07 ..
-rwxr-xr-x 1 dru wheel 801 Feb 16 12:32 time.pl
ls -l backup
total 3
drwxr-xr-x 3 dru wheel 512 Mar 24 08:49 .
drwxr-xr-x 8 dru wheel 512 Mar 24 08:49 ..
-rwxr-xr-x 1 dru wheel 801 Mar 24 08:49 time.pl
You'll note that the last modified time for the file time.pl has been
changed to the time I made the recursive copy, rather than the last
time this file was actually modified, which was back in February.
This may or may not be a big deal to you if you are only interested in backing up some files in your own home directory. However, this could certainly cause confusion if this was the backup solution for larger portions of your FreeBSD system.
There are other considerations when using cp -r to backup files.
What if I wanted to backup files for several users? I would probably
do the backup as the superuser. Let's see what happens if I repeat that
copy, but this time as the superuser:
rm -r backup
su
Password:
cp -r perlscripts backup
ls -l backup
total 3
drwxr-xr-x 3 root wheel 512 Mar 24 09:20 ./
drwxr-xr-x 8 dru wheel 512 Mar 24 09:20 ../
-rwxr-xr-x 1 root wheel 801 Mar 24 09:20 time.pl
You'll note that both the backup directory and the time.pl file are owned by the user who did the copy, in this case root. This situation could have
been avoided if I had remembered to include the p switch to preserve the
original permissions.
Just imagine the nightmare if I had backed up each user's
home directory as the superuser using cp -r; I would have to readjust the
ownership and possibly the permissions of any file that needed to be restored, plus the original file modification times would still be unknown.
If that's still not a big deal to you, consider how I would backup my
entire home directory using cp -r. I do NOT want to do it this way, even
though it seems logical enough:
cd
cp -r . backup
If I do try to do this, my hard drive will churn for an eerily long period
of time before giving me an error message that includes several screens
worth of the word backup and something about the name being too long. This
is because the cp command will go into an endless loop if your
destination happens to be in the same directory or a subdirectory of the
source you are backing up. It will copy backup to backup/backup to
backup/backup/backup and so on until it runs out of space.
So how would I backup my entire home directory? This is where things start to involve a bit more work and I start to get the gnawing suspicion that there has to be an easier way to accomplish this. This will work:
mkdir backup
cp -r .cshrc .history .mailrc .ssh2 .xinitrc articles file ip.c jpegs
lynx_bookmarks.html pdfs perlscripts tricks unix backup/
but will quickly become time-consuming and inconvenient as the number of files in my home directory continues to grow. I could get a bit fancier by coming up with wildcard expressions that represent all of the files and directories in my home directory, but I would still be doing things the hard way.
This is where the concept of archiving and utilities that were designed to do archiving come into play. So what exactly is an archive? It is a file containing a collection of other files in a structure that preserves the contents, permissions, timestamp, owner, group, and pathnames of the original files so they can be reconstructed at a later time. In other words, archiving utilities can copy all of the files and subdirectories within a directory and then recreate that original directory structure without losing any permissions or modification times along the way.
This is actually even more interesting once you realize that there are
devices that don't even know what a filesystem is or how to read a
filesystem hierarchy. We are used to thinking of our files living in a
filesystem hierarchy. For example, my time.pl file is a file that lives in the
perlscripts directory which is a subdirectory of my home directory (dru) which
is a subdirectory of the home directory which is a subdirectory of the /usr
filesystem, or:
/usr/home/dru/perlscripts/time.pl
Any device that can contain a filesystem and therefore understand a filesystem hierarchy is known as a block device. The hard drive that contains your FreeBSD operating system is an example of a block device.
However, there are devices that do not understand what a filesystem
hierarchy is. Consider how a tape device works. When you write data to
a tape, your characters are simply passed to the tape one after the other,
or sequentially. There is no filesystem, or any concept that the file time.pl
belongs within the perlscripts directory. Such devices are known as
character devices and are often called "raw."
Archiving utilities can backup to either a block or character device. The archive file itself contains all of the information required to recreate the original file hierarchy; that information is saved along with your data. This means you can backup your data to a character device such as a tape drive, and then later restore your data to a block device such as your hard drive.
There are several archiving utilities that come with your FreeBSD system.
I will be covering tar, cpio, pax, dd, and dump/restore. Let's
see what the whatis command has to say about each of these utilities:
whatis tar cpio pax dd dump
tar(1) - tape archiver; manipulate tar archive files
cpio(1) - copy files to and from archives
pax(1) - read and write file archives and copy
directory hierarchies
dd(1) - convert and copy a file
dump(8), rdump(8) - filesystem backup
Note that tar, cpio, and pax are considered to be archivers. We'll
see that tar is easiest to use when you want to backup entire directory
structures. In contrast, the cpio utility is the easiest command to use
when you want to pick and choose which files to backup. And the pax command is a combination of both these commands with a bit of added
functionality thrown in.
The dd utility is interesting -- it can actually convert files as it
backs them up. We'll see that this can be invaluable, say, when backing up
files from a PC to a SPARC. Finally, the dump command is designed to
backup an entire filesystem, not just a directory structure.
I want to discuss a few more items, though, before we start using each of these commands. Most of these commands assume that you will be backing up to a SCSI tape drive but will let you change this default with a switch. Even if you don't have a tape drive, it is useful to understand the naming syntax your FreeBSD system uses for tape devices.
Like other Unix systems, FreeBSD stores information regarding devices in
the /dev directory. Let's do a long listing of the first few files in
this directory:
ls -l /dev | head
total 62
drwxr-xr-x 3 root wheel 14336 Mar 17 19:31 .
drwxr-xr-x 18 root wheel 512 Jan 31 19:17 ..
-r-xr-xr-x 1 root wheel 43405 Sep 18 2001 MAKEDEV
-r-xr-xr-x 1 root wheel 2064 Sep 18 2001 MAKEDEV.local
crw-r----- 2 root operator 117, 0 Sep 22 2001 acd0a
crw-r----- 2 root operator 117, 2 Sep 22 2001 acd0c
crw-r----- 2 root operator 117, 8 Sep 22 2001 acd1a
crw-r----- 2 root operator 117, 10 Sep 22 2001 acd1c
crw-r----- 2 root operator 116, 0x00010002 Sep 22 2001 ad0
Notice the difference in the fifth field of that long listing. The first
few files indicate their size in bytes -- for example, the file MAKEDEV is
43405 bytes in size. However, the last five files have a "117," or "116," instead. Note that these files are also character devices; you can
tell this as their file mode is c (just before their permissions).
Directories have a file mode of d and regular files have a file mode of -.
The device files in the /dev directory are really just pointers to a
driver contained in the kernel for the device that each device file represents.
This means that these files are really empty, they are just pointers. The
value in what is normally the size field of ls -l represents a
"major_number,minor_number. " For example, the device file acd1c has a major number of 117 and no minor number. The major number indicates which driver
should be used; the minor number gives any additional information about
the device to the driver.
The MAKEDEV file in this directory is really a shell script used to make the device files. If you want to find out what a device file
refers to, read the comments at the beginning of this file. For example,
to see which devices refer to tape devices, I'll search this file for the
word tape:
more /dev/MAKEDEV
/tape
And I'll find that the following tape drives are supported on my FreeBSD system:
sa SCSI tape driver (formerly called st)
wt QIC-02 or QIC-36 3M cartridge tape
There is also a third type that is supported:
wst ATAPI tape drive on IDE bus
Each of these has an associated man page which you can read if you have one of these tape devices.
If I look for these devices in the /dev directory, I'll note that they
usually come with some additional letters:
ls /dev | grep wt
nrwt0
nrwt0b
nrwt0c
nrwt0d
rwt0
rwt0b
rwt0c
rwt0d
Most tape devices (but not all) will include the letter "r" indicating that they are a "raw" or character device. By default, after you backup to a tape device, it will rewind; meaning your backup will be overwritten if you do another backup to that tape. To prevent this default behavior, use the device that includes the letter "n" for no rewind.
Occasionally, a device will also include an "e," meaning that it will eject the tape once the backup is complete.
The last thing I want to mention in today's article is the difference between absolute and relative pathnames. Since an archiving utility will save the pathname of a file and use that pathname information when recreating the file, it is important to know the difference between the two types of pathnames.
If a pathname begins with a / it means it is an absolute pathname. This is usually considered to be a bad thing in a backup as you will only be
able to restore that file to the original directory it came from, meaning you
will lose any changes you've made to that file since you backed it up.
Even if you are in a different directory when you restore that file, it
will still restore that file to its original location.
If a pathname begins with ./ or no / at all it means it is a relative
pathname. This is usually considered to be a good thing in a backup as the
file can be restored anywhere. You simply cd to the directory you want
to restore the file to, and the archiver will add the current directory to
the pathname as it restores the file.
In next week's article, we'll continue this series by demonstrating how to
use the tar utility.
Dru Lavigne is a network and systems administrator, IT instructor, author and international speaker. She has over a decade of experience administering and teaching Netware, Microsoft, Cisco, Checkpoint, SCO, Solaris, Linux, and BSD systems. A prolific author, she pens the popular FreeBSD Basics column for O'Reilly and is author of BSD Hacks and The Best of FreeBSD Basics.
Read more FreeBSD Basics columns.
Return to the BSD DevCenter.

