User-Mode Linux (UML) is a Linux virtual machine running on Linux that allows you to boot Linux on a "software" machine. These virtual machines can be easily created and destroyed, and allow you do do virtually anything that can be done with a physical system. Because of this, UML has turned out to have a wide variety of uses. In this article, I will talk about an application that has not received anywhere near the attention I think it deserves.
UML virtual machines are nearly identical to physical machines in their behavior, except that they are far more convenient to configure and boot. This makes them ideal for system administrator training and practice. In particular, they are very well-suited for creating admin disasters in order to practice recovering from them. I will be describing the creation of and recovery from three disasters, plus the creation (but not recovery) of a fourth.
To get started, you will need to download UML and install it. Go to
http://user-mode-linux.sourceforge.net/dl-sf.html and grab and
install either the UML RPM or deb, whichever is appropriate for your
system. These will install UML itself, plus a number of utilities.
You will also need a filesystem image to boot UML on. These are
available from the same page. I will be using the Debian root
filesystem in the examples below. If you are too short of bandwidth
to download that one, get the
tomsrtbt filesystem instead.
To help you get used to using UML, I'll start off with a special introductory disaster which I'll make no attempt to recover from. Even if you are an experienced UML user, you'll probably want to follow along because we're going to do something that you've always wanted to do anyway.
We're going to do a
rm -rf / just to see what happens.
So, start UML as follows:
% linux ubd0=cow,root_fs
This tells UML to boot from the
root_fs file with the file
a copy-on-write (COW) layer above it. The file name
cow is arbitrary and generated automatically, so you can change the name as long as you are consistent about it. You'll see the utility of this a bit later. After you uncompress it, your root filesystem is likely named
You can either rename it to
root_fs to follow the instructions below
verbatim or replace
root_fs everywhere with the actual name.
As it boots, take note of a line in the console output that looks like this:
mconsole initialized on /tmp/uml/d4oIw6/mconsole
Now, when it comes up and gives you a login prompt, log in as root (password "root"), and do the following:
usermode:~# cd / usermode:/# rm -rf /
Let it crank for awhile until things break horribly. With the Debian filesystem from the UML site, I ultimately get this:
rm: cannot remove directory '//dev/pty': Directory not empty rm: WARNING: Circular directory structure. This almost certainly means that you have a corrupted file system. NOTIFY YOUR SYSTEM MANAGER. The following two directories have the same inode number: //dev //dev/pts
If you're the morbid type, you might poke around to see what, if
anything, you can still do. You'll need the
bash built-ins because
your favorite utilities are likely to be gone.
When you've had enough of this trashed system, you'll need to shut it
down cleanly. Since
halt won't work, the best way is to use the
uml_mconsole utility to halt it. On the host, run
giving it the directory name that you took careful note of when it was
booting, and tell it to halt UML:
% uml_mconsole d4oIw6 (d4oIw6) halt OK
Now, you get to see why we used the COW file. The damage to the
filesystem is contained entirely within the COW file. The underlying
root_fs file is completely untouched. To see this, you can throw
out the COW file:
% rm cow
and boot UML just as you did before.
% linux ubd0=cow,root_fs
You'll see that it boots fine, and that the filesystem is intact. We'll be using this technique to create disasters without irreversibly damaging the real filesystem.
Now, we'll create a relatively simple disaster and recover from it.
% rm cow % linux ubd0=cow,root_fs
Now, remove the password file and try to halt the machine
usermode:~# rm /etc/passwd usermode:~# halt You don't exist. Go away.
halt doesn't work any more, so we'll shut it down from the
uml_mconsole zJwanV (zJwanV) sysrq u OK (zJwanV) halt OK
sysrq u flushes the filesystems to disk and remounts them
read-only. This will save us an
fsck on the next boot. Boot it again,
this time specifying only the
cow file on the command line:
% linux ubd0=cow
Now, we see how well Linux works without a password file:
Debian GNU/Linux 2.2 usermode ttys/0 usermode login: root Password: Login incorrect
It boots fine, but it's (surprise!) impossible to log in. So, let's
shut this down from the
mconsole again and fix it:
uml_mconsole b9cpus (b9cpus) sysrq u OK (b9cpus) halt OK
We'll boot up only to single-user, and recreate enough of the password file so that root can log in:
% linux ubd0=cow single
Distributions differ on their interpretation of
single. If you
don't get a shell with
single, then try
emergency instead. On my
Debian filesystem, both give me a shell.
/etc/passwd: No such file or directory Give root password for maintenance (or type Control-D for normal startup):
Anything here, including hitting Return, seems to work.
sh-2.03# cat > /etc/passwd sh: /etc/passwd: Read-only file system
Here's the first problem. We need to remount the root filesystem read-write before doing anything else:
sh-2.03# mount / -o remount
OK, back to our regularly scheduled disaster. I use
cat here, but if
you prefer vi, go ahead and use that.
sh-2.03# cat > /etc/passwd root::0:0:root:/root:/bin/bash ^D
So far, so good. Let's do a sanity check to make sure the utilities think the password file is good:
sh-2.03# whoami root
That's fine, so let's continue the boot by exiting the single-user shell:
And now let's see if root can log in:
Debian GNU/Linux 2.2 usermode ttys/0 usermode login: root Last login: Tue Nov 13 18:28:32 2001 on ttys/0 Linux usermode 2.4.13-1um #2 Fri Oct 26 15:42:47 EDT 2001 i686 unknown usermode:~#
Yes, root can log in again. If this had happened on a physical
machine, your next job would be to chase down the most recent backup
tape and restore
/etc/passwd from it.
This time, we're going get rid of
bash, which can't be fixed by
booting into single-user mode.
While writing this article, I discovered a bug in the UML block driver which causes COW files not to work properly when they aren't mounted as the root filesystem. So, we are going to dispense with them for the time being.
no_bash, boot it up, log in, and get rid of
% cp root_fs no_bash
% linux ubd0=no_bash
usermode:~# rm /bin/bash
If the halt hangs, halt UML with the
Let's boot it up again and see how it does without a shell:
It boots very quickly and it's impossible to log in:
INIT: cannot execute "/etc/init.d/rcS" INIT: Entering runlevel: 2 INIT: cannot execute "/etc/init.d/rc" Debian GNU/Linux 2.2 (none) ttys/0 (none) login: root Unable to determine your tty name.
So, we need to shut it down with the
mconsole and figure out how to
We're going to simulate booting from a rescue disk. We're going to do
root_fs as the rescue disk, assigning that to be disk 0, and
moving the damaged filesystem to disk 1:
% linux ubd0=root_fs ubd1=no_bash
So, log in, mount the damaged filesystem on
/mnt and make sure that
bash is missing:
usermode:~# mount /dev/ubd/1 /mnt usermode:~# ls /mnt/bin/bash ls: /mnt/bin/bash: No such file or directory
OK, this is now easy to fix. We can just copy the shell from the rescue disk:
usermode:~# cp -p /bin/bash /mnt/bin/bash usermode:~# ls -l /bin/bash /mnt/bin/bash -rwxr-xr-x 1 root root 461400 Feb 20 2000 /bin/bash -rwxr-xr-x 1 root root 461400 Feb 20 2000 /mnt/bin/bash
Now, you can halt UML and boot it on
no_bash to confirm that it again boots OK.
For our finale, we are going to make a backup of the filesystem and destroy enough of it that fixing it requires restoring the backup. The backup device will be an empty file that's large enough to hold our filesystem:
% dd if=/dev/zero of=backup seek=600 bs=$((1024*1024)) count=1
My filesystem is just over 500MB, so I created a 600MB backup file to
allow for any overhead of the backup format. Replace the
with whatever size is appropriate for you. Now copy
trashed and boot it up with
backup as disk 1.
% cp root_fs trashed % linux ubd0=trashed ubd1=backup
Log in, and make the backup on
/dev/ubd/1. I'm using
tar here. If
you favor a different backup tool, feel free to use it. Notice that
we're not creating a filesystem on this device. It's being used as a
raw data device in exactly the same way as a tape.
If it fails with an I/O error, the backup file you created was too
small. You can extend it by simply running
dd on the file with a
seek argument and retrying the backup.
usermode:~# tar clf /dev/ubd/1 / tar: Removing leading '/' from member names tar: Removing leading '/' from link names
When it's done, we will make "trashed" live up to its name:
usermode:~# rm -rf /bin /lib /usr/lib
Remove anything you like. Feel free to corrupt things, too. When
you're done having fun, shut it down, using the
mconsole, if necessary.
Now, it's time to fix it back up. Boot UML with
root_fs as the
backup as disk 1 again, and
trashed as disk 2:
% linux ubd0=root_fs ubd1=backup ubd2=trashed
Now, log in, mount the damaged filesystem on
cd to it, and restore the backup:
usermode:~# mount /dev/ubd/2 /mnt usermode:~# cd /mnt usermode:/mnt# tar xpf /dev/ubd/1 tar: : Cannot mkdir: No such file or directory tar: Error exit delayed from previous errors
It succeeded, despite the error:
usermode:/mnt# ls bin arch dd fgrep ls pidof run-parts touch ...
Now, you can check that it is fixed by halting UML and booting it on "trashed" again and seeing that it's fine.
Hopefully this article has convinced you that UML can be a valuable system administration tool. I've demonstrated the creation and recovery of a variety of different types of sysadmin catastrophes.
Obviously, this is only a tiny sample of the possible disasters that can happen. You can ensure that you are prepared for them by making them happen and figuring out how to fix them. It is possible to make them happen on a physical machine, but it should be apparent that simulating them with UML is far more convenient, and almost completely authentic. The devices may have different names, but the procedures are exactly the same as on a physical machine.
With the publication of this article, I am inaugurating the Sysadmin Disaster of the Month on the UML web site at http://user-mode-linux.sourceforge.net/sdotm.html. I will present a disaster and take submissions of solutions. I will arbitrarily choose a winner each month based on criteria such as originality, subtlety, brevity, and parsimony. I will also take submissions of proposed disasters. If you have a disaster that you'd like featured, submit it, along with a proposed solution, if you have one.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.