Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Managing Disk Space with LVM

by Bryce Harrington and Kees Cook
04/27/2006

The Linux Logical Volume Manager (LVM) is a mechanism for virtualizing disks. It can create "virtual" disk partitions out of one or more physical hard drives, allowing you to grow, shrink, or move those partitions from drive to drive as your needs change. It also allows you to create larger partitions than you could achieve with a single drive.

Traditional uses of LVM have included databases and company file servers, but even home users may want large partitions for music or video collections, or for storing online backups. LVM and RAID 1 can also be convenient ways to gain redundancy without sacrificing flexibility.

This article looks first at a basic file server, then explains some variations on that theme, including adding redundancy with RAID 1 and some things to consider when using LVM for desktop machines.

LVM Installation

An operational LVM system includes both a kernel filesystem component and userspace utilities. To turn on the kernel component, set up the kernel options as follows:

Device Drivers --> Multi-device support (RAID and LVM)

    [*] Multiple devices driver support (RAID and LVM)
    < >   RAID support
    <*>   Device mapper support
    < >     Crypt target support (NEW)

You can usually install the LVM user tools through your Linux distro's packaging system. In Gentoo, the LVM user tools are part of the lvm2 package. Note that you may see tools for LVM-1 as well (perhaps named lvm-user). It doesn't hurt to have both installed, but make sure you have the LVM-2 tools.

LVM Basics

To use LVM, you must understand several elements. First are the regular physical hard drives attached to the computer. The disk space on these devices is chopped up into partitions. Finally, a filesystem is written directly to a partition. By comparison, in LVM, Volume Groups (VGs) are split up into logical volumes (LVs), where the filesystems ultimately reside (Figure 1).

Each VG is made up of a pool of Physical Volumes (PVs). You can extend (or reduce) the size of a Volume Group by adding or removing as many PVs as you wish, provided there are enough PVs remaining to store the contents of all the allocated LVs. As long as there is available space in the VG, you can also grow and shrink the size of your LVs at will (although most filesystems don't like to shrink).

Thumbnail, click for full-size image.
Figure 1. An example LVM layout (Click to view larger image)

Example: A Basic File Server

A simple, practical example of LVM use is a traditional file server, which provides centralized backup, storage space for media files, and shared file space for several family members' computers. Flexibility is a key requirement; who knows what storage challenges next year's technology will bring?

For example, suppose your requirements are:

400G  - Large media file storage
 50G  - Online backups of two laptops and three desktops (10G each)
 10G  - Shared files

Ultimately, these requirements may increase a great deal over the next year or two, but exactly how much and which partition will grow the most are still unknown.

Disk Hardware

Traditionally, a file server uses SCSI disks, but today SATA disks offer an attractive combination of speed and low cost. At the time of this writing, 250 GB SATA drives are commonly available for around $100; for a terabyte, the cost is around $400.

SATA drives are not named like ATA drives (hda, hdb), but like SCSI (sda, sdb). Once the system has booted with SATA support, it has four physical devices to work with:

/dev/sda  251.0 GB
/dev/sdb  251.0 GB
/dev/sdc  251.0 GB
/dev/sdd  251.0 GB

Next, partition these for use with LVM. You can do this with fdisk by specifying the "Linux LVM" partition type 8e. The finished product looks like this:

# fdisk -l /dev/sdd

Disk /dev/sdd: 251.0 GB, 251000193024 bytes
255 heads, 63 sectors/track, 30515 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device            Start   End      Blocks      Id  System
/dev/sdd1         1       30515    245111706   8e  Linux LVM

Notice the partition type is 8e, or "Linux LVM."

Creating a Virtual Volume

Initialize each of the disks using the pvcreate command:

# pvcreate /dev/sda /dev/sdb /dev/sdc /dev/sdd

This sets up all the partitions on these drives for use under LVM, allowing creation of volume groups. To examine available PVs, use the pvdisplay command. This system will use a single-volume group named datavg:

# vgcreate datavg /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

Use vgdisplay to see the newly created datavg VG with the four drives stitched together. Now create the logical volumes within them:

# lvcreate --name medialv  --size 400G
# lvcreate --name backuplv --size  50G
# lvcreate --name sharelv  --size  10G

Without LVM, you might allocate all available disk space to the partitions you're creating, but with LVM, it is worthwhile to be conservative, allocating only half the available space to the current requirements. As a general rule, it's easier to grow a filesystem than to shrink it, so it's a good strategy to allocate exactly what you need today, and leave the remaining space unallocated until your needs become clearer. This method also gives you the option of creating new volumes when new needs arise (such as a separate encrypted file share for sensitive data). To examine these volumes, use the lvdisplay command.

Now you have several nicely named logical volumes at your disposal:

/dev/datavg/backuplv     (also /dev/mapper/datavg-backuplv)
/dev/datavg/medialv      (also /dev/mapper/datavg-medialv)
/dev/datavg/sharelv      (also /dev/mapper/datavg-sharelv)
Understanding the Linux Kernel

Related Reading

Understanding the Linux Kernel
By Daniel P. Bovet, Marco Cesati

Selecting Filesystems

Now that the devices are created, the next step is to put filesystems on them. However, there are many types of filesystems. How do you choose?

For typical desktop filesystems, you're probably familiar with ext2 and ext3. ext2 was the standard, reliable workhorse for Linux systems in years past. ext3 is an upgrade for ext2 that provides journaling, a mechanism to speed up filesystem checks after a crash. ext3's balance of performance, robustness, and recovery speed makes it a fine choice for general purpose use. Because ext2 and ext3 have been the defaults for such a long time, ext3 is also a good choice if you want great reliability. For storing backups, reliability is much more important than speed. The major downside to ext2/ext3 is that to grow (or shrink) the filesystem, you must first unmount it.

However, other filesystems provide advantages in certain situations, such as large file sizes, large quantities of files, or on-the-fly filesystem growth. Because LVM's primary use is for scenarios where you need extreme numbers of files, extremely large files, and/or the need to resize your filesystems, the following filesystems are well worth considering.

For large numbers of small files, ReiserFS is an excellent choice. For raw, uncached file I/O, it ranks at the top of most benchmarks, and can be as much as an order of magnitude faster than ext3. Historically, however, it has not proven as robust as ext3. It's been tested enough lately that this may no longer be a significant issue, but keep it in mind.

If you are designing a file server that will contain large files, such as video files recorded by MythTV, then delete speed could be a priority. With ext3 or ReiserFS, your deletes may take several seconds to complete as the filesystem works to mark all of the freed data blocks. If your system is recording or processing video at the same time, this delay could cause dropped frames or other glitches. JFS and XFS are better choices in this situation, although XFS has the edge due to greater reliability and better general performance.

With all these considerations in mind, format the partitions as follows:

# mkfs.ext3 /dev/datavg/backuplv
# mkfs.xfs /dev/datavg/medialv
# mkfs.reiserfs /dev/datavg/sharelv

Mounting

Finally, to mount the file systems, first add the following lines to /etc/fstab:

/dev/datavg/backuplv   /var/backup     ext3       rw,noatime    0 0
/dev/datavg/medialv    /var/media      xfs        rw,noatime    0 0
/dev/datavg/sharelv    /var/share      reiserfs   rw,noatime    0 0

and then establish and activate the mount points:

# mkdir /var/media /var/backup /var/share
# mount /var/media /var/backup /var/share

Now your basic file server is ready for service.

Adding Reliability With RAID

So far, this LVM example has been reasonably straightforward. However, it has one major flaw: if any of your drives fail, all of your data is at risk! Half a terabyte is not an insignificant amount to back up, so this is an extremely serious weakness in the design.

To compensate for this risk, build redundancy into the design using RAID 1. RAID, which stands for Redundant Array of Independent Disks, is a low-level technology for combining disks together in various ways, called RAID levels. The RAID 1 design mirrors data across two (or more) disks. In addition to doubling the reliability, RAID 1 adds performance benefits for reads because both drives have the same data, and read operations can be split between them.

Unfortunately, these benefits do not come without a critical cost: the storage size is cut in half. The good news is that half a terabyte is still enough for the present space requirements, and LVM gives the flexibility to add more or larger disks later.

With four drives, RAID 5 is another option. It restores some of the disk space but adds even more complexity. Also, it performs well with reads but poorly with writes. Because hard drives are reasonably cheap, RAID 5's benefits aren't worth the trouble for this example.

Although it would have made more sense to start with a RAID, we waited until now to introduce them so we could demonstrate how to migrate from raw disks to RAID disks without needing to unmount any of the filesystems.

In the end, this design will combine the four drives into two RAID 1 pairs: /dev/sda + /dev/sdd and /dev/sdb + /dev/sdc. The reason for this particular arrangement is that sda and sdd are the primary and secondary drives on separate controllers; this way, if a controller were to die, you could still access the two drives on the alternate controller. When the primary/secondary pairs are used, the relative access speeds are balanced so neither RAID array is slower than the other. There may also be a performance benefit to having accesses evenly distributed across both controllers.

First, pull two of the SATA drives (sdb and sdd) out of the datavg VG:

# modprobe dm-mirror 
# pvmove /dev/sdb1 /dev/sda1 
# pvmove /dev/sdd1 /dev/sdc1 
# vgreduce datavg /dev/sdb1 /dev/sdd1 
# pvremove /dev/sdb1 /dev/sdd1

Then, change the partition type on these two drives, using filesystem type fd (Linux raid autodetect):

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       30515    245111706  fd  Linux raid autodetect

Now, build the RAID 1 mirrors, telling md that the "other half" of the mirrors are missing (because they're not ready to be added to the RAID yet):

# mdadm --create /dev/md0 -a -l 1 -n 2 /dev/sdd1 missing
# mdadm --create /dev/md1 -a -l 1 -n 2 /dev/sdb1 missing

Add these broken mirrors to the LVM:

# pvcreate /dev/md0 /dev/md1
# vgextend datavg /dev/md0 /dev/md1

Next, migrate off of the raw disks onto the broken mirrors:

# pvmove /dev/sda1 /dev/md0 
# pvmove /dev/sdc1 /dev/md1 
# vgreduce datavg /dev/sda1 /dev/sdc1 
# pvremove /dev/sda1 /dev/sdc1

Finally, change the partition types of the raw disks to fd, and get the broken mirrors on their feet with full mirroring:

# fdisk /dev/sda1
# fdisk /dev/sdc1
# mdadm --manage /dev/md0 --add /dev/sda1
# mdadm --manage /dev/md1 --add /dev/sdc1

That's quite a few steps, but this full RAID 1 setup protects the LVM system without having to reinstall, copy or remount filesystems, or reboot.

Network Access of Files

A file server isn't much use if you can't get files off of it. There are many ways to serve files, but the most common and powerful is Network File System (NFS). NFS allows other *nix machines to mount the file shares for direct use. It's also pretty easy to set up on Linux.

First, make sure the file server has NFS enabled in the kernel (2.6.15 in this example):

File systems
 Network File Systems

 <*> NFS file system support
  [*]   Provide NFSv3 client support
  <*> NFS server support
  [*]   Provide NFSv3 server support

Rebuild and reinstall the kernel and then reboot the file server. If you'd like to avoid rebooting, build NFS as a module and then load it with modprobe nfsd.

Next, start the NFS service. Your Linux distro will have an init script to do this. For instance, on Gentoo, you'll see:

/etc/init.d/nfs start 
 * Starting portmap ...      [ ok ]
 * Mounting RPC pipefs ...   [ ok ]
 * Starting NFS statd ...    [ ok ]
 * Starting NFS daemon ...   [ ok ]
 * Starting NFS mountd ...   [ ok ]

You can double-check that NFS is running by querying portmapper with the command rpcinfo -p | grep nfs:

program  vers proto port  service
100003    2   udp   2049  nfs
100003    3   udp   2049  nfs
100003    2   tcp   2049  nfs
100003    3   tcp   2049  nfs

Next, you must specify which directories the NFS service should export. Add the following to /etc/exports:

/var/backup    192.168.0.0/24(rw,sync)
/var/media     192.168.0.0/24(rw,sync)
/var/share     192.168.0.0/24(rw,sync)

This lists the directories to share, the machines (or networks) to permit to mount the files, and a set of options to control how the sharing works. The options include rw to allow read-write mounts and sync to force synchronous behavior. sync prevents data corruption if the server reboots in the middle of a file write, but sacrifices the performance advantages that async would provide.

Next, export these file shares from the NFS service:

# exportfs -av
exporting 192.168.0.0/24:/var/backup
exporting 192.168.0.0/24:/var/media
exporting 192.168.0.0/24:/var/share

Now, mount these file shares on each machine that will use them. Assuming the file server is named fileserv, add the following lines to the client machines' /etc/fstab files:

# Device               mountpoint    fs-type   options    dump  fsckorder
fileserv:/var/backup   /var/backup   nfs       defaults   0     0
fileserv:/var/media    /var/media    nfs       defaults   0     0
fileserv:/var/share    /var/share    nfs       defaults   0     0

Finally, create the mountpoints and mount the new shares:

# mkdir /var/backup /var/media /var/share
# mount /var/backup /var/media /var/share

Now all the machines on your network have access to large, reliable, and expandable disk space!

Backup Strategies

As you rely more heavily on this new LVM-enabled disk space, you may have concerns about backing it up. Using RAID ensures against basic disk failures, but gives you no protection in the case of fire, theft, or accidental deletion of important files.

Traditionally, tape drives are used for backups of this class. This option is still viable and has several advantages, but it can be an expensive and slow solution for a system of this size. Fortunately, there other options using today's technology.

rsync is a powerful utility for copying files from one system to another, and it works well across the Internet. You could set up a backup system at a friend's house in a different city and arrange to periodically send backups there. This is easy to do with cronjob:

04 4 * * 4  rsync --delete -a /var/backup/ fileserv.myfriend.org:/backup/myself/backup \
    > /var/log/crontab.backup.log 2>&1

Another approach is to attach a pair of external RAID 1 hard drives to your file server using Firewire, USB, or eSATA. Add one drive to /dev/md0 and the other to /dev/md1. Once the mirroring is complete, remove the drives and store them in a safe place offsite. Re-mirror weekly or monthly, depending on your needs.

Growth and Reallocation

Suppose that over the next year, the storage system fills up and needs to be expanded. Initially, you can begin allocating the unallocated space. For instance, to increase the amount of space available for shared files from 10GB to 15GB, run a command such as:

# lvextend -L15G /dev/datavg/sharelv
# resize_reiserfs /dev/datavg/sharelv

But over time, all the unallocated disk space will be used. One solution is to replace the four 250G drives with larger 800G ones.

In the case where you use RAID 1, migration is straightforward. Use mdadm to mark one drive of each of the RAID 1 mirrors as failed, and then remove them:

# mdadm --manage /dev/md0 --fail /dev/sda1
# mdadm --manage /dev/md0 --remove /dev/sda1
# mdadm --manage /dev/md0 --fail /dev/sdc1
# mdadm --manage /dev/md0 --remove /dev/sdc1

Pull out the sda and sdc hard drives and replace them with two of the new 800G drives. Split each 800G drive into a 250G partition and a 550G partition using fdisk, and add the partitions back to md0 and md1:

# fdisk /dev/sda
# fdisk /dev/sdc
# mdadm --manage /dev/md0 --add /dev/sda1
# mdadm --manage /dev/md1 --add /dev/sdc1

Repeat the above process with sdd and sdb to move them to the other two new drives, then create a third and fourth RAID device, md2 and md3, using the new space:

# mdadm --create /dev/md2 -a -l 1 -n 2 /dev/sda2 /dev/sdd2
# mdadm --create /dev/md3 -a -l 1 -n 2 /dev/sdb2 /dev/sdc2

Finally, add these to LVM:

# pvcreate /dev/md2 /dev/md3
# vgextend datavg /dev/md2 /dev/md3

The file server now has 1.6TB of fully redundant storage.

LVM and Desktops

So far, we've talked only about LVM and RAID for secondary disk space via a standalone file server, but what if you want to use LVM to manage the space on a regular desktop system? It can work, but there are some considerations to take into account.

First, the installation and upgrade procedures for some Linux distributions don't handle RAID or LVM, which may present complications. Many of today's distros do support it, and even provide tools to assist in creating and managing them, so check this first.

Second, having the root filesystem on LVM can complicate recovery of damaged file systems. Because boot loaders don't support LVM yet, you must also have a non-LVM /boot partition (though it can be on a RAID 1 device).

Third, you need some spare unallocated disk space for the new LVM partition. If you don't have this, use parted to shrink your existing root partition, as described in the LVM HOWTO.

For this example, assume you have your swap space and /boot partitions already set up outside of LVM on their own partitions. You can focus on moving your root filesystem onto a new LVM partition in the partition /dev/hda4. Check that the filesystem type on hda4 is LVM (type 8e).

Initialize LVM and create a new physical volume:

# vgscan
# pvcreate /dev/hda4
# vgcreate rootvg /dev/hda4

Now create a 5G logical volume, formatted into an xfs file system:

# lvcreate rootvg ---name rootlv -size 5G
# mkfs.xfs /dev/rootvg/rootlv

Copy the files from the existing root file system to the new LVM one:

# mkdir /mnt/new_root
# mount /dev/rootvg/rootlv /mnt/new_root
# cp -ax /. /mnt/new_root/

Next, modify /etc/fstab to mount / on /dev/rootvg/root instead of /dev/hda3.

The trickiest part is to rebuild your initrd to include LVM support. This tends to be distro-specific, but look for mkinitrd or yaird. Your initrd image must have the LVM modules loaded or the root filesystem will not be available. To be safe, leave your original initrd image alone and make a new one named, for example, /boot/initrd-lvm.img.

Finally, update your bootloader. Add a new section for your new root filesystem, duplicating your original boot stanza. In the new copy, change the root from /dev/hda3 to /dev/rootvg/rootlv, and change your initrd to the newly built one. If you use lilo, be sure to run lilo once you've made the changes. For example, with grub, if you have:

title=Linux
  root (hd0,0)
  kernel /vmlinuz root=/dev/hda3 ro single
  initrd /initrd.img

add a new section such as:

title=LinuxLVM

  root (hd0,0)
  kernel /vmlinuz root=/dev/rootvg/root ro single
  initrd /initrd-lvm.img

Conclusion

LVM is only one of many enterprise technologies in the Linux kernel that has become available for regular users. LVM provides a great deal of flexibility with disk space, and combined with RAID 1, NFS, and a good backup strategy, you can build a bulletproof, easily managed way to store, share, and preserve any quantity of files.

Bryce Harrington is a Senior Performance Engineer at the Open Source Development Labs in Beaverton, Oregon.

Kees Cook is the senior network administrator at OSDL.


Return to the Linux DevCenter.

Copyright © 2009 O'Reilly Media, Inc.