ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Big Scary Daemons

CVSup Infrastructure

08/16/2001

I administer a few dozen FreeBSD boxes. On some I have senior administrative duties; on others I'm just called in as needed. A few I can best describe as "being stuck with." Every so often, I go on an upgrade spree and rebuild most of them.

CVSup is very convenient. It also uses a huge amount of CPU time and generates disk activity. While you probably don't care about your disk activity or CPU usage, the mirror maintainers care about theirs. In fact, there's a whole bunch of things that users do that give mirror maintainers headaches. You aren't one of those users, of course. You would never dream of annoying anyone, let alone the kind folks donating thousands of dollars of T1s and high-end servers so you don't have to pay for a commercial operating system that doesn't work the way you want it to.

For those who don't know, CVSup was designed by John Polstra. He's also one of the poor buggers doomed to ride herd over FreeBSD's CVSup mirror operations. When users decide to upgrade their systems via cron every 5 minutes, he's the guy who figures out what to do. If a committer goofs and damages the main FreeBSD source repository, he's one of the guys who gets to break out vi and perform triage. According to Polstra, there's a lot users can do to make the mirror maintainers' lives, and his, easier.

First, he says "Make an effort to balance the load among the mirror sites. Too many people simply (and lazily) use cvsup.freebsd.org." I remember when there were only three CVSup mirrors. Today, there's 83.

Right now, there are 17 mirrors in the United States. Surely one of them is closer to you than poor overloaded cvsup.freebsd.org?

It's difficult to say which mirror is actually closest to you, but you can use ping for a quick-and-dirty check. Generally speaking, lower-numbered mirrors are more heavily loaded. Higher-numbered mirrors have less users, and probably more capacity. In my case, cvsup16 is less than 50 milliseconds away and responds quite snappily.

Most of the problems users cause come from cron. How many people really need to upgrade a system automatically? Do you honestly need the latest source code every night? Maybe you do. I sure don't, and FreeBSD is responsible for a considerable portion of my income.

If you're running CVSup out of cron, do it at a random time. "Don't run it at xx:00, for instance," Polstra says. Load on the mirrors is quite high on the hour. Polstra suggests looking at your watch to see where the second hand points at this particular moment, and use that number of minutes past the hour.

Think about how often you need to update. Are you really going to build FreeBSD from source every hour? If not, why upgrade your source code every hour? Developers need rapid access to changes, of course, but many users don't have to update their source as frequently as they do.

"Don't ever set up a cron job to update more often than hourly," says Polstra. "Many mirror site maintainers will block you if they catch you doing that." It takes a good hour to build world on a fast machine. If that hour-old code was a burger, it'd be so fresh that the cow wouldn't know it was gone yet. Why do you need it fresher?

By default, the output of cron jobs is mailed to root. Read those messages. Polstra reads the cvsupd logs on the mirrors, after all. "Looking at the server logs on the mirror sites, I see many cases where certain users' updates have been failing consistently for weeks. Obviously those users aren't paying attention."

Comment on this articleDo you use CVSup to upgrade your BSD boxen?
Post your comments

Also in Big Scary Daemons:

Running Commercial Linux Software on FreeBSD

Building Detailed Network Reports with Netflow

Visualizing Network Traffic with Netflow and FlowScan

Monitoring Network Traffic with Netflow

Information Security with Colin Percival

When you no longer need the automatic updates, get rid of the cron job. Many people leave cron jobs around forever -- they don't seem to hurt anything. But every mindless automaton pointlessly running CVSup is quite possibly preventing someone else from downloading code they actually need.

Some users update different systems simultaneously. "If you are updating multiple machines, do them one at a time," says Polstra. There's nothing like several connections coming from one block of IPs -- or worse, several connections from behind one NAT IP -- to make it obvious that one person is doing a lot of upgrading.

If dealing with these sorts of things sounds like fun to you, and if you have bandwidth and hardware to spare, you might consider becoming an official mirror site. This would give you access to the master CVSup server and a legitimate reason to upgrade your source code every hour.

Finally, John says, "If you have more than a couple machines to update, set up a local mirror, as described in the fine article by Michael Lucas."

There's no such article, I hear you say? There will be in a couple of weeks. Building a local mirror is easy and straightforward. For a small mirror used by only a few people, you can even put it on a laptop and carry it around with you. It turns out that running a mirror doesn't take that much in the way of hardware or time.

First off, here's how the FreeBSD CVSup mirror system works.

There's a master CVS repository on freefall.freebsd.org. This is the absolutely authoritative source of FreeBSD code. Users cannot, under any circumstances, update their systems from freefall. It's also a hard-working machine. As the authoritative repository, it must check to see if files have been changed by some program other than cvsupd. This is done with stat (2) which, while not particularly expensive used one at a time, devours disk resources when it has to check every single file that has ever been in FreeBSD. Every time someone updates from freefall, disk usage climbs.

The CVSup server on freefall has only one client, cvs-master.freebsd.org. Its purpose is to serve as a main source for official mirrors. "The master mirror is the most efficient by far," says Polstra. "It doesn't have to do much disk I/O [those stat() calls], and it doesn't have to do too much thinking." The files are only touched by cvsupd, and cvsupd knows perfectly well what files it's used. As such, cvs-master can support many main mirrors. Since it's a key part of the FreeBSD infrastructure, however, access to cvs-master is tightly restricted to mirror sites only.

These final mirrors are what us lowly users can access. Mirror machines pull their updates from cvs-master. Because cvs-master updates every 6 minutes, code becomes available on a user mirror not more than 66 minutes after it appears on freefall. That's not bad for a worldwide data distribution system.

If you want to be a part of all this, it's not that hard.

To run an official mirror, you need to first check your hardware. Polstra recommends at least a 400MHz Pentium II, 256MB of RAM, and a good, fast hard disk. This would support 8 to 10 users most of the time, unless there's a recent release. (A release touches every file in the repository, increasing the amount of work needed to update, hence boosting the time required.)

One particular mirror server used to have a single disk, 128MB of RAM, and a PII-400 MHz CPU. It could handle up to eight clients simultaneously, but interactive performance was horrendous when the system was under load. That same system has been upgraded to two Ultra2 SCSI disks -- one for the system files and one for the repository, and 512MB of RAM. Its limit has been upped to 20 clients at a time, and interactive performance is always excellent.

Adding RAM is the best way to increase a CVSup mirror's performance. Updates will happen more quickly -- users will stay connected for a shorter time, so the system load will drop and it can handle more users. This is something of the opposite of the "death spiral" an overloaded machine can suffer from.

Second, you need to check your bandwidth. One mirror chosen at random transferred about 1.1 megabits of traffic out and a quarter-megabit in during a randomly-chosen day. You can't quite serve this over an ISDN line, but a T1 has plenty of capacity to spare.

If you have the hardware and the bandwidth to donate, contact Polstra at jdp@freebsd.org. Include your location, as he tries to balance mirrors by geography. If your area is short on mirrors and you fit his other requirements, he might take you up on it. John chooses the mirror sites carefully, balancing the needs against the locations offered and the capacity. "I would be unlikely to allow 30 mirror sites in, say, Iceland," he says.

Next time, we'll look at the software involved in setting up a mirror. Hopefully, we can force John to soon change the naming scheme on the mirrors to something like cvsup14.yourstatehere.freebsd.org. And even if we can't do that, you can set up a local mirror to meet the needs of your organization.

Michael W. Lucas


Read more Big Scary Daemons columns.

Return to the BSD DevCenter.

Copyright © 2009 O'Reilly Media, Inc.