Building the PostgreSQL BuildFarmby Andrew Dunstan
PostgreSQL is a high-quality, full-featured open source database that runs on a huge number of platforms. The number of possible combinations of machine architecture, operating system, system libraries, compilers, and configuration sets is truly staggering. It also has just increased significantly with the recent release of the 8.0 version, which supports Windows natively for the first time. One of the problems that the PostgreSQL project faces, as many other similar projects do, is how to know whether some change has broken things on some platform. We don't have the resources to run every possible combination, nor even a tiny proportion of them. On several occasions it has happened that breakage only became apparent some time after a change went in. We created PostgreSQL BuildFarm to address that difficulty.
What's a Build Farm?
Because there was no way to create a testing laboratory with all the requisite hardware and software for the job, I decided instead to take a leaf out of the book of the Samba project, which has similar needs, and leverage the resources of some of our users and developers. The Samba Build Farm has operated for quite a while, and with good success. Our build farm has two parts: a central server that collects and processes build results, and member machines spread out across the world that run builds and report the results.
I started out building the client with two files, a script file and a configuration file. The script is written in Perl. There are several reasons for that. One is that I write a lot of Perl and I'm quite comfortable doing so. Another is that PostgreSQL uses Perl to build when pulling the source from its CVS repository, so I knew (or thought I knew) that any platform that could build PostgreSQL could run the build farm client. Because the script is Perl, it seemed easiest to make the config file Perl too. Essentially it constructs a Perl hash that contains all the configuration data--like where to get the source code and how, the credentials for the particular farm member, and so on.
The process runs something like this:
- Check out the source code.
- See what files have changed since the last build run.
- If nothing has changed, stop.
- Make a temporary copy of the source.
- Configure the build, using a temporary installation location and TCP port.
- Make the distribution.
- Run the PostgreSQL regression suite.
- Make the optional contrib modules.
- Install the distribution.
- Initialize a database location and start the
- Run the regression suite again, this time using the installed server.
- Install the contrib modules.
- Run the regression suites for the contrib modules.
- Report success.
If any step fails, further processing stops and the program reports the failure along with all the logs relevant to the failed step.
The script has many options, most of which are useful only in debugging your setup. In normal use, it usually requires only the CVS branch name to test as a command-line parameter.
What's It Take?
The bandwidth requirements are quite modest. I have found that the best way by far is to run my own CVS mirror of the PostgreSQL sources, using the fine CVSup package. This took me an hour or two to set up, but it made checkouts lightning-fast. On the mirror machine, I sync with the PostgreSQL server every 15 minutes via a cron job. I have not had the slightest trouble with it since I set this up, and I have four client machines on my home network that all use the mirror--in one case locally and in the other three cases by way of a pserver I set up on the mirror machine.
The other useful efficiency gain I found was to use the Samba project's wonderful ccache package. Some of my machines are old and slow. In one case I found that using ccache reduced the time for a build farm script run from around an hour to about 15 minutes. This application is really a poster child for ccache.
While you don't need lots of bandwidth or processing power to run the build farm, you do need lots of disk space. With four branches supported, one of my members uses 433MB with no builds running. Each concurrent build can take another several hundred megabytes, although this is a requirement only during a run. Given the current price of disk storage, requiring lots of disk space for the build farm seems reasonable. If disk space is at a premium in a given system, you can reduce the requirements by not using ccache and not running more than one build at a time. Even then, usage can climb transiently to around 500MB for a single branch.
How Does It Work with Users?
Along the way to creating the build farm client, I encountered a few pitfalls.
For example, having two instances running for the same branch using the same
build root directory is not a good idea--the design doesn't really allow for
that. The solution is to use a classic Unix-style lock file. If the
script cannot obtain an exclusive lock on the file (in
mode), it exits immediately. Another was that CVS operations can end up hanging
if the CVS server has problems. The script therefore has optional code to force
a time-out on CVS operations.
Once I had a client that would build and report its results (at that stage, it reported to the screen rather than to a server), I started to create the server. Obviously it needed a data store, and naturally it uses PostgreSQL. Then I had to decide on a transport mechanism. I decided on a simple HTTP/CGI transaction for several reasons. First, even when machines are behind firewalls, they most often have the ability to make outbound HTTP calls on port 80, even if everything else is blocked. Second, using CGI rather than a more structured system like SOAP meant that I needed fewer libraries on the client side. I (somewhat naively) thought that all modern Perl installations would have the LWP library and its friends. I constructed a very simple datastore consisting of two tables and a simple CGI script to accept results from a member and store them in the database. The CGI script is Perl, using the very standard CGI module to parse the query and the DBI and DBD::Pg modules to talk to the database.
One early design decision was that not just anyone could join the build farm. People would have to apply and receive approval. There are several reasons for this. It is important that members be reachable in case of errors. If your platform is having problems, we want you to be active in fixing it, as you might have the only such platform we know of, or indeed in existence. Preferably, you will already be a member of the PostgreSQL community and be known to me or someone else in that community, but I have accepted a few members I don't know. Lastly, we want coverage and diversity rather than mere numbers. There is no virtue in our having 100 identical FooLinux members--all it means is that we have more information to wade through. So you have to register, and later you will receive by email or other means a name and a secret. The name is an alias to identify you (all are animal names), and the secret is a signing key for any results you send to the server. Each request is signed with an SHA1 digest of the combination of the request and the secret. The server performs the same calculation and discards the results if the two don't match.
After a little experimentation, the database looked like this:
Table "public.buildsystems" Column | Type | Modifiers ------------------+------+----------- name | text | not null secret | text | not null operating_system | text | not null os_version | text | not null compiler | text | not null compiler_version | text | not null architecture | text | not null status | text | not null sys_owner | text | not null owner_email | text | not null Indexes: "buildsystems_pkey" primary key, btree (name) Table "public.build_status" Column | Type | Modifiers -----------------------+-----------------------------+----------- sysname | text | not null snapshot | timestamp without time zone | not null status | integer | stage | text | log | text | conf_sum | text | branch | text | changed_this_run | text | changed_since_success | text | Indexes: "build_status_pkey" primary key, btree (sysname, snapshot) "bs_branch_snapshot_idx" btree (branch, snapshot) "bs_sysname_branch_idx" btree (sysname, branch) Foreign-key constraints: "bs_fk" FOREIGN KEY (sysname) REFERENCES buildsystems(name) ON UPDATE CASCADE ON DELETE CASCADE
text type is an (almost) arbitrarily large character type
in PostgreSQL and several other databases. I chose this type for most of the
fields so as not to restrict the data unduly while learning exactly what sorts
of data we would need. It is still the early days of the build farm. My
intention is to standardize some of the fields in due course, especially
status field in
build_status has proved fairly
useless--it is the exit code returned by the stage that was last run. At some
stage, we'll drop it. The
snapshot is the time at which the
program took a CVS snapshot of the branch in question. The
conf_sum and especially
log fields can grow quite
large. They contain the log for the stage that failed, if any, and the
configuration settings and detected machine information for the build farm
Once I had the database in place and a script to add status reports to the
database via a web transaction, I modified the client code to call the web
transaction instead of reporting to the screen. Then I needed two more pieces:
registration and display of status. Registration required one small static HTML
page with a form and a tiny CGI script. Each person's application is put in the
database with a status of
pending, and they cannot upload results
until they receive a name and secret. Then the status changes to
approved. I currently do this manually by logging in to the
database and calling a stored procedure. Lastly, there are four pages that display the results. Three are summaries, and one displays all the information
about a particular build. They are all written in Perl and all use Andy
Wardley's superb Template
Toolkit package for handling the presentation. Michael Glaesemann then very
kindly designed a nice skin for the site for me--and that brings us to the
state the build farm is in today.
How Do You Use It?
The client code is available at pgFoundry, the PostgreSQL
community development site, as is the CVS repository, so anyone
can download it and play around. You don't need to be an approved member;
there's a special
nosend mode that does everything except actually
upload results, and a couple of levels of verbosity that let you see what's
The client code, by design, runs unattended from cron. Here are the relevant cron entries from one of my machines:
25 * * * * cd /home/andrew/pgbf && ./run_build.pl 55 1 * * * cd /home/andrew/pgbf && ./run_build.pl REL8_0_STABLE 55 3 * * * cd /home/andrew/pgbf && ./run_build.pl REL7_4_STABLE 55 5 * * * cd /home/andrew/pgbf && ./run_build.pl REL7_3_STABLE 55 7 * * * cd /home/andrew/pgbf && ./run_build.pl REL7_2_STABLE
This means I try to build the
HEAD branch (the main development
branch) from CVS every hour if there have been changes. I check stable
branches once a day.
One of the biggest and most frustrating challenges has been to make the
client work on Windows. The only approved build environment for the new Windows
port is the package sets from the Mingw
project. These do some very cool things but have posed a couple of problems
for build farms. The biggest problem is that it involves a virtual environment,
in which you can refer to Unix-like paths such as /usr/local that do
not in fact exist. That is fine while you stay within the environment, but step
outside it and try to use them, and you'll have trouble. The Mingw DTK does have
a version of Perl, so you can run it from inside Mingw. Unfortunately, that
Perl does not come with the packages required to do the web transaction. It
can do all the rest of the build farm process just fine, but not that part. My
attempts to add those modules onto the Mingw Perl failed miserably, as did my
attempts to convince a non-Mingw Perl to run the pre-web transaction part of
the build farm. The Solomonic solution has been to split the script in two. The
Mingw Perl runs the build, collects the results, and calls the Perl Data::Dumper module to
serialize the data, which it prints to a file. It then calls another script
that reads that file, calls Perl's
eval on the contents,
and then performs the web transaction. The second script runs standard Windows
Perl and doesn't need to know anything of the Mingw paths. This solution is
now in the latest release of the client code. The code for Unix is identical;
you just use the same Perl interpreter for both pieces.
There is plenty more work to go. Things under consideration include:
- Sorting, searching, and filtering status information
- A web-based approval process
- Autosubscribing approved members to a mailing list
- Statistical reporting
- Email alerts of state changes
- More sensitive triggering of builds--building only when really needed
- The ability to test patches not yet committed to main code
- An extra build farm-only test suite
- Stack traces from core dumps
- RSS and other forms of output
- Polishing up web code and committing it to CVS
We also need to broaden the coverage of the build farm, now that it has proved to be viable.
The PostgreSQL BuildFarm is a classic case of distributed computing. We have a task that needs to occur over and on a large number of different platforms. Rather than try to assemble them all, we use instances of such platforms in situ, and their owners devote a few spare computing cycles to the task. It demonstrates the power of the open source community. It also provides a practical way that many users can contribute back to the project, even if they are not developers.
The PostgreSQL BuildFarm server runs on a equipment and bandwidth provided by the fine people at CommandPrompt, for which they have my thanks and those of the PostgreSQL community.
Andrew Dunstan works for a small consulting and software company in the Triangle area of North Carolina, and contributes to PostgreSQL as an enthusiastic hobbyist as well as a sometime professional user.
Return to ONLamp.com.