Making Packager-Friendly Softwareby Julio M. Merino Vidal
A package maintainer, or packager, is a person who creates packages for software projects. He eventually finds common problems in these projects, resulting in a complex packaging process and a final package that is a nightmare to maintain. These little flaws exist because in most cases the original developers are not packagers, so they are not aware of them. In other words, if you do not know something is wrong, you cannot fix it.
This article describes some of these common problems and possible solutions. Consequently, it is of most value to software developers who make their creations publicly available. Keep in mind that any published project will eventually catch a packager's attention; the easier it is to create the package, the sooner someone can package it.
This document can also help package maintainers to show them some problems they may not be aware of. Remember that a task of a good packager is to send bug reports--with appropriate fixes, if possible--to the mainstream developers about any problems that are found. That way, future versions of the program will be easier to maintain. Note that by doing this, they will help not only themselves, but also all other packagers who handle the same piece of software in other operating systems or platforms.
In case you're wondering whether I know what I'm talking about, let me present myself. I have worked for The NetBSD Packages Collection (pkgsrc) since November 2002. During that time, I have done more than 1,600 package updates and created around 200 packages, most of which are related to GNOME; I am the main maintainer of its packages. While doing this, I have repeatedly encountered and fixed the problems described in this article, so I would like to solve them at their root (by the original software developers). I hope this gives you a bit of confidence.
When presenting solutions for the problems described, I have focused on the most popular build infrastructure in the free software world: GNU Autoconf, GNU Automake, and GNU Libtool. However, the ideas outlined here apply to any build infrastructure you can think of.
I would like to thank Ben Collver, Thomas Klausner, and Todd Vierling, all of them pkgsrc developers, due to their suggestions; and in general all other developers of this system for continuously improving its quality.
It's a good idea to be familiar with the following basic terms, which will be used in this article:
Distribution file (distfile, for short)--A file that contains the pristine sources of a program, as published by the original authors. They usually come in the form of a tarball, such as
Packaging system--The infrastructure used to build and/or install packages in a system in their preferred form. This includes the utilities used to generate binary packages (see below) and to handle them on a running system.
Source package--The set of files used to build a binary package from a distribution file. This concept is very clear in, for example, NetBSD's pkgsrc, FreeBSD's ports, or Gentoo's Portage, because it refers to a single directory in the centralized tree holding all packages.
However, this term also applies to other packaging systems that always use binary packages. For example, when talking about Debian packages, it refers to the debian subdirectory included in some distribution files. When talking about RPMs, this alludes to the Source RPM files (SRPMs).
Binary package--A file that provides a program in a ready-to-install manner, usually including prebuilt binaries and possibly providing some scripts to finish its configuration. This is the most common form of packages in Linux distributions, as
.rpmfiles are exactly this.
Package (n.)--Used to refer to a binary package and a source package indistinctly.
Package (v.)--To create a source package from scratch, based on a published distribution file.
Broken package--A package that, due to an unexpected reason, fails to work properly. This can be either because its build fails, because it does not install some expected files, because it cannot be fetched, and so on.
Packager--The person who creates a package.
The Distribution File
The first problems in packaging come from the way that project maintainers create or handle the distfiles. These issues are uncommon, but once you start maintaining an affected package, you are likely to suffer its problems forever (unless you persuade the author to fix them). Here's how you can avert trouble:
Avoid modifying published distfiles. Once you have made a distfile available, never modify it. Even if it includes a stupid bug, don't touch it; instead, publish a new version.
Rationale: Many packaging systems store cryptographic digests of the distfiles they use in the source packages. This helps verify that no third party has modified the package since its creation. If you change a distfile, you will break the package because the digest test will fail. The maintainer has to check why the test fails, to ensure that there are no malicious changes--not an easy task.
Avoid moving published distfiles. Once you have published a distfile and distributed its URL, don't remove it from the server or move it around. If you must do it, it would be nice if you contacted all known package maintainers to let them know this issue.
Rationale: Many source packages download distfiles from their original sites; if the file is moved or removed, the fetch process will fail and the package will be broken. This isn't difficult to fix, but it opens a time window during which people cannot download the package.
Always use versioned distfiles. The distfile's name must always include a version string identifying it, whether a version number or a timestamp. If you want a static name that refers to the latest version, use a symbolic link on your sever pointing to the full name.
Rationale: This is very similar to the modification of published distfiles described above. If you replace a distfile with one containing a new version, you implicitly break the cryptographic digests stored in source packages.
Do not include prebuilt files in your distfile. Be sure that your distfile does not contain prebuilt files that are OS- or architecture-specific. For example, it is erroneous to include a prebuilt object file, but correct to include a Lex-generated C source file.
Rationale: When building on operating systems and/or architectures different from yours, those files will not be built again because the rebuild rules will not fire. They will cause strange errors later, as their format will be incorrect.
Several build tools force developers to include documentation files in their distfiles. For example, GNU Automake checks for the existence of README, NEWS, COPYING, and other files, although it does not check the contents. Unfortunately, many developers create those files to shut up errors but forget to fill them in. Although it's hard to believe, I have found several distfiles without any kind of information, many of which are GNOME core libraries.
Why are these files important? They provide very valuable information to the packager. At the very least, he needs:
Description of the program: Two or three paragraphs are enough. Ideally, this goes at the very beginning of the README file.
Rationale: Source packages usually provide a file with the description of the package. If the packager has to write it without any reference, he may write something inaccurate or forget to say something important.
License: Make clear the license terms under which you have distributed your work. This often manifests itself as a COPYING file in the top-level directory of the source tree, containing a summary of the license that affects all the files in it.
Rationale: It's important to know which restrictions apply to your work when creating a package. A common example is the Sun Java Virtual Machines: we can create a package for them for personal use, but cannot redistribute it later. Plus the source package cannot download them automatically, so the packager has to tell the user how to do it manually.
Changes between versions: You should provide a list of major changes between all the versions you have published. Ideally, this goes in the NEWS file as an enumeration. Note that ChangeLogs are conceptually different, as they detail every change in every source file. Those are useful too, but not as much as a digest of changes between versions.
Rationale: When updating a source package to the latest version, the packager must know which changes happened. Guessing them is very difficult and inaccurate, which will result in updates lacking information (something other packagers dislike). Also keep in mind that this information is very valuable when tracking down bugs in a software project.
If you are using GNU Automake, you can tweak it to bomb out when doing a
make distif the NEWS file is not up to date. Do this by adding the
check-newsflag to the call to
AM_INIT_AUTOMAKE. You might change your
configure.acfile to include the following line:
Note that keeping all this information in a web page is not as useful as including it in the package. Web pages are by nature volatile, so they may become unavailable after some time, especially if the project is abandoned or moved from the original server.
Additionally, please be careful when writing these files. Lots of projects include incomplete notes and are full of typos and incorrect spacing, which denotes that the author does not care about them. These files are usually the first thing the occasional user of your program will examine; if they look sloppy, he will have a bad impression of your project, even if it is coded perfectly.
Pages: 1, 2