ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


The New Breed of Version Control Systems

by Shlomi Fish
01/29/2004

A version control system enables developers to keep historical versions of the files under development and to retrieve past versions. It stores version information for every file (and the entire project structure) in a collection normally called a repository.

Inside the repository, several parallel lines of development, normally called branches, may exist. This can be useful to keep a maintenance branch for a stable, released version while still working on the bleeding-edge version. Another option is to open a dedicated branch to work on an experimental feature.

Version control systems also let the user give labels to a snapshot of a branch (often referred to as tags), to ease later extraction. This is useful to signify individual releases or the most recent usable development version.

Using a version control system is an absolute must for a developer of a project above a few hundred lines of code, and even more so for projects involving the collaboration of several developers. Using a good version control system is certainly better than the ad-hoc methods some developers use to maintain various revisions of their code.

Traditionally, the de-facto open source version control system was CVS, but lately many others have emerged that aim to be better in some or every way. This article provides an overview of several alternatives.

Common Features

Version control systems come in all shapes and sizes, but there are common guidelines for their design. Some systems support Atomic Commits, which means that the state of the entire repository changes all at once. Without atomic commits, each file or unit changes separately and so the state of the entire repository at any one point may not be preserved.

Most common VCSs allow merging of changes between branches. This means that changes committed to one branch will be committed to the trunk or another branch as well, with one automatic (or at least semi-automatic) operation.

A distributed version control system allows the cloning of a remote repository, producing an exact copy. It also allows changes to propagate from one repository to another. In non-distributed VCSs, a developer needs repository access in order to commit changes to the repository. That leaves developers without repository access as second-class citizens. With a distributed VCS, this is a non-issue, as each developer can clone the master repository and work on it, later propagating his changes to the master repository.

Another common factor is whether the repository allows versioned file and directory renames (and possibly copies well). If a file changes location, will the repository preserve its history? Can changes applied to the organization of the older files be applied to the new organization?

Of these features, CVS itself supports only merging.

Related Reading

CVS Pocket Reference
By Gregor N. Purdy

CVS

CVS, the Concurrent Versions System, is a mature and relatively reliable version control system. Many large open source projects, including KDE, GNOME, and Mozilla use CVS. Most open source hubs such as SourceForge support it as a service, which as a result caused it to be used by many other projects.

Despite its popularity, CVS has its limitations. For example, it does not support file and directory renaming. Furthermore, binary files are not handled very well. CVS is not distributed and the commits are not atomic. As there are already better alternatives that aim to be a superset of its functionality, you are probably better off starting a new project by using something else.

On the plus side, CVS is extensively documented in its own online book and in many online tutorials. There are also many graphical clients and add-ons available for it.

Subversion

Subversion aims to create a better replacement for CVS. It retains most of the conventions of working with CVS, including a large part of the command set, so CVS users will quickly feel at home. Aside from that Subversion offers many useful improvements over CVS: copies and renames of files and directories, truly atomic commits, efficient handling of binary files, and the ability to be networked over HTTP (and HTTPS). Subversion also has a native Win32 client and server.

Subversion has recently entered its beta period after being alpha for a long time. As such it may still have some minor quirks, and its performance in some areas is lacking. Nevertheless, it's very usable for a beta-stage software, and was so even in a large part of its alpha-stage.

The HTTP (or HTTPS)-based Subversion service is difficult to deploy in comparison to other systems, as it requires setting up an Apache 2 service with its own specialized module. There is also an "svnserve" server that is less capable but easier to set up (and faster) and uses a custom protocol. Moreover, Subversion's support for merging is limited and resembles that of CVS. (i.e., merges to branches where files were moved will not be performed correctly). It is also relatively resource intensive, especially with large operations.

Subversion is extensively documented in the free online book, Version Control with Subversion. The rudimentary online help system supplied by the Subversion client can also prove useful for reference. Subversion has many add-ons, but they are still less mature than their CVS counterparts.

Arch

GNU Arch is a VCS originally created by Tom Lord for his own version control needs, as well of those of other free software projects. Arch was initially prototyped as a collection of shell scripts, but its main client now is tla, which is written in C and should be portable to any UNIX. It has not been ported to Win32; while it is possible to do so, it is not a priority for the project.

Arch is a distributed version control system. It does not require a special service in order to set up a network-accessible repository, and any remote file-service service (such as FTP, SFTP, or WebDAV) is a suitable Arch service. This makes setting up a service incredibly easy.

Arch supports versioned renames of files and directories, as well as intelligent merging that can detect if a file has been renamed and applies the changes cleanly. Arch aims to be superior to CVS, but there are still some individual features missing. Arch is a post-1.0 system and, as such, is declared mature and stable for any use.

Arch is documented with a very basic online help system and a tutorial.

OpenCM

OpenCM is a version control system created for the EROS project. OpenCM does not aim to be as feature-rich as CVS is, but it does have a few advantages. OpenCM has versioned renames of files and directories, atomic commits, automatic propagation of changes from branch to trunk, and some support for cryptographic authentication.

OpenCM uses its own custom protocol for communicating between the client and the server. It is not distributed. Since OpenCM is not very feature-rich, it is possible that other systems will better suit your needs. However, you may prefer using OpenCM if one or more of its features is attractive to you.

OpenCM runs on any UNIX and on Windows under the Cygwin emulation layer. It features a CVS-like command set and is well documented.

Aegis

Aegis is a source configuration management (SCM) system created by Peter Miller. It is not networked, and all operations are done via UNIX file-system operations. As such, it also uses the UNIX permissions system to determine who has permission to perform what operation. Despite the fact that Aegis is not networked, it is still distributed in the sense that repositories can be cloned and changes can be propagated from one repository to the other. Allowing network access requires using a file system such as NFS.

Being an SCM system, Aegis tries to assure the correctness of the code that was checked in. Namely, it:

Its command set reflects this philosophy and is quite tedious if you desire only a plain version control system.

Aegis is documented in several troff documents that are then rendered into PostScript. As such, it is sometimes hard to browse the documentation to find exactly what you want. Still, the documentation is of high quality.

Monotone

The Monotone Version Control System was created by Graydon Hoare, and exhibits a different philosophy than all of the above systems. It is distributed, with changesets propagated to a certain depot that can be a CGI script, an NNTP (Usenet news) receiver, or SMTP (email). From there, each developer pulls the desirable changes into his own copy of the repository.

This may have the unfortunate effect of causing the history or current state of the individual repositories to fall out of sync with each other, as individual repositories do not receive the appropriate changes, or receive inappropriate ones.

Monotone relies heavily on strong cryptography. It identifies files, directories, and revisions by SHA1 checksums. RSA certificates govern repository permissions.

O'Reilly Open Source Convention.

Monotone supports renames and copies of files and directories. It has a command set that aims to be as CVS-compatible as possible, with some necessary deviations due to its different philosophy. It should be portable to Win32, but was not explicitly ported yet.

Monotone is still under development, and may still have some behavioral glitches. The Monotone developers expect to resolve these problems as work continues.

All in all, Monotone holds a lot of promise, and is well worth examining.

BitKeeper

BitKeeper is not an open source version control system, but is listed here for completeness because some open source projects use it. BitKeeper is very reliable and feature-rich, supporting distributed repositories; serving over HTTP, file, and directory copies, and renames; patches management; tracking changes from branch to trunk; and many other features.

BitKeeper comes in two licenses. The commercial license costs a few thousands dollars per seat (lease or buy). The gratis license is available for development of open source software, but has some restrictions, among them a non-compete clause and a requirement to upgrade the system as new versions come out, even if they have a different license. Furthermore, the source code is not publicly available, and binaries exist only for the most common systems, including Win32.

A handful of projects use BitKeeper, including some of the Linux kernel developers and the core MySQL developers. It has been the subject of much controversy in the Linux Kernel Mailing List. Due to its license, BitKeeper is not suitable for open source development, as this will alienate more "idealistic" developers, and impose various problems on the users who choose to use it. If you are working on a non-public project and can afford to pay for BitKeeper, it is naturally an option.

Conclusion

You probably should not use CVS, as there are several better alternatives, unless you cannot get hosting for something else. (Note that GNU Savannah provides hosting for Arch, and there is documentation for using it with SourceForge). You should also not use the free version of BitKeeper because of its restrictions.

Other systems are nicer than CVS and provide a better working experience. When I work in CVS, I always take a long time to think where to place a file or how to name it, because I know I cannot rename it later, without breaking history. This is no problem in other version control systems that support moving or renaming. One project in which I was involved decided to rename their directories and split the entire project history.

And you certainly have a lot of choice.

More Information

An item-by-item comparison of these systems can be found at the Better SCM Site. Rick Moen has a list of Version Control and SCMs for Linux on his web site. Finally, the DMOZ Configuration Management Tools directory provides many other useful links.

Finally, more information about version control systems and configuration management tools can be found in the comp.software.config-mgmt FAQs page.

Shlomi Fish

is a software professional, who has been experimenting with programming since 1987 and with various UNIX technologies since 1996. He graduated from the Technion with a B.Sc. in Electrical Engineering, and has been heavily involved as a Linux and open source user, developer, and advocate.

His most successful project so far was Freecell Solver, but he also headed several other projects, and contributed to other projects such as Perl 5, Subversion, and the GIMP.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.