oreilly.comSafari Books Online.Conferences.


Preserving Backward Compatibility
Pages: 1, 2

Planning ahead

The best thing you can do to ensure that you maintain protocol and data format compatibility is to plan ahead by designing your protocols and data formats so that you can add things to them in the future without disturbing prior versions of the code. This means you need to be able to add new elements to your files or data streams that your code can ignore if it doesn't understand them and that new code needs to be able to deal with the absence of the new elements.

Use of XML

The canonical place that Subversion uses this technique is within the XML data formats used in the working copy libraries (for example, the .svn/entries files) and the DAV-based network protocol used by libsvn_ra_dav and mod_dav_svn. I'm not the biggest fan of XML, but it does make it pretty simple to create formats and protocols that can be extended later with a minimum of problems.

Specifically, the use of XML in libsvn_ra_dav has simplified the process of adding parameters to functions in the repository access API. For example, when I added the --limit parameter to the svn log command, I had to transmit that parameter to the server so it could pass on to the libsvn_repos-level log functions. Because the functions in question simply send a report to the server in XML form, all that was required was to add a new element containing the parameter. New servers simply look for the new element, and if it isn't there, they assume it wasn't sent, preserving compatibility with old clients. Old servers ignore the new element because they don't understand it, and the client code simply recognizes the case by noticing when it has received more than the requested number of log entries and ignoring the rest, allowing the new parameter to work even with a server that does not understand it.

Custom protocol design

Of course, you don't need to use XML in order to ensure forward and backward compatibility in your protocols. Subversion's libsvn_ra_svn and svnserve have a custom protocol that uses many of the same tricks you might use in an XML-based protocol. The svn:// protocol sends data across the network encoded in tuples: lists of items that are known to contain certain items. The functions for reading tuples off the wire ignore extra entries in the tuple, so you can add new parameters and old servers will ignore them, just like we did in the DAV-based format.

Additionally, the svn:// protocol includes in its initial handshake a minimum and maximum protocol version and list of capabilities supported by the server and client. Thus, both the client and server have a chance to adjust to the exact version of the protocol being spoken at the other end of the connection. This allowed the addition of pipelining to the protocol shortly before the release of Subversion 1.0 while ensuing that old clients continued to work. See the subversion/libsvn_ra_svn/protocol file for more details on how the svn:// protocol works.

Upgrade paths

For forward-compatible but not backward-compatible changes, what's most important is to provide a smooth update path. There are two main ways of doing this, both of which Subversion has used at various times.

Ease in the change

Long before Subversion hit 1.0, the developers made the decision to change the format used when storing timestamps in the working copy code. The change occurred slowly, over the course of a few releases. First came support for reading the new format, so the code that parses timestamps would try the new format, and if that failed it would try the old format before finally returning an error if that failed. Then, after that code had been out in the wild for a while, libsvn_wc changed so that it wrote out timestamps in the new format. The pre-1.0 policy for upgrades was to ensure compatibility only within a single version. Because the support for reading the new format went in a version before the introduction of support for writing the new format, the project retained that compatibility. Support for the old date format exists to this day in Subversion's timestamp-parsing code, but nothing has written out dates in that format in quite some time.

What's important to keep in mind here is that the slow introduction of change allowed the users the ability to revert from the new version (which produced the new format timestamps) to the previous version (which knew how to read them) on the off chance that they encountered some sort of problem with their upgrade.

Detect the incompatibility and compensate

The addition of UUIDs to Subversion repositories is another example of how to change an on-disk format in a backward-compatible way. Originally Subversion repositories did not have any unique identifier; features like svn switch --relocate were dangerous because you couldn't ensure that both URLs referred to the same repository. To solve the problem, each repository now has a universally unique ID stored in a new database table (because at the time, the only filesystem back end that existed was the Berkeley DB one). To ensure that new code worked with repositories created prior to the addition of this feature, the lack of this table simply caused the function that returns the repository's UUID to create the table itself, seamlessly upgrading the repository without the user ever being aware of it.

The important item to note here is that if you can possibly make the upgrade automatic from the point of view of the user, then you should do so. Avoiding manual steps can be only a good thing.

Dependency problems

One place where it's easy to forget about compatibility problems is in your project's dependencies. Any libraries you link against or external programs you use will each to have their own compatibility issues, just as you will. It's important to be aware of those issues when deciding to make use of a third-party product. In Subversion we've had at least three separate dependencies that cause compatibility problems. Some are internal to Subversion, and some poke through to users from time to time.

Dependencies that show through your API

The most important kind of dependency you need to worry about is one that shows up in your public API. This can happen when you use data types defined in the library as arguments to your library's functions, such as with the Apache Portable Runtime (APR) in Subversion. Any non-backward-compatible change that occurs in the library you depend on will instantly affect you as soon as your users try to upgrade to a new version of the dependency.

When Subversion first hit 1.0, the only released version of APR was from the 0.9.x series of releases. Because Subversion uses APR in almost every part of its public interface, this means that to maintain ABI compatibility, all releases of Subversion within its 1.x branch only officially support the use of APR 0.9.x releases. While Subversion does happen to work with APR 1.0.x, official builds use 0.9.x.

The primary reason Subversion can't use APR 1.0.x is that the size of the data type apr_off_t has changed from off_t (often 32 bits long on a 32-bit system) to long (often 64 bits long on a 32-bit system). This support was necessary for interoperating with programs (Perl, for example) that redefine the size of an off_t via the _FILE_OFFSET_BITS define. Because apr_off_t shows up in the public Subversion API, this change makes versions of Subversion compiled with APR 1.0.x instantly incompatible with versions compiled with APR 0.9.x. Additionally, APR uses a set of compatibility rules that allow it to drop and change parts of its public API between major versions, so any of those kinds of changes will cause similar types of problems as the apr_off_t changes.

The important lesson to learn from this is that as soon as you let into your program a data type defined by your dependency, its compatibility issues instantly become your compatibility issues.

Dependencies that are hidden by your API

An interesting counterexample in Subversion's case is the Neon library, which Subversion uses as its HTTP/WebDAV client library. Neon differs from APR in two ways. First, Neon doesn't make it into Subversion's public interface, so changes to Neon's data types have a harder time making themselves seen to clients of Subversion itself. Second, Neon's interface is far less stable than APR's is. Even APR 0.9.x, despite its pre-1.0 version number, provides a high level of stability in its API. Neon has never professed to do so, with nontrivial changes in its API being reasonably common.

This means that in order to support multiple versions of Neon, Subversion needs to jump through a few hoops. That has happened at least once, with nontrivial amounts of shim code being introduced to libsvn_ra_dav in order to account for changes in the Neon API as a result. This allowed Subversion to function with either the old Neon API or the new one for a reasonable amount of time while users upgraded.

While backflips like the shim code in libsvn_ra_dav ease the burden on its users, they don't solve all the problems. If a program uses Neon directly in its own code as well as the Subversion API's, it's possible for Neon upgrades required by Subversion to break backward compatibility. It's not clear yet the best way to handle this kind of change.

Again, it is important to note that this is a valuable lesson. Once you use a library, its compatibility issues are your compatibility issues.

Dependencies that show through your on-disk formats

The last type of compatibility problem that a third party library can introduce is when the library is responsible for the on-disk format of some of your data, as in the case of Berkeley DB as used by Subversion. Upgrading to a new version of the library can result in unexpected problems if the disk formats are incompatible. This has resulted in significant issues, mainly because Berkeley DB upgrades often require manual intervention, ranging from a full dump/load cycle to a simple recovery. Vendors and distributors often package Berkeley DB so upgrades may occur without the user's conscious action.

There's not much more to say about this kind of compatibility problem other than the fact that the only real solution is education. Users need to understand the issues upgrades can bring, and ideally the problems that result from them need to specify what has gone wrong. Unfortunately, users often feel terror at the sudden inability to access their data, so panic may outweigh education in some cases.

Making Compatibility Decisions

Now that you've learned about the types of compatibility, seen some tricks you can use to help maintain them, and heard about some specific examples of how such problems can occur, it's time to think about your specific application and how these issues apply to you.

First, consider your user base. If you have only a dozen highly technical users, jumping through hoops to maintain backward compatibility may be more trouble than it's worth. On the other hand, if you have hundreds or thousands of nontechnical users who cannot deal with manual upgrade steps, you need to spend a lot of time worrying about those kinds of issues. Otherwise, the first time you break compatibility you'll easily burn through all the goodwill you built up with your users by providing them with a useful program. It's remarkable how easily people forget the good things from a program as soon as they encounter the first real problem.

Next, consider your project. If you don't actually provide a library your users embed in their own application, worrying about API and ABI stability is pointless. Similarly, if you don't store data on disk or send it over the network, the issues associated with those activities are moot. It's rare that a program has no compatibility issues at all, but it's also rare for one to encounter all the issues described in this article.

In Conclusion

Consider again the example of the Subversion project. Subversion's compatibility policy appears in the "Release numbering, compatibility, and deprecation" section of the HACKING file in the top level of its source tree. You can upgrade and downgrade within a single minor release cycle without issue. You can upgrade to new versions in the same major release cycle without issue. When the major version number changes, all bets are off. These rules apply to API/ABI issues, data format issues, and network protocol issues.

Has the project followed this policy? The answer--as is often the case with software engineering--is a qualified yes. Subversion has in one instance added a function in a nonminor release as part of a change to fix a security problem that broke the ability to go back and forth within that specific minor version. The nature of the security problem meant sacrificing compatibility in this particular case.

Other than that, though, the policy has been a success. Users have upgraded to new versions of Subversion without fear. Various versions of the official client and server, and even third-party clients that implement the same protocols, have also enjoyed continued compatibility. The users seem happy with the compatibility promises, and the developers are not overly hampered by them. It isn't always easy, but in my opinion it's been worth it.

All projects need to consider compatibility. The issues are rarely as simple as you might like, and they require serious thought for each project, as no two are the same. Finally, be aware that worrying too much about compatibility can cripple you, so it's important not to place too high a price on it. Only you can determine how high is too high. I hope this article has given you a starting place for making that determination.

Garrett Rooney is a software developer at FactSet Research Systems, where he works on real-time market data.

Return to

Sponsored by: