Developer Interview: Christoph Reichenbach and Lars Skovlund on FreeSCI
O'Reilly Network: What are the unique programming challenges in reverse engineering a game engine that you've experienced?
Skovlund: The fact that we don't have any exact specifications makes it difficult to determine how closely you need to follow the original code or even the purpose of a particular piece of code. Often you have to see an engine feature being used by a game before you can start making assumptions about it. You often have to make strict assumptions about a particular feature, which can be relaxed later, but only after your game engine is finished enough to run a game that uses it.
The little things that change between versions are another problem. I have a collection of 15 game interpreters just for the supported range of games. Often the changes between each version are just bug fixes, but there are sometimes small differences in the way things behave which don't become clear until someone notices a graphical glitch (or whatever) while playing a game.
Reichenbach: While one of the core motivations of such a project is, of course, to solve a problem created by the lack of the free availability of the original system's source code, it is easy to forget about portability and just focus on getting the reimplementation up and running on your personal platform of choice. The first versions of FreeSCI were specifically tailored toward Unix-like platforms running on 32-bit machines; only later was the library and run-time environment required by the interpreter changed to allow reasonably easy implementations on other platforms. However, as recent porting efforts have shown, there remains much still to be done for us in this region.
FreeSCI: Rebuilding Sierra's Classic Quests -- Few publishers were as important to adventure games as the venerable Sierra On-Line. Their King's Quest, Space Quest, and Leisure Suit Larry series paved the way for other fine installments. Though Sierra has moved on, their games live on through the FreeSCI project. Howard Wen explores how FreeSCI lets you play your favorite old games -- and, just maybe, create new ones.
Another problem is estimating the amount of features needed. Unlike reverse-engineered programs based on decompiled versions of the original code, you do not, in general, know exactly where you're going; designing ahead is a mixture of guessing the most likely features required in a certain piece of code and cleaning up parts whose functionality are believed to be well-understood.
As an example, consider upcalls in SCI. Some SCI kernel functions, which serve as the SCI function library, providing file I/O, graphics primitives, access to the sound server, etc., invoke bytecode functions for certain functionality, similar to what kernels like MACH do in some situations (with "bytecode" corresponding to "user space" here). Initially, we didn't know this, so our execution stack did not support plugging in calls to C code in between calls to bytecode; fortunately, this turned out to be relatively straightforward to implement.
O'Reilly Network: Any advice--legal or technical--for those who are looking into reverse engineering a game engine?
Skovlund: Two words: stay legal! It's very important when you reverse engineer a game that the original author has no valid reason to complain. This usually means that reverse engineering and implementation should be done by different groups of people. I hardly wrote a line of code in the first years for this exact reason.
Another thing to watch out for is patents. Sierra was granted a few patents on key technologies used in later SCI games. So far we haven't needed to deal with them, but we are going to have to do that for SCI01 and later games. One of those can be worked around, while the others might pose a problem because they describe very general concepts.
Reichenbach: We have one part of the team doing investigations on their interpreter and documenting these and another part reimplementing them. This way we avoid legal issues.
While it could be argued that it's a lot more work this way, it's also much more entertaining. We have done a lot of things very differently from the way Sierra implemented them (such as the graphics subsystem), which not only serves to considerably weaken a potential case Sierra's lawyers might try to make, but it also allowed us to add new features and checks (which may be of interest for future SCI game developers).
On the technical front, first, my recommendation would be not to use weakly typed and notoriously unportable languages like C and C++ for reimplementing the engine. It is far too easy to write dangerous, slow, and unportable code in these languages. Other languages that might come to mind would be popular scripting languages like Perl or Python. However, the amount of type-checking offered by these is far too small to make them useful for any large programs. More expressive, well-defined languages like Standard ML or Eiffel would, in general, be a much better choice.
If you still decide to use C or C++, keep the following in mind:
sizeof(int) != sizeof(void*). Assuming otherwise will break support for the Alpha architecture.
Some architectures cannot do immediate 16- or 32-byte reads from "odd" addresses. This is a particular problem when dealing with old bytecode. Try to read it in single-byte fashion by default. If it helps, you can make platform-specific optimizations later.
Remember that not everybody is little endian.
Try not to buy into one particular graphics or sound library. FreeSCI started out designed for the libggi graphics library, which, at that time, appeared to be one of the libraries most likely to become generally accepted and ported to a vast number of architectures. Today it's pretty much dead.
By abstracting our graphics API, we have significantly simplified porting to new architectures (and different visuals on the same architectures). Thus, we don't depend on, say, SDL supporting a certain platform in order to run on it. Of course, this creates some problems with API-specific optimizations, but, at least for graphics drivers, these tend to be sufficiently similar to allow them to be taken advantage of generically.
Depending on what you're trying to reimplement, it's possible that the computers you're targeting are more powerful than the platforms the code was originally targeted at. Thus, you can usually do more checking on whether what the game is trying to do is consistent with your perception of what it should be allowed to do (this, of course, is particularly relevant to flexible interpreters). Unless you know exactly the semantics of the issues you're dealing with, you can't hope to provide an accurate or, even, a better rendition of the original engine. Building fences and watching the script code run against them (by triggering run-time warnings, errors, or even failing some static analysis, if you're bold enough to implement that) is usually the only way to figure these out.
Unless the game you're trying to reimplement is fully documented already, the single biggest mistake you can make is not to record what you find out when examining the original code. Unless you happen to get your reimplementation right the very first time (which, of course, doesn't happen in practice), you'll have to examine the original code again when you try to fix your bugs.
For interpreters, in particular, documenting also helps people develop other, orthogonal tools. Also note that, when doing a clean-room reimplementation, documentation arises almost naturally as an artifact of the communication between the decoding and the reimplementation teams.
Howard Wen is a freelance writer who has contributed frequently to O'Reilly Network and written for Salon.com, Playboy.com, and Wired, among others.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.