ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Collaborative Document Editing with svk

by Chia-liang Kao
09/09/2004

Say you have a document that needs to be presented in two languages and you are the translator. While the translation is in progress, someone revises the original master document. This means you now might be working with an outdated paragraph or one no longer present in the master version.

This article tries to map this problem to parallel development, which version control systems solve with the branch and merge model. You will also see how svk helps you maintain translated documents easily.

The Problem

Translating a document takes time and is seldom finished in a single session. Unfortunately, an original document sometimes undergoes revisions before the translator is finished. A translator needs to track the original document closely, in order to avoid working on outdated paragraphs or even those that have been removed completely. Even after the translation is complete, if many parts of the original document undergo revision, the translator will have to track numerous changes and can easily get lost if he does not finish all of his adjustments within a single work session.

Do It By Hand

Suppose you want to translate a document into Chinese. You make a copy of the original version, start working on it, and get through the first ten paragraphs. Then you discover that the author made some changes in the original document. Some of them are in the parts that you have translated and some are not. You examine the changes: in the chunks you haven't yet translated, you update by copying and replacing; for those parts you've translated already, you have to read and understand the changes in the original document and adjust your translation accordingly.

For smaller documents that don't change frequently and take only a few work sessions to finish, this is not a big deal. For larger documents where the translation takes longer, odds are greater that someone will update the original document. Repeating the process is boring and wastes the translator's time. Of course, you should use a tool whenever something is repetitive and boring.

Version Control Systems

Version control systems are essential productivity tools for software development. They're often useful for configuration file management too. What do they have to do with translations?

Most version control systems support parallel development on branches, in which two or more teams can work on things separately, merging them together later. Modern version control systems also support merge tracking across branches. You can simply give the instruction to merge everything from A to B, without explicitly specifying which changes to merge. That will merge all the changes that were made to A since the last time you performed a merge into B.

After merging, if you haven't touched the corresponding parts that have changed in A, the system will update your version. Otherwise, if there are conflicts, it will prompt you to resolve them.

In the context of translation, merging can allow B to catch up with the latest modifications from A. Conflicts will occur when someone has updated the original text of translated paragraphs--identifying when you need to update their translations.

svk (http://svk.elixus.org/) is a new version control system that is easy to use for maintaining translated versions of documents. After all, it's pointless to use software that brings you more overhead than the time you can save.

The rest of this article introduces basic svk usage that will be sufficient for the translation task introduced here. For other features and further information, consult the svk web site.

Installing svk

svk requires Subversion and its Perl bindings, which many Linux and BSD distributions provide as prebuilt packages. After installing them, you can install svk like other CPAN modules:

% perl Makefile.PL
% make all test
# make install

Next, you need to configure the depot, the storage location of files and modifications. The default depot is //. Inside, the depot looks like a normal filesystem, so you will also want to organize things into directories:

% svk depotmap --init
% svk mkdir -m 'this is for articles' //articles

To add items to the depot and modify them, you must first check out //article to your ordinary filesystem:

% svk checkout //articles
% cd articles

Suppose article-en.txt is the file to translate, whether written by you or downloaded from somewhere else. In both cases, putting the article into the depot for version control allows the translated version to more easily track its base. To add files to the depot:

% svk add article-en.txt
% svk commit -m 'initial version' article-en.txt

When the original author changes it, simply overwrite it again and svk commit it.

In the example, we will assume the sample text below is now in article-en.txt:

Manageable Document Translation with svk

Translating articles is really hard work, and you deserve a better
tool.  The tool should be able to tell you which part of the original
text has been updated so you can adjust the translation accordingly.

If the file you are to translate is already under version control somewhere else with Subversion, cvs, or Perforce, svk can mirror it and update to the latest version for you automatically. Consult the svk home page for more information.

There are other basic svk commands you might find useful, such as status and log.

Working on the Translation Branch

Create a branch of article-en.txt by performing a copy:

% svk copy article-en.txt article-zh.txt
A + article-zh.txt

This will create the file article-zh.txt. svk knows it is related to the current version of article-en.txt. You can now work on the translation in the new file. Commit changes often to snapshot your work in progress:

% svk commit -m 'translate first paragraph' article-zh.txt

Larger projects such as books will involve many files instead of just one. In that case, organize them in a directory and copy the whole directory for branching.

Merging Changes

Suppose that after some hard work, you have finished the translation. Someone has revised the original text to fix some wording:

Translating documents is really hard work and you deserve a better
tool.  The tool should be able to tell you which part of the original
text is updated, so you could adjust the translation accordingly.

The first line didn't change.

To bring this change to the translated version, use sfk's smart merge:

% svk smerge -C //articles/article-en.txt article-zh.txt
Auto-merging (3, 5) //articles/article-en.txt to //articles/article-zh.txt
 (base //articles/article-en.txt:3).
C   article-en.txt
New merge ticket:
 8712955e-75da-0310-a94d-be098d1d806e:/articles/article-en.txt:5
1 conflict found.

The smart merge command finds the last merge point automatically, so you don't have to worry about those funny revision numbers above. In this case, for the first time a merge is performed, the merge base is the point at which you copied (or branched) the original file. Run the smerge command often to update and commit article-en.txt, as catching up with the original text frequently is easier than doing a huge update some months later.

The -C option here means to check what the smerge command will do. It has found a conflict. Run smerge again without -C, which makes modifications to article-zh.txt. You'll see the content like this:

conflicts after merge
Figure 1. Conflicts after a merge.

The first line, which you have translated and whose original text has not changed, remains there. The second part, as you can see, now consists of a marker for the conflict, which shows your version with the original text that the translation was based on and, finally, the new version of the original text.

In a document longer than just four lines, svk will merge all the other changes that you haven't translated into the latest version of article-en.txt. This saves a lot of time tracking and copying them.

Now you can proceed to update the translations in the file. It should be relatively easier, since you can read the changes made in the original text while modifying your translation side by side.

Before you can commit the file again, you have to mark the conflicts that svk just found as resolved:

% svk resolved article-zh.txt
/home/clkao/articles/article-zh.txt marked as resolved.

Why? Consider that you are using svk the usual way--managing source code. The funny conflict markers won't mean much to any compiler, and committing them will result in an inconsistent state where your software won't build. This is not the case for document translation! If you can't finish updating all those conflicts for now, it's rather convenient to commit the work in progress, leaving the conflicts there for resolution later. In the event that you've set up a Subversion server, other people in the same group translating the same document may be able to finish the merge easily. This makes collaborative translation possible.

Using Merge Tools

A few tools allow interactive merges like the one we did earlier manually. svk supports Emacs (EDiff), meld, and FileMerge.App on Mac OS X. You can easily call them as external merge tools for conflict resolution. We will use Emacs as the example here.

Find the svk-merge-emacs tool under the utils directory in svk distribution. Also remember to put svk-ediff.el in your emacs lisp directory.

The SVKMERGE environment variable controls the external merge tool:

% export SVKMERGE=svk-merge-emacs
% svk smerge //articles/article-en.txt article-zh.txt
Auto-merging (3, 5) //articles/article-en.txt to //articles/article-zh.txt
 (base //articles/article-en.txt:3).
Started 5427, Try 'kill -USR1 5425' to terminate if things mess up

The svk smerge command now waits for the external merge tool. An Emacs window will pop up in ediff-merge-with-ancestor mode.

Press $$ to ignore obviously merged chunks, as you likely care only about conflicts. Use the n and p keys to navigate between chunks that need merging. Press + to expand a chunk to combined mode, which consists of the base text:

merging with emacs
Figure 2. Merging with Emacs.

Comparing the ancestor and variant B parts by eye is sometimes tiring. In that case, mark the ancestor text and press =b to compare it with the revised master text in buffer B:

viewing detailed changes
Figure 3. Viewing detailed changes.

This should give you a clearer idea about what has changed. When you decide how to reflect the change in your translation, press a to load the translated text back into the merging into buffer, then update it.

If the paragraph has had a complete rewrite and is far different from the old one, press b to load the new version and translate it from scratch.

When you are done with all the chunks, press q to finish the merge session. svk smerge will finish the rest of the merge. You can examine the change you made to article-zh.txt with svk diff:

updated translation
Figure 4. The updated translation.

Run svk commit as usual when you are satisfied with it.

Limitations of svk Translation

svk translation works best if there are markups between sections or paragraphs, as that makes it easier for external merge tools to find corresponding paragraphs.

If the original article moves paragraphs around, you will have to track them manually. You will see in the conflict marker that the moved paragraph becomes empty, and the new location with the moved text will have an empty base and translated text.

Conclusion

Using branch and merge in svk as presented in the article can apply to any other version control system that supports branching and merge tracking and, perhaps, external merge tools.

Translation is an extreme type of article editing. The concepts here can also apply to conventional editing. Make a branch for the article that you are proofreading. When the author sends an updated version, you can smerge the changes painlessly, retaining the modification you made, or vice versa.

Chia-liang Kao is the original author of svk and the Perl bindings for Subversion.


Return to ONLamp.com

Copyright © 2009 O'Reilly Media, Inc.