ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Atomic File Transactions, Part 2

by Jonathan Amsterdam
02/06/2002

This article is the second of a two-part series. If you haven't read the first part yet, it would make sense to do so before reading this article. In this article, I describe my algorithm for atomic filesystem transactions, and tour the design of my package. A detailed justification of the algorithm can be found in the appendix. The code for the system is implemented in the Java package com.astrel.io.atomic, available here.

An Atomicity Algorithm

The first step in a correct atomicity algorithm is keeping a journal. (The term "log" is used in the transaction processing literature, but to avoid stepping on its common -- and related -- meaning of "outputting messages about the system for later analysis," we'll use journal. Writers of journalling filesystems have done the same.)

The journal for a transaction is a record of all the actions that have been taken under the transaction. (In my scheme, each transaction has its own journal. In traditional systems, there is a single journal for all transactions.) Armed with a journal, the recovery process knows what happened, and in what order, before the crash, and can reconstruct the original state from the information in the journal plus the backup files.

The algorithm has four parts:

The execution phase performs the actions as they are requested, writing enough information in the journal to undo each action, should that become necessary. The commit phase cleans up and deletes the journal. Rollback plays back the journal in reverse order, undoing the actions, then cleans up and deletes the journal. Recovery looks for existing journal files and cleans up after them, undoing uncommitted transactions.

Execution

Let's look at each part in detail, beginning with execution. The algorithm for executing a single action under a transaction first writes undo information to the journal, then performs the action:

E1. Write the action's name, its arguments (the file to delete) and any other necessary undo information (the name of the backup file) into the journal.

E2. Flush the journal's contents to disk.

E3. Write any backup files needed to undo the action.

E4. Perform the action.

E5. If there is an error performing the action, erase its entry from the journal.

Step E5 guarantees that every journal entry represents a successful action, except possibly the last.

To analyze the correctness of this algorithm with respect to crashes, you would have to consider every possible crash point, and show that all transactions are still atomic. I undertake this task in the appendix.

Commit

Now let's turn to the commit portion of the algorithm. When commit occurs, all the actions have been executed and the program is happy with the results, so there is little to do.

C1. Write in the journal that the transaction has committed.

C2. Delete the backup files for the transaction.

C3. Delete the journal.

The actual commit occurs when C1 completes successfully. Deleting the backup files and the journal is just cleanup. We delete the journal last because it is the only place that contains the locations of the backups.

Rollback

The rollback algorithm undoes the actions in reverse order, then cleans up and deletes the journal. The basic algorithm is complicated by the possibility of crashes before or during rollback:

R1. For each journal entry that has not already been undone, in reverse order:

R1.1 If the action needs to be undone, undo it.

R1.2 Record in the journal that the action has been undone.

R2. Delete the backup files for the transaction.

R3. Delete the journal.

In the absence of crashes, R1.2 would be unnecessary, and in R1.1 all of the information to determine whether an action needed to be undone would be in memory, since the system could keep track of the success or failure of each action. But if a crash occurs, then rollback will be run as part of recovery, and only the filesystem (including the saved journal information) will be available to determine whether the action needs to be undone. That is why step R1.2 keeps track in the journal of which actions have been undone. See the appendix for a detailed analysis.

Recovery

Recovery happens when the system is restarted after a crash. If it finds no journals in the TransactionManager's directory, then it has nothing to do. Otherwise, for each journal:

RE1. If the journal ends in a commit marker, perform steps C2 and C3.

RE2. Otherwise, perform rollback.

At the end of a successful recovery, all uncommitted transactions will have been rolled back, and all backup and journal files will have been deleted.

Optimizations

Related Reading

Learning JavaLearning Java
By Pat Niemeyer & Jonathan Knudsen
Table of Contents
Index
Sample Chapter
Full Description
Read Online -- Safari

A number of optimizations to the rollback/recovery part of this algorithm are possible. For example, if a transaction creates a file and then performs several other actions on it, there is no need to undo the actions -- it is enough to delete the file. Similarly, if a backup copy of a file is made, then subsequent actions on a file are irrelevant; recovery can simply restore the backup. After giving it some serious thought, I decided not to implement these optimizations, for four reasons. First, rollbacks are rare -- most transactions commit successfully -- so improving the speed of rollback would not be of much benefit. Second, typical transactions probably don't involve multiple operations on the same file. For instance, it seems unlikely that a transaction would write to nonexistent file once (thereby creating it), then close it and write to it again. A more common pattern would involve a single access to each of many files. Third, renaming complicates these optimizations; renamings still have to be undone carefully, and in reverse order. Finally, adding these optimizations would complicate the architecture of my package, making it more difficult to understand its correctness and harder for users to add their own actions.

Undo vs. Redo

The atomic transaction algorithm I've just discussed performs filesystem actions as they are requested, and handles rollback and recovery by undoing actions. In contrast to this undo-based algorithm, a redo-based algorithm adds actions to a journal when they are requested, but does not actually perform them until commit. In practice, a redo algorithm maintains "shadow" copies of all files changed during a transaction, and replaces the originals with the shadows when the transaction commits.

As a simple example, let's contrast the undo and redo versions of a basic atomic file write. Here is the undo version, repeated from above for convenience:

E1u. Rename the original file to a backup file. If there is an error doing this, then delete the backup file, if any, and report failure.

E2u. Write the file. If there is an error doing so, then restore the backup and report failure.

E3u. Otherwise, delete the backup and report success.

This is the redo version:

E1r. Write to a shadow file. If there is an error doing so, then report failure.

E2r. Rename the shadow file to the original file (deleting the original first, if this is necessary).

The redo version is a bit simpler, but its greatest advantage is in recovery. If there is a crash during or just after E1r, then the original file was never changed, so recovery isn't necessary (except for the detail of cleaning up the shadow file). In filesystems that permit renaming to an existing file, step E2r is atomic, and its successful completion commits the transaction. (In filesystems that require that the original is deleted before the rename can take place, a crash between the delete and rename necessitates recovery.)

We can generalize the redo approach to sequences of actions. Actions are written to a journal as they are requested, and performed at commit time. If a crash occurs before commit, then no recovery is necessary because nothing has been changed.

This lack of need for recovery makes the redo approach very attractive. The algorithm seems simpler, and more importantly, after a crash, the filesystem is in a consistent state, even if the program that generated the transactions never runs again. But there are some hidden problems with the redo approach.

First, if a crash occurs while the actions are being carried out during commit, then the file system is in an inconsistent state, requiring recovery. In other words, the redo approach doesn't eliminate the need for recovery; it just shortens the window of time when it is necessary. The recovery and rollback steps involved if a crash occurs during commit are essentially the same as in the undo approach.

A second and more important flaw with the redo approach is that problems with an action are not reported until commit, since the action doesn't actually take place until then. Say the user performs a delete under a transaction, and then executes other actions as part of that transaction, assuming that the delete succeeded. The user must wait until commit time to discover whether the delete failed, and thus whether the other actions were justified. In fact, it is impossible to write a transaction that behaves differently depending upon whether or not an action succeeded, since this fact can't be known until a commit is requested. Typical filesystems provide no way to test whether an action would succeed other than actually performing the action.

For these reasons, I believe that the undo approach is better than the redo approach for implementing atomic filesystem transactions.

A Tour of the Package

The com.astrel.io.atomic package implements the undo logic described above. In addition to the classes TransactionManager and Transaction, there is a Journal class that handles the details of reading and writing the journal, an Action class that represents a single filesystem action, and subclasses of Action for each of the supported actions. There are also several exception classes, which I won't discuss.

TransactionManager

This class manages a set of transactions. It maintains the directory where journal files are written and provides each transaction with a unique number. It also performs recovery by scanning the directory for journal files and asking the Journal class to perform rollback on each file.

Transaction

This class ensures that a sequence of actions is performed atomically. Its constructor creates a journal. The methods openOutputStream, openRandomAccess, delete, and rename are really just one-line convenience methods that create an instance of the appropriate Action subclass, and then call the run method with the action. The code for run (simplified slightly to ease explanation) is shown in Listing 1. Since it is one of the central methods of the system, I describe it in detail.

Transaction.run takes an action as argument and returns a value that depends on the action (a FileOutputStream, for example, in the case of the action corresponding to the openOutputStream method). The action has already been created, and so it has all the information necessary for it to be executed (e.g., the name of the file to open). The job of run is to generate undo information for the action, write it to the journal, execute the action, and return its results. The task is complicated by the need to handle errors correctly. Errors in the action's execution are returned to the caller normally. Problems with generating or saving the undo information are caught and rethrown as TransactionExceptions, after rolling back the transaction. Problems in rollback cause InconsistentStateException to be thrown.

The run method begins by calling the action's prepare method. This causes the action to determine all information necessary for undo. Typically, this is the name of a backup file. It may also be part of a file's current state. For instance, an action that changes the permissions on a file would need to know the current permissions for undo purposes. The action should place the information in its nontransient instance variables.

After prepare returns, the action is serialized into the journal. The undo information, saved in nontransient instance variables, is thus written to disk. Journal.writeAction ends with a call to FileDescriptor.sync to make sure the action's serialized form is on disk before proceeding. Then the action's createBackup method is called, which actually creates the backup file, if any.

If any one of these three methods -- Action.prepare, Journal.writeAction, and Action.createBackup -- fails, then the transaction is rolled back and a TransactionException is thrown to indicate a problem with the transaction machinery itself.

Finally, the action is executed and its result is returned. Errors here result in the journal entry for the action being erased (the job of Journal.actionFailed). Since the problem occurred while executing the action itself, and not in the transaction system, the original exception is rethrown. The transaction is not rolled back (unless Journal.actionFailed itself has a problem); it can be continued with a different action. (The assumption is that if an action fails, it does not change the state of the filesystem. Thus each action must itself be atomic.)

Action

This abstract class contains the methods required to perform and undo a single action, as well as convenience methods that can help in this task. Particular actions are subclasses of this class. Listing 2 shows the class for deleting a file, called DeleteAction. You can provide your own actions by writing your own subclass of Action.

The constructor will receive the arguments of the action, and can choose to throw exceptions if it determines that the action cannot be carried out. For example, the DeleteAction constructor throws a FileNotFound exception if it is asked to delete a file that does not exist.

An action's prepare method has the job of saving any information needed to undo the action. Often this involves naming a backup file. The Action method generateBackupFilename provides a robust way to do this. It generates a unique name in the same directory as the original file, using the original file name, a unique sequence number, and the extension .abk. (This extension can be changed by setting a property or by calling a method in the TransactionManager class.) At the time it returns, it guarantees that the file doesn't exist.

DeleteAction's prepare method simply invokes generateBackupFilename on the original filename and stores the result in an instance variable.

Action's createBackup method is responsible for actually making the backup. The Action methods renameNotDeleting and copyNotDeleting are useful for this. As their names suggest, these methods will not delete the target file if it exists; instead they throw an exception. This prevents inadvertent erasure of backup files that may have been generated by some other process. Since generateBackupFilename ensures that the filename created doesn't exist, and since createBackup is called milliseconds later, it is extremely unlikely that this error will occur. DeleteAction's createBackup method calls renameNotDeleting to rename the original to the backup.

When should a backup be created by copying the file, and when by renaming? Renaming is much faster, but copying makes more sense if the original must continue to exist -- for example, when appending to a file.

Action's execute method carries out the action, possibly returning an Object. DeleteAction's execute method has nothing to do, since renaming the original file to the backup has effectively deleted the file. The execute method of the Action subclass for opening a file creates and returns a FileOutputStream.

At the end of the transaction, Action's close method is called to close files or perform other normal ending operations.

In a successfully committed transaction, only the prepare, createBackup, execute, and close methods will be called.

During rollback, an action's undo method will be called. The undo method must undo the action if it in fact occurred. The method is responsible for detecting this -- that is, for implementing the local undo property. (See the appendix for a description of this property.) DeleteAction's undo method renames the backup file to the original file.

The cleanup method gives the action a chance to delete backups or other filesystem states. It is called after a successful commit or rollback.

Journal

This class uses a RandomAccessFile to manipulate the journal. It contains methods for writing actions and markers into the file. It is careful to flush all in-memory data to disk using FileDescriptor's sync method after each method call, to ensure that crashes are handled correctly. For example, if an action's undo information is not written to disk before the action executes, it will be impossible to undo the action after a crash.

Related Work
Frank O'Dwyer of Rainbow Diamond Limited in Ireland has independently written a similar file transaction package. The idea of using shouldCommit and end methods instead of commit and rollback methods is his, as is the method of creating backup files. There are a number of minor differences in our systems, but the main one is that his uses redo logic instead of undo logic. For reasons I discussed above, I think the undo approach is superior.

Conclusion

This article described a package that makes it possible to perform filesystem actions like writing, deleting, and renaming files atomically. Although the underlying algorithms are complex and subtle, using the package is quite straightforward. Consider it as an alternative to a database when only atomicity is required.

Continue to next page for Appendix: Correctness of the Algorithm

Jonathan Amsterdam is Senior Consulting Engineer at DataSynapse, Inc. Prior to that, he was president and founder of Astrel, Inc., a Java training and consulting firm.


Return to ONJava.com.

Appendix: Correctness of the Algorithm

Here I provide detailed justifications of the correctness (with respect to crashes) of my atomicity algorithm. For convenience, the steps of the algorithm are repeated.

Execution

E1. Write the action's name, its arguments (e.g. the file to delete) and any other necessary undo information (e.g. the name of the backup file) into the journal.

E2. Flush the journal's contents to disk.

E3. Write any backup files needed to undo the action.

E4. Perform the action.

E5. If there is an error performing the action, erase its entry from the journal.

Crash points

During E1 or E2. If a crash occurs during the first two steps, the action's entry may be partially written to the journal. Recovery will detect that the final entry is not complete and will ignore it.

After E2 and before E4. Backup files have been created, but the action never occurred. Recovery will detect this (see the recovery section below) and will delete the backup.

During E4. Each filesystem action supported by the package is either atomic (in which case we need not worry about a crash during the action), or it is possible to determine from the state of the filesystem whether the action began or not, and to undo its partial effects.

After E4 and before E5. Recovery takes into account that the last action in the journal may not have succeeded.

During E5. Erasing the action from the journal is atomic, on the assumption that setting the length of a RandomAccessFile is atomic.

Commit

C1. Write in the journal that the transaction has committed.

C2. Delete the backup files for the transaction.

C3. Delete the journal.

Crash points

Before or during C1. If a crash occurs previous to or while writing the commit marker, the commit fails. Recovery will notice the existence of the journal file, see that there is either no commit marker or a partial one at the end of the file, and roll back the transaction.

After C1 and before C3. Some backup files may not be deleted. The recovery process, noticing that the transaction is committed, will finish the job, reading the journal to find the locations of the backups.

During C3. Can't happen -- we assume delete is atomic.

The Local Undo Property

Before looking at the correctness of recovery/rollback in the face of crashes, we must discuss a requirement on actions that is central to the correctness argument. The requirement states that for each action, it must be possible to restore the system to the state before the action was attempted, even if the action failed to occur, partially occurred, or was already undone. This requirement is implicit in the first part of step R1.1: "If the action needs to be undone...". Detecting whether an action needs to be undone is difficult to achieve, in general, but the algorithm is designed so that the detecting code can assume that no other actions on the relevant files have intervened. Let's call this requirement on actions the local undo property.

We need the local undo property for two reasons. First, after a crash, it is possible that the last entry recorded in the journal represents an action that never happened. That could occur if the system crashed between step E4, when the journal entry was made, and E5, when the action was executed. In that case, recovery should not undo the last action. Since the journal entry claims the action happened, we must look to the filesystem to see whether it really did or not. For instance, to test whether a delete of file F occurred, we could see if F existed or not. If it did, we would not attempt to restore it from backup.

The second reason that we need the local undo property concerns crashes that occur during rollback itself. The subsequent restart and recovery may attempt to undo an action a second time. If the undo operation for an action is idempotent -- that is, it has the same effect whether it happens once or many times -- then repeated undo attempts aren't a problem. But typical undo operations are not idempotent. For example, the way to undo a file creation is to delete the file. Attempting the delete a second time after it succeeded the first time would result in an error, since the file doesn't exist anymore. Making delete idempotent would seem to be easy -- just test for the presence of the file before deleting it. Unfortunately, achieving true idempotence is difficult by simply looking at the state of the filesystem (i.e., which files exist), because such information is ambiguous. An example should help make this clear.

Consider an action that creates a file named F. The corresponding undo operation is deleting F, if it exists. This would seem to be idempotent, since it always leaves the filesystem in the same state: F does not exist. The problem is that it may delete the wrong file F.

To see this, assume rollback did not record undone actions (i.e., step R1.2 were not present). Say that initially there is a file named F, and that the transaction

Delete F
Create F

is carried out, but fails before commit. When recovery runs, it rolls back the transaction by first deleting F -- that is, the undo operation for "Create F" -- and then restoring the original F from a backup via renaming, thereby undoing the "Delete F" action. Recovery then crashes. At this point, a file named F exists. When recovery is run again, "Create F" will be undone again, deleting F. We have just deleted the original file F present before the transaction, a grave error. The backup is gone, so we cannot restore from it.

One fix would be to restore the backup by copying rather than renaming. But similar problems arise with other undo operations, where there is no obvious backup to copy.

For instance, the obvious way to undo a rename of F to G is to rename G back to F, provided G exists and F does not. But consider the transaction

Create F
Delete G
Rename F to G

in an initial state where F does not exist but G does. If the transaction crashes just before commit, then recovery will rename G to F, restore the backup of the original G, and finally delete F. If this first recovery attempt then crashes before deleting the journal, then the filesystem contains G but not F, so the next recovery attempt begins by renaming G to F. But as before, this G is the original G. When it is deleted as a result of undoing the Create F action, the original file will be lost.

These examples demonstrate that there is no obvious way to make undo operations idempotent by looking only at the existence of files; the problem is that there may be different versions of the files. The best solution is to record in the journal the fact that the undo action took place, so it will not be attempted in the event of a crash and restart. The only time an undo operation will be executed more than once is if the crash occurs after the operation happens but before the undo marker is written into the journal. But in that case, no other operations will intervene, so testing the existence of files will give the right results. This is exactly what I have called the local undo property: an action's undo operation must be able to detect whether the action needs to be undone from the state of the filesystem, provided no other actions on the relevant files have occurred in between.

Rollback

R1. For each journal entry that has not already been undone, in reverse order:

R1.1 If the action needs to be undone, undo it.

R1.2 Record in the journal that the action has been undone.

R2. Delete the backup files for the transaction.

R3. Delete the journal.

Before R1. In this case, the system crashed before rollback began -- in other words, while the transaction was in progress. Recovery, failing to see a commit marker at the end of the journal, will call rollback to undo the transaction's partial effects. There is one tricky point: if the crash happened after an action failed but before its record could be erased from the journal (that is, between steps E4 and E5), then the last journal entry represents a failed action, perhaps one that partially completed. Since recovery is the first thing that should happen after a crash, there will have been no intervening activity, so the local undo property will guarantee that the last journal entry is treated appropriately.

During R1.1. If the undo operation is atomic, then this can't happen. Otherwise, the local undo assumption says that the original state of the system can be recovered, even if the action was partial. A crash during undo will look like a partially-completed action, so it will be correctly recovered when the system restarts. The local undo property assumes that the files aren't touched between the crash and the subsequent recovery attempt. This is achieved by using the undo markers written in step R1.2. When recovery runs after a crash, it will use the undo markers to avoid any attempt to undo actions that have already been undone.

Between R1.1 and R1.2. The action has been undone, but that fact is not recorded in the journal. Recovery will call rollback, which will start again with this action. The local undo assumption covers this case. No intervening activity has occurred since the crash, so the system will detect that the action has been undone -- this is the same as detecting that the action has not occurred -- and will not attempt to undo it again.

During R1.2. Writing the last byte of the undo marker is atomic. (See the discussion above on crashing before or during C1.)

After R1.2 and before R1.1. Since these steps are in a loop, we must consider a crash after R1.2 runs, but before the next iteration begins. In this case, recovery will cause rollback to resume exactly where it left off, because there is an undo marker in the journal for each undone action.

After the last R1.2 and before R3. Since the fact that every action is undone is recorded in the journal, the next time rollback runs it will proceed immediately to step R2, picking up where it left off, and then to R3.

During R3. Deleting the journal is atomic.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.