The availability of cheap computing power and increased network bandwidth gives rise to distributed component-based computing applications. A distributed component-based application is a configuration of services provided by different application components running on physically independent computers that appear to the users of the system as a single application running on a single physical machine. Several things motivate the adoption of distributed component-based systems over traditional centralized systems.
To give the illusion to users of a single unified application running on a single physical machine, instead of a collection of disparate applications running on heterogeneous computers connected via a network, a distributed system needs to be transparent in the following ways.
The distributed system allows a user to store, access, and manipulate data transparently from many computers while maintaining the integrity of data during system failures. The management of distributed data and transactions is accomplished at the local and global levels. A local data manager, or resource manager, enables the access and manipulation of data or resources. These resource managers provide the transparency of data location, data models, and database security and authority control. A local transaction management system is responsible for initiating, monitoring, and terminating transactions in a computing system. A distributed transaction management system extends the scope of a local transaction management system by coordinating with the local resource managers to view related transactions over a network as a single transaction.
A transaction is a group of statements that represents a unit of work, which must be executed as a unit. Transactions are sequences of operations on resources -- like read, write or update -- that transforms one consistent state of the system into a new consistent state. In order to reflect the correct state of reality in the system, a transaction should have the following properties.
Each transaction should preserve the ACID properties for the system to reflect the correct (i.e., consistent) state of reality. Unlike a centralized computing environment where application components and resources are located at a single site, and transaction management only involves a local data manager running on a single machine, in a distributed computing environment all the resources are distributed across multiple systems. In such a case transaction management needs to be done both at local and global levels. A local transaction is one which involves activities in a single local resource manager. A distributed or a global transaction is executed across multiple systems, and its execution requires coordination between the global transaction management system and all the local data managers of all the involved systems. The Resource Manager and Transaction Manager (TM), also known as a transaction processing monitor (TP monitor), are the two primary elements of any transactional system. In centralized systems, both the TP monitor and the resource manager are integrated into the DBMS server. To support advanced functionalities required in a distributed component-based system, separation of TP monitor from the resource managers is required.
Transaction management taxonomy
The most common configurations of transactional enterprise systems are the following.
There are two ways to specify transactions, namely, (i) programmatic and (ii) declarative.
A global or distributed transaction consists of several subtransactions and is treated as a single recoverable atomic unit. The global transaction manager is responsible for managing distributed transactions by coordinating with different resource managers to access data at several different systems. Since multiple application components and resources participate in a transaction, it's necessary for the transaction manager to establish and maintain the state of the transaction as it occurs. This is achieved by using a transaction context, which is an association between the transactional operations on the resources and the components invoking the operations. During the course of a transaction, all the threads participating in the transaction share the same transaction context. The scope of a transaction context logically encapsulates all the operations performed on transactional resources during a transaction. The transaction manager needs to analyze the transaction request and decompose the transaction into many subtransactions, propagate the transaction context, and send them to associated resource managers. The transaction context is usually maintained transparently by the underlying transaction manager.
Resource managers inform the transaction manager of their participation in a transaction by means of a process called resource enlistment. The transaction manager keeps track of all the resources participating in a transaction by resource enlistment and uses this information to coordinate transactional work performed by the resource managers with two-phase commit and recovery protocol. All the resources enlisted are deleted at the end of a transaction, i.e., after it either commits or rolls back. The transaction manager has to monitor the execution of the transaction and determine whether to commit or roll back the changes made by the transaction to ensure the atomicity of the transaction.
Two-phase commit (2PC)
Two-phase commit protocol between the transaction manager and all
the resources enlisted for a transaction ensures that either all the
resource managers commit the transaction or they all abort. In this
protocol shown in the figure below, when the application requests the
commitment of a transaction, the transaction manager issues a
PREPARE_TO_COMMIT request to all the resource managers
involved. Each of these resources may in turn send a reply indicating
whether it is ready for commit (
PREPARED) or not
NO). Only when all the resource managers are ready for a
commit, does the transaction manager issue a commit request
COMMIT) to all the resource managers. Otherwise, the
transaction manager issues a rollback request (
the transaction is rolled back.
Although 2PC guarantees the autonomy of the transaction, the required processing load is quite heavy, creating frequent update conflicts, especially when data is duplicated across multiple sites. Replication of data is a way to alleviate this conflict problem and is usable only when transaction-based update propagation is not required. Most distributed systems adopt these two methods in parallel to judiciously match the requirements of the application.
There are trade-offs between two phase commit (2PC) and replication server approaches. The fundamental difference between them is that one operates at the transaction level and the other is a periodic propagation of updates. The guidelines are to 1) keep data replication as minimal as possible, 2) keep the number of copies small, if data must be synchronized as a transaction, and use 2PC, 3) use replication servers if concurrency requirements outweigh sub-second data integrity requirements, and 4) use replication servers if the network and nodes are unreliable.
Concurrency control is another crucial functionality. It allows multiple users to access data at the same time, increasing the throughput and performance of the system. It allows transactions to execute concurrently while achieving the same logical result as if they had executed serially. Concurrency control allows multiple transactions to read and update data simultaneously, and it includes transaction scheduling and management of the resources needed by transactions during execution. Most of the methods to ensure serialization of transactions are lock based, e.g., two-phase-locking method, timestamp method, and multiversion method. The Two-phase locking (2PL) algorithm is the most commonly used technique in distributed transactional systems to accomplish update synchronization and concurrency control. Often vendors combine concurrency control techniques like 2PL, consistency control techniques like 2PC, and timeout for deadlock resolution into a single implementation for global distributed transaction management.
Queued transaction processing
Direct transaction processing is synchronous, since the initiator of the transaction is blocked until the transaction manager runs the transaction. Unfortunately, there are situations where either the client or the server or the communication link between them fails. Sometimes there is even a need for priority scheduling of requests for transactions. The synchronous model of transaction processing cannot handle all these cases. This led to the development of asynchronous transaction processing models using queues. The queue is a transactional resource and operations on the queue, namely, enqueueing and dequeueing, are either made durable or completely undone depending on whether the transaction that issued the operations commits or aborts. The J2EE platform defines two mechanisms for handling queued semantics. One can use the native JMS API or one can make use of message-driven beans as defined in EJB Specification 2.0.
JMS API in transactions
The application component developer should not use the JMS
request-reply paradigm within a single transaction. Since a JMS
message is not delivered to its final destination until the
transaction commits, the receipt of the reply within the same
transaction never takes place. Because the container manages the
transactional enlistment of JMS sessions on behalf of a bean, the
parameters of the
createQueueSession(boolean transacted, int
transacted, int acknowledgeMode) methods are ignored. It is
recommended that the component developer specify that a session is
transacted and provide 0 for the value of the acknowledgement mode. It
is also important to keep in mind that the JMS
acknowledge() method either within a transaction or
within an unspecified transaction context must not be used. Message
acknowledgement in an unspecified transaction context is handled by
the container with
JMS AUTO_ACKNOWLEDGE semantics.
A message-driven bean is an asynchronous message consumer invoked by the container as a result of the arrival of a JMS message. The client's view of a message-driven bean is that of a JMS message consumer that implements some business logic running on the server. A client accesses a message-driven bean through JMS by sending messages to the JMS Destination (Queue or Topic) for which the message-driven bean class is the MessageListener. Message-driven beans have no home or remote interfaces and are stateless. They are thus like stateless session beans: all bean instances are equivalent as long as they are not involved in servicing a client message. A message-driven bean instance is created by the container to handle the message processing for the consumer. The container controls its lifetime. However, the instance variables of the message-driven bean instance can contain state across the handling of client messages.
Examples of such state include an open database connection and an object reference to an EJB object. The message-driven bean model makes developing an enterprise bean, which is asynchronously invoked to handle the processing of incoming JMS messages, as simple as developing the same functionality in any other JMS MessageListener. It also makes concurrent processing of a stream of messages possible by means of container provided pooling of message-driven bean instances.
The goal of transparency in a distributed TP environment demands interoperability between transaction managers and interoperability of TMs with resource managers. Currently, there are two open standards, namely, X/Open DTP Model & ISO TP Model. ISO TP protocol communicates between two distributed transaction managers possibly from different vendors in an open environment. ISO TP doesn't have many followers. On the other hand, X/Open Distributed Transaction Processing (DTP) is a distributed transaction-processing model proposed by the Open Group. This model is a standard among most of the commercial vendors providing transaction processing and relational database solutions. Some of the major elements of the model are
The following are the major interfaces specified in the model:
The X/Open DTP model is fairly well-established in the industry. Commercial transaction management products like TXSeries/Encina, Tuxedo, TopEnd, and AT&T GIS support the TX interface. Most of the commercial databases such as Oracle, Sybase, Informix and Microsoft SQL Server, and messaging middleware products like IBM's MQSeries and Microsoft's MSMQ Server, provide an implementation of the XA interface.
The Java 2 Enterprise Edition is a framework based on architecture. Support for transactions is a major element of the J2EE architecture. The component provider can use the Java Transaction API (JTA) to specify transaction boundaries in the component code. On the other hand, with declarative transaction specification support in enterprise beans, transactions are started and completed automatically by the container. The J2EE server implements the necessary low level transaction protocols between transaction manager and database systems supporting the JDBC API, including mechanisms for transaction context propagation and the optional distributed two-phase commit. The J2EE platform currently supports flat transactions that cannot have any nested transactions. It is possible to have a transactional application that uses a combination of servlets and JSP pages accessing multiple EJBs within a single transaction. Each component may acquire one or more connections to access one or more
shared resource managers. It is important to emphasize that there are many different possibilities to choose from when it comes to architecting a solution for a specific application. However, out of all the possible configurations, only the following recommended ones are highlighted here.
Case I: Stand-alone Client <-> EJB Container
<-> RDBMS/EIS Resources
This configuration is the first step in the evolutionary migration from a two-tier client server system. In general there is a fat client probably written using Swing that communicates with the business logic residing in the EJBs in an EJB container of some application server.
Case II: Browser <-> Web Container <-> RDBMS/EIS
This configuration is popular in small scale web applications that are not supposed to serve an unpredictable number of users and is more appropriate for systems that reside within the intranet of an organization. This is suitable for applications where use of an application server is overkill.
Case III: Browser <-> Web Container <-> EJB
Container <-> RDBMS/EIS Resources
This configuration is the recommended full-blown architecture where the application has to be robust and scalable with a multitude of users demanding the maximum performance out of the system.
Dibyendu Baksi is a J2EE transactions systems and frameworks designer and developer for Sun Microsystems, Inc.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.