Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples

Clustering and Load Balancing in Tomcat 5, Part 1

by Srini Penchikala

The latest version of the Tomcat servlet container provides clustering and load balancing capabilities that are essential for deploying scalable and robust web applications. The first part of this article provides an overview of installation, configuration, usage, and extension of clustering and load balancing features. The second will introduce a sample web application to demonstrate the steps involved in configuring Tomcat server instances to enable clustering, and will study session persistence using in-memory replication in the cluster environment.

The Tomcat 5 server comes with a rules-based load balancer application. Two simple custom load balancing rules (extending the rules API) were written based on round-robin and random algorithms to redirect incoming web requests. Performance benchmarks for the sample web application running in the cluster environment are presented. The load testing tool JMeter was used to simulate multiple web users to study the load-balancing mechanism.

Since this article concentrates mainly on demonstrating the clustering capabilities in the Tomcat servlet container, J2EE application clustering to replicate EJB, JNDI, and JMS objects is not discussed here. Refer to the articles "J2EE Clustering" and "J2EE Clustering with JBoss" for EJB and JMS clustering.

Large-Scale System Design

Enterprise web portal applications must provide scalability and high availability (HA) for web services in order to serve thousands of users hitting a corporate web site. Scalability is the system's ability to support increasing numbers of users by adding additional servers to the cluster. High availability is basically providing redundancy in the system. If a cluster member fails for some reason, another member in the cluster can transparently take over the web requests. Deploying a web portal application in a cluster environment gives us the ability to achieve scalability, reliability, and high availability required by the web portal application. Basically, the main goal of clustering is to prevent any web site outage problems occurring due to a Single Point of Failure (SPoF) in the system.

Large-scale system design provides mission-critical services to ensure minimal downtime and maximum scalability in an enterprise application environment. Rather than run a single server, multiple cooperating servers are run. To scale, you should include additional machines within the cluster and to minimize downtime, you should make sure every component of the cluster is redundant. The main ingredient of a large-scale system is clustering, which includes load balancing, fault tolerance, and session state persistence features. Usually for web applications, a hardware- or software-based load balancer sits in front of the application servers within the cluster. These load balancers are used to distribute the load between the cluster nodes by redirecting web traffic to an appropriate cluster member, at the same time detecting any server failures.


A cluster is defined as a group of application servers that transparently run a J2EE application as if it were a single entity. There are two methods of clustering: vertical scaling and horizontal scaling. Vertical scaling is achieved by increasing the number of servers running on a single machine, whereas horizontal scaling is done by increasing the number of machines in the cluster. Horizontal scaling is more reliable than vertical scaling, since there are multiple machines involved in the cluster environment, as compared to only one machine. With vertical scaling, the machine's processing power, CPU usage, and JVM heap memory configurations are the main factors in deciding how many server instances should be run on one machine (also known as the server-to-CPU ratio).

Related Reading

Tomcat: The Definitive Guide
By Jason Brittain, Ian F. Darwin

The servers in a J2EE cluster are usually configured using one of the three options. In the independent approach, each application server has its own file system with its own copy of the application files. Another approach is to use a shared file system, where the cluster uses a single storage device that all application servers use to obtain application files. A third configuration approach is called the managed approach, where an administrative server controls access to application content and is responsible for "pushing" appropriate application content to managed servers. The admin server ensures that all servers in the cluster have the application available. It also updates all servers when an application is deployed, and removes the application from all servers when the application is undeployed.

Clustering can be done at various tiers in a J2EE application, including at the database tier. Some database vendors offer clustered databases that support data replication between multiple database servers by providing client transparency where the client (usually a servlet container or an application server) doesn't have to know to which database server it's connecting to get the data. Examples of JDBC clustering are Oracle9i's Real Application Clusters (RAC) and Clustered JDBC (C-JDBC). RAC supports fail over of database connections and transparently reroutes JDBC connections and database requests to a failed over database node. C-JDBC is an open source database cluster that allows a web application to transparently access a cluster of databases through a JDBC driver. This implementation not only load balances JDBC connections among the database nodes in the cluster, but also fails over to a secondary database server.

Clustering in Tomcat

Clustering was available in the previous Tomcat version (version 4.1) as a third-party JAR file; it wasn't very easy to install or configure to make multiple Tomcat instances run in a cluster. JavaGroups is a popular choice for adding clustering capabilities in open source servlet containers (Tomcat) and application servers (JBoss). But in the latest version of Tomcat server, clustering comes as part of the main installation package. This minimizes all of the extra effort that goes into integrating third-party clustering implementations into the Tomcat server.

In a typical cluster environment, for servers in the cluster to cooperate and replicate state, they need to communicate with each other. This group communication is achieved either by point-to-point RMI (TCP-IP) or via IP multicast. Most of the J2EE application servers (such as JBoss, Oracle, WebLogic, and Borland) all use IP multicast communication to send state/updates/heartbeats to one another in the cluster. Here's how the communication among the cluster members works in Tomcat: all of the cluster members talk to each other using multicast ping messages. Each Tomcat instance will send out a message in which it will broadcast its IP address and TCP listen port for session replication. If an instance has not received the message within a given time frame, it is considered down.

Another popular concept in clustering, called farming, provides cluster-wide hot deployment of web applications. In a server farm, a web application is deployed by copying an application's WAR file to only one node in the cluster; farming will take care of deploying the web application across the entire cluster. Similarly, removing the WAR file from a single cluster node will result in undeploying the web application from all the nodes in the cluster. The Tomcat clustering documentation mentions that a future Tomcat version will support farming capability.

Load Balancing

Load balancing (also known as high availability switch over) is a mechanism where the server load is distributed to different nodes within the server cluster, based on a load balancing policy. Rather than execute an application on a single server, the system executes application code on a dynamically selected server. When a client requests a service, one (or more) of the cooperating servers is chosen to execute the request. Load balancers act as single points of entry into the cluster and as traffic directors to individual web or application servers.

Two popular methods of load balancing in a cluster are DNS round robin and hardware load balancing. DNS round robin provides a single logical name, returning any IP address of the nodes in the cluster. This option is inexpensive, simple, and easy to set up, but it doesn't provide any server affinity or high availability. In contrast, hardware load balancing solves the limitations of DNS round robin through virtual IP addressing. Here, the load balancer shows a single IP address for the cluster, which maps the addresses of each machine in the cluster. The load balancer receives each request and rewrites headers to point to other machines in the cluster. If we remove any machine in the cluster, the changes take effect immediately. The advantages of hardware load balancing are server affinity and high availability; the disadvantages are that it's very expensive and complex to set up.

There are many different algorithms to define the load distribution policy, ranging from a simple round robin algorithm to more sophisticated algorithms used to perform the load balancing. Some of the commonly used algorithms are:

Load-balancing algorithms affect statistical variance, speed, and simplicity. For example, the weight-based algorithm has a longer computational time than the other algorithms. For a more detailed explanation on load balancing, refer to the ONJava article "Load Balancing Web Applications."

Load Balancing in Tomcat

Load balancing capability was not provided in previous Tomcat versions. The integration of the Apache web server and the Tomcat servlet container together has been a popular choice to handle web requests and to balance loads. In an Apache-Tomcat setup, a Tomcat instance called Tomcat Worker is configured to implement load balancing.

Tomcat 5 provides load balancing in three different ways: using the JK native connector, using Apache 2 with mod_proxy and mod_rewrite, or using the balancer web app. In this article, we concentrate on the third option, using the balancer web application to redirect web requests to different nodes in the cluster. The load balancer application is a rules-based application that uses a servlet filter mechanism to redirect incoming web requests to the next available member in the cluster. Servlet filters were introduced in the Servlet 2.3 specification. These filters are used for a variety of tasks in a web application, such as JAAS authentication, encryption, logging and auditing, data compression, XSLT filters that transform XML content, etc. As mentioned on the Tomcat balancer web site, the balancer application is not designed as a replacement for other robust load-balancing mechanisms. Rather, it's a simple and extensible way to direct traffic among multiple servers. Check out the sample Java classes provided in the balancer application to understand how load balancing is achieved in different ways using different rules criteria.

The load balancing is enabled by creating a rules configuration file (called rules.xml) that contains various rules and redirection URLs. The balancer filter checks the RuleChain to determine where to redirect the request by checking the rules in the same order as they are defined in the rules.xml file. As soon as a Rule matches the criteria, the filter stops the evaluation and redirects the request to URL specified for the matching rule.

Fault Tolerance

Fault tolerance is the system's ability to allow a computation to fail over to another available server if a server in the cluster goes down, as transparently to the end user as possible. An ideal fail over scenario is that the cluster service should detect when a server instance is no longer available to take any requests, and stop sending requests to that instance. It should also periodically check to see if a cluster member is available again and, if so, automatically add it to the pool of active cluster nodes.

Fault Tolerance in Tomcat

Tomcat 5 does not provide a built-in fail over mechanism to detect when a cluster member crashes. Hopefully, a future version of Tomcat will provide the fail over feature that can be used to find the availability of a specific cluster member to make sure it's ready to service incoming web requests.

There are two levels of fail over capabilities typically provided by clustering solutions:

Session State Persistence

Fail over and load balancing require the session state to be replicated at different servers in a cluster. Session state replication allows a client to seamlessly get session information from another server in the cluster when the original server, on which the client established a session, fails. The state can be system state and/or application state (application state contains the objects and data stored in an HTTP session). The main goal of session replication is not to lose any session details if a cluster member crashes or is stopped for application updates or system maintenance.

As far as session persistence is concerned, clustering can be a simple scenario in which a cluster member doesn't have any knowledge of session state in the other cluster members. In this scenario, the user session lives entirely on one server, selected by the load balancer. This is called a sticky session (also known as session affinity), since the session data stays in the cluster member that received the web request.

On the other hand, the cluster can be implemented in such a way that each cluster member is completely aware of session state in other cluster members, with the session state periodically propagated to all (or preferably, one or two) backup cluster members. This type of session is known as a replicated session.

There are three ways to implement session persistence:

In memory session replication, the individual objects in the HttpSession are serialized to a backup server as they change, whereas in database session persistence, the objects in the session are serialized together when any one of them changes.

The main drawback of database/file system session persistence is limited scalability when storing large or numerous objects in the HttpSession. Every time a user adds an object to the HttpSession, all of the objects in the session are serialized and written to the database or shared file system.

Session Replication in Tomcat

Session replication in the current version of Tomcat server is an all-to-all replication of session state, meaning the session attributes are propagated to all cluster members all the time. This algorithm is efficient when the clusters are small. For large clusters, the next Tomcat release will support primary-secondary session replication, where the session will only be stored at one or maybe two backup servers.

There are three types of session replication mechanisms in Tomcat:

Factors to Consider in Implementing a J2EE Cluster

There are many factors to take into account when designing a J2EE cluster. The following is a list of questions to be considered in a large-scale J2EE system design. (This list is taken from "Creating Highly Available and Scalable Applications Using J2EE" in the EJB Essentials Training document.)


Load Balancing

Fault Tolerance

Session State Persistence

Proposed Cluster Setup

Listed below are the main objectives I wanted to accomplish in the proposed cluster environment:


In part two of this article, we'll look at how to deploy a cluster (by running multiple Tomcat server instances) to achieve these goals. We will discuss the cluster architecture and configuration details to enable session replication in Tomcat 5.

Srini Penchikala is an information systems subject matter expert at Flagstar Bank.

Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.