Modifying Slony Clusters
Pages: 1, 2
Swapping Providers
If your replication scheme has three or more nodes, you may want to switch providers. This is not the same as failover or promotion of a new master. It simply changes the source of the replication data for a node.
In the example case, Node 1, the origin of set 1, provided the information on set 1 to Node 2. When you added Node 3, you subscribed it to Table Set 1. The Table Set both originated on and was provided by Node 1.
Now the goal is to switch providers so that Node 3 retrieves Table Set 1 information from Node 2, instead of directly from Node 1. Node 1 remains the origin of the Table Set.
Node 3 could have either Node 1 or Node 2 as a provider of that set
information. Node 2 is available as a provider, because when you subscribed Node
2 to Table Set 1, you also enabled it as a forwarder of Table Set
1.
The listen paths, however, have Node 2 and Node 3 listening to
each other via Node 1. One necessary change is to ensure that 2 and 3 listen to
each other directly, because Node 2 will now provide the data for Node 1. Once
the paths and listens are already set up, simply resubscribe the set, setting
the provider to Node 2 instead of Node 1.

Figure 3.
#! /bin/bash
#
# varlena Slony Initialization
# ==============================
slonik << _END_
cluster name =gbcluster;
node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres';
node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres';
node 3 admin conninfo = 'dbname=gb2 host=localhost port=5430 user=postgres';
# ==============================
store listen (origin=3, receiver=2, provider=3);
store listen (origin=2, receiver=3, provider=2);
subscribe set ( id=1, provider=2, receiver=3, forward=yes);
_END_
After running this script, examine the log files to see that Slony stored the listens and updated the subscription.
Switching Masters
In the example, the database called gb (mastered on a PostgreSQL 7.4 installation) replicates to two nodes on 8.0 installation. The decision has been made to move the database forward so that the 8.0 installation is the master. (If you are using Slony-I, be prepared to take advantage of a fast upgrade using master promotion.)
Before you even consider swapping masters, you must have in hand a complete
set of steps to follow to switch your applications accessing the 7.4
installation to the 8.0 installation. These steps are application-dependent, and
so you are on your own. However, the steps probably consist of stopping each
application, changing the connection information (ideally in one
place), and bringing the application back up after the switchover. It is
imperative for a smooth and fast switchover that you have the application
switch information at hand. Write them down and save the instructions in your
source code control mechanism. You will never be sorry you did that.
One more cautionary note, I highly recommend that you test your application on a copy of the 8.0 database. This copy should not be the replica, if your application writes to the database; it should be another copy. Remember that replicas are read only databases. Oh, yes, one more thing. Back up your databases before performing any major operations on them. Switching replication masters is a major operation. No excuses. Back up!
Everything that happens in a Slony-I replication happens because of an
Event. One of the important Events is a SYNC event.
Every subscribing node replicates data up to a SYNC event and then commits the
changes to the database. All nodes capable of forwarding subscriptions also log
those changes until every node subscribed to that set has confirmed
the SYNC event. This ensures that replication data remains available in the
system until Slony is sure that no node needs it.
To change the master of a set, you must first ensure that there are listeners for any of the new paths. The example provided listening paths from and to both of the nodes already, so there are no new listener paths required. Before swapping a master on a subscribed set, the set must be locked to ensure that no updates occur during the swap-over part. Then you may move the set. Finally, the newly designated replica node, formerly the master node, must subscribe to the set.
Before you run the script, ensure that write access to your master database
is OFF.
#!/bin/sh
slonik << _EOF_
cluster name = gbcluster;
node 1 admin conninfo = 'dbname=gb host=localhost port=5434 user=postgres';
node 2 admin conninfo = 'dbname=gb host=localhost port=5430 user=postgres';
node 3 admin conninfo = 'dbname=gb2 host=localhost port=5430 user=postgres';
# add listener paths if required
#
# lock and move set
#
lock set (id=1, origin=1);
move set (id=1, old origin=1, new origin=2);
# subscribe set if required
subscribe set (id=1, provider=2, receiver=1, forward=yes);
_END_
After this script, Slony-I is ready for the change. Restart your application and have it point to the new master for writes. This process should only take seconds.
Failover
The replication example here began as a 7.4 database gb
(Node 1), replicating to an 8.0 installation (Node 2). Then you added a third
node on the 8.0 database called gb2. Initially the third node
replicated directly from Node 1. You switched Node 3's provider to Node 2, then
promoted Node 2 to be the master. Now you have an 8.0 master at Node 2 and two
replicas, one 7.4 and one in the same installation as gb2.
Consider the second replica, gb2, as if it were on another machine
or installation.
Suppose the master database failed or lost power and had to be taken offline. (Remember Node 3, for the sake of this discussion, is not on this machine, and so this hypothetical situation treats it as if it were live when the machine died.) What to do next is a human decision. You must prepare for this failover scenario. In this case, you have decided to failover to Node 1 in case of a failure on Node 2.
The applications communicating with the database will also start to fail, so you must take them offline and restart them, pointing at the new master quickly after the master switch takes effect.
The failover procedure for Slony is a combination of a provider change and a
master change, both of which "Building
and Configuring Slony" covered. Previously, you added Node 3 and had Node 1
provide for it. Then you changed Node 3's provider to Node 2, and finally,
promoted Node 2 to master. In the failover case, Slony must do the reverse using
the failover command:
- Promote Node 1 to master.
- Switch Node 3 to pull from Node 1.
Then you can safely remove Node 2 for repair.
#!/bin/sh
slonik <<_EOF_
cluster name = gbcluster;
node 1 admin conninfo = 'dbname=gb host=localhost user=postgres port=5434';
node 2 admin conninfo = 'dbname=gb host=localhost user=postgres port=5430';
node 3 admin conninfo = 'dbname=gb2 host=localhost user=postgres port=5430';
failover (id=2, backup node = 1);
_EOF_
At this point, the slon process for Node 2 should be dead. When
you are ready to put Node 2 back into place, add it as a fresh empty database
into the replication scheme. (See "Add Node," above.) When it catches up, then
you can switch masters so that Node 2 can again be the master of the cluster.
(See "Switching Masters.")
While resolving some of the problems that I ran into, I found that it was
easiest and clearest for me to drop Node 2. drop node "erases" the
node from the entire replication cluster. This is different than
uninstall node, which removes the Slony-I schema from a specific
database instance. Both tools are useful, but don't confuse them.
|
Related Reading
Practical PostgreSQL |
The other issue I ran into was that while quickly cutting and pasting, I had
mismatched paths, listens, and connection information. It is very
worthwhile to check each of these commands by hand to verify that the commands
are exactly what you meant. Also, don't forget that the connection information
for store path pertains to the server node. Undoing
bad paths and listen connections is a delicate operation and it is very
tempting to throw the whole thing away and start from scratch.
Schema Changes
Changing the schema of a database being replicated is not simple. The schema changes must occur at the exact transactional point in time. Direct schema changes could lead to serious corruption in Slony-I due to the handling and disabling of triggers, rules, and constraints in the replica.
Slony-I provides a way to execute SQL statements via the Event
mechanism. This provides the transactional integrity necessary for schema
changes, as well as the trigger, rule, and constraint changes required for the
replicas. You must initiate Events on the master node.
To add a "dummy" column to the artfile table, issue an Event to
the master Node 1 pointing to the ALTER TABLE script file so as to
synchronize it between databases. The EXECUTE SCRIPT command in
Slonik will do this. Remember, Slony must be able to find your change
script.
#!/bin/sh
#
# Create a new column in a table in the replicated set
#
echo "Creating new column in the artfile table"
slonik <<_EOF_
cluster name = gbcluster;
node 1 admin conninfo = 'dbname=gb host=localhost user=postgres port=5434';
node 2 admin conninfo = 'dbname=gb host=localhost user=postgres port=5430';
node 3 admin conninfo = 'dbname=gb2 host=localhost user=postgres port=5430';
execute script ( SET ID = 1, FILENAME = 'changes20050219.sql', EVENT NODE = 1);
_EOF_
Once this change propagates, you can do an update to populate the new column and verify it is being updated on all replicas.
References
- General Bits Slony Articles on Tidbits
- The Slony-I Project documentation on GBorg
- Slonik Commands
- Jan Wieck's Original Slony-I Talk and Scripts (PDF) July 2004 in Portland, OR, sponsored by Global Registry Services
- Information from IRC #slony on freenode.net
- Mailing list: Slony1-general@gborg.postgresql.org
A. Elein Mustain has more than 15 years of experience working with databases, 10 of those working exclusively with object relational database systems.
Return to ONLamp.com.
- Trackback from http://jira.io.aol.com:8080/confluence/display/IODOCS/MDDB+Replication+Failover
MDDB Replication Failover
2005-10-28 14:48:17 [View]
- Trackback from http://jira.io.aol.com:8080/confluence/display/IODOCS/PMS+Replication+Failover
PMS Replication Failover
2005-10-28 14:47:06 [View]



