ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Upgrading a MySQL Application

by Russell Dyer
10/16/2003

Most developers design MySQL databases for their own use or for the use of their employers. Occasionally, though, a developer will design a database for use by others, for sale as an application. Since an application developer usually isn't present when his application is installed and used, he must consider many factors when designing the database.

Despite excessive planning, though, an application is never perfect when it's first released. Invariably, tables will need to be changed and scripts will need to be replaced. This is fairly easy for a developer to do on his own server. The difficulty comes when creating an upgrade program that will make these changes, a program that non-technical clients can run on their own. Careful planning of an upgrade is required to keep from destroying the client's data.

In this article I will examine such a quandary and offer some suggestions for putting together an upgrade to a MySQL application. I'll provide examples of how such an upgrade might unfold. Finally, I'll also comment on how one might update the data in a reference table of an application.

The Application

For the examples in this article I will use the scenario of a developer who has created a MySQL application that runs on Linux. The application is designed for used-car dealers to track their inventory, sales, and other related information. It's accessed through the web browser; the interfaces are written in Perl using the CGI module. In addition to HTML pages for employee use, there are also pages for the public to view automobiles for sale, on-line.

Related Reading

MySQL Cookbook
By Paul DuBois

The developer has sold several copies of his application over the Web. His clients each pay a quarterly subscription fee for regular updates. Now that it's time for an update, the developer wants to make some changes to improve the program.

For one thing, two tables, sales_staff and employees, contain redundant information. The sales_staff table holds the commission pay-out rate, sales ranking, and other information, including each salesperson's name and telephone numbers (home and mobile). The employees table contains personnel data on all of a client's employees, not just salespeople, also including contact information. When a client has to add new salespeople, he must add them to both tables. To change a telephone number, he must make telephone number changes in both. Naturally, clients often forget to change both.

As part of the upgrade, the developer wants to eliminate the name and telephone columns from sales_staff, instead adding a column for the employee id to link the records to employees. He'll also have to change the Perl scripts that talk to these tables, but as this is primarily a MySQL article, we'll gloss over those parts. The upgrade will contain several other improvements, but this one change is sufficient for our example. Let's now go through each step of the process in the order in which the upgrade would be performed.

Playing it Safe

Since clients, particularly used-car dealers, can be pretty rude when their data is lost, the developer has tested the upgrade extensively. Additionally, he wants to take the precaution of having the upgrade program make a backup of the client's data before upgrading the application. There are a few MySQL statements and tools that may be used to backup data. However, since many of his clients have installed the application on virtual domains (i.e., on sites rented from web hosting companies), they havu limited choices. Therefore, the developer has decided to use mysqldump to backup the effected tables.

A client must first upload the upgrade package—a tarred and gzipped file containing the upgrade program (upgrade.plx) and other related files—to his web server. He then must log onto the server with administrative access to install the package and to run upgrade.plx. The upgrade package creates a backup sub-directory beneath the installation directory to store backups of data and scripts. After some opening fanfare, the user is prompted to provide an administrative username and password for MySQL. The user is also asked for the database name (if different from the default) for the dealership application; some web hosting companies assign names to their customers' databases. The upgrade program will store and export this client information as bash environment variables like this:

export USR='username'
export PW='password'
export DB='database'
export UP_DIR=`pwd`
export SCRIPT_DIR='/var/cgi/cardealer'

This method sets up variables that can be used by all of the scripts contained in the upgrade program. Of course, the actual username and other values will contain the client's specific information. As for the upgrade directory (UP_DIR), that is set by the pwd command. This could be incorrect, though, if the user started upgrade.plx from a different directory. Therefore, it's better to have the user confirm this value before proceeding. All of these shell variables can be retrieved by bash when needed by adding a dollar-sign in front of the variable name like this: $PW. For Perl, the variables are part of an environment hash and can be retrieved specifically like this: $ENV{PW}.

Getting back to the backup, for good measure, the user is then asked if the MySQL tables may be locked before backing up the data. This is to ensure that the data isn't changed during the backup. If the user answers affirmatively, a table can be locked using the bash variables like this:

$ mysql --user="$USR" --password="$PW" \
	-e "LOCK TABLES sales_staff READ" $DB

$ mysql --user="$USR" --password="$PW" \
	-e "FLUSH TABLES" $DB

These statements should be entered from the Linux command line, not the mysql client. The SQL statements (i.e., LOCK and FLUSH) could be invoked by Perl using the DBI module. However, since a mysqldump statement will follow, we'll stay on the command line for now.

The -e option indicates that what follows should be executed in MySQL. Within the quotes is a LOCK statement: the table that's to be locked is given; to lock more tables, just list them right after the first table, with spaces in between. A READ lock prevents users from writing to the tables during the backup while still allowing them to read the data. By the way, these command lines, like many in this article, are wrapped for nicer display. However, they may be entered on the same line or with a backslash as shown here at the end of each incomplete line to let the shell know that more follows. Once the table is locked, the upgrade program can back it up:

$ mysqldump --user="$USR" --password="$PW" \ 
	$DB sales_staff > ./backup/sales_staff.sql

$ mysql --user="$USR" --password="$PW" \ 
	-e "UNLOCK TABLES" $DB

The mysqldump utility creates a plain text file containing a CREATE TABLE statement and a series of INSERT statements to restore the data if necessary. The table name is given here and the output is redirected to the backup path and the backup filename. Incidentally, --add-locks could have been added to the mysqldump statement here instead of running the LOCK and FLUSH statements before, and so an UNLOCK statement would not have to be run later. Choose the approach you prefer.

If it's necessary to restore the data, the developer has written a separate program called restore.plx to restore the data (see last section below). In addition to backing up the data, the upgrade program will also make copies of the Perl scripts that will be replaced. This will be done using tar like this:

tar -cvzf $UP_DIR/backup/scripts.tar.gz \
	$SCRIPT_DIR/car_db/*

The c option creates an archive and the f option a tar file. The v option verbosely lists the files being archived. The z option compresses the file with gzip. Finally, the value of the $UP_DIR variable contains the installation directory of the upgrade determined at the beginning. The $SCRIPT_DIR is another environment variable that contains the directory of the application's CGI pages.

Migrating Data

To eliminate the redundancy in the sales_staff table, the upgrade program first needs to run an ALTER TABLE statement to add a column for the employee id.

For tidiness, the developer chose to add the emp_id column after the sales_id column (the first and primary key column):

ALTER TABLE sales_staff 
ADD COLUMN emp_id INT 
AFTER sales_id;

Next, the employee id number must be determined for each row. This can be messy and make the developer wish he had planned his application better. Nevertheless, that's in the past and it has to be dealt with now. To update most of the records, the upgrade program will run the following SQL statements:

CREATE TABLE temp_empid
   (empid INT, salesid INT);

INSERT INTO temp_empid
   SELECT   employees.emp_id, sales_id
   FROM     employees, sales_staff
   WHERE    emp_first = rep_first
     AND    emp_last  = rep_last; 

UPDATE   sales_staff, temp_empid
   SET   emp_id   = empid
   WHERE sales_id = salesid;

DROP TABLE temp_empid;

The first SQL statement creates a temporary table, temp_empid to hold each employee's id number and the related sales id. The second SQL statement inserts into temp_empid each emp_id selected from employees and the related sales_id selected from sales_staff, where the first and last name columns of each table match. The third SQL statement updates sales_staff by setting the values of the new column emp_id equal to the matching empid column in temp_empid where the sales id columns of each of these tables match. Finally, the fourth SQL statement deletes the temporary table. All of these SQL statements could be replaced with one statement without the use of a temporary table:

UPDATE sales_staff, employees
   SET   sales_staff.emp_id = employees.emp_id
   WHERE emp_first          = rep_first
     AND emp_last           = rep_last;

This will work, but being split apart into a set of three SQL statements like above can help to solve some data migration problems.

Migration Problems

There are some potential problems with running the set of three SQL statements above unchecked. First, before running the UPDATE statement it's a good idea to extract a list of rows from temp_empid at least for where there are duplicates (if not for all rows) so that the user can adjust the information before updating sales_staff. The following SQL statement will help:

SELECT COUNT(sales_id), sales_id, CONCAT(rep_first, ' ', rep_last)
FROM     employees, sales_staff
WHERE    emp_first = rep_first
AND      emp_last  = rep_last
GROUP BY employees.emp_id;

A Perl program would use this statement to capture the count of rows and the name and sales id of each sales representative. For counts greater than one, the user would be given a list of salespeople. By capturing sales_id, the upgrade program can run another SELECT statement to retrieve the entire related record from sales_staff if requested, so that the user can have more data to associate rows in temp_empid to salespeople. The programs, of course, will have to keep circling back and querying the database again for duplicates until there are none. Then the UPDATE statement can be run.

For rows in sales_staff where there are no matches (probably due to different spellings of names), the user should be asked to adjust the data. The simplest way to deal with these records is to run the following SQL statement after the UPDATE statement is run to select only records that were not updated (where emp_id in sales_staff is still NULL):

SELECT sales_id, CONCAT(rep_first, ' ', rep_last)
FROM  sales_staff
WHERE emp_id IS NULL;

One potential issue with the all-in-one UPDATE statement is it refers to multiple tables. This feature was added to MySQL in version 4.0. Clients using an earlier version will need a workaround. This requires getting Perl involved in performing the updates instead of using a multi-table UPDATE statement. Here's one way in which it could be done in Perl after the CREATE TABLE and the INSERT statements are run, but before the DROP statement:

#!/usr/bin/perl -w

use strict;
use DBI;

# Client Variables
my $db   = $ENV{DB};
my $pw   = $ENV{PW};
my $user = $ENV{USR};

# Connect to MySQL
my $dbh = DBI->connect( "DBI:mysql:$db:localhost", $user, $pw )
	or die "Failed: " . DBI->errstr();

# Get rows from temp_empid
my $sql = "SELECT empid, salesid FROM temp_empid";
my $sth = $dbh->prepare($sql);
$sth->execute();

my $empids = $sth->fetchall_arrayref();
$sth->finish();

# Loop through array of temp_empid rows 
# and update sales_staff table
foreach my $emp (@$empids) {    
	my ($empid,$salesid) = @$emp;

	my $sql = "UPDATE sales_staff 
		SET emp_id = ?
		WHERE sales_id = ?";

	my $sth = $dbh->prepare($sql);
	$sth->execute( $empid, $salesid );
	$sth->finish();
}

$dbh->disconnect();

exit;

The opening lines of this script (sales_staff_update.plx) use the Perl DBI module and set up the client variables. The script then establishes a connection to the MySQL server or dies if it cannot. Next, the program basically gets all of the records at once from the temporary table. Each row of the results, that is, each salesperson, is stored in an array, itself stored in a comprehensive array. The scalar $empids holds a reference to this parent array. The script then loops over the salespeople (each row of temp_empid), updating the sales_staff table where the sales id numbers match. Once this is done, the script disconnects from the MySQL server and exits. The script above is run by the upgrade program like this:

perl sales_staff_update.plx 2> \
	'./logs/sales_staff_update'

Any error messages are redirected to a text file which may be sent to the developer as needed.

Updating a Reference Table

The application's database has a table (makes_models) that contains the names of hundreds of makes and models of automobiles. This table can be joined to the inventory table by a column called model_id. It saves the user from having to type the make and model name for each automobile entered into inventory. The client does, however, have the option of adding models not already in the table. As part of the upgrade, the developer wants to seize the opportunity not only to add new models to the client's makes_models table, but also to get a copy of the user's additions so that the developer can add them to his master table for future releases.

In preparation for the upgrade, the developer exported makes_models from his master copy of the database using mysqldump:

$ mysqldump --user='root' --password='abc123' \
	--add-locks --no-create-info car_dealer \
	makes_models > makes_models.sql

The --no-create-info option stops the CREATE TABLE statement from being added to the output file. There is a problem with the file that this will create, though. The developer doesn't want to run INSERT statements because they might cause duplicates (if not errors). Instead, he wants to run REPLACE statements, which will only insert a row if there is not already a row with the same primary key in the table. If a row already exists, then MySQL will replace it with the imported record. Unfortunately, there isn't an option with mysqldump to generate REPLACE statements instead of INSERT statements. But since the syntax of both statements are identical, a simple Perl one-liner like the following can fix the exported file before it's distributed to clients:

$ perl -p -i.bak -e 's/INSERT/REPLACE/;' makes_models.sql

The -p switch tells Perl to process each line of the input file. The -i.bak here causes Perl to create a backup file called makes_models.sql.bak in the current directory. The original file will be replaced. The -e switch tells Perl that what follows in quotes is to be executed by Perl. The bit in quotes substitutes the first occurrence of INSERT on each line with REPLACE.

With the makes_models.sql dump file properly adjusted, the upgrade program only has to run it on the client's server to update the client's makes_models table. After that has been done, the upgrade program will then need to query the database for user generated rows in makes_models.

As mentioned before, the primary key of makes_models is model_id. This is not an automatically incremented numeric column. Instead, it's a fixed length alphanumeric character column (CHAR(9)). The first four characters are letters representing the make and the last five are sequential numbers (e.g., FORD12345) This is a little old fashioned, but the developer is more comfortable with this method. When the user creates a new model, the application generates an identifier with the letters USER followed by a five character number. One advantage to this is that user created rows won't be overwritten by updates. Another advantage is that it makes it easy to extract these rows. The upgrade program will use the following SQL statement to create a temporary table (exp_models) based on makes_models, select the user added rows, and insert them into the temporary table:

CREATE TABLE exp_models
   SELECT * 
   FROM  makes_models 
   WHERE model_id LIKE 'USER%';

The first line here creates the temporary export table based on the types of the columns selected from makes_models—in this case, all (*) columns. As for the WHERE clause, the % is a wildcard. Rows that start with USER and end in any number of characters will be selected.

After this, the upgrade program merely needs to run mysqldump to export the temporary table and then use the system's e-mail service to mail it to the developer (preferably with the client's permission). Once this is done, it can issue a DROP statement to delete the temporary table.

Undoing or Finishing

With the table and data changes completed, the new and replacement scripts can be installed. This is just a matter of copying files with the cp command. Once it's done, the user should be asked to verify the integrity of the data and to make sure that the new scripts are working properly. If the user responds that there are problems, the upgrade program should offer to undo the upgrade and to restore the data. The scripts that are tarred and gzipped in the backup directory could be restored like this:

cd $SCRIPT_DIR
tar -xvfz $UP_DIR/backup/scripts.tar.gz

The first line changes to the directory containing the application's scripts. In the second line, the tar command extracts all of the files backed up and restores them. In this case, the z switch will gunzip the files. The x switch will then untar the file (the f switch). The v option (for verbose) will allow the user to see what's happening. This is good for reassuring the user. As for scripts that the upgrade added, the restoration program can run a series of rm command statements to remove them individually. All that will remain then is to restore the tables that were tampered with and their data. This can be done by using the mysqldump files that were created at the beginning:

mysql --user="$USR" --password="$PW" \
      $DB < ./backup/sales_staff.sql

This command causes MySQL to run the SQL statements contained in the text file created by mysqldump. After this, the restoration program should e-mail the developer a message that the upgrade failed, along with copies of the log files that the upgrade generated.

If the upgrade was successful, though, the upgrade program could then delete the backup files and the installation program. However, it would be prudent to leave everything in place in case the client notices a problem with the upgrade a few days later. The files could be deleted during the next upgrade.

Closing Comments

This article has only illustrated one table change and one reference table update. However, it has touched on many of the factors that need to be considered in a MySQL application upgrade. A real upgrade would simply be more of the same. As with most situations with databases, a database application upgrade is tedious and requires a tremendous amount of planning and testing. To save yourself plenty of grief, be prepared for failure by making backups and by developing a restoration script. This will minimize client complaints and make it easier to recover the client's data and to keep the client's patronage. You should also try to consider every deviation that the client might make, and assume that the clients have no technical abilities. The smoother the upgrade, the less follow-up telephone support that will be required. This will lead to more referrals and more money per hour for your work in developing your application.

Russell Dyer has worked full-time for several years as a free-lance writer of computer articles, primarily on MySQL.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.