Upgrading a MySQL Application
Pages: 1, 2
Migrating Data
To eliminate the redundancy in the sales_staff table,
the upgrade program first needs to run an ALTER TABLE
statement to add a column for the employee id.
For tidiness, the developer chose to add the emp_id
column after the sales_id column (the first and
primary key column):
ALTER TABLE sales_staff
ADD COLUMN emp_id INT
AFTER sales_id;
Next, the employee id number must be determined for each row. This can be messy and make the developer wish he had planned his application better. Nevertheless, that's in the past and it has to be dealt with now. To update most of the records, the upgrade program will run the following SQL statements:
CREATE TABLE temp_empid
(empid INT, salesid INT);
INSERT INTO temp_empid
SELECT employees.emp_id, sales_id
FROM employees, sales_staff
WHERE emp_first = rep_first
AND emp_last = rep_last;
UPDATE sales_staff, temp_empid
SET emp_id = empid
WHERE sales_id = salesid;
DROP TABLE temp_empid;
The first SQL statement creates a temporary
table, temp_empid to hold each employee's id number and
the related sales id. The second SQL statement inserts
into temp_empid each emp_id selected
from employees and the related sales_id
selected from sales_staff, where the first and last name
columns of each table match. The third SQL statement
updates sales_staff by setting the values of the new
column emp_id equal to the matching empid
column in temp_empid where the sales id columns of each
of these tables match. Finally, the fourth SQL statement deletes the
temporary table. All of these SQL statements could be replaced with
one statement without the use of a temporary table:
UPDATE sales_staff, employees
SET sales_staff.emp_id = employees.emp_id
WHERE emp_first = rep_first
AND emp_last = rep_last;
This will work, but being split apart into a set of three SQL statements like above can help to solve some data migration problems.
Migration Problems
There are some potential problems with running the set of three SQL
statements above unchecked. First, before running
the UPDATE statement it's a good idea to extract a list
of rows from
temp_empid at least for where there are duplicates (if not for all
rows) so that the user can adjust the information before updating
sales_staff. The following SQL statement will help:
SELECT COUNT(sales_id), sales_id, CONCAT(rep_first, ' ', rep_last)
FROM employees, sales_staff
WHERE emp_first = rep_first
AND emp_last = rep_last
GROUP BY employees.emp_id;
A Perl program would use this statement to capture the count of
rows and the name and sales id of each sales representative. For
counts greater than one, the user would be given a list of
salespeople. By capturing
sales_id, the upgrade program can run another SELECT
statement to retrieve the entire related record from sales_staff
if requested, so that the user can have more data to associate rows in
temp_empid to salespeople. The programs, of course, will have to
keep circling back and querying the database again for duplicates until there
are none. Then the UPDATE statement can be run.
For rows in sales_staff where there are no matches
(probably due to different spellings of names), the user should be
asked to adjust the data. The simplest way to deal with these records
is to run the following SQL statement after the UPDATE
statement is run to select only records that were not updated
(where emp_id in sales_staff is
still NULL):
SELECT sales_id, CONCAT(rep_first, ' ', rep_last)
FROM sales_staff
WHERE emp_id IS NULL;
One potential issue with the all-in-one UPDATE
statement is it refers to multiple tables. This feature was added to
MySQL in version 4.0. Clients using an earlier version will need a
workaround. This requires getting Perl involved in performing the
updates instead of using a multi-table
UPDATE statement. Here's one way in which it could be done in Perl
after the CREATE TABLE and the INSERT statements are
run, but before the DROP statement:
#!/usr/bin/perl -w
use strict;
use DBI;
# Client Variables
my $db = $ENV{DB};
my $pw = $ENV{PW};
my $user = $ENV{USR};
# Connect to MySQL
my $dbh = DBI->connect( "DBI:mysql:$db:localhost", $user, $pw )
or die "Failed: " . DBI->errstr();
# Get rows from temp_empid
my $sql = "SELECT empid, salesid FROM temp_empid";
my $sth = $dbh->prepare($sql);
$sth->execute();
my $empids = $sth->fetchall_arrayref();
$sth->finish();
# Loop through array of temp_empid rows
# and update sales_staff table
foreach my $emp (@$empids) {
my ($empid,$salesid) = @$emp;
my $sql = "UPDATE sales_staff
SET emp_id = ?
WHERE sales_id = ?";
my $sth = $dbh->prepare($sql);
$sth->execute( $empid, $salesid );
$sth->finish();
}
$dbh->disconnect();
exit;
The opening lines of this script
(sales_staff_update.plx) use the Perl DBI module and set
up the client variables. The script then establishes a connection to
the MySQL server or dies if it cannot. Next, the program basically
gets all of the records at once from the temporary table. Each row of
the results, that is, each salesperson, is stored in an array,
itself stored in a comprehensive array. The
scalar $empids holds a reference to this parent array.
The script then loops over the salespeople (each row
of temp_empid), updating the sales_staff
table where the sales id numbers match. Once this is done, the script
disconnects from the MySQL server and exits. The script above is run
by the upgrade program like this:
perl sales_staff_update.plx 2> \
'./logs/sales_staff_update'
Any error messages are redirected to a text file which may be sent to the developer as needed.
Updating a Reference Table
The application's database has a table (makes_models)
that contains the names of hundreds of makes and models of
automobiles. This table can be joined to the inventory table by a
column called model_id. It saves the user from having to
type the make and model name for each automobile entered into
inventory. The client does, however, have the option of adding models
not already in the table. As part of the upgrade, the developer wants
to seize the opportunity not only to add new models to the client's
makes_models table, but also to get a copy of the user's additions
so that the developer can add them to his master table for future releases.
In preparation for the upgrade, the developer exported
makes_models from his master copy of the database using
mysqldump:
$ mysqldump --user='root' --password='abc123' \
--add-locks --no-create-info car_dealer \
makes_models > makes_models.sql
The --no-create-info option stops the CREATE
TABLE statement from being added to the output file. There is a
problem with the file that this will create, though. The developer
doesn't want to run
INSERT statements because they might cause duplicates (if not
errors). Instead, he wants to run REPLACE statements, which will
only insert a row if there is not already a row with the same primary key in
the table. If a row already exists, then MySQL will replace it with the
imported record. Unfortunately, there isn't an option with
mysqldump to generate REPLACE statements instead of
INSERT statements. But since the syntax of both statements are
identical, a simple Perl one-liner like the following can fix the exported file
before it's distributed to clients:
$ perl -p -i.bak -e 's/INSERT/REPLACE/;' makes_models.sql
The -p switch tells Perl to process each line of the
input file. The -i.bak here causes Perl to create a
backup file called
makes_models.sql.bak in the current directory. The original file
will be replaced. The -e switch tells Perl that what follows in
quotes is to be executed by Perl. The bit in quotes substitutes the first
occurrence of INSERT on each line with REPLACE.
With the makes_models.sql dump file properly adjusted,
the upgrade program only has to run it on the client's server to
update the client's makes_models table. After that has
been done, the upgrade program will then need to query the database
for user generated rows in
makes_models.
As mentioned before, the primary key of makes_models
is
model_id. This is not an automatically incremented numeric column.
Instead, it's a fixed length alphanumeric character column
(CHAR(9)). The first four characters are letters representing the
make and the last five are sequential numbers (e.g., FORD12345)
This is a little old fashioned, but the developer is more comfortable with this
method. When the user creates a new model, the application generates an
identifier with the letters USER followed by a five character
number. One advantage to this is that user created rows won't be overwritten by
updates. Another advantage is that it makes it easy to extract these rows. The
upgrade program will use the following SQL statement to create a temporary
table (exp_models) based on makes_models, select the
user added rows, and insert them into the temporary table:
CREATE TABLE exp_models
SELECT *
FROM makes_models
WHERE model_id LIKE 'USER%';
The first line here creates the temporary export table based on the types of
the columns selected from makes_models—in this case, all
(*) columns. As for the WHERE clause, the
% is a wildcard. Rows that start with USER and end in
any number of characters will be selected.
After this, the upgrade program merely needs to run mysqldump
to export the temporary table and then use the system's e-mail service to mail
it to the developer (preferably with the client's permission). Once this is
done, it can issue a DROP statement to delete the temporary
table.
Undoing or Finishing
With the table and data changes completed, the new and replacement
scripts can be installed. This is just a matter of copying files with
the
cp command. Once it's done, the user should be asked to verify the
integrity of the data and to make sure that the new scripts are working
properly. If the user responds that there are problems, the upgrade program
should offer to undo the upgrade and to restore the data. The scripts that are
tarred and gzipped in the backup directory could be restored like this:
cd $SCRIPT_DIR
tar -xvfz $UP_DIR/backup/scripts.tar.gz
The first line changes to the directory containing the
application's scripts. In the second line, the tar
command extracts all of the files backed up and restores them. In this
case, the z switch will
gunzip the files. The x switch will then
untar the file (the f switch). The v
option (for verbose) will allow the user to see what's happening. This is good
for reassuring the user. As for scripts that the upgrade added, the restoration
program can run a series of rm command statements to remove them
individually. All that will remain then is to restore the tables that were
tampered with and their data. This can be done by using the
mysqldump files that were created at the beginning:
mysql --user="$USR" --password="$PW" \
$DB < ./backup/sales_staff.sql
This command causes MySQL to run the SQL statements contained in
the text file created by mysqldump. After this, the
restoration program should e-mail the developer a message that the
upgrade failed, along with copies of the log files that the upgrade
generated.
If the upgrade was successful, though, the upgrade program could then delete the backup files and the installation program. However, it would be prudent to leave everything in place in case the client notices a problem with the upgrade a few days later. The files could be deleted during the next upgrade.
Closing Comments
This article has only illustrated one table change and one reference table update. However, it has touched on many of the factors that need to be considered in a MySQL application upgrade. A real upgrade would simply be more of the same. As with most situations with databases, a database application upgrade is tedious and requires a tremendous amount of planning and testing. To save yourself plenty of grief, be prepared for failure by making backups and by developing a restoration script. This will minimize client complaints and make it easier to recover the client's data and to keep the client's patronage. You should also try to consider every deviation that the client might make, and assume that the clients have no technical abilities. The smoother the upgrade, the less follow-up telephone support that will be required. This will lead to more referrals and more money per hour for your work in developing your application.
Russell Dyer has worked full-time for several years as a free-lance writer of computer articles, primarily on MySQL.
Return to ONLamp.com.



