Most developers design MySQL databases for their own use or for the use of their employers. Occasionally, though, a developer will design a database for use by others, for sale as an application. Since an application developer usually isn't present when his application is installed and used, he must consider many factors when designing the database.
Despite excessive planning, though, an application is never perfect when it's first released. Invariably, tables will need to be changed and scripts will need to be replaced. This is fairly easy for a developer to do on his own server. The difficulty comes when creating an upgrade program that will make these changes, a program that non-technical clients can run on their own. Careful planning of an upgrade is required to keep from destroying the client's data.
In this article I will examine such a quandary and offer some suggestions for putting together an upgrade to a MySQL application. I'll provide examples of how such an upgrade might unfold. Finally, I'll also comment on how one might update the data in a reference table of an application.
For the examples in this article I will use the scenario of a developer who has created a MySQL application that runs on Linux. The application is designed for used-car dealers to track their inventory, sales, and other related information. It's accessed through the web browser; the interfaces are written in Perl using the CGI module. In addition to HTML pages for employee use, there are also pages for the public to view automobiles for sale, on-line.
|
Related Reading
|
The developer has sold several copies of his application over the Web. His clients each pay a quarterly subscription fee for regular updates. Now that it's time for an update, the developer wants to make some changes to improve the program.
For one thing, two tables, sales_staff
and employees, contain redundant information. The
sales_staff table holds the commission pay-out rate, sales
ranking, and other information, including each salesperson's name and telephone
numbers (home and mobile). The employees table contains personnel
data on all of a client's employees, not just salespeople, also including
contact information. When a client has to add new salespeople, he must add
them to both tables. To change a telephone number, he must make telephone
number changes in both. Naturally, clients often forget to change both.
As part of the upgrade, the developer wants to eliminate the name
and telephone columns from sales_staff, instead adding a
column for the employee id to link the records
to employees. He'll also have to change the Perl scripts
that talk to these tables, but as this is primarily a MySQL article,
we'll gloss over those parts. The upgrade will contain several other
improvements, but this one change is sufficient for our example.
Let's now go through each step of the process in the order in which
the upgrade would be performed.
Since clients, particularly used-car dealers, can be pretty rude
when their data is lost, the developer has tested the upgrade
extensively. Additionally, he wants to take the precaution of having
the upgrade program make a backup of the client's data before
upgrading the application. There are a few MySQL statements and tools
that may be used to backup data. However, since many of his clients
have installed the application on virtual domains (i.e., on sites
rented from web hosting companies), they havu limited
choices. Therefore, the developer has decided to
use mysqldump to backup the effected tables.
A client must first upload the upgrade package—a tarred and
gzipped file containing the upgrade program (upgrade.plx)
and other related files—to his web server. He then must log
onto the server with administrative access to install the package and
to run
upgrade.plx. The upgrade package creates
a backup sub-directory beneath the installation directory
to store backups of data and scripts. After some opening fanfare, the
user is prompted to provide an administrative username and password
for MySQL. The user is also asked for the database name (if different
from the default) for the dealership application; some web hosting
companies assign names to their customers' databases. The upgrade
program will store and export this client information as
bash environment variables like this:
export USR='username'
export PW='password'
export DB='database'
export UP_DIR=`pwd`
export SCRIPT_DIR='/var/cgi/cardealer'
This method sets up variables that can be used by all of the
scripts contained in the upgrade program. Of course, the actual
username and other values will contain the client's specific
information. As for the upgrade directory (UP_DIR), that
is set by the pwd command. This could be incorrect,
though, if the user started upgrade.plx from a different
directory. Therefore, it's better to have the user confirm this value
before proceeding. All of these shell variables can be retrieved by
bash when needed by adding a dollar-sign in front of the variable
name like this: $PW. For Perl, the variables are part of an
environment hash and can be retrieved specifically like this:
$ENV{PW}.
Getting back to the backup, for good measure, the user is then
asked if the MySQL tables may be locked before backing up the data.
This is to ensure that the data isn't changed during the backup. If
the user answers affirmatively, a table can be locked using
the bash variables like this:
$ mysql --user="$USR" --password="$PW" \
-e "LOCK TABLES sales_staff READ" $DB
$ mysql --user="$USR" --password="$PW" \
-e "FLUSH TABLES" $DB
These statements should be entered from the Linux command line, not
the
mysql client. The SQL statements (i.e., LOCK and
FLUSH) could be invoked by Perl using the DBI module. However, since a mysqldump statement will follow, we'll stay on the command line
for now.
The -e option indicates that what follows should be
executed in MySQL. Within the quotes is a LOCK statement:
the table that's to be locked is given; to lock more tables,
just list them right after the first table, with spaces in
between. A READ lock prevents users from writing to the
tables during the backup while still allowing them to read the
data. By the way, these command lines, like many in this article, are
wrapped for nicer display. However, they may be entered on the same
line or with a backslash as shown here at the end of each incomplete
line to let the shell know that more follows. Once the table is
locked, the upgrade program can back it up:
$ mysqldump --user="$USR" --password="$PW" \
$DB sales_staff > ./backup/sales_staff.sql
$ mysql --user="$USR" --password="$PW" \
-e "UNLOCK TABLES" $DB
The mysqldump utility creates a plain text file
containing a
CREATE TABLE statement and a series of INSERT
statements to restore the data if necessary. The table name is given here and
the output is redirected to the backup path and the backup filename.
Incidentally, --add-locks could have been added to the
mysqldump statement here instead of running the LOCK
and FLUSH statements before, and so an UNLOCK
statement would not have to be run later. Choose the approach you prefer.
If it's necessary to restore the data, the developer has written a
separate program called restore.plx to restore the data
(see last section below). In addition to backing up the data, the
upgrade program will also make copies of the Perl scripts that will be
replaced. This will be done using
tar like this:
tar -cvzf $UP_DIR/backup/scripts.tar.gz \
$SCRIPT_DIR/car_db/*
The c option creates an archive and the f
option a
tar file. The v option verbosely lists the files
being archived. The z option compresses the file with
gzip. Finally, the value of the $UP_DIR variable
contains the installation directory of the upgrade determined at the beginning.
The $SCRIPT_DIR is another environment variable that contains the
directory of the application's CGI pages.
|
To eliminate the redundancy in the sales_staff table,
the upgrade program first needs to run an ALTER TABLE
statement to add a column for the employee id.
For tidiness, the developer chose to add the emp_id
column after the sales_id column (the first and
primary key column):
ALTER TABLE sales_staff
ADD COLUMN emp_id INT
AFTER sales_id;
Next, the employee id number must be determined for each row. This can be messy and make the developer wish he had planned his application better. Nevertheless, that's in the past and it has to be dealt with now. To update most of the records, the upgrade program will run the following SQL statements:
CREATE TABLE temp_empid
(empid INT, salesid INT);
INSERT INTO temp_empid
SELECT employees.emp_id, sales_id
FROM employees, sales_staff
WHERE emp_first = rep_first
AND emp_last = rep_last;
UPDATE sales_staff, temp_empid
SET emp_id = empid
WHERE sales_id = salesid;
DROP TABLE temp_empid;
The first SQL statement creates a temporary
table, temp_empid to hold each employee's id number and
the related sales id. The second SQL statement inserts
into temp_empid each emp_id selected
from employees and the related sales_id
selected from sales_staff, where the first and last name
columns of each table match. The third SQL statement
updates sales_staff by setting the values of the new
column emp_id equal to the matching empid
column in temp_empid where the sales id columns of each
of these tables match. Finally, the fourth SQL statement deletes the
temporary table. All of these SQL statements could be replaced with
one statement without the use of a temporary table:
UPDATE sales_staff, employees
SET sales_staff.emp_id = employees.emp_id
WHERE emp_first = rep_first
AND emp_last = rep_last;
This will work, but being split apart into a set of three SQL statements like above can help to solve some data migration problems.
There are some potential problems with running the set of three SQL
statements above unchecked. First, before running
the UPDATE statement it's a good idea to extract a list
of rows from
temp_empid at least for where there are duplicates (if not for all
rows) so that the user can adjust the information before updating
sales_staff. The following SQL statement will help:
SELECT COUNT(sales_id), sales_id, CONCAT(rep_first, ' ', rep_last)
FROM employees, sales_staff
WHERE emp_first = rep_first
AND emp_last = rep_last
GROUP BY employees.emp_id;
A Perl program would use this statement to capture the count of
rows and the name and sales id of each sales representative. For
counts greater than one, the user would be given a list of
salespeople. By capturing
sales_id, the upgrade program can run another SELECT
statement to retrieve the entire related record from sales_staff
if requested, so that the user can have more data to associate rows in
temp_empid to salespeople. The programs, of course, will have to
keep circling back and querying the database again for duplicates until there
are none. Then the UPDATE statement can be run.
For rows in sales_staff where there are no matches
(probably due to different spellings of names), the user should be
asked to adjust the data. The simplest way to deal with these records
is to run the following SQL statement after the UPDATE
statement is run to select only records that were not updated
(where emp_id in sales_staff is
still NULL):
SELECT sales_id, CONCAT(rep_first, ' ', rep_last)
FROM sales_staff
WHERE emp_id IS NULL;
One potential issue with the all-in-one UPDATE
statement is it refers to multiple tables. This feature was added to
MySQL in version 4.0. Clients using an earlier version will need a
workaround. This requires getting Perl involved in performing the
updates instead of using a multi-table
UPDATE statement. Here's one way in which it could be done in Perl
after the CREATE TABLE and the INSERT statements are
run, but before the DROP statement:
#!/usr/bin/perl -w
use strict;
use DBI;
# Client Variables
my $db = $ENV{DB};
my $pw = $ENV{PW};
my $user = $ENV{USR};
# Connect to MySQL
my $dbh = DBI->connect( "DBI:mysql:$db:localhost", $user, $pw )
or die "Failed: " . DBI->errstr();
# Get rows from temp_empid
my $sql = "SELECT empid, salesid FROM temp_empid";
my $sth = $dbh->prepare($sql);
$sth->execute();
my $empids = $sth->fetchall_arrayref();
$sth->finish();
# Loop through array of temp_empid rows
# and update sales_staff table
foreach my $emp (@$empids) {
my ($empid,$salesid) = @$emp;
my $sql = "UPDATE sales_staff
SET emp_id = ?
WHERE sales_id = ?";
my $sth = $dbh->prepare($sql);
$sth->execute( $empid, $salesid );
$sth->finish();
}
$dbh->disconnect();
exit;
The opening lines of this script
(sales_staff_update.plx) use the Perl DBI module and set
up the client variables. The script then establishes a connection to
the MySQL server or dies if it cannot. Next, the program basically
gets all of the records at once from the temporary table. Each row of
the results, that is, each salesperson, is stored in an array,
itself stored in a comprehensive array. The
scalar $empids holds a reference to this parent array.
The script then loops over the salespeople (each row
of temp_empid), updating the sales_staff
table where the sales id numbers match. Once this is done, the script
disconnects from the MySQL server and exits. The script above is run
by the upgrade program like this:
perl sales_staff_update.plx 2> \
'./logs/sales_staff_update'
Any error messages are redirected to a text file which may be sent to the developer as needed.
The application's database has a table (makes_models)
that contains the names of hundreds of makes and models of
automobiles. This table can be joined to the inventory table by a
column called model_id. It saves the user from having to
type the make and model name for each automobile entered into
inventory. The client does, however, have the option of adding models
not already in the table. As part of the upgrade, the developer wants
to seize the opportunity not only to add new models to the client's
makes_models table, but also to get a copy of the user's additions
so that the developer can add them to his master table for future releases.
In preparation for the upgrade, the developer exported
makes_models from his master copy of the database using
mysqldump:
$ mysqldump --user='root' --password='abc123' \
--add-locks --no-create-info car_dealer \
makes_models > makes_models.sql
The --no-create-info option stops the CREATE
TABLE statement from being added to the output file. There is a
problem with the file that this will create, though. The developer
doesn't want to run
INSERT statements because they might cause duplicates (if not
errors). Instead, he wants to run REPLACE statements, which will
only insert a row if there is not already a row with the same primary key in
the table. If a row already exists, then MySQL will replace it with the
imported record. Unfortunately, there isn't an option with
mysqldump to generate REPLACE statements instead of
INSERT statements. But since the syntax of both statements are
identical, a simple Perl one-liner like the following can fix the exported file
before it's distributed to clients:
$ perl -p -i.bak -e 's/INSERT/REPLACE/;' makes_models.sql
The -p switch tells Perl to process each line of the
input file. The -i.bak here causes Perl to create a
backup file called
makes_models.sql.bak in the current directory. The original file
will be replaced. The -e switch tells Perl that what follows in
quotes is to be executed by Perl. The bit in quotes substitutes the first
occurrence of INSERT on each line with REPLACE.
With the makes_models.sql dump file properly adjusted,
the upgrade program only has to run it on the client's server to
update the client's makes_models table. After that has
been done, the upgrade program will then need to query the database
for user generated rows in
makes_models.
As mentioned before, the primary key of makes_models
is
model_id. This is not an automatically incremented numeric column.
Instead, it's a fixed length alphanumeric character column
(CHAR(9)). The first four characters are letters representing the
make and the last five are sequential numbers (e.g., FORD12345)
This is a little old fashioned, but the developer is more comfortable with this
method. When the user creates a new model, the application generates an
identifier with the letters USER followed by a five character
number. One advantage to this is that user created rows won't be overwritten by
updates. Another advantage is that it makes it easy to extract these rows. The
upgrade program will use the following SQL statement to create a temporary
table (exp_models) based on makes_models, select the
user added rows, and insert them into the temporary table:
CREATE TABLE exp_models
SELECT *
FROM makes_models
WHERE model_id LIKE 'USER%';
The first line here creates the temporary export table based on the types of
the columns selected from makes_models—in this case, all
(*) columns. As for the WHERE clause, the
% is a wildcard. Rows that start with USER and end in
any number of characters will be selected.
After this, the upgrade program merely needs to run mysqldump
to export the temporary table and then use the system's e-mail service to mail
it to the developer (preferably with the client's permission). Once this is
done, it can issue a DROP statement to delete the temporary
table.
With the table and data changes completed, the new and replacement
scripts can be installed. This is just a matter of copying files with
the
cp command. Once it's done, the user should be asked to verify the
integrity of the data and to make sure that the new scripts are working
properly. If the user responds that there are problems, the upgrade program
should offer to undo the upgrade and to restore the data. The scripts that are
tarred and gzipped in the backup directory could be restored like this:
cd $SCRIPT_DIR
tar -xvfz $UP_DIR/backup/scripts.tar.gz
The first line changes to the directory containing the
application's scripts. In the second line, the tar
command extracts all of the files backed up and restores them. In this
case, the z switch will
gunzip the files. The x switch will then
untar the file (the f switch). The v
option (for verbose) will allow the user to see what's happening. This is good
for reassuring the user. As for scripts that the upgrade added, the restoration
program can run a series of rm command statements to remove them
individually. All that will remain then is to restore the tables that were
tampered with and their data. This can be done by using the
mysqldump files that were created at the beginning:
mysql --user="$USR" --password="$PW" \
$DB < ./backup/sales_staff.sql
This command causes MySQL to run the SQL statements contained in
the text file created by mysqldump. After this, the
restoration program should e-mail the developer a message that the
upgrade failed, along with copies of the log files that the upgrade
generated.
If the upgrade was successful, though, the upgrade program could then delete the backup files and the installation program. However, it would be prudent to leave everything in place in case the client notices a problem with the upgrade a few days later. The files could be deleted during the next upgrade.
This article has only illustrated one table change and one reference table update. However, it has touched on many of the factors that need to be considered in a MySQL application upgrade. A real upgrade would simply be more of the same. As with most situations with databases, a database application upgrade is tedious and requires a tremendous amount of planning and testing. To save yourself plenty of grief, be prepared for failure by making backups and by developing a restoration script. This will minimize client complaints and make it easier to recover the client's data and to keep the client's patronage. You should also try to consider every deviation that the client might make, and assume that the clients have no technical abilities. The smoother the upgrade, the less follow-up telephone support that will be required. This will lead to more referrals and more money per hour for your work in developing your application.
Russell Dyer has worked full-time for several years as a free-lance writer of computer articles, primarily on MySQL.
Return to ONLamp.com.
Copyright © 2007 O'Reilly Media, Inc.