ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Pitfalls of Transactions with PHP

by Kimberlee Jensen
12/18/2003

Why Transactions?

Transactions -- a database management system (DBMS) feature that enables data recovery on internal errors -- raise significant controversy among developers. Some view transactions as a crutch, while others view them as a necessity.

Using transactions can save time and money, because transactions enable databases to restore themselves. But inappropriate use of transactions means wasted overhead and impeded performance. And transactions do not replace programming; sadly, some developers have foregone error trapping and data validation, thinking that transactions will catch all errors.

Transaction support in a DBMS or its table types is defined using the ACID test:

Initially, the MySQL DBMS did not support transactions. As of version 3.23-max, MySQL supports transactions with two additional table types, InnoDB and BDB. InnoDB is recommended for its CPU efficiency and support of row-level and multiple-concurrency locking. Now that MySQL has full transaction support, PHP developers need to incorporate them effectively. Although PHP 4.x lacks native transaction functions, transaction statements can be used the same way as standard SQL queries.

Related Reading

Web Database Applications with PHP, and MySQL
By Hugh E. Williams, David Lane

When to Use Transactions

To maximize the use of transactions, developers should follow these simple guidelines:

By supporting transactions, the DBMS provides an extreme level of reliability and drastically reduces the need for manual recovery and data integrity validation. However, any transaction, by definition, adds overhead because at some level, the DBMS must lock data and take a snapshot of the data at the outset. So transactions must be used appropriately.

Transactions Versus Errors

Developers can sometimes misunderstand the definition of a transaction failure. The transaction will fail when the database engine itself has an error that causes a query or queries not to complete. The most common misconception is that 0 rows returned or affected are considered a failure. For example, if a query of DELETE FROM tablex WHERE fieldy = 10 returns 0 affected rows, that is considered a successful query according to the transaction definition because the query completed and returned its affected rows. Whether the number of affected rows matches the expected result is a matter for programming, not transactions.

Some developers believe transactions mean the elimination of programming, even using transactions in lieu of programmatic data validation. This is wrong: transactions are intended to recover errors made internally by the database engine or externally by extreme measures such as a power outage. Transactions were never intended to account for user errors.

Web developers must trap errors because transactions do not prevent data from being lost between the database and the web application. Theoretically, a transaction could complete successfully, but the results of the transaction could fail to reach the web application. If the web application connects to the database server through a local host connection (using the UNIX socket), this is generally not an issue unless the socket becomes corrupted or disabled. However, if the web application connects to the database via the TCP/IP socket, a packet can be dropped in transmission. From the database standpoint, the transaction has succeeded, but if the web application has lost data, it has failed.

For example, the database could execute a SELECT query successfully and return a result set, but if those data rows are lost in the transfer to the web application, the application would show 0 as the number of rows. This is why before looping through a result set, a developer always needs to verify that mysql_num_rows(result identifier) is non-zero to make sure the web application has received the data successfully.

Assuming transactions cover all possible failures is a mistake. Programmers still need to program!

As well, database developers still need to plan. Before the programmers begin to code the web application, the database developers must make informed decisions about when to use a transaction-safe table. MyISAM table types typically execute SELECT queries much faster than InnoDB types. According to recent PHP Oracle/MySQL benchmarks, InnoDB table types handle more inserts per second than BDB table types.

For this example we will use InnoDB for our transaction-essential tables and MyISAM for tables not requiring transactions.

An Illustrated Example

Let's model a fictional e-commerce site that sells MP3 files. The database will need a table to store customer contact information and another table to represent each individual MP3 file the customer is purchasing. We will also assume the use of temporary data to hold the user's shopping cart information. We will then transfer the information to permanent tables upon checkout. Any shopping cart information that is not transferred to the permanent tables can be purged. We will use temporary tables instead of sessions to hold the shopping cart data because we want to keep statistics on user behavior, such as placing items in a shopping cart but not completing the transaction. (Once our fictional web site is as successful as Amazon, we may switch to sessions that hold shopping cart data).

Example Tables

The tables involved with the completion of an order will include:

Here are two of the proposed tables in the model:

The order_detail table:

CREATE TABLE order_detail (
    detail_ID INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    order_ID  INT UNSIGNED,
    mp3_ID    INT UNSIGNED)

The order table:

CREATE TABLE orders (
    order_ID      INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    customer_ID   INT UNSIGNED,
    order_date    TIMESTAMP(8),
    total         FLOAT (6, 2),
    temp_order_ID INT UNSIGNED)

Transactions in the Example

The pseudo-code for what transpires during checkout reads as such:

  1. Insert customer data into the customer table.
  2. Transfer data from the temporary order tables to the permanent order tables.
  3. Update the permanent order tables to reflect their relationships to the customer table.
  4. Delete data from the temporary order tables (if necessary).

This process exemplifies a transaction, as all stages contribute to a single goal — the completion of a sale. If the database crashes during a single step of this process, we immediately want to recover and restore the information. For example, if the customer information fails to load, we want to revert the database state back to the previous state, preserving the shopping cart data in the temporary order tables.

These queries must execute in the specified order. If the deletion of data from the temporary tables executes before the data is transferred to the permanent tables, the operation will fail. Using transaction-safe tables will prevent that problem. To ensure that these tables use the InnoDB handler, the developer can use the ALTER statement to change the table type after creation, or tack the table type on at the end of the CREATE statement. The latter is preferred.

CREATE TABLE orders (
    order_ID      INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    customer_ID   INT UNSIGNED,
    order_date    TIMESTAMP(8),
    total         FLOAT (6, 2),
    temp_order_ID INT UNSIGNED) Type=InnoDB

One database can use numerous table types. Assuming we are selling MP3 files legitimately for download, we can assume a table to hold the name, description, and price of each song. We will also assume separate linking tables to link songs to categories (assuming a many-to-many relationship), songs to artists, songs to CDs, and songs to record labels. These tables will be searched frequently. The process of adding to the inventory is done in batch format when traffic is slow. Also, since these are MP3 files, the purchasing of a song does not affect inventory; adding a song to a shopping cart does not impact the song table whatsoever. Therefore, these tables should be MyISAM, which processes SELECT queries the fastest. This the default type in MySQL as of version 3.23.

The programming logic can catch any SQL queries that fail or return no results. To make these SELECT queries that search the inventory part of a transaction would be a waste of overhead.

The best strategy for working with transactions is to apply the try/catch programming strategy from Java. Let's examine a model of troubleshooting with PHP.

Web application coding model
Figure 1. Web application coding model.

The Code

Now, for a code example:

Let's assume the customer is performing a keyword search and this code represents the results page:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Mega MP3 Store Search Results</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body bgcolor="#FFFFCC" text="#000000">
<?php

// check to see that the form has been submtited and that the keyword field is
// empty
if (($_GET["Submit"] == "Search it!") &&
    (!empty($_GET["keywords"]))       &&
    (strlen($_GET["keywords"]) > 1))
{
    // If the user did not bother to type in any keywords, or typed in one
    // letter why bother with the rest of the code. Note, a regular expression
    // is also useful to validate the quality of the keywords escape any
    // characters that will affect the database
    $keywords = addslashes($_GET["keywords"]);

    // the @ symbol is warning suppression so a warning will not be thrown back
    // to the user, be careful not to over-rely on warning suppression, every
    // warning suppression should be modified with an if else to catch the
    // warning
    if ($connect = @mysql_connect("localhost", "specialwebuser",
                                  "good4pasSwor9d"))
    {
        mysql_select_db("music_for_sale");

        // we will keep the query simple and ignore the record labels and
        // genres for now

        $query = "SELECT songs.name, songs.description, songs.price, songs.ID,
                  artists.artist_group_name FROM songs
                  INNER JOIN artists ON songs.ID = artists.songID WHERE
                  ((songs.name LIKE '%".$keywords."%') OR
                  (songs.description LIKE '%".$keywords."%'))";

        // test for errors in the query, again, if this generates an SQL error,
        // do not display the error to the user, but rather trap it

        if ($result_query = @mysql_query($query, $connect))
        {

            // test if there are results returned by the query, if not,
            if (mysql_num_rows($result_query) > 0)
            {
                print ("<p> We found: " . mysql_num_rows($result_query) .
                    "match(es) to your search for <b>$keywords</b>lt;/p>

                while($row = mysql_fetch_array ($result_query))
                {

                // obviously you can use more sophisticated HTML than merely
                // paragraph formatting
                print ("<p>" . $row["title"] . " " . $row["description"] .
                    "</p>");

                } // end while - looping through the result set

            } // end if there is a result set
            else
            {
                print ("<p>We failed to find a match for <b>$keywords</b>lt;/p>

            } // end else - there were no results

        } // end if no error in the query
        else
        {
            // here you may also want to email details of the error to the
            // Webmaster, notice we take ownership of the blame, do not blame
            // the customer for a query error

            print ("<p>We are having technical difficulties, please
                contact our customer service at 1-800-555-5555.</p>");

        } // end else - query had an error
    } // end if connection was successful
    else
    {
        // here you may also want to email details of the error to the
        // Webmaster, noting the connection failed
        print ("<p>We are having technical difficulties, please
            contact our customer service at 1-800-555-5555.</p>");
    }
} // end if keywords are not empty
else
{
    // here you can blame the customer for not entering a keyword
    print ("<p>You did not enter any keywords, please enter
        something.</p>");
} // end else keywords are in fact empty

?>

<form action="results.php" method="get" name="search" id="search">
<input name="keywords" type="text" id="keywords">
<input type="submit" name="Submit" value="Search it!">
</form>
</body>
</html>

Some developers execute the try/catch concept by using exit or or die statements. This is a Perl and Unix-scripting strategy that is programmatically sound, but HTML faulty. For example, let's examine this code:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Mega MP3 Store Search Results</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body bgcolor="#FFFFCC" text="#000000">

<?php

$connect = @mysql_connect ("localhost", "specialwebuser", "good4pasSwor9d")
    or die ("<p>We are having technical difficulties, please contact our
             customer service at 1-800-555-5555.</p>");

$query = "SELECT songs.name, songs.description, songs.price, songs.ID,
          artists.artist_group_name FROM songs
          INNER JOIN artists ON songs.ID = artists.songID WHERE
          ((songs.name LIKE '%".$keywords."%')
          OR (songs.description LIKE '%".$keywords."%'))";

// we will keep the query simple and ignore the record labels and genres for
// now
$result_query = @mysql_query($query, $connect)
    or die ("<p>We are having technical difficulties, please contact our
             customer service at 1-800-555-5555.</p>");

// more code to follow
?>

</body>
</html>

By using the exit statement here, the server never sends the </body> and </html> tags to the client browser if there is an error in the database connection or in the query. This breaks the rules of XHTML compliance and also hurts compliance with assistive technologies. As the industry moves further into client-side XML, developers should expect to make their code comply with these standards.

To come full circle with our strategy, let's examine code for performing our sale completion and incorporate transactions with PHP. Unlike languages such as Perl and Java, PHP does not provide a native API for transactions, so developers must use existing functions in the MySQL API. We will include some pseudo-code and eliminate the surrounding HTML for brevity.

<?php

// we will assume that we have client-side validation with JavaScript, but that
// our site may be accessed with non-JavaScript browsers such as Lynx, we will
// also assume our customers table has three required fields: first_name,
// last_name, and email

$form_error="";

if (empty($_POST["first_name"])
{
    $form_error .= "first name";
}

if (empty($_POST["last_name"])
{
    $form_error .= "last name";
}

if (!preg_match("/([\w]{1,}[@]{1}[\w]{1,}[\.]{1}[\w]{2,})/", $_POST["email"]))
{
    // we'll perform a very basic email validation
    $form_error .= "email";
}

if ($form_error == "")
{
    // if the user completed the form to our minimal standards

    if ($connect = @mysql_connect("localhost", "specialwebuser",
        "good4pasSwor9d"))
    {
        mysql_select_db("music_for_sale");

        // make sure we flag the database to not commit after each executed
        // query
        $query        = "SET AUTOCOMMIT=0";
        $result_query = @mysql_query($query, $connect);

        $query        = "BEGIN";
        $result_query = @mysql_query($query, $connect);

        // generally the BEGIN statement will not cause an SQL error, so
        // trapping of this query is not essential

        // initialize a flag to true, and if a single query fails to return the
        // correct affected rows, or if it generates an error, we will set the
        // flag to false
        $success = true;

         // note that the insert into customers could be outside of the
         // transaction, we may want to capture that data regardless of whether
         // or not the sale is complete, so this query could be executed
         // outside of the transaction
        $query = "INSERT INTO customers (first_name, last_name, email)
                  VALUES ('$first_name,' '$last_name', '$email'))";

        // test for errors in the query, again, if this generates an SQL error,
        // do not display the error to the user, but rather trap it
        if ($result_query = @mysql_query($query, $connect)) 
        {
            // verify the INSERT statement took effect
            if (mysql_affected_rows($connect) == 1)
            { 
                /* Next we will use pseudo-code for the following operations.
                We could choose to nest all the queries, but instead we will
                take advantage of transactions and error trapping. We do want
                to trap an error and point out to the user that the transaction
                was not complete. We want to tell the user ourselves, rather
                than have the database generate a warning or PHP generate a
                syntax error.

                * Use SELECT LAST_INSERT_ID() to capture the customer ID number
                generated by virtual of being an AUTO_INCREMENT field

                * Assign the customer ID to a scalar variable

                * SELECT the values from the temp_orders table matching the
                temp order ID to the cookie

                * INSERT that information into the orders table, which
                represents the permanent orders

                * Use SELECT LAST_INSERT_ID() to capture the order ID, assign
                it to a scalar variable, so that you can email it to the
                customer and print it out as a tracking number for the customer

                * SELECT the values from the temp_order_details table matching
                the temp order ID to the cookie

                * INSERT that information into the order_details table

                * DELETE from the temp_order_details table

                * DELETE from the temp_orders table

                To execute each query we will use this syntax, which will
                enable us to set a flag if there is an error. This is essential
                because a query that completes execution but returns 0 rows
                affected is not considered a failure in terms of transactions.

                $result_query = @mysql_query($query, $connect);
                if (($result_query == false) &&
                   (mysql_affected_rows($connect) == 0))
                 {
                    // verify the query executed completely and verify that it
                    // had impact on the table

                    $success = false;

                    // here also, the developer could choose to add a ROLLBACK
                    // statement
                }
            */
            } // end if affected rows is 1 for the INSERT statement
        } // end if no error in the insert query query
        else
        {
               print ("<p>We are having technical difficulties, please
                    contact our customer service at 1-800-555-5555.</p>");
        }// end else - problems with the query
        if ($success == true)
        {
            $query = "COMMIT";
            $result_query = @mysql_query($query, $connect)
        } // end if success is true
        else
        {
            print ("<p>We are having technical difficulties, please
                    contact our customer service at 1-800-555-5555.</p>");
        } // end else - the success flag is false
    } // end if connection is successful
    else
    {
        print ("<p>We are having technical difficulties, please contact
                our customer service at 1-800-555-5555.</p>");
        // here you may also want to email details of the error to the Webmaster
    }
} // end if form validation passed
else
{
    // give the user some instructions on how to return to the form, also pass
    // along back to the form data via the query string to be used in
    // generating an error message on the form page
    print ("<p>Your forgot to provide us with some information, there
            was a problem with the following field(s): $form_error. Please
            either hit the back button on your browser or <a href =
            \"form.php?error=".urlencode($form_error)."\">click
            here</a> to return to the form .</p>");
}
?>

Error Handling with Transactions

Using the "if/else" structures to trap errors may seem like extraneous code, but a good developer will use include files and functions to break the code into reusable modules. To further modularize this code, a good developer will create functions and include files. For example, one function could be the error message shown to the user and an email alert to the site administrator:

function show_error($error_code)
{
    print ("<p>We are having technical difficulties, please contact our
            customer service at 1-800-555-5555.</p>");
    mail("websiteguru@mysite.com ", "error with database", $error_code,
            "From: site@mysite.com");
} // end function

By calling this function, the else blocks can simply read:

else
{
    show_error("connection failed");
}

Include files could also be used to hold the connection parameters so that when the user names and passwords change every six months, the information is updated in one spot. Note that any include file should be named .php not .inc so that the code is never seen as plain text by a browser (or you must change your web server configuration).

Additionally, any include file that holds the user name and password should include a check to see how it is being loaded. If the REQUEST_URI is equal to the same name as the include file, then redirect to the home page so that the user cannot see the file.

Conclusion

There is a significant risk of data loss when sending data to and from a socket (UNIX or TCP/IP). Transactions again will not recover that. Transactions can only recover when the DBMS fails, but once the queries have completed, the transaction completes.

Whether the data reaches the application successfully is a matter of programming. With error trapping, the developer can force a rollback if a single piece of the information fails to reach the application. Also, the developer should always write custom error messages. Printing out the errors generated from the DBMS using mysql_error() is not only completely useless to the site visitor, but also a potential security risk to the database.

Transactions offer wonderful reliability for mission-critical operations where not only is the data at risk of being corrupted or lost, but also the loss of data would severely jeopardize the company's business. Transactions are best used in conjunction with data validation and error trapping on behalf of the programmer. Because MySQL offers the flexibility among table types, developers should choose the transaction-safe and non-transaction safe tables appropriately.

Kimberlee Jensen is a freelance Web developer and an IT instructor at Seattle Central Community College.


Return to PHP DevCenter.

Copyright © 2009 O'Reilly Media, Inc.