ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Autofilled PHP Forms

by Gavin Andresen
03/16/2006

I hate typing, but I like writing code. A while ago I started to dread all the rote typing required to handle forms in PHP, and began looking for a Better Way. This article describes how I used PHP's regular expression functions to do most of the heavy lifting required to process forms properly, saving lots of typing and giving me time to do stuff I don't hate, such as writing code (and playing NetHack).

The Problem

Jeff Cogswell described the general problem in his article User-Friendly Form Validation with PHP and CSS: you have to display the form, validate the input, and then either display some sort of thank-you page or, if validation fails, redisplay the form with any errors shown along with the values that the user entered. Because I generally code forms for clients who are paying lots of money for a snazzy-looking web site, the forms also have to look nice and match the appearance of the rest of the site.

The Tedious Typing Solution

The most straightforward way to solve the problem is to create a nice-looking form in your favorite WYSIWYG HTML editor and then insert bits of PHP code to display form values and error messages in all the right spots. For example, if the form has a required field named "email," I use PHP code like this to validate it:

$validationData          = array();
$validationData['email'] =
  array('isRequired', type='emailAddress');

$formErrors = validateForm($_POST, $validationData)

That's not too bad--I can write the validateForm() function once and use it over and over again, extending it whenever I need to deal with another type of value.

Redisplaying the form with errors and correct values is where things get messy. Some simple HTML like:

<td align="right">Email:</td>
<td><input name="email" value="" /></td>

becomes a maze of HTML and PHP:

<td align="right"
<?php if (isset($formErrors['email'])) {
          echo 'class="error"';
      } ?> >
Email:</td>
<td><input name="email"
<?php if (isset($_POST['email'])) {
          echo 'value="'.$_POST['email'].'"';
      } ?> /></td>

Repeating that for every field in the form is annoying. Repeating it for every <option> value in a 50-value Select Your State drop-down list is almost painful enough to make me give up coding and become a plumber.

The Problem, Restated

Start with an HTML form containing no PHP code. You have PHP arrays containing the values that should be shown and any form validation errors. You want a PHP function that takes the arrays and HTML, and returns the HTML modified a little bit so it shows all the right stuff--give it <input name="email"> and $_POST['email'] = "gavin@mailinator.com", and it should return <input name="email" value="gavin@mailinator.com">. After searching the PHP manual and the Web, I couldn't find a function that did that, so I wrote it myself and called it fillInFormValues().

fillInFormValues Details

fillInFormValues takes three arguments: fillInFormValues($html, $values, $errors)

  1. $html contains the HTML markup (in a PHP string) for the form. fillInFormValues isn't fussy about the HTML you pass in; it will handle HTML3, HTML4, or XHTML, and it doesn't care if you pass in an entire page's worth of HTML or just HTML fragments containing form input fields. fillInFormValues does require that you pass in valid HTML; it isn't as forgiving as some web browsers, so don't pass in HTML with missing </textarea> tags.
  2. $values is a PHP array containing the values that the form should show, where $values['fieldName'] = fieldValue. If you're redisplaying the form after a form validation error, you can just pass in $_POST or $_GET or $_REQUEST.
  3. $errors is a PHP array containing validation errors, where $errors['fieldName'] = "error message". fillInFormValues looks for a <ul class="error"></ul> element in the HTML and inserts the error message into it. Use HTML <label> tags to mark up your form text:

    <label for="address">Street Address:</label>
    <input name="address" id="address" />

    If $errors['address'] is set, fillInFormValues will add class="error" to the corresponding <label>. Defining a simple CSS rule like label.error: color: red; will then make erroneous input bright red. Using <label> tags also makes your forms more accessible to people using screen readers or other accessibility aids and also tells the web browser to put the cursor in the address field if the user clicks on the Street Address text, making the form easier to use for everybody. Note that labels match to input fields using the id attribute, not the name element (though I always give my forms the same name and ID so I don't confuse myself).

fillInFormValues returns a string that is $html modified as little as possible to display the $values and $errors. If you pass in empty arrays, it returns $html unchanged, which makes it easier to display the form the first time. Here's a complete working example:

<?php
ob_start();
?>
<html>
<head>
<title>fillInFormValues: short example</title>
<style>
.error { color: red; }
</style>
</head>
<body>
<h1>Sign up for our newsletters</h1>
<ul class="error"><li>PLACEHOLDER FOR FORM ERRORS</li></ul>
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="GET">
<table border="0">
<tr>
 <td><label for="email">Your Email Address:</label></td>
 <td><input type="text" name="email" id="email"></td>
</tr>
<tr>
 <td><label for="newsletter">Sign up for these newsletters:</label></td>
 <td>
 <input type="checkbox" name="news" id="news">
  <label for="news">News</label><br />
 <input type="checkbox" name="security" id="security">
  <label for="security">Security Notices</label><br />
 <input type="checkbox" name="specials" id="specials">
  <label for="specials">Specials</label>
 </td>
</tr>
<tr><td> </td>
 <td><input type="submit" name="submit" value="Sign Up"></td>
</tr>
</table>
</form>
</body>
</html>

<?php
$html = ob_get_contents();
ob_end_clean();

require_once("fillInFormValues.php");
require_once("validateForm.php");

$request = (get_magic_quotes_gpc() ?
             array_map('stripslashes', $_REQUEST), $_REQUEST);

$validationData['email'] = array('isRequired', 'type' => 'email');
if (isset($request['submit'])) {
  $formErrors = validateForm($request, $validationData);
  if (count($formErrors) == 0) {
    // Normally there would be code here to process the form
    // and redirect to a thank you page...
    // ... but for this example, we just always re-display the form.
    $info = "No errors; got these values:".
      nl2br(htmlspecialchars(print_r($request, 1)));
    $html = preg_replace('/<body>/', "<body><p>$info</p>", $html);
  }
}
else {
  $formErrors = array();
}

echo fillInFormValues($html, $request, $formErrors);

?>

I used the PHP output buffering routines (ob_start/ob_get_contents/ob_end_clean) to get the page's HTML into a PHP string to then pass to fillInFormValues. Besides filling in the $validationData array, this same PHP code will work for any form, even the great big ugly ones on your insurance company's web site.

Implementing fillInFormValues

fillInFormValues uses preg_replace_callback to find and replace the bits of HTML that need modifying. For example, here's the regular expression used to find <label> tags that might need class="error" added to them:

/<label([^>]*)>/i

Reading from left to right, the first / just starts the regular expression. <label matches exactly that. The gobbledygook in parentheses matches anything except a > character. The > after the parentheses matches that character, and the /i ends the regular expression--the i makes it case-insensitive so it finds <LABEL ...> or <label ...>.

This is the quick-and-dirty way to parse HTML. For example, it doesn't take into account tags that are surrounded by HTML comments, although in this case it's harmless if commented-out label tags change. It can also get confused if you pass it HTML4 code with bizarre attribute values like <input value="<label for='foo'*>"*>. Don't do that. If you need a real HTML parser, use XML_HTMLSax.

Now the code only needs a callback function that looks at the label tag and just returns it unchanged or returns it with class="error" inserted:

function fillInLabel($matches)
{
  global $formErrors;
  global $idToNameMap;

  $tag = $matches[0];

  $for = getAttributeVal($tag, "for");
  if (empty($for) or !isset($idToNameMap[$for])) { return $tag; }
  $name = $idToNameMap[$for];
  if (array_key_exists($name, $formErrors)) {
    return replaceAttributeVal($tag, 'class', 'error');
  }
  return $tag; // No error.
}

Callback functions passed to preg_replace_callback always have one argument--an array of stuff matched by the regular expression. $matches[0] is the entire match; $matches[1] is the stuff matched by the first set of parentheses; and so on. In this case, $matches[0] is the entire <label> tag. The code gets the label's for attribute and sees whether it corresponds to an entry in the $formErrors array; if so, it replaces the label's class attribute with class="error". If the label doesn't correspond to a form error, then the function returns the tag unchanged. I've used global variables to pass in the extra information that fillInLabel() needs.

The patterns and callbacks for <input>, <select>, <textarea>, and <ul class="error"> are similar, just more complicated. The <select> callback is the trickiest--it uses preg_replace_callback recursively to find and replace <option> tags between the <select> and </select> tags. The regular expression for <input> tags is hairy because values like this are legal HTML 4:

<input name="foo" value="hello <smile>">

The > character in the quoted value means that the "anything besides a >" ([^>]*) regular expression won't work to get the guts of the input tag. Note that < and > aren't legal XML/XHTML attribute values; they must be encoded as &lt; and &gt;. Life will be so much simpler when HTML 4 is obsolete.

The getAttributeVal and replaceAttributeVal functions use another powerful PHP regular expression function--preg_match_all. They use this regular expression to match attributes inside HTML tags:

/(\w+)((\s*=\s*".*?")|(\s*=\s*'.*?')|(\s*=\s*\w+)|())/s

It isn't quite as incomprehensible as it looks at first glance--for example, if passed a string like:

name="foo" value='123' style=purple checked

the first part of the regular expression (\w+) will match name, value, style, and checked--it matches one or more "word" characters. The rest of the regular expression matches one of the four ways you can specify attribute values in HTML. (\s*=\s*".*?") matches ="foo"; (\s*=\s*'.*?') matches ='123'; (\s*=\s*\w+) matches =purple; and () allows the checked attribute to match even though it's not followed by any value at all.

Because regular expressions are "greedy," matching as much as they can, preg_match_all with the above regular expression will do exactly what you want, returning four matches when passed name="foo" value='123' style=purple checked.

Here's the complete code for getAttributeVal:

  /**
   * Returns value of $attribute, given guts of an HTML tag.
   * Returns false if attribute isn't set.
   * Returns empty string for no-value attributes.
   *
   * @param string $tag  Guts of HTML tag, with or without the <tag and >.
   * @param string $attribute E.g. "name" or "value" or "width"
   * @return string|false Returns value of attribute (or false)
   */
  function getAttributeVal($tag, $attribute) {
    $matches = array();
    // This regular expression matches attribute="value" or
    // attribute='value' or attribute=value or attribute
    // It's also constructed so $matches[1][...] will be the
    // attribute names, and $matches[2][...] will be the
    // attribute values.
    preg_match_all('/(\w+)((\s*=\s*".*?")|(\s*=\s*\'.*?\')|(\s*=\s*\w+)|())/s',
                   $tag, $matches, PREG_PATTERN_ORDER);

    for ($i = 0; $i < count($matches[1]); $i++) {
      if (strtolower($matches[1][$i]) == strtolower($attribute)) {
        // Gotta trim off whitespace, = and any quotes:
        $result = ltrim($matches[2][$i], " \n\r\t=");
        if ($result[0] == '"') { $result = trim($result, '"'); }
        else { $result = trim($result, "'"); }
        return $result;
      }
    }
    return false;
  }

Passing PREG_PATTERN_ORDER to preg_match_all returns the attribute names in $matches[1][$i] and the attribute values in $matches[2][$i], which is exactly what you want. replaceAttributeVal's code is very similar, except it passes in PREG_OFFSET_CAPTURE (available in PHP 4.3.0 or later) to determine where in the string all the attributes are, and then uses substr_replace to either replace an existing value or add the value to the HTML tag.

Tidying Up

The first version of fillInFormValues() put its arguments into global variables so the callback functions that do the real work could get to them. Yuck. All those callback functions clutter up the PHP function namespace, too; double yuck.

There's a straightforward fix--encapsulate the arguments and callback functions in a "helper" class, and then use the array(&$this, "function") callback syntax supported by all the PHP functions that have callback arguments. fillInFormValues() creates a helper object, then calls a method on that object to do all the work:

function fillInFormValues($formHTML, $request = null, $formErrors = null)
{
  if ($request === null) {
    // magic_quotes on: gotta strip slashes:
    if (get_magic_quotes_gpc()) {
      function stripslashes_deep(&$val) {
        $val = is_array($val) ? array_map('stripslashes_deep', $val)
          : stripslashes($val);
       return $val;
      }
      $request = stripslashes_deep($_REQUEST);
    }
    else {
      $request = $_REQUEST;
    }
  }
  if ($formErrors === null) { $formErrors = array(); }

  $h = new fillInFormHelper($request, $formErrors);
  return $h->fill($formHTML);
}

/**
 * Helper class, exists to encapsulate info needed between regex callbacks.
 */
class fillInFormHelper
{
  var $request;  // Normally $_REQUEST, passed into constructor
  var $formErrors;
  var $idToNameMap; // Map form element ids to names

  function fillInFormHelper($r, $e)
  {
    $this->request = $r;
    $this->formErrors = $e;
  }

  function fill($formHTML)
  {
    $s = fillInFormHelper::getTagPattern('input');
    $formHTML = preg_replace_callback("/$s/is",
       array(&$this, "fillInInputTag"), $formHTML);

    // Using simpler regex for textarea/select/label, because in practice
    // they never have >'s inside them:
    $formHTML = preg_replace_callback('!(<textarea([^>]*>))(.*?)(</textarea\s*>)!is',
       array(&$this, "fillInTextArea"), $formHTML);

    $formHTML = preg_replace_callback('!(<select([^>]*>))(.*?)(</select\s*>)!is',
       array(&$this, "fillInSelect"), $formHTML);

    // Form errors:  tag <label> with class="error", and fill in
    // <ul class="error"> with form error messages.
    $formHTML = preg_replace_callback('!<label([^>]*)>!is',
       array(&$this, "fillInLabel"), $formHTML);
    $formHTML = preg_replace_callback('!<ul class="error">.*?</ul>!is',
       array(&$this, "getErrorList"), $formHTML);
    
    return $formHTML;
  }

  /**
   * Returns pattern to match given a HTML/XHTML/XML tag.
   * NOTE: Setup so only the whole expression is captured
   * (subpatterns use (?: ...) so they don't catpure).
   * Inspired by http://www.cs.sfu.ca/~cameron/REX.html
   *
   * @param string $tag  E.g. 'input'
   * @return string $pattern
   */
  function getTagPattern($tag)
  {
    $p = '(';  // This is a hairy regex, so build it up bit-by-bit:
    $p .= '(?is-U)'; // Set options: case-insensitive, multiline, greedy
    $p .= "<$tag";  // Match <tag
    $sQ = "(?:'.*?')"; // Attr val: single-quoted...
    $dQ = '(?:".*?")'; // double-quoted...
    $nQ = '(?:\w*)'; // or not quoted at all, but no wacky characters.
    $attrVal = "(?:$sQ|$dQ|$nQ)"; // 'value' or "value" or value
    $attr = "(?:\s*\w*\s*(?:=$attrVal)?)"; // attribute or attribute=
    $p .= "(?:$attr*)"; // any number of attr=val ...
    $p .= '(?:>|(?:\/>))';  // End tag: > or />
    $p .= ')';
    return $p;
  }

  /**
   * Returns value of $attribute, given guts of an HTML tag.
   * Returns false if attribute isn't set.
   * Returns empty string for no-value attributes.
   * 
   * @param string $tag  Guts of HTML tag, with or without the <tag and >.
   * @param string $attribute E.g. "name" or "value" or "width"
   * @return string|false Returns value of attribute (or false)
   */
  function getAttributeVal($tag, $attribute) {
    $matches = array();
    // This regular expression matches attribute="value" or
    // attribute='value' or attribute=value or attribute
    // It's also constructed so $matches[1][...] will be the
    // attribute names, and $matches[2][...] will be the
    // attribute values.
    preg_match_all('/(\w+)((\s*=\s*".*?")|(\s*=\s*\'.*?\')|(\s*=\s*\w+)|())/s',
                   $tag, $matches, PREG_PATTERN_ORDER);

    for ($i = 0; $i < count($matches[1]); $i++) {
      if (strtolower($matches[1][$i]) == strtolower($attribute)) {
        // Gotta trim off whitespace, = and any quotes:
        $result = ltrim($matches[2][$i], " \n\r\t=");
        if ($result[0] == '"') { $result = trim($result, '"'); }
        else { $result = trim($result, "'"); }
        return $result;
      }
    }
    return false;
  }
  /**
   * Returns new guts for HTML tag, with an attribute replaced
   * with a new value.  Pass null for new value to remove the
   * attribute completely.
   * 
   * @param string $tag  Guts of HTML tag.
   * @param string $attribute E.g. "name" or "value" or "width"
   * @param string $newValue
   * @return string
   */
  function replaceAttributeVal($tag, $attribute, $newValue) {
    if ($newValue === null) {
      $pEQv = '';
    }
    else {
      // htmlspecialchars here to avoid potential cross-site-scripting attacks:
      $newValue = htmlspecialchars($newValue);
      $pEQv = $attribute.'="'.$newValue.'"';
    }

    // Same regex as getAttribute, but we wanna capture string offsets
    // so we can splice in the new attribute="value":
    preg_match_all('/(\w+)((\s*=\s*".*?")|(\s*=\s*\'.*?\')|(\s*=\s*\w+)|())/s',
                   $tag, $matches, PREG_PATTERN_ORDER|PREG_OFFSET_CAPTURE);

    for ($i = 0; $i < count($matches[1]); $i++) {
      if (strtolower($matches[1][$i][0]) == strtolower($attribute)) {
        $spliceStart = $matches[0][$i][1];
        $spliceLength = strlen($matches[0][$i][0]);
        $result = substr_replace($tag, $pEQv, $spliceStart, $spliceLength);
        return $result;
      }
    }

    if (empty($pEQv)) { return $tag; }

    // No match: add attribute="newval" to $tag (before closing tag, if any):
    $closed = preg_match('!(.*?)((>|(/>))\s*)$!s', $tag, $matches);
    if ($closed) {
      return $matches[1] . " $pEQv" . $matches[2];
    }
    return "$tag $pEQv";
  }

  /**
   * Returns modified <input> tag, based on values in $request.
   * 
   * @param array $matches
   * @return string Returns new guts.
   */
  function fillInInputTag($matches) {
    $tag = $matches[0];

    $type = fillInFormHelper::getAttributeVal($tag, "type");
    if (empty($type)) { $type = "text"; }
    $name = fillInFormHelper::getAttributeVal($tag, "name");
    if (empty($name)) { return $tag; }
    $id = fillInFormHelper::getAttributeVal($tag, "id");
    if (!empty($id)) { $this->idToNameMap[$id] = $name; }

    switch ($type) {
      /*
       * Un-comment this out at your own risk (users shouldn't be
       * able to modify hidden fields):
       *    case 'hidden':
       */
    case 'text':
    case 'password':
      if (!array_key_exists($name, $this->request)) {
        return $tag;
      }
      return fillInFormHelper::replaceAttributeVal($tag, 'value', $this->request[$name]);
      break;
    case 'radio':
    case 'checkbox':
      $value = fillInFormHelper::getAttributeVal($tag, "value");
      if (empty($value)) { $value = "on"; }

      if (strpos($name, '[]')) {
        $name = str_replace('[]', '', $name);
      }

      if (!array_key_exists($name, $this->request)) {
        return fillInFormHelper::replaceAttributeVal($tag, 'checked', null);
      }
      $vals = (is_array($this->request[$name])?$this->request[$name]:array($this->request[$name]));

      if (in_array($value, $vals)) {
        return fillInFormHelper::replaceAttributeVal($tag, 'checked', 'checked');
      }
      return fillInFormHelper::replaceAttributeVal($tag, 'checked', null);
    }
    return $tag;
  }
  /**
   * Returns modified <textarea...> tag, based on values in $request.
   * 
   * @param array $matches
   * @return string Returns new value.
   */
  function fillInTextArea($matches) {
    $tag = $matches[1]; // The <textarea....> tag
    $val = $matches[3]; // Stuff between <textarea> and </textarea>
    $endTag = $matches[4]; // The </textarea> tag

    $name = fillInFormHelper::getAttributeVal($tag, "name");
    if (empty($name)) { return $matches[0]; }
    $id = fillInFormHelper::getAttributeVal($tag, "id");
    if (!empty($id)) { $this->idToNameMap[$id] = $name; }

    if (!array_key_exists($name, $this->request)) { return $matches[0]; }
    return $tag.htmlspecialchars($this->request[$name]).$endTag;
  }
  /**
   * Returns modified <option value="foo"> tag, based on values in $vals.
   * 
   * @param array $matches
   * @return string Returns tag with selected="selected" or not.
   */
  function fillInOption($matches)
  {
    $tag = $matches[1];  // The option tag
    $valueAfter = $matches[2]; // Potential value (stuff after option tag)
    $val = fillInFormHelper::getAttributeVal($tag, "value");
    if (empty($val)) { $val = trim($valueAfter); }
    if (in_array($val, $this->selectVals)) {
      return fillInFormHelper::replaceAttributeVal($tag, 'selected', 'selected').$valueAfter;
    }
    else {
      return fillInFormHelper::replaceAttributeVal($tag, 'selected', null).$valueAfter;
    }
  }

  var $selectVals;

  /**
   * Returns modified <select...> tag, based on values in $request.
   * 
   * @param array $matches
   * @return string
   */
  function fillInSelect($matches) {
    $tag = $matches[1];
    $options = $matches[3];
    $endTag = $matches[4];

    $name = fillInFormHelper::getAttributeVal($tag, "name");
    if (empty($name)) { return $matches[0]; }
    $id = fillInFormHelper::getAttributeVal($tag, "id");
    if (!empty($id)) { $this->idToNameMap[$id] = $name; }

    if (strpos($name, '[]')) {
      $name = str_replace('[]', '', $name);
    }
    if (!array_key_exists($name, $this->request)) { return $matches[0]; }

    $this->selectVals = (is_array($this->request[$name])?$this->request[$name]:array($this->request[$name]));

    // Handle all the various flavors of:
    // <option value="foo" /> OR <option>foo</option> OR <option>foo
    $s = fillInFormHelper::getTagPattern('option');
    $pat = "!$s(.*?)(?=($|(</option)|(</select)|(<option)))!is";
    $options = preg_replace_callback($pat, array(&$this, "fillInOption"), $options);
    return $tag.$options.$endTag;
  }

  /**
   * Returns modified <label...> tag, based on $formErrors.
   * 
   * @param array $matches
   * @return string
   */
  function fillInLabel($matches) {
    $tag = $matches[0];
    $for = fillInFormHelper::getAttributeVal($tag, "for");
    if (empty($for) or !isset($this->idToNameMap[$for])) { return $tag; }
    $name = $this->idToNameMap[$for];

    if (array_key_exists($name, $this->formErrors)) {
      return fillInFormHelper::replaceAttributeVal($tag, 'class', 'error');
    }
    return $tag; // No error.
  }

  /**
   * Returns modified <ul class="error"> list with $formErrors error messages.
   * 
   * @return string
   */
  function getErrorList() {
    $result = "";
    foreach (array_unique($this->formErrors) AS $f => $msg) {
      if (!empty($msg)) {
        $result .= "<li>".htmlspecialchars($msg)."</li>\n";
      }
    }
    if (empty($result)) { return ""; }  // No errors: return empty string.
    $result = '<ul class="error">'.$result.'</ul>';
    return $result;
  }
} // End of helper class.

The Other Way

Several packages can help you generate HTML for forms from your PHP code. (I've heard nice things about HTML_QuickForm, for example.) Most of them also automate form validation and redisplay. Still, I don't like using PHP to generate HTML; I like separating application logic (PHP code) from display (HTML) as much as possible. It's very nice to be able to edit the look of a web page, including any forms on the page, in a WYSIWYG editor like DreamWeaver.

Simple Is Best

fillInFormValues() has a very simple interface--it's just a function call. The implementation isn't terribly complicated, either. It's under 400 lines of code, including comments. I like simple things; they're easier to integrate into bigger projects. I use fillInFormValues() to prepopulate forms with values fetched from a database. I register it as a Smarty block function, so forms in my page templates redisplay themselves properly. Since I've started using it, I haven't been tempted to take up plumbing.

Download the Source

All the source code for this article, plus unit tests and source for a fillInFormValues Smarty extension, are available for download as a .zip archive.

Gavin Andresen spends his time writing core content management system code for Gravity Switch, creating online games for the blind to play with each other and their sighted friends and family at All inPlay, and playing with his children.


Return to the PHP DevCenter.

Copyright © 2009 O'Reilly Media, Inc.