|
Custom Error Pages with PHP and Apacheby David Sklar, coauthor of PHP Cookbook02/13/2003 |
Using PHP and Apache, you can turn your "Page Not Found" messages into more than bland error reports. You can serve an alternate page based on the name of the page that was not found, create a page on the fly from a database, or send an email about the missing page to a webmaster.
Building a custom error page with PHP and Apache requires two steps. You need to tell Apache to run a PHP program when it encounters a 404 ("Page Not Found") error. And you need to write the corresponding program that takes the appropriate action.
Configuring Apache
To tell Apache what to do on a 404 error, use the
ErrorDocument directive:
ErrorDocument 404 /error-404.php
This tells Apache to serve up error-404.php in the
document root directory when it encounters a 404 error. The
ErrorDocument directive can go in Apache's
httpd.conf file, but it also works in .htaccess files in
individual directories. You can have a site-wide error-handling page or
different error-handling pages for different parts of your site. Apache
also sets some server variables that the error-handling page can
access:
|
Related Reading
PHP Cookbook |
REDIRECT_URL: the URL-path that was not found. If a user asks for the nonexistent page http://www.example.com/lunch/pastrami.html, for example, this variable is set to/lunch/pastrami.html.REDIRECT_STATUS: the HTTP response status resulting from the request for the original page. In our case, this is always "404". You can useErrorDocumentwith other status codes, though, so if you have one error-handling page for multiple statuses, you can use this variable to determine which error status caused the error-handling page to be loaded.REDIRECT_ERROR_NOTES: a brief description of what went wrong, for example, "File does not exist: /usr/local/apache/docroot/lunch/pastrami.html".REDIRECT_REQUEST_METHOD: the method of the request for the original page, such asGETorPOST.
If there is a query string in the original request, it is stored in
REDIRECT_QUERY_STRING. The error page does not have access to
the GET or POST variables via
$_GET, $_POST, or $_REQUEST, but
cookie variables are still available in $_COOKIE.
These REDIRECT variables are available in the PHP
superglobal array $_SERVER:
$_SERVER['REDIRECT_URL'],
$_SERVER['REDIRECT_STATUS'], and so forth.
Taking Action
The information in the REDIRECT variables can be used to
do many different things in response to a request for a nonexistent
page. If your site has been recently reorganized, you can transparently
redirect users to the new URL that corresponds to a particular old
URL:
<?php
$map = array('/old/1' => '/new/2.html',
'/old/2' => '/new/3.html');
if (isset($map[$_SERVER['REDIRECT_URL']])) {
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$map[$_SERVER['REDIRECT_URL']];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
A redirect response needs to include the query string in the redirect
URL if the query string was present in the original request. Redirects
always use the GET method. Including the query string
preserves any GET variables from the original request, but
POST data is lost.
Additionally, the protocol and host name need to be at the beginning of
the redirect URL sent with the Location header. This example hardcodes
"http" as the protocol and gets the host name from the
HTTP_HOST server variable. To work transparently under https
as well as http, your code should test for the presence of
$_SERVER['HTTPS']. If this variable is set to "on", then the
protocol should be "https" instead of "http".
Basic redirection could also be accomplished with a list of Apache
Redirect or RedirectMatch directives, but you
can construct more complicated expressions in PHP. You can easily redirect
multiple old URLs to the same new URL:
<?php
$rev_map = array('new.html' =>
array('/old-1.html',
'/old-2.html',
'/old-3.html'));
foreach ($rev_map as $new => $ar) {
foreach ($ar as $old) {
$map[$old] = $new;
}
}
if (isset($map[$_SERVER['REDIRECT_URL']])) {
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$map[$_SERVER['REDIRECT_URL']];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
You can look up the new URLs to which the old ones map in a database:
<?php
mysql_connect('localhost','user','password');
mysql_select_db('pages');
// escape quotes and SQL wildcards from the old URL
$old_page = mysql_real_escape_string($_SERVER['REDIRECT_URL']);
$old_page = strtr($old_page,array('_' => '\_',
'%' => '\%'));
$r = mysql_query("SELECT new FROM pages
WHERE old LIKE '$old_page'");
if (mysql_numrows($r) == 1) {
$ob = mysql_fetch_object($r);
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] . $ob->new;
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
print "This page is really not found.";
}
?>
If you need to use values from
$_SERVER['REDIRECT_QUERY_STRING'] into variables to determine
the new URL, parse the query string with parse_str(). If
$_SERVER['REDIRECT_QUERY_STRING'] is
artist=weird+al&album=dare+to+be+stupid, then
parse_str($_SERVER['REDIRECT_QUERY_STRING'],$vars) sets
$vars['artist'] to "weird al" and $vars['album']
to "dare to be stupid".
You can even use the error document to make a simple caching system. If a page isn't found, get its contents from your database and write them to disk. Then, redirect the user to the same URL they just asked for. Since the page now exists, they'll get it, and not the error page:
<?php
mysql_connect('localhost','user','password');
mysql_select_db('pages');
// escape quotes and SQL wildcards from the old URL
$url = mysql_real_escape_string($_SERVER['REDIRECT_URL']);
$url = strtr($url,array('_' => '\_',
'%' => '\%'));
// look for the page in the database
$r = mysql_query("SELECT page FROM pages
WHERE url LIKE '$url'");
if (mysql_numrows($r) == 1) {
$ob = mysql_fetch_object($r);
if ($fp = fopen($_SERVER['DOCUMENT_ROOT'] .
$_SERVER['REDIRECT_URL'],'w')) {
// write the page to disk
fwrite($fp,$ob->page);
fclose($fp);
// send the user back to the same URL
$new_loc = 'http://' .
$_SERVER['HTTP_HOST'] .
$_SERVER['REDIRECT_URL'];
if (isset($_SERVER['REDIRECT_QUERY_STRING'])) {
$new_loc .= '?' .
$_SERVER['REDIRECT_QUERY_STRING'];
}
header("Location: $new_loc");
} else {
// couldn't generate the page
print "This page is really not found.";
}
} else {
// couldn't find the page in the database
print "This page is really not found.";
}
?>
In this example, the entire contents of a page are stored in the page
column of the pages table and are written to a file with
fwrite(). You could do more interesting or complicated things
when generating a page, like pull multiple pieces of the page from
different places or populate a template with dynamic data. However you
generate the page, publishing a new version of it is easy. Just update the
database and delete the file from disk. The next time a user asks for that
page, it won't be found. The error-handling page will load the updated
page (or its components) from the database and write the new version to a
file.
If you're sending a user to a new PHP page, it's important to use a
redirect instead of just loading the page with include(). The
error page doesn't have GET or POST variables
set, and some server variables are different (for example,
$_SERVER['PHP_SELF'] points to the error page, not the
original URL.) If you're sending the user to a static page, however,
including content without a redirect can be useful. You can use an
error-handling page to provide access to a library of files without
keeping the files under the web server document root, for example:
<?php
$file_root = '/usr/local/songs/';
$song = strtolower($_SERVER['REDIRECT_URL']);
$song_file = realpath($file_root .
substr($song,1,1) .
"/$song.mp3");
if (preg_match("{^$file_root}",$song_file) &&
is_readable($song_file)) {
header('Status: 200 Found');
header('Content-type: audio/mpeg');
header('Content-disposition: attachment; filename=' .
$song . '.mp3');
readfile($song_file);
} else {
print "Unknown song.";
}
?>
If this error-handling page is set up for the root directory of
http://www.example.com/, asking for
http://www.example.com/EatIt sends you the file
/usr/local/songs/e/eatit.mp3, if that file exists. Checking to
see whether the output of realpath() begins with
$file_root prevents a user from passing directory-changing
strings like "/../" in the URL. If a file is found, the page
sends the right status code and headers to tell the user that they're
getting an MP3 file and then sends the contents of the song file.
The error-handling page doesn't just have to find a new page to send to users. It can notify the webmaster that a page is missing. You can use this to find out if your own site has bad links to itself:
if (preg_match('{^http(s)?://'.$_SERVER['HTTP_HOST'].'}',
$_SERVER['HTTP_REFERER'])) {
ob_start();
print_r($_SERVER);
$data = ob_get_contents();
ob_end_clean();
mail($_SERVER['SERVER_ADMIN'],
'Page Not Found: '.$_SERVER['REDIRECT_URL'],
$data);
}
The preg_match() statement finds referrer URLs that are on
the same host as the current request by comparing the beginning of the
referring URL to the $_SERVER['HTTP_HOST']. If they match,
the output of print_r($_SERVER) is stored in
$data using output buffering:
ob_start()tells PHP to capture output in a buffer instead of printing it.ob_get_contents()returns the contents of that buffer.ob_end_clean()turns off output buffering without printing the buffer.
The mail() function sends a message to the server
administrator. The body of the message (all the $_SERVER
variables in $data) contains the referring URL and other
information that you can use to fix the page with the bad link.
More Information
Documentation for Apache custom error responses is at http://httpd.apache.org/docs/custom-error.html. The www.php.net site uses a custom error response to turn handy shortcut URLs like http://www.php.net/xml into the correct URL for the XML section of the manual. You can see the source code to it at http://cvs.php.net/co.php/phpweb/error/index.php?r=HEAD.
O'Reilly & Associates recently released (November 2002) PHP Cookbook .
Sample Chapter 8, Web Basics, is available free online.
You can also look at the Table of Contents, the Index, and the Full Description of the book.
For more information, or to order the book, click here.
David Sklar is an independent consultant in New York City, the author of O'Reilly's Learning PHP 5, and a coauthor of PHP Cookbook.
Return to ONLamp.com.
