ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Allowing Registration-Required Binary Downloads

by Robert Bernier
10/21/2004

One of the hallmarks of a well-developed web presence is the ability to offer downloads. We're all familiar with the site that offers you a free application. You receive something of value in exchange for supplying the site with a contact name and email address. This isn't the only example: remote code compilation, translation services, and forum requests are others.

This article demonstrates how to create a web site that offers a binary download in exchange for user-supplied information in a manner that protects your data. The site will store information gleaned from a form on the site as well as the downloadable binary in a PostgreSQL database.

It's easy to validate the form's values, such as the email's domain, by carefully crafting the table's definitions and using the appropriate PHP functions. You can also keep user statistics, such as tracking the number of downloads on a per-user basis.

For the sake of expediency, I'll only discuss the contents of the server pages in the broadest terms. I expect that you already know PHP and SQL.

The Scenario

Your company supplies consulting services. You're offering a software tool in exchange for the name and email address of the downloader. After a user invokes a download request, your system sends an email with the download URL. The applicant clicks on the URL and the download begins. Here's where the first problem crops up. You saw this coming; most of those emails that you've received are bogus. You now face the daunting task of validating each record to see who really downloaded your binary.

Creating a download site shouldn't be brain surgery, but there are several facts that you should keep in mind.

  1. Some people will take the time to fill out forms with bad email addresses, even if the site is plastered with notices saying that they can't download the binary without a valid email address.
  2. Some people will take the extraordinary amount of effort of registering an email address on a free email service just for the download.
  3. A small number of people who practice fact number 2 have purposely introduced viruses/trojans into an email addressed to you under the guise of a legitimate communication.

Related Reading

Upgrading to PHP 5
By Adam Trachtenberg

Therefore, it's necessary to institute a mechanism that reduces frivolous form submissions and maximizes security. It seems there are a lot of people out there with free time on their hands.

Who Should Mind the Store?

There's a lot of technology out there, but it's a bit of a trick to decide what to use. Developing a viable solution depends upon understanding the context. You must know the people who run the site and appreciate the system's operating conditions.

Here are some questions you should ask:

Creating the Simple Downloader

Let's start by looking at some pages with minimal code and little validation. (You can download the sample server pages and table definitions if you like.)

This is the SQL used to generate the client table:

CREATE TABLE client(
	firstname text,
	lastname  text,
	email     text,
	CONSTRAINT unique_nameAndEmail 
   	   UNIQUE (firstname, lastname, email)
)

The PostgreSQL server runs on localhost with a database named mycompany. The user name is applicant with a password of 123 It's more secure if you've created the tables and functions as the superuser, but don't forget that you'll need to assign the correct privileges for applicant as well.

There are four pages:

How It Works

When a user submits the form in index.html, the parameters go to downloadVerification.php. This page obtains three parameters: firstname, lastname, and email.

The page concatenates these parameters into a single string using the md5() function and appends it to the download URL. It sends this URL to the applicant's email address using PHP's mail() function:

$email   = trim($_POST['email']);
$str_md5 = md5($f_name.$l_name.$email);
$msg     = "http://localhost/down1.php?vericode=".str_md5;

mail($email, "Binary Download Verification Code", $msg,
	"From:sales@mycompany.com");

It also inserts the name and email variables into the client table:

$query  = "INSERT INTO client (firstname,lastname,email)
            VALUES('$f_name','$l_name','$email')";

$dbconn = pg_connect("host=127.0.0.1 port=5432 dbname=mycompany user=applicant password=123") 
          or die("<h2>Warning... system is congested. Please wait, then try again</h2>");

Clicking on the link directs the applicant to down1.php. Note that this page is really an intermediary page, in that it provides something to read during the download. The JavaScript in this page redirects to the real download page, down.php.

<html>
  <head>
    <title>The download page</title>
    <?php
      $str_md5 = $_GET['vericode'];
      
      echo "
        <script language='JavaScript' type='text/JavaScript'>\n
          function redirect()\n
          {\n
               window.location = 'http://localhost/down.php?vericode=$str_md5'\n
          }\n
             
          setTimeout('redirect();', 2000)\n
        </script>\n";
    ?>
  </head>
  <body>This page gives a nice message before the download actually begins</body>
</html>

The down.php page extracts the vericode parameter with the following query:

SELECT firstname,lastname,email FROM client 
         WHERE 
           md5(firstname||lastname||email) = '$str_download'"

The binary download code is cool not only because it stores the binary in the database, as opposed to the filesystem, but because of its utter simplicity. The secret is sending the browser the correct headers:

// large objects must be obtained from within a transaction
pg_query($dbconn, "begin");

// current large object resource number of the zipped file is "17899"
$lo_oid    = 17899;

$handle_lo = @pg_lo_open($dbconn,$lo_oid,"r") or die("<h2>Error.. can't get handle</h2>");

//headers to send to the browser before beginning the binary download
header('Accept-Ranges: bytes');
header('Content-Length: 32029974'); //this is the size of the zipped file
header('Keep-Alive: timeout=15, max=100');
header('Content-type: Application/x-zip');
header('Content-Disposition: attachment; filename="superjob.zip"');

@pg_lo_read_all($handle_lo) or 
  die("<h2>Error, can't read large object.</h2>");

// committing the data transaction
pg_query ($dbconn, "commit");

Note: I submitted this snippet to the PHP homepage at pg-lo-read-all documentation.

I've previously uploaded this binary into the database as a large object using the psql utility. This process returned an object identifier (oid) to use to refer to the object. Using that oid, the code fetches a handle to it:

$handle_lo = @pg_lo_open($dbconn,$lo_oid,"r") or die("<h2>Error.. can't get handle</h2>");

The browser needs to know that it's about to receive a zipped file before the data transfer. The page sends the appropriate headers:

header('Accept-Ranges: bytes');
header('Content-Length: 32029974');        // the size of the zipped file
header('Keep-Alive: timeout=15, max=100');
header('Content-type: Application/x-zip'); // a zipped file is coming
header('Content-Disposition: attachment; filename="superjob.zip"');

Then the download begins:

@pg_lo_read_all($handle_lo) or 
  die("<h2>Error, can't read large object.</h2>");

Caveat! Large object transactions require that you enclose the instructions within a transaction; that is, between

pg_query ($dbconn, "begin");

and

pg_query ($dbconn, "commit");

The Problem of Validation

Validation protects your data and makes the entire transaction more robust, too. You need to decide where to place the validation instructions, either in the server pages or the PostgreSQL back end. This will depend on the site's unique operating conditions.

Normally, I opt to place the bulk of validation on the database side, with only the most basic validation in the server pages. Since the DBA uses the data, it only makes sense to say that he is also the one who is in the best position to protect the data.

Validation in Forms

There's not much I want to say here other than to remind you that protecting your forms is the first step to preventing problems:

Validation in Server Pages

You should, at the very least, repeat the same validation as for forms. Avoid composing strings in PHP scripts as SQL statements. For example, instead of writing:

// DO NOT USE; INSECURE
$query = "INSERT INTO client (firstname,lastname,email)
            VALUES('$f_name','$l_name','$email')";

Use this:

$query = SELECT f_insert($firstname,$lastname,$email);

The user-defined function f_insert() is a plpgsql function created in the database client. User-defined functions allow you to define additional validation rules without changing the PHP source code. They also mitigate exploits such as SQL injection.

Validation in the Database

There are two goals when defining this example database. First, verify the inserted variables. Second, prevent uncontrolled downloads of the zipped binary.

A good table definition is the first step to maintaining data integrity. Let's rewrite the client table:

CREATE TABLE client(
	firstname varchar(15) NOT NULL,
	lastname  varchar(15) NOT NULL,
	email     varchar(50) NOT NULL,
	counter   int         DEFAULT 0,
	CONSTRAINT unique_nameAndEmail 
    	UNIQUE (firstname, lastname, email),
	CONSTRAINT firstname_length CHECK(length(rtrim(ltrim(firstname)))>=3),
	CONSTRAINT lastname_length  CHECK(length(rtrim(ltrim(lastname)))>=3),
	CONSTRAINT email_include    CHECK(email ~~ '%_@%.%'),
	CONSTRAINT email_exclude    CHECK(email !~ '[\,#,$,%,&,(,), ]'),
	CONSTRAINT email_exclude1   CHECK(email !~ 'hotmail' AND email !~ 'yahoo'),
	CONSTRAINT max_download     CHECK(counter <= 3)
);

The table definition carries out the following constraints:

I've also added the counter column to control the maximum number of downloads. In this case, it's three.

The function f_insert() is:

CREATE OR REPLACE FUNCTION f_insert(
	varchar, --firstname
	varchar, --lastname
	varchar  --email
)

RETURNS INT AS '
	BEGIN
	 INSERT INTO client 
	  (firstname, lastname, email) 
	  VALUES ($1, $2, $3);
	RETURN 0;
	END;
'
LANGUAGE plpgsql;

Controlling the number of downloads is as easy as defining a CHECK constraint on the counter column in the table:

CONSTRAINT max_download     CHECK(counter <= 3)

The column value increments by one for each download. When the value of counter exceeds the CONSTRAINT value, the SQL query fails, rejecting additional downloads.

Implementing a counter requires an UPDATE query and a trigger, which fires a function that increments the counter column.

The new query in down.php becomes:

$query = "UPDATE client SET counter= -1
           WHERE
             md5(firstname||lastname||email) = '$str_download'";

The function f_update is:

CREATE OR REPLACE FUNCTION f_tr_update()
 RETURNS TRIGGER AS '
	BEGIN
	  IF NEW.counter = -1 THEN
	    NEW.counter= OLD.counter + 1;
	  END IF;
	RETURN NEW;
	END;
'
LANGUAGE plpgsql;

The value -1 is a dummy variable, which flags the update trigger that an increment is about to occur. In addition, the presence of the flag allows the DBA to update records that have nothing to do with a download request.

The command that creates the trigger is:

CREATE TRIGGER tr_update 
  BEFORE UPDATE ON client
  FOR EACH ROW
  EXECUTE PROCEDURE f_tr_update();

The Email Conundrum

Obtaining a valid email address is our single most critical validation. Although much of the previous discussion has gone a long way to mitigating the problem, there's still the challenge of validating the domain name itself. Fortunately, PHP contains a function that provides a readymade solution.

Insert these two lines of code into downLoadVerification.php:

$email_domain = explode("@",$email);
if (!checkdnsrr($email_domain[1],"MX")) die($errormsg);

The explode() function parses the email address based on the @ symbol and checkdnsrr() contacts the Domain Name Server and verifies that the domain name has a working SMTP server.

Conclusion

This article has covered two issues: how to download a binary through an interactive process of filling out a form, and how to protect your site through the efficient use of validation in a PostgreSQL database. Creating an interactive web site can be a rewarding experience, if you do it right.

Robert Bernier is the PostgreSQL business intelligence analyst for SRA America, a subsidiary of Software Research America (SRA).


Return to the PHP DevCenter

Copyright © 2009 O'Reilly Media, Inc.