Allowing Registration-Required Binary Downloadsby Robert Bernier
One of the hallmarks of a well-developed web presence is the ability to offer downloads. We're all familiar with the site that offers you a free application. You receive something of value in exchange for supplying the site with a contact name and email address. This isn't the only example: remote code compilation, translation services, and forum requests are others.
This article demonstrates how to create a web site that offers a binary download in exchange for user-supplied information in a manner that protects your data. The site will store information gleaned from a form on the site as well as the downloadable binary in a PostgreSQL database.
It's easy to validate the form's values, such as the email's domain, by carefully crafting the table's definitions and using the appropriate PHP functions. You can also keep user statistics, such as tracking the number of downloads on a per-user basis.
For the sake of expediency, I'll only discuss the contents of the server pages in the broadest terms. I expect that you already know PHP and SQL.
Your company supplies consulting services. You're offering a software tool in exchange for the name and email address of the downloader. After a user invokes a download request, your system sends an email with the download URL. The applicant clicks on the URL and the download begins. Here's where the first problem crops up. You saw this coming; most of those emails that you've received are bogus. You now face the daunting task of validating each record to see who really downloaded your binary.
Creating a download site shouldn't be brain surgery, but there are several facts that you should keep in mind.
- Some people will take the time to fill out forms with bad email addresses, even if the site is plastered with notices saying that they can't download the binary without a valid email address.
- Some people will take the extraordinary amount of effort of registering an email address on a free email service just for the download.
- A small number of people who practice fact number 2 have purposely introduced viruses/trojans into an email addressed to you under the guise of a legitimate communication.
Therefore, it's necessary to institute a mechanism that reduces frivolous form submissions and maximizes security. It seems there are a lot of people out there with free time on their hands.
Who Should Mind the Store?
There's a lot of technology out there, but it's a bit of a trick to decide what to use. Developing a viable solution depends upon understanding the context. You must know the people who run the site and appreciate the system's operating conditions.
Here are some questions you should ask:
- Is the business logic in the server pages or in the database back end?
- What kind of validation is necessary to protect both the data integrity and the site itself?
- Is there a division of labor? Does one person maintain the site while another processes the data?
- Do the people involved understand what can go wrong?
- Do you host the site on your own machine or does it belong to a third-party, such as an ISP?
Creating the Simple Downloader
Let's start by looking at some pages with minimal code and little validation. (You can download the sample server pages and table definitions if you like.)
This is the SQL used to generate the
CREATE TABLE client( firstname text, lastname text, email text, CONSTRAINT unique_nameAndEmail UNIQUE (firstname, lastname, email) )
The PostgreSQL server runs on
localhost with a database named
mycompany. The user name is applicant with a password
of 123 It's more secure if you've created the tables and
functions as the superuser, but don't forget that you'll need to assign the
correct privileges for applicant as well.
There are four pages:
- index.html, the application form
- downloadVerification.php, the form's target page
- down1.php, the target page for the applicant's emailed URL
- down.php, which transmits the binary to the applicant's browser
How It Works
When a user submits the form in index.html, the parameters go to
downloadVerification.php. This page obtains three parameters:
The page concatenates these parameters into a single string using the
md5() function and appends it to the download URL. It sends this URL
to the applicant's email address using PHP's
$email = trim($_POST['email']); $str_md5 = md5($f_name.$l_name.$email); $msg = "http://localhost/down1.php?vericode=".str_md5; mail($email, "Binary Download Verification Code", $msg, "From:firstname.lastname@example.org");
It also inserts the
$query = "INSERT INTO client (firstname,lastname,email) VALUES('$f_name','$l_name','$email')"; $dbconn = pg_connect("host=127.0.0.1 port=5432 dbname=mycompany user=applicant password=123") or die("<h2>Warning... system is congested. Please wait, then try again</h2>");
The down.php page extracts the
vericode parameter with
the following query:
SELECT firstname,lastname,email FROM client WHERE md5(firstname||lastname||email) = '$str_download'"
The binary download code is cool not only because it stores the binary in the database, as opposed to the filesystem, but because of its utter simplicity. The secret is sending the browser the correct headers:
// large objects must be obtained from within a transaction pg_query($dbconn, "begin"); // current large object resource number of the zipped file is "17899" $lo_oid = 17899; $handle_lo = @pg_lo_open($dbconn,$lo_oid,"r") or die("<h2>Error.. can't get handle</h2>"); //headers to send to the browser before beginning the binary download header('Accept-Ranges: bytes'); header('Content-Length: 32029974'); //this is the size of the zipped file header('Keep-Alive: timeout=15, max=100'); header('Content-type: Application/x-zip'); header('Content-Disposition: attachment; filename="superjob.zip"'); @pg_lo_read_all($handle_lo) or die("<h2>Error, can't read large object.</h2>"); // committing the data transaction pg_query ($dbconn, "commit");
Note: I submitted this snippet to the PHP homepage at
I've previously uploaded this binary into the database as a large object
psql utility. This process returned an object
oid) to use to refer to the object. Using that
oid, the code
fetches a handle to it:
$handle_lo = @pg_lo_open($dbconn,$lo_oid,"r") or die("<h2>Error.. can't get handle</h2>");
The browser needs to know that it's about to receive a zipped file before the data transfer. The page sends the appropriate headers:
header('Accept-Ranges: bytes'); header('Content-Length: 32029974'); // the size of the zipped file header('Keep-Alive: timeout=15, max=100'); header('Content-type: Application/x-zip'); // a zipped file is coming header('Content-Disposition: attachment; filename="superjob.zip"');
Then the download begins:
@pg_lo_read_all($handle_lo) or die("<h2>Error, can't read large object.</h2>");
Caveat! Large object transactions require that you enclose the instructions within a transaction; that is, between
pg_query ($dbconn, "begin");
pg_query ($dbconn, "commit");
The Problem of Validation
Validation protects your data and makes the entire transaction more robust, too. You need to decide where to place the validation instructions, either in the server pages or the PostgreSQL back end. This will depend on the site's unique operating conditions.
Normally, I opt to place the bulk of validation on the database side, with only the most basic validation in the server pages. Since the DBA uses the data, it only makes sense to say that he is also the one who is in the best position to protect the data.
Validation in Forms
There's not much I want to say here other than to remind you that protecting your forms is the first step to preventing problems:
- Always use a maximum-length parameter for the form's text variables.
- When possible, use radio buttons and checkboxes rather than text boxes, because this limits the kinds of variables expected at the back end. Don't forget, though, that determined users can send any kind of data they want.
- Trim white spaces on text input.
Validation in Server Pages
You should, at the very least, repeat the same validation as for forms. Avoid composing strings in PHP scripts as SQL statements. For example, instead of writing:
// DO NOT USE; INSECURE $query = "INSERT INTO client (firstname,lastname,email) VALUES('$f_name','$l_name','$email')";
$query = SELECT f_insert($firstname,$lastname,$email);
The user-defined function f_insert() is a
plpgsql function created
in the database client. User-defined functions allow you to define additional
validation rules without changing the PHP source code. They also mitigate
exploits such as SQL injection.
Validation in the Database
There are two goals when defining this example database. First, verify the inserted variables. Second, prevent uncontrolled downloads of the zipped binary.
A good table definition is the first step to maintaining data integrity.
Let's rewrite the
CREATE TABLE client( firstname varchar(15) NOT NULL, lastname varchar(15) NOT NULL, email varchar(50) NOT NULL, counter int DEFAULT 0, CONSTRAINT unique_nameAndEmail UNIQUE (firstname, lastname, email), CONSTRAINT firstname_length CHECK(length(rtrim(ltrim(firstname)))>=3), CONSTRAINT lastname_length CHECK(length(rtrim(ltrim(lastname)))>=3), CONSTRAINT email_include CHECK(email ~~ '%_@%.%'), CONSTRAINT email_exclude CHECK(email !~ '[\,#,$,%,&,(,), ]'), CONSTRAINT email_exclude1 CHECK(email !~ 'hotmail' AND email !~ 'yahoo'), CONSTRAINT max_download CHECK(counter <= 3) );
The table definition carries out the following constraints:
- There must be no white spaces at the beginning or end of the variables.
- The variables
lastnamemust be at least three and less than 15 characters long.
- The variable
@symbol and a period to the right of it. The email address cannot contain the symbols
], or any white spaces. This isn't strictly RFC-822-compliant, but it's a good first-order approximation.
- The email address cannot a the domain name containing the strings
yahoo, because many spammers use these email domains.
I've also added the
counter column to control the maximum
number of downloads. In this case, it's three.
CREATE OR REPLACE FUNCTION f_insert( varchar, --firstname varchar, --lastname varchar --email ) RETURNS INT AS ' BEGIN INSERT INTO client (firstname, lastname, email) VALUES ($1, $2, $3); RETURN 0; END; ' LANGUAGE plpgsql;
Controlling the number of downloads is as easy as defining a
CHECK constraint on the
counter column in the
CONSTRAINT max_download CHECK(counter <= 3)
The column value increments by one for each download. When the value of
counter exceeds the
CONSTRAINT value, the SQL query fails,
rejecting additional downloads.
Implementing a counter requires an
UPDATE query and a trigger, which fires a
function that increments the
The new query in down.php becomes:
$query = "UPDATE client SET counter= -1 WHERE md5(firstname||lastname||email) = '$str_download'";
CREATE OR REPLACE FUNCTION f_tr_update() RETURNS TRIGGER AS ' BEGIN IF NEW.counter = -1 THEN NEW.counter= OLD.counter + 1; END IF; RETURN NEW; END; ' LANGUAGE plpgsql;
-1 is a dummy variable, which flags the update trigger that an
increment is about to occur. In addition, the presence of the flag allows the
DBA to update records that have nothing to do with a download request.
The command that creates the trigger is:
CREATE TRIGGER tr_update BEFORE UPDATE ON client FOR EACH ROW EXECUTE PROCEDURE f_tr_update();
The Email Conundrum
Obtaining a valid email address is our single most critical validation. Although much of the previous discussion has gone a long way to mitigating the problem, there's still the challenge of validating the domain name itself. Fortunately, PHP contains a function that provides a readymade solution.
Insert these two lines of code into downLoadVerification.php:
$email_domain = explode("@",$email); if (!checkdnsrr($email_domain,"MX")) die($errormsg);
explode() function parses the email address based on the
@ symbol and
checkdnsrr() contacts the Domain Name
Server and verifies that the domain name has a working SMTP server.
This article has covered two issues: how to download a binary through an interactive process of filling out a form, and how to protect your site through the efficient use of validation in a PostgreSQL database. Creating an interactive web site can be a rewarding experience, if you do it right.
Robert Bernier is the PostgreSQL business intelligence analyst for SRA America, a subsidiary of Software Research America (SRA).
Return to the PHP DevCenter