The previous advice is nice and reasonable, especially in an ideal world where third-party software is bug free and everything is configured correctly. Unfortunately, the real world is a bit different.
For instance, you must be aware that your application doesn't work on some buggy browser, or that it cannot work in specific circumstances with some database. Also, you may have a nice and comprehensive test suite that runs flawlessly on your development machine, but the application may not work correctly when installed on a different machine because the database could be installed improperly, or the mail server settings could be incorrect, or the internet connection could be down. In the same vein, if you want to really be sure that when the user--with a specific browser in a specific environment--clicks on that button she gets that result, you have to emulate exactly that situation.
It looks as if this advice takes you back to square one, where you need to test everything. Hopefully you've learned something in the process: whereas in principle you would like to test everything, in practice you can effectively prioritize your tests, focusing on some more than on others, and splitting them into categories to run separately at different times.
You definitely need to test that the application is working as intended when deployed on a different machine. From the failures to these installation tests, you may also infer what is wrong and correct the problem. Keep these installation tests--those of the environment where your software is running--decoupled from the unit tests checking the application logic. If you're sure that the logic is right, then you are also sure that the problems are in the environment and can focus your debugging skills in the right area.
In any case, you need to have both high-level (functional, integration, and installation) tests and low-level (unit tests, doctests) tests. High-level tests include those of the user interface. In particular, you need a test to make sure that if a user clicks on x he gets y, so you are sure that the internet connection, the web server, the database, the mail server, your application, and the browser all work nicely together. Beware not to focus on these global kinds of tests. You don't need to write thousands of these high-level tests if you already have many specific low-level tests checking that the logic and the various components of your application are working.
Having structured your application properly, you need a smaller number of user interface tests, but you still need at least a few. How do you write those tests, then?
There are two possibilities: the hard way and the easy way.
The hard way is just doing everything by hand, using your favorite programming language web libraries to perform
POST requests and verify the results. The easy way is to leverage tools built by others. Of course, internally these tools work just by calling the low-level libraries, so it is convenient to say a couple of words on the hard way just to understand what is going on, in case the high-level tools give you some problem. Moreover, there is always the possibility than you need something more customized, and knowledge of the low-level libraries can be valuable.
Any modern programming language has libraries to interact with the HTTP protocol; here, I will give examples in Python, since it is a common and readable language for web programming. Python's urllib libraries manage the interaction with the Web. There are two of them: urllib, which works in the absence of authentication, and urllib2, which can also manage cookie-based authentication. A complete discussion of these two libraries would take a long time, but explaining the basics is pretty simple. I will give just a couple of recipes based on urllib2, the newest and most powerful library.
The support for cookies in Python 2.4 has improved (essentially by including the third-party ClientCookie library), so you may not be aware of the trick I am going to explain, even if you have used the urllib libraries in the past. So, don't skip the next two sections. ;)
Suppose you want to access a site that does not require authentication. Making a
GET request is pretty easy: at the interpreter prompt, just type
>>> from urllib2 import urlopen >>> page = urlopen("http://www.example.com")
Now you have a filelike object that contains the HTML code of the page http://www.example.com/:
>>> for line in page: print line, <HTML> <HEAD> <TITLE>Example Web Page</TITLE> </HEAD> <body> <p>You have reached this web page by typing "example.com", "example.net", or "example.org" into your web browser.</p> <p>These domain names are reserved for use in documentation and are not available for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt">RFC 2606</a>, Section 3.</p> </BODY> </HTML>
If you try to access a nonexistent page or your internet connection is down, you will get an
urllib2.URLError instead. Incidentally, this is why the
urllib2.urlopen function is better than the older
urllib.urlopen, which would just silently retrieve a page containing the error message.
You can easily imagine how to use
urlopen to check your web application. For instance, you could retrieve a page, extract all the links, and verify that they refer to existing pages; or verify that the retrieved page contains the right information, for instance by matching it with a regular expression. In practice,
urlopen (possibly coupled with a third-party HTML parsing tool, such as Beautiful Soup) gives you all the fine-grained control you may wish for.
urlopen gives you the possibility to make a
POST: just pass the query string as the second argument to
urlopen. As an example, I will make a
POST to http://issola.caltech.edu/~t/qwsgi/qwsgi-demo.cgi/widgets, which is a page containing the example form coming with Quixote, a nice, small Pythonic web framework.
>>> page = urlopen("http://issola.caltech.edu/~t/qwsgi/qwsgi-demo.cgi/widgets", ... "name=MICHELE&password=SECRET&time=1118766328.56") >>> print page.read() <html> <head><title>Quixote Widget Demo</title></head> <body> <h3>You entered the following values:</h3> <table> <tr><th align="left">name</th><td>MICHELE</td></tr> <tr><th align="left">password</th><td>SECRET</td></tr> <tr><th align="left">confirmation</th><td>False</td></tr> <tr><th align="left">eye colour</th><td><i>nothing</i></td></tr> <tr><th align="left">pizza size</th><td><i>nothing</i></td></tr> <tr><th align="left">pizza toppings</th><td><i>nothing</i></td></tr> </table> <p>It took you 163.0 sec to fill out and submit the form</p> </body> </html>
page will contain the result of your
POST. Notice that I had to explicitly pass a value for
time, which is an hidden widget in the form.
That was easy, wasn't it?