ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Uploading Files with Beans

by Budi Kurniawan
04/05/2001

How many times have you asked yourself or been curious about how the developers at Hotmail or Yahoo Mail process the attachments to your email? Rest assured that you are not the only one. Too often Java Internet developers only concentrate on processing strings from an HTML form, and when asked by the boss if they can do file upload, they have to do some research before they can come back with an answer. File upload is too rarely discussed by even respectable Java literature.

And, with the growth of the Internet, file upload has now also played significant roles beyond email applications. Other Internet/intranet applications such as Web-based document management systems and the likes of "Secure File Transfer via HTTP" require uploading files to the server extensively. This article discusses all you need to know about file upload. But first things first. Before you jump too excitedly into coding, you need to understand the underlying theory: the HTTP request. Knowledge of the HTTP request is critical because when you process an uploaded file, you work with raw data not obtainable from an HttpServletRequest object's methods such as getParameter, getParameterNames, or getParameterValues.

The HTTP Request

Each HTTP request from the Web browser or other Web client applications consists of three parts:

  • A line containing the HTTP request method, the Uniform Resource Identifier (URI), and the protocol and the protocol version
  • HTTP Request headers
  • The entity body

These three parts are explained in the following sections.

The Request Method, URI and Protocol

The first subpart of the first part, the HTTP request method, indicates the method used in the HTTP request. In HTTP 1.0, it could be one of the following three: get, head, or post. In HTTP 1.1, in addition to the three methods, there are four more methods: delete, put, trace, and options. Among the seven, the two methods that are most frequently used are get and post. get is the default method. You use it, for example, when you type a URL such as http://www.onjava.com in the Location or Address box of your browser to request a page. The post method is common too. You normally use this as the value of the <form> tag's method attribute. When uploading a file, you must use the post method.

The second part of the first part, the URI, specifies an Internet resource. A URI is normally interpreted as being relative to the Web server's root directory. Thus, it starts with a forward slash (/) that is of the following format.

/virtualRoot/pageName

For example, in a typical JavaServer Pages application the URI could be the following.

/eshop/login.jsp

More information about URI can be found here.

The third component of the first part is the protocol and the protocol version understood by the requester (the browser). The protocol must be HTTP and the version could be 1.0 or 1.1. Most Web servers understand both versions 1.0 and 1.1 of HTTP. Therefore, this kind of Web server can serve HTTP requests in both versions as well. If you are still using an old HTTP 1.0 Web server, you could be in trouble if your users use modern browsers that send requests using HTTP 1.1 protocol.

Combining the three sub-parts of the first component of an HTTP request, the first component would look like the following.

POST /virtualRoot/pageName HTTP/version

For instance:

POST /eshop/login.jsp HTTP/1.1

The HTTP Request Headers

The second component of an HTTP request consists of a number of HTTP headers. There are four types of HTTP headers: general, entity, request, and response. These headers are summarized in Tables 1, 2 and 3. The response headers are HTTP Response specific, thus not relevant to be discussed here.

Table 1: HTTP General Headers
Header Description
Pragma

The Pragma general header is used to include implementation specific directives that may apply to any recipient along the request/response chain. This is to say that pragmas notify the servers that are used to send this request to behave in a certain way. The Pragma header may contain multiple values. For example, the following line of code inform all proxy servers that relay this request not to use a cached version of the object but to download the object from the specified location:

Pragma: no-cache
Date

The Date general header represents the date and time at which the message was originated.


Table 2: HTTP Entity Headers.
Header Description
Allow

This header lists the set of method supported by the resource identified by the requested URL. The purpose of this field is strictly to inform the recipient of valid methods associated with the resource. The Allow header is not permitted in a request using the post method, and thus should be ignored if it is received as part of a post entity. For instance,

Allow: get, head
Content-Encoding

This header is used to describe the type of encoding used on the entity. When present, its value indicates the decoding mechanism that must be applied to obtain the media type referenced by the Content-Type header. For example,

Content-Encoding:
x-gzip
Content-Length

This header indicates the size of the entity-body, in decimal number of octets, sent to the recipient or, in the case of the head method, the size of the entity-body that would have been sent had the request been a get. Applications should use this field to indicate the size of the entity-body to be transferred, regardless of the media type of the entity. A valid Content-Length field value is required on all HTTP/1.0 request messages containing an entity-body. Any Content-Length header greater than or equal to zero is a valid value. For example,

Content-Length:
32345
Content-Type

The Content-Type header indicates the media type of the entity-body sent to the recipient or, in the case of the head method, the media type that would have been sent had the request been a get. For example,

Content-Type:
text/html
Expires

The Expires header gives the date and time after which the entity should be considered invalid. This allows information providers to suggest the volatility of the resource or a date after which the information may no longer be accurate. Applications must not cache this entity beyond the date given. The presence of an Expires header does not imply that the original resource will change or cease to exist at, before, or after that time. However, information providers should include an Expires header with that date. For example,

Expires: Thu, 29
Mar 2001 13:34:00 GMT
Last-Modified

The Last-Modified header indicates the date and time at which the sender believes the resource was last modified. The exact semantics of this field are defined in terms of how the recipient should interpret it. If the recipient has a copy of this resource that is older than the date given by the Last-Modified field, that copy should be considered stale For example,

Last-Modified: Thu, 10 Aug 2000
12:12:12 GMT

Table 3: HTTP Request Headers
Header Description
From

The From header specifies who is taking responsibility for the request. This field contains the email address of the user submitting the request. For example,

From: dragonlancer@labsale.com
Accept

This header contains a semicolon-separated list of MIME representation schemes that are accepted by the client. The server uses this information to determine which data types are safe to send to the client in the HTTP response. Although the Accept field can contain multiple values, the Accept line itself can also be used more than once to specify additional accept types (this has the same effect as specifying multiple accept types on a singe line). If the Accept filed is not used in the request header, the default accepts types of text/plain and text/html are assumed. For example,

Accept:
text/plain; text/html Accept; image/gif; image/jpeg
Accept-Encoding

This header is very similar to the accept header in syntax. However, it specifies the content-encoding schemes that are acceptable in the response. For instance,

Accept-Encoding: x-compress; x-zip
Accept-Language

This header is also similar to the Accept header. It specifies the preferred response language. The following example specifies English as the accepted language:

Accept-Language: en
User-Agent

The User-Agent, if present, specifies the name of the client browser. The first word should be the name of the software followed by a slash and an optional version number. Any other product names that are part of the complete software package may also be included. Each name/version pair should be separated by white space. This field is used mostly for statistical purposes. It allows servers to track software usage and protocol violation. For example,

User-Agent: Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)

Referer

This header specifies the URI that contained the URI in the request header. In HTML, it would be the address of the page that contained the link to the requested object. Like the User-Agent header, this header is not required but is mostly for the server's statistical and tracking purpose. For example,

Referer:
http://localhost/Atoms/Details.htm
Authorization

The Authorization header contains authorization information. The first word contained in this header specifies the type of authorization system to use. Then, separated by white space, it should be followed by the authorization information such as a user name, password, and so forth. For example,

Authorization: user ken:dragonlancer
If-Modified-Since

This header is used with the GET method to make it conditional. Basically, if the object hasn't changed since the date and time specified by this header, the object is not sent. A local cached copy of the object is used instead. For example,

If-Modified-Since: Thu, 10 Aug 2000 12:12:29 GMT

Pages: 1, 2, 3, 4

Next Pagearrow