An Amble Through Apache Configuration
by Rael Dornfest03/02/2000
The next attraction on our Apache tour is an amble through httpd.conf, the web server's main configuration file.
At first glance, the httpd.conf file can seem intimidating. Heck, the third introductory paragraph reads, "You have been warned." But despite this rather ominous beginning, Apache is surprisingly simple to configure thanks to its well-thought-out default settings.
In this article, I'll cover a selection of Apache configuration directives, the settings that define how Apache should actually run. These are things like: where files are located on your server, how much of the machine's resources Apache may use, which content visitors are allowed to see, and how many concurrent visitors the server can handle. I won't describe every directive as some are self-evident, others are perfectly fine left at the default, and still others are so involved they warrant their own columns.
Defaults and Why They're Default
Apache's default settings came about through a collaborative effort between the core development team and Apache end-users -- people like you. As a result, a newcomer can relatively easily download Apache, unpack it from its virtual box, plug it in, and get it running in only a few minutes.
Don't, however, confuse simplicity for lack of power. As we amble through httpd.conf, you'll notice scores of directives we'll pass up without mention. Heed the warning at the beginning of the file to read the documentation carefully before trampling through some of the more complicated settings. Beyond the defaults, the httpd.conf file reflects the preferences of the user and localized security issues.
Visit http://www.apache.org/docs for detailed documentation on all Apache runtime configuration directives.
Access.conf, Srm.conf, and Why You Don't Care
Take a gander at the contents of your Apache installation's configuration
directory, called either conf or etc
depending upon your layout preferences and installation method.
Mine (a source install using the default prefix and layout) is
/usr/local/apache/conf. The binary installs I've seen
tend to be the same. The Apache 1.3.9 RPM installation under Red Hat
6.1 creates a conf directory at /etc/httpd/conf.
% cd /usr/local/apache/conf
%ls
access.conf httpd.conf magic mime.types srm.conf
You'll notice a few other files aside from our friend, httpd.conf.
For simplicity's sake, the two other .conf files, access.conf
and srm.conf have been deprecated in favor of consolidating
all configuration directives inside httpd.conf. While, in some
instances, it may make sense to keep a few directives in these two
files (or another arbitrary file), it's not a particularly standard
practice anymore. For further information, take a look at the
AccessConfig
and
ResourceConfig
directives.
As for mime.types and magic, if enough readers
are interested, I'll discuss MIME-types and the mod_magic module in later columns.
Diving In
Alright, let's dive in. Open your httpd.conf file in the text editor of your choice.
httpd.conf is organized in the the following manner:
- Section 1. Global Environment
Configuration directives dealing with the Apache server itself. - Section 2. "Main" server configuration
The parameters of the default server, as opposed to ... - Section 3. Virtual Hosts
Parameters specific to a virtual host, which override some of the main server configuration defaults.
I don't cover Virtual Hosts in this column, but if you're dying to give this feature a whirl, visit the Resources section at the end of this article.
Section 1. Global Environment
KeepAlive
The HTTP protocol is "stateless," meaning that each request/response
pair between web browser and server is independent. If, for example,
you visit a web page that contains three embedded images, your browser
actually makes four separate connections to that web server -- one
for the page itself, and one for each of the images in turn.
KeepAlive, an extension to HTTP, provides a persistent connection between browser and server so that the same connection can handle multiple request/response pairs. The result is a drop in latency, or the time taken up by establishing a connection.
I'll leave the details of this set of directives to the more-than-ample httpd.conf comments.
Server-Pool Regulation
Apache under Unix is multi-process, meaning that each request is
handled by a separate copy, or child process of the httpd program.
Win32 Apache is multi-threaded -- the server handles each request
internally rather than generating another instance of the program.
If this sounds like bad elevator music to you, don't worry about it
-- unless you're running under Win32, in which case I direct you to
Apache.org's
"Using Apache With Microsoft Windows" documentation.
Server-pool regulation balances the overhead required to spawn child processes with the memory and processor resources associated with running multiple copies of the httpd program. While the defaults are a reasonable place to start, it's only by watching a) the number of httpd processes running at any one point, and b) your server's memory and CPU usage when you're receiving the largest number of concurrent hits that you can start tuning Apache to your particular circumstances.
For example,
say you're running an old 486 with little memory as your experimental web
server and you want the bare minimum of resources devoted to Apache. You
might set MinSpareServers, MaxSpareServers,
and StartServers to a very low number. Someone running
an overloaded server which handles mail, news, and web traffic might want
to limit MaxClients. That way, when their site is rumored
to be the last place to purchase the toy-to-have du jour, the sudden
flurry of web hits won't disrupt mail services.
It boils down to this: Watch, tune, watch, tune. Incidentally, if you have a particularly nice configuration you're willing to share, please post a message on the O'Reilly Network Apache Forum.
Section 2. 'Main' Server Configuration
We now wander into Section 2 to take a peek at some of the basic directives that define the server's main web site.
Port
Think of a port as a television channel. Just as you
expect to find "The B-Movie Channel" consistently on channel 123, web
browsers expect to find web servers at port 80. (This analogy is an
oversimplification of how ports work, but it'll do for the purposes of
our discussion.) This doesn't mean that your web server has to run on
port 80 -- this is only true if you want your web site to be found.
Suppose you wish to hide a test or experimental server from the outside world. You have a DSL line and have configured your router to only allow requests directed at port 80, served by your public web server. Configure your experimental server to listen on a different port number -- 8000 for instance. You can choose any port number as long as the port is not reserved for use by another service, and the number is greater than 1023 if you're not running as root.
To visit a web site hosted on a port other than 80, your visitors must
include the port number in the URL they type into their browsers.
For example, to visit the web site at port 8000 on your local machine,
use the following URL: http://localhost:8000
ServerAdmin
(default: you@your.address)
I'm always surprised by the occasional "500 Internal Server Error"
message I encounter directing me to send e-mail to the server
Administrator at you@your.address. Be sure to set the
ServerAdmin directive so that your visitors
don't have to resort to guesswork to report problems they
have with your site.
DocumentRoot
(default: usually {ServerRoot}/htdocs)
As the name suggests, this is, in the simplest case, the location where
the static content of your web site lives -- HTML files, images, or sounds -- content that doesn't change on a request-by-request basis.
This is where you would store the files you wish to make available for
public viewing. Most folks use the directory as the root of an organized
hierarchy of directories. Here's an example:
- htdocs
The top-level web pages such as the site's home page, "about us" page, site map, etc.
- htdocs/images
A repository for images used throughout the web site
- htdocs/services
Pages having to do with our services
- htdocs/products
Pages having to do with our products
htdocs/products/images
Pictures of our products
On occasion, non-static content such as Server Side Includes (SSI), embedded PHP scripting code, and CGI scripts (to name a few examples) resides right along with static content in the document root directory. Be sure, however, to think this strategy through -- you must understand how these dynamic content generators affect your site's performance and security. For more information on some of the dynamic content generators I mention here, visit the Resources section at the end of this article.
ScriptAlias
(default: usually {ServerRoot}/cgi-bin/)
CGI scripts, historically the most common dynamic content generators,
usually reside outside of the main document tree in the location specified
by the ScriptAlias directive. The ScriptAlias directive
indicates that anything in the specified directory should be run as
a program rather than simply sent to the browser as a file.
DirectoryIndex
(default: index.html)
You've probably noticed when visiting a web site that the URL you enter
to get there usually doesn't necessarily contain the name of the document
itself; the same holds true when clicking on many links within a site.
The URL looks something like http://www.oreillynet.com rather
than http://www.oreillynet.com/documentname.html. Behind the
scenes, the web server is looking in the document root directory for the
file specified in the DirectoryIndex directive to display by default. In
other words, while you enter only http://www.oreillynet.com
into your browser, you're probably actually viewing a document called
index.html within the O'Reilly Network web site's document root.
By default, httpd.conf specifies this document as index.html; this, as with almost everything else in Apache, is configurable. If you're used to the Windows three-character file suffix limit or are in an environment where some folks will be editing documents for online publication under Windows and uploading these to your server, you might use index.htm as your default. Some servers assume a default.html or default.htm document. Thankfully, you may specify a space-delimited list of one or more directory indexes in order of preference from left to right:
DirectoryIndex index.html index.htm default.html default.htm
If you're using a dynamic content generator within your document
hierarchy, you can just as easily designate its extension as the default:
DirectoryIndex index.php index.cgi index.html default.html
Amble Over
And thus ends our brief amble through the Apache server configuration file. I hope you've enjoyed the tour and now have a bit more of a handle on just what Apache configuration is all about (and how wonderfully configurable it is). I may have glossed over a few topics that interest you, so I'll end by suggesting several excellent tutorials and detailed documentation already available on the Web. (Why reinvent the wheel when someone else has already gone to the trouble of constructing perfectly round ones?)
An Important Note on Security and Performance
This article should in no way be considered a comprehensive tutorial. When it comes to security and performance, almost everything's situation-specific. Be careful: educate yourself by reading the documentation, ask an Apache-savvy friend for help, consult your system administrator, and join in on (or just lurk in the corner of) newsgroup and mailing list discussions.Resources
The following is a list of starting points from which to explore some of the topics covered (or not) in this article.
- Apache Documentation
- Virtual Hosts
- ApacheWeek's "Using Virtual Hosts" article
- The Apache Virtual Host Documentation
- Access Control
- The O'Reilly Network Apache DevCenter's "Access Control" Topic bookmarks
- LinuxPlanet's very nice Security and Apache: An Essential Primer tutorial
- Dynamic Content
- mod_cgi, the Apache CGI module
- CGI.pm, A Perl CGI module
- The Perl CGI Programming FAQ
- PHP: Hypertext Preprocessor
- mod_perl, the Apache/Perl Integration project
Tune in Next Time ...
Fun with Logs!
As always, if you'd like me to cover anything in particular in this column, feel free to post your suggestions to the O'Reilly Network Apache Forum.