ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Nagios, Part 2

by Oktay Altunergil
09/26/2002

In the first part of this article we've seen what Nagios is and how we can install Nagios and its plugins. We also have briefly looked at what configuration files are necessary and how to install the sample configuration files. Now we will take a look at each configuration file one by one and configure one host 'example.com' and two services on it 'http' and 'ping' to be monitored. If something goes wrong with these services, two users 'oktay' and 'verty' will be notified.

Configuring Monitoring

We first need to add our host definition and configure some options for that host. You can add as many hosts as you like, but we will stick with one host for simplicity.

Contents of hosts.cfg

# Generic host definition template
define host{
 # The name of this host template - referenced i
 name                            generic-host    
 n other host definitions, used for template recursion/resolution
 # Host notifications are enabled
 notifications_enabled           1     
 # Host event handler is enabled   
 event_handler_enabled           1        
 # Flap detection is enabled  
 flap_detection_enabled          1     
 # Process performance data
 process_perf_data               1
 # Retain status information across program restarts       
 retain_status_information       1   
 # Retain non-status information across program restarts    
 retain_nonstatus_information    1       
 # DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST,
 # JUST A TEMPLATE!
 register                        0        
}


# Host Definition

define host{
 # Name of host template to use
 use                     generic-host             

 host_name               example.com
 alias                   An Example Domain
 address                 www.example.com
 check_command           check-host-alive
 max_check_attempts      10
 notification_interval   120
 notification_period     24x7
 notification_options    d,u,r
}

The first host defined is not a real host but a template which other host definitions are derived from. This mechanism can be seen in other configuration files also and makes configuration based on a predefined set of defaults a breeze.

With this setup we are monitoring only one host , 'www.example.com' to see if it is alive. The 'host_name' parameter is important because this server will be referred to by this name from the other configuration files.

Now we need to add this host to a hostgroup. Even though we will keep the configuration simple by defining a single host, we still have to associate it with a group so that the application knows which contact group (see below) to send notifications to.

Contents of hostgroups.cfg

define hostgroup{
        hostgroup_name  flcd-servers
 alias           The Free Linux CD Project Servers
 contact_groups  flcd-admins
 members         example.com
}

Above, we have defined a new hostgroup and associate the 'flcd-admins' contact group with it. Now let's look into the contactgroup settings.

Contents of contactgroups.cfg

define contactgroup{
 contactgroup_name       flcd-admins
 alias                   FreeLinuxCD.org Admins
 members                 oktay, verty
}

We have defined the contact group 'flcd-admins' and added two members 'oktay' and 'verty' to this group. This configuration ensures that both users will be notified when something goes wrong with a server that 'flcd-admins' is responsible for. (Individual notification preferences can override this). The next step is to set the contact information and notification preferences for these users.

The Networking CD Bookshelf

Related Reading

The Networking CD Bookshelf
By O'Reilly Media, Inc.

Contents of contacts.cfg

define contact{
 contact_name                    oktay
 alias                           Oktay Altunergil
 service_notification_period     24x7
 host_notification_period        24x7
 service_notification_options    w,u,c,r
 host_notification_options       d,u,r
 service_notification_commands   notify-by-email,notify-by-epager
 host_notification_commands      host-notify-by-email,host-notify-by-epager
 email                           oktay@example.com
 pager                           dummypagenagios-admin@localhost.localdomain
 }

define contact{
 contact_name                    Verty
 alias                           David 'Verty' Ky
 service_notification_period     24x7
 host_notification_period        24x7
 service_notification_options    w,u,c,r
 host_notification_options       d,u,r
 service_notification_commands   notify-by-email,notify-by-epager
 host_notification_commands      host-notify-by-email
 email                           verty@example.com
 }

In addition to providing contact details for a particular user, the 'contact_name' in the contacts.cfg is also used by the cgi scripts (i.e the Web interface) to determine whether a particular user is allowed to access a particular resource. Although you will need to configure .htaccess based basic http authentication in order to be able to use the Web interface, you still need to define those same usernames as seen above, before the users can access any of the resources even after they are logged in with their username and passwords. Now that we have our hosts and contacts configured, we can start configuring individual services on our server to be monitored.

Contents of services.cfg

# Generic service definition template
define service{
 # The 'name' of this service template, referenced in other service definitions
 name    generic-service  
 # Active service checks are enabled
 active_checks_enabled  1 
 # Passive service checks are enabled/accepted
 passive_checks_enabled  1 
 # Active service checks should be parallelized 
 # (disabling this can lead to major performance problems)
 parallelize_check  1  
 # We should obsess over this service (if necessary)
 obsess_over_service  1  
 # Default is to NOT check service 'freshness'
 check_freshness   0  
 # Service notifications are enabled
 notifications_enabled  1 
 # Service event handler is enabled
 event_handler_enabled  1 
 # Flap detection is enabled
 flap_detection_enabled  1 
 # Process performance data
 process_perf_data  1 
 # Retain status information across program restarts
 retain_status_information 1  
 # Retain non-status information across program restarts
 retain_nonstatus_information 1  
 # DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
 register   0
 }

# Service definition
define service{
 # Name of service template to use
 use    generic-service   

 host_name   example.com 
 service_description  HTTP
 is_volatile   0
 check_period   24x7
 max_check_attempts  3
 normal_check_interval  5
 retry_check_interval  1
 contact_groups   flcd-admins
 notification_interval  120
 notification_period  24x7
 notification_options  w,u,c,r
 check_command   check_http
 }


# Service definition
define service{
 # Name of service template to use
 use    generic-service   

 host_name   example.com
 service_description  PING
 is_volatile   0
 check_period   24x7
 max_check_attempts  3
 normal_check_interval  5
 retry_check_interval  1
 contact_groups   flcd-admins
 notification_interval  120
 notification_period  24x7
 notification_options  c,r
 check_command   check_ping!100.0,20%!500.0,60%
 }

Using the above setup, we are configuring two services to be monitored. The first service definition, which we have called HTTP, will be monitoring whether the Web server is up and notifies us if there's a problem. The second definition monitors the ping statistics from the server and notifies us if the response time increases too much and if there's too much packet loss which is a sign of network trouble. The commands we use to accomplish this are 'check_http' and 'check_ping' which were installed into the 'libexec' directory when we installed the plugins. Please take your time to get familiar with all other plugins that are available and configure them similarly to the above definitions. You can also write your own plugins to do custom monitoring. For instance, there's no plugin to check if Tomcat is up or down. You could simply write a script that loads a default jsp page on a remote Tomcat server and returns a success or failure status based on the presence or lack of a predefined text value (i.e "Tomcat is up") on the page. (In such a case you would need to add a definition for this custom command in your checkcommand.cfg file which we have not touched)

Starting Nagios

Now that we have configured the hosts and the services to monitor, we are ready to fire up Nagios and start monitoring. We will start Nagios using the init script that we had installed earlier.

root@ducati:/usr/local/nagios/etc# /etc/rc.d/rc.nagios start 
Starting network monitor: nagios 
/bin/bash: -l: unrecognized option 
 [ ... ]

If you receive the above error message, it means the 'su' command installed on your server does not support the '-l' option. To fix it, open up /etc/rc.d/rc.nagios (or its equivalent on your system) and remove the 'l' where it says 'su -l'. You will end up with 'su -' which means the same thing. After making the change, run the above startup command again. If you receive 'permission denied' errors. Just reset the ownership information on your Nagios installation directory and it will be resolved.

root@ducati:/usr/local/nagios# chown -R nagios  /usr/local/nagios 
root@ducati:/usr/local/nagios# chgrp -R nagios  /usr/local/nagios

If everything went smoothly, Nagios should now be running. The following command will show you whether Nagios is up and running and the process ID associated with it, if it is indeed running.

root@ducati:/usr/local/nagios# /etc/rc.d/rc.nagios status
  PID TTY          TIME CMD
  22645 ?        00:00:00 nagios

The same command will stop Nagios when called with the 'stop' paramter instead of 'start' or 'status'.

The Web Interface

Although Nagios has already started monitoring and is going to send us the notifications if and when something goes wrong, we need to set up the Web interface to be able to interactively monitor services and hosts in real time. The Web interface also gives a view of the big picture by making use of graphics and statistical information.

Sure enough, we need to have a Web server already set up in order to be able to access the Nagios Web interface. For this article we will assume that we are running the Apache Web server. I will use the exact same configuration that is included in the official Nagios documentation because it works fine.

Addition to httpd.conf


ScriptAlias /nagios/cgi-bin/ /usr/local/nagios/sbin/
<Directory "/usr/local/nagios/sbin/">
 AllowOverride AuthConfig
 Options ExecCGI
 Order allow,deny
 Allow from all
</Directory>

Alias /nagios/ /usr/local/nagios/share/
<Directory "/usr/local/nagios/share">
 Options None
 AllowOverride AuthConfig
 Order allow,deny
 Allow from all
</Directory>

This configuration creates a Web alias '/nagios/cgi-bin/' and directs it to the cgi scripts in your Nagios 'sbin' directory. Assuming your main Web site is set up at http://127.0.0.1, you will be able to access the Nagios Web interface at http://127.0.0.1/nagios/ . At this point, the Nagios Web interface should come up properly, but you will notice that you cannot access any of the pages. You will get an error message that looks like the following.

It appears as though you do not have permission to view information for any of the hosts you requested... If you believe this is an error, check the HTTP server authentication requirements for accessing this CGI and check the authorization options in your CGI configuration file.

This is a security precaution that is designed to only allow authorized people to be able to access the monitoring interface. The authentication is handled by your Web server using Basic HTTP Authentication (i.e. .htaccess). Nagios then uses the credentials for the user who has logged in and matches it with the contacts.cfg contact_name entries to determine which sections of the Web interface the current user can access.

Configuring .htaccess based authentication is easy provided that your Web server is already configured to use it. Please refer to the documentation for your Web server if it's not configured. We will assume that our Apache server is configured to look at the .htaccess file and apply the directives found in it.

First, create a file called .htaccess in the /usr/local/nagios/sbin directory. If you would like to lock up your Nagios Web interface completely, you can also put a copy of the same file in the /usr/local/nagios/share directory.

Put the following in this .htaccess file.

AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
require valid-user

When you're adding your first user, the password file that .htaccess refers to will not be present. You need to run the 'htpasswd' command with the -c option to create the file.

htpasswd -c /usr/local/nagios/etc/htpasswd.users oktay
New password: ******
Re-type new password: ******
Adding password for user oktay

For the rest of your users, use the 'htpasswd' command without the '-c' option so as not to overwrite the existing one. After you add all of your users, you can go back to the Web interface which will now pop up an authentication dialog. Upon successful authentication, you can start using the Web interface. I will not go into detail about using the Web interface since it's pretty self explanatory. Notice that your users will only be able to access information for servers that they are associated with in the Nagios configuration files. Also, some sections of the Web interface will be disabled for everyone by default. If you would like to enable those, take a look at 'etc/cgi.cfg'. For instance, in order to allow the user 'oktay' to access the 'Process Info' section, uncomment the 'authorized_for_system_information' line and add 'oktay' to the list of names delimited by commas.

This is all you need to install and configure Nagios to do basic monitoring of your servers and individual services on these servers. You can then fine tune your monitoring system by going through all of the configuration files and modifying them to match your needs and requirements. Going through all plugins in the libexec directory will also give you a lot of ideas about what local and remote services you can monitor. Nagios also comes with software that can be used to monitor a server's disk and load status remotely. Finally, Nagios comes with so many features that no single article could explain all of it. Please refer to the official documentation for more advanced topics that aren't covered here.

Happy hacking.

Web Resources:

Official Nagios Web Site: http://www.nagios.org
Official NetSaint Web Site: http://www.netsaint.org
Nagios Plugins: http://nagiosplug.sourceforge.net
Nagios ScreenShots: http://www.nagios.org/screenshot.php
htpasswd man Page: http://www.rt.com/man/htpasswd.1.html

Oktay Altunergil works for a national web hosting company as a developer concentrating on web applications on the Unix platform.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.