Top Five Open Source Packages for System Administrators
Pages: 1, 2, 3, 4
Object Configuration Files
The bulk of Nagios configuration occurs in the object configuration files. These files define hosts and services to be monitored, how various status conditions should be interpreted, and what actions should be taken when they occur. These files are used to define the following items:
Hosts: Computers and other network devices
Host Groups: Named groups of hosts
Services: Important daemons providing specific network services
Contacts: User to be contacted in the event of a problem
Contact Groups: Named groups of contacts
Time Periods: Day and/or time ranges within a week, used to specify when checks are to be performed, notifications are to be sent, and the like
Commands: Commands to be run for all purposes (host/service checking, notifications, event handling, and so on). Nagios provides two files containing many predefined commands: checkcommands.cfg and misccommands.cfg.
Host Dependencies: Specifications of host reachability dependencies. When an intermediate host is down, checks are skipped for all hosts that are dependent on that one.
Service Dependencies: Specifications of service dependency requirements. When a service host is down, checks are skipped for all other services that are dependent on it.
Host Escalations: Definitions of optional escalation levels for host problems
Host Group Escalations: Definitions of optional escalation levels for host groups
Service Escalations: Definitions of optional escalation levels for failed services
The items in red will need to be defined for virtually every Nagios installation; the ones in black are optional. In the sample Nagios configuration provided with the package, each type of object is defined in a separate configuration file (named after the object type, excluding any spaces). However, you can arrange your definitions in any form that makes sense to you.
Hosts and Host Groups
All of these items are defined via templates: named sets of attributes and settings that can be easily applied to any number of actual objects. For example, here is a template definition for hosts:
define host{
; Template name
name normal
; This is only a template (not a real host)
register 0
; Host notifications are enabled
notifications_enabled 1
; Command to check if host is available
check_command check-host-alive
; Recheck failures this many times
max_check_attempts
; Repeat failure notifications every 2 hours
notification_interval 120
; When to check (time period name)
notification_period 24x7
; Notify when down, unreachable and on recovery
notification_options d,u,r
; Host event handler is enabled
event_handler_enabled 1
; Event handler command (defined elsewhere)
event_handler host-eh
; Flap detection is disabled
flap_detection_enabled 0
; Save performance data
process_perf_data 1
; Save status information across restarts
retain_status_information 1
}
This template defines a variety of host-monitoring settings (which are explained in the comments following the semicolons). Here is a host definition that uses this template:
define host{
; Template on which to base host
use normal
; Note the attribute is not "name" as above
host_name beulah
; Longer description
alias beulah: SuSE 8.1
; IP address
address 192.168.1.44
; Overrides template value
max_check_attempts 8
}
Other hosts may be defined in a similar way. Host definitions themselves can
also be used as templates, provided that a name attribute is included.
Once hosts have been defined, they may be placed into host groups via directives like this one:
define hostgroup{
hostgroup_name bldg2
alias Building 2
contact_groups admins1
members beulah,callisto,ariadne,leah,lovelace,valley
}
This definition creates the host group named bldg2, consisting of six
hosts (all previously defined via define host directives). The
contact_groups attribute specifies who to send notifications to, and it
is defined elsewhere (as we'll see).
You can use as many host groups as you want to. Hosts can be part of multiple host groups, and host groups themselves may be nested.
Services
Here are two service templates and a service definition:
define service{ ; Define defaults for all services
name generic
register 0
; Check service every 30 minutes
normal_check_interval 30
; Retry failing checks every 3 minutes, up to 5 times
retry_check_interval 3
max_check_attempts 5
event_handler_enabled 1
check_period 24x7
; Repeat notifications for failures every 2 hours
notification_interval 120
notification_period 6to22
; Notify contacts about critical failures/recoveries
notification_options c,r
notifications_enabled 1
contact_groups admins
}
define service{ ; Define the SMTP service
use generic
name generic-smtp
register 0
service_description Check SMTP
check_command check_smtp
event_handler eh_smtp
contact_groups mailadmins
}
define service{ ; Define services to be monitored
use generic-SMTP
; Monitor SMTP for all hosts in this host group
host_groups mailhosts
}
The first template (generic) defines some settings, which can be applied to a variety of service types. The second template, generic-SMTP, uses the first template as a starting point and adds to them in order to create a generic SMTP monitoring service. Specifically, it defines a check command, an event handler, and a contact group that are appropriate for the SMTP service. The final define service stanza sets up SMTP monitoring for all of the hosts in the mailhosts host group.
Contacts and Contact Groups
Here are two stanzas defining a contact and a contact group:
define contact{
contact_name nagadmin
alias Nagios Admin
; When to notify about service problems
service_notification_period 6to22
; When to notify about host problems
host_notification_period 24x7
; Notify on critical problems and recoveries
service_notification_options c,r
; Notify on host down and recoveries
host_notification_options d,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-epager
email nagios-admins@ahania.com
pager $ADMINPAGER$
}
define contactgroup{
contactgroup_name mailadmins
alias Mail Admins
members mailadm,chavez,catfemme
}
The first stanza defines a contact named nagadmin. It also defines
what events to notify this contact about and the time periods during which
notifications should be sent. The commands to use to generate the alerts are
also specified, along with arguments to them (see below).
Time Periods
Time period definitions are quite simple. Here are the definitions of the two time periods we have used so far:
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
define timeperiod{
timeperiod_name 6to22
alias Weekdays, 6 AM to 10 PM
Monday 06:00-22:00
Tuesday 06:00-22:00
Wednesday 06:00-22:00
Thursday 06:00-22:00
Friday 06:00-22:00
}
Note that only the applicable days need be included in the definition.
Commands
The commands referred to in many of the preceding object definitions also must be defined. For example, here is the SMTP service check command definition:
define command{
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$
}
This command runs the check_smtp script stored in the directory
defined in the macro $USER1$ (defined in the resource.cfg file--see
below); this macro conventionally holds the path to the Nagios plug-ins
directory. The command is passed the option -H, followed by the IP address
of the host to be checked (the latter is expanded from the built-in
$HOSTADDRESS$ macro).
You can determine the syntax for any plug-in by running it with the
--help option. You can also extend Nagios by adding custom plug-ins of
your own. See the documentation for details on how to accomplish this.
Event handers are defined in the same way, as in this example:
define command{
command_name eh_smtp
command_line /usr/local/nagios/eh/fix_mail $HOSTADDRESS$ $STATETYPE$
}
Here, we define the command named eh_smtp. It specifies the full path
to a program to run, passing two arguments: the host's IP address and the value
of the $STATETYPE$ macro. This item is set to HARD for critical
failures and SOFT for warnings.
Here are the definitions of commands used for notifications (we've wrapped
the command_line setting for clarity):
define command{
command_name notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios 1.0 *****\n\n
Notification Type: $NOTIFICATIONTYPE$\n\n
Service: $SERVICEDESC$\n
Host: $HOSTALIAS$\n
Address: $HOSTADDRESS$\n
State: $SERVICESTATE$\n\n
Date/Time: $DATETIME$\n\n
Additional Info:\n\n$OUTPUT$" |
/usr/bin/mail -s "** $NOTIFICATIONTYPE$
alert - $HOSTALIAS$/$SERVICEDESC$
is $SERVICESTATE$ **" $CONTACTEMAIL$
}
This command constructs a simple email message using the printf
command and many built-in Nagios macros. It then sends the message using the
mail command, specifying the recipient as the $CONTACTEMAIL$
macro. The latter contains the value of the corresponding email
attribute for the host or service that is generating the alert.