oreilly.comSafari Books Online.Conferences.


Network Management With OpenNMS

by Shane O'Donnell

So you've deployed your networks, systems, applications, and all of the lubricant necessary to make them work. (Usually, that lubricant comes in the form of aspirin and/or alcohol for the technician, but that's another story altogether.)

Once everything is handily deployed, you suddenly find yourself thinking, "Whew, that's a job well done." But then it dawns on you: not only are you not done, but you are now stuck tending the monster you've just created. Dr. Frankenstein found himself in a similar situation at one point.

What you need is a way to make sure these resources are available to your users, as well as to provide yourself a handy way to consolidate the critical information from each of those devices into a single console. This will allow you to verify system availability, provide a single clearinghouse of all the information you need to do your job, and enable you to keep your sanity intact.

Enter OpenNMS.

OpenNMS was designed from the ground up to be a one-for-one replacement for HP's OpenView, IBM's Tivoli, CA's Unicenter, and the like. With that in mind, the OpenNMS team designed it as a network management tool, complete with SNMP hooks and a system-monitoring tool that can measure the availability of critical network services. It also has a configurable, event-driven messaging subsystem that allows you to plug-in event streams from other sources, such as vulnerability information from Nessus, tailed log files, and /proc-based monitors. And in good open source fashion (OpenNMS is released under the terms of the GPL), the product was designed to leverage preexisting tools where it made sense. Therefore, the SNMP performance data storage and graphing system uses RRDTool (MRTG anyone?), the Web server/JSP container/servlet engine is Apache's Jakarta Tomcat, and the underlying RDBMS is PostgreSQL.

Built from the ground up in Java, the project has covered a lot of territory in a short period of time. Version 0.4 was the first public release at the end of 2000, with 0.9.6 currently available and 1.0 slated for sometime in April. But enough about releases.

Why should you be interested in the tool in the first place?

Good question. The simple fact is that the people who rely on your network, most likely, don't actually care about the network structure. They probably do care about the services provided by it, namely Web servers, databases, and email, and perhaps other services that make accessing those possible, such as DNS and DHCP.

As the network administrator, you have different needs. You need a tool that can help manage the network infrastructure as a means of providing access to these network services. OpenNMS' approach is to manage each device as a host of specific services, whether they be simple network connectivity (e.g., ICMP Pings) or complex Web transactions. Once installed and minimally configured, OpenNMS will automatically discover everything attached to the network and scan those devices for services supported (e.g., HTTP on port 80, SMTP on port 25, etc.). And those scans are deeper than a remedial portscan, actually exercising the protocol instead of just issuing a socket connect.

Once discovered, the services are committed to the database and scheduled to be polled every five minutes by default. This verifies that they are still available. If they don't respond, "critical services" are checked to begin the problem-isolation process. Following that, an appropriate event that reflects the isolated problem is generated. In turn, this event can be configured to create an outage record, invoke some sort of auto-response action (i.e., run a script), create a trouble-ticket, send a page or email notification, and/or eventually end up in OpenNMS' event browser.

But identifying outages is the simple stuff. Once an outage has been determined, what then? OpenNMS implements intelligent behaviors, such as dynamically changing the polling interval based on duration of an outage. For example, polling every 30 seconds during the initial five minutes of a outage, backing off to minute intervals for the next hour, and then correlating the polls to determine the root cause. In practice, that means it would generate one message that says a network interface went down, instead of five separate messages that say various services on that interface were unavailable. But before we can get any deeper into the workings of OpenNMS, let's talk about installing the package(s).

Pages: 1, 2

Next Pagearrow

Sponsored by: