Monitoring with SNMP, Part 3: Automate active monitoring with Nagios

My last post showed how to monitor networked devices with SNMP. You could try to remember to manually check the status of things periodically, but that would be missing the point of computers. Instead, automate your monitoring with Nagios, a web-based monitoring tool for Linux that automates the process of actively querying devices and doing something with the information. Nagios is available as free open source software (Nagios Core), and the company offers additional non-free products with premium features. The open-source version is fine for getting started and setting up basic monitoring. Nagios does a lot more than just SNMP monitoring. I’ll refer you to the Nagios Core documentation to get Nagios up and running, and I’ll focus on how to set up Nagios to actively monitor devices with SNMP.

Step 1: Retrieve information manually using snmpget

In Part 2, I described how to use snmpwalk and snmpget to query a device. You have to get that operation working manually before you can automate it! Please read the previous post and get SNMP working before proceeding.

Step 2: Run the check_snmp plugin manually

Nagios uses the check_snmp plugin to query devices via SNMP. The best way to get started is to run the plugin manually on the command line and get everything working. Then, edit the Nagios configs (or update the nconf database) with the parameters from the command line. For example, in my previous post I used snmpget to check the total current on a PDU. The same information can be obtained through check_snmp:
/usr/lib64/nagios/plugins/check_snmp --protocol=2c --community=tripplite --hostname=pdu-01 --oid=UPS-MIB::upsOutputCurrent.1 --warning=140 --critical=150

SNMP OK - 60 | UPS-MIB::upsOutputCurrent.1=60

The arguments warning and critical set thresholds that determine when Nagios when generate a warning or a critical alert. In this case, if the value returned from the SNMP query is > 140 a warning is generated, and a critical alert occurs if the result is > 150. As you can see, since only 60 tenth-amps are being drawn, Please read the documentation for threshold syntax.

Step 3: Define services to automate SNMP checks

Once you have the check_snmp command working manually, it’s time to add a service to Nagios. Here is what the above command might look like when defined as a service in Nagios:
define service {
service_description Check current draw
check_command check_snmp!--protocol=2c --community=tripplite --oid=UPS-MIB::upsOutputCurrent.1 --warning=140 --critical=160
hostgroup_name pdus,ups,ats
use generic-service
}

Note that I have created three host groups for PDUs, UPSs, and automatic transfer switches (ATS). This service will run on any host that is added to any of those groups. It’s a good idea to structure your Nagios configs so that hosts are part of host groups, and services run on host groups.

References

  1. Nagios Core Documentation
  2. Nagios Plugin Docs
  3. Nagios Plugin Development Guidelines

Leave a Reply