Writing SNMP based checks


This article is obsolete and may be not valid anymore!

1. Preparations and SNMP basics

SNMP based checks work exactly like the agent based ones with one exception: Instead of using a section from an agent's output you specify a list of subtrees in an SNMP MIB as data source. Check_MK will retrieve each of the subtrees with a separate snmpwalk and combine the output into a table compatible with those used by the agent based checks.

1.1. SNMPv1 versus SNMPv2

Before we can start I want to say a few words about the different SNMP versions. Check_MK supports three of them: v1, v2c and v3. When calling snmpwalk you always have to specify the SNMP version to use (-v1, -v2c or -v3) Please note that:

  • Some information is only available via -v2c and -v3!
  • Some older devices do not support -v2c and -v3, however
  • -v3 is about encryption and security and inconvenient for testing.

Do not wonder that you cannot find any 64 bit counters when using -v1. SNMPv1 does not support them. So rather try -v2c in that case. Please note, that users must declare such SNMPv2c-hosts in bulkwalk_hosts in main.mk.

1.2. Vendor specific MIBs

Many people have an unclear understand of what a MIB (file) is. Some assume that with the installation of a MIB file a monitoring system automatically knows how to monitor the device in question. The truth is that:

  • A MIB file is just a translation from numeric OIDs and enumeration values into texts.
  • It also might contain some human readable explanation of its variables.
  • Check_MK does not need (nor use) MIB files.

MIB files are quite helpful during the development of checks since they give us information about the existing OIDs and their meaning. They help finding the OIDs interesting for the monitoring and the possible values they can have. There are two ways for installing a MIB file:

  1. Install it as root in /usr/share/snmp/mibs
  2. Install it as user in ~/.snmp/mibs/ (do a mkdir -p ~/.snmp/mibs before this)

If you correctly have installed a MIB files, snmpwalk will show names instead of numbers at the places in question.

1.3. Finding the correct OIDs

If you have setup your SNMP environment, then next step in implementing our own SNMP check is to find the interesting places (OIDs) in the whole tree of SNMP variables your device supports.

In this tutorial we want to write a simple check for testing the operational state of network interfaces. Check_MK already ships a powerful check that can do this and much more (if/if64). But as an example this will do perfectly (and it will work on almost every SNMP device).

After some investigation with snmpwalk we find the place in the OID tree where information about local network interfaces is provided:

root@linux# snmpwalk -v1 -c public 192.168.56.2 ifTable
IF-MIB::ifIndex.1 = INTEGER: 1
IF-MIB::ifIndex.2 = INTEGER: 2
IF-MIB::ifIndex.3 = INTEGER: 3
IF-MIB::ifDescr.1 = STRING: lo
IF-MIB::ifDescr.2 = STRING: eth0
IF-MIB::ifDescr.3 = STRING: eth1
IF-MIB::ifType.1 = INTEGER: softwareLoopback(24)
IF-MIB::ifType.2 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.3 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifMtu.1 = INTEGER: 16436

Since Check_MK in general works with numeric OIDs, lets call snmpwalk again, this time with the option -On:

root@linux# snmpwalk -On -v2c -c public 192.168.56.2 ifTable
.1.3.6.1.2.1.2.2.1.1.1 = INTEGER: 1
.1.3.6.1.2.1.2.2.1.1.2 = INTEGER: 2
.1.3.6.1.2.1.2.2.1.1.3 = INTEGER: 3
.1.3.6.1.2.1.2.2.1.2.1 = STRING: lo
.1.3.6.1.2.1.2.2.1.2.2 = STRING: eth0
.1.3.6.1.2.1.2.2.1.2.3 = STRING: eth1
.1.3.6.1.2.1.2.2.1.3.1 = INTEGER: softwareLoopback(24)
.1.3.6.1.2.1.2.2.1.3.2 = INTEGER: ethernetCsmacd(6)
.1.3.6.1.2.1.2.2.1.3.3 = INTEGER: ethernetCsmacd(6)
.1.3.6.1.2.1.2.2.1.4.1 = INTEGER: 16436
.1.3.6.1.2.1.2.2.1.4.2 = INTEGER: 1500
.1.3.6.1.2.1.2.2.1.4.3 = INTEGER: 1500
.1.3.6.1.2.1.2.2.1.5.1 = Gauge32: 10000000
.1.3.6.1.2.1.2.2.1.5.2 = Gauge32: 10000000
.1.3.6.1.2.1.2.2.1.5.3 = Gauge32: 10000000
.1.3.6.1.2.1.2.2.1.6.1 = STRING:
.1.3.6.1.2.1.2.2.1.6.2 = STRING: 8:0:27:38:45:b1
.1.3.6.1.2.1.2.2.1.6.3 = STRING: 8:0:27:f4:e2:e
.1.3.6.1.2.1.2.2.1.7.1 = INTEGER: up(1)
.1.3.6.1.2.1.2.2.1.7.2 = INTEGER: up(1)
.1.3.6.1.2.1.2.2.1.7.3 = INTEGER: up(1)
.1.3.6.1.2.1.2.2.1.8.1 = INTEGER: up(1)
.1.3.6.1.2.1.2.2.1.8.2 = INTEGER: up(1)
.1.3.6.1.2.1.2.2.1.8.3 = INTEGER: down(2)

I marked the interesting OIDs in the example output. The longest common prefix of all relevant OIDs is .1.3.6.1.2.1.2.2.1 in our case. We'll remember that for later. Now we have to decide, which subtrees are needed. In our case we will use the following sub OIDs:

2The name of the interface
3The type of the interface
8The operational state of the interface

1.4. Declaration of the OIDs in the check file

Now let's create a check file with the name mynics. The needed OIDs are declared in the array snmp_info. You specify a pair of

  • The base OID (= the prefix)
  • A python list with the sub OIDs to retrieve below this base OID
/usr/share/check_mk/checks/mynics
snmp_info["mynics"] = ( ".1.3.6.1.2.1.2.2.1", [ "2", "3", "8" ] )

Note: If you are using OMD, then you put your check files as site user directly into your site's local area into ~/local/share/check_mk/checks.

As soon as an snmp_info entry exists for a check Check_MK knows that it is of type SNMP (and not TCP). During inventory and during checking Check_MK will fetch the three sub-OIDs 2, 3 and 8 below the prefix .1.3.6.1.2.1.2.2.1 with three separate calls to snmpwalk.

2. The implementation of the check

2.1. Dummy implementation

Just as in tutorial for agent based checks, we first write dummy functions and output the data arriving at the inventory function. Here is a complete check implementation:

mynics
def inventory_mynics(info):
   # Debug: lets see how the data we get looks like
   print info
   return []

def check_mynics(item, params, info):
   return (3, "UNKNOWN - not yet implemented")

check_info["mynics"] = {
    "check_function"        : check_mynics,
    "inventory_function"    : inventory_mynics,
    "service_description"   : "NIC %s",
    "snmp_info"             : ( ".1.3.6.1.2.1.2.2.1", [ "2", "3", "8" ] )
}

An explanations of check_info for those who have not read the tutorial for agent based checks:

  • The first entry is the check function
  • The second entry is the Nagios service description. The %s will be replaced with the check item. If your check does not have an item (uses None) then you must not use a %s.
  • The third entry is 1 if the check outputs performance data and 0 otherwise.
  • The forth entry is the inventory function or no_inventory_possible if you check does not support inventory.

We know that that declaration is a bit strange. This has historical reasons, of course. During the version 1.1.11 it will probably cleaned up...

2.2. A first test of our check

Before we can try a first inventory, we must declare the host as an SNMP host in main.mk. Otherwise Check_MK will not even contact the host via SNMP (this has changed in version 1.1.9. Please read the migration notes for details).

Simply add the host tag snmp:

main.mk
all_hosts = [
 "192.168.56.2|snmp",
]

A test inventory of that host now shows this debug output:

root@linux# check_mk --checks mynics -I  192.168.56.2
[['lo', '24', '1'], ['eth0', '6', '2'], ['wlan0', '6', '1'], ['usb0', '6', '2'],
['vboxnet0', '6', '2'], ['pan0', '6', '2']]

If your SNMP info is more complex, then that output quickly becomes unreadable. Fortunately Python provides the module pprint for pretting printing Python objects. Here is a variant that uses pprint:

mynics
def inventory_mynics(info):
   # Debug: lets see how the data we get looks like
   import pprint ; pprint.pprint(info)
   return []

And its output:

root@linux# check_mk -I mynics 192.168.56.2
[['lo', '24', '1'],
 ['eth0', '6', '2'],
 ['wlan0', '6', '1'],
 ['usb0', '6', '2'],
 ['vboxnet0', '6', '2'],
 ['pan0', '6', '2']]

You might have noticed from this that Check_MK transforms the SNMP output such that each item is in one line (one list). So from now on everything works exactly like at agent based checks.

When looping over the lines in info, we can make use of Python's list assignment feature and directly unpack each line into the three variables nic, type and state:

mynics
def inventory_mynics(info):
   for nic, type, state in info:

Now let our inventory function skip the loopback device lo, since it is its task to decide which items make sense being checked. The loopback device certainly does not. For that purpose we have declared the second column in snmp_info: the interface type. As a look into the MIB file /usr/share/snmp/mibs/IANAifType-MIB.txt will confirm, the type 6 means ethernet and is used for "normal" interfaces. Furthermore we only want to monitor NICs that are currently up.

When making comparisons please have in mind, that - though SNMP is sometimes sending numbers - Check_MK provides everything as strings. So we need to check for type "6" and state "1" (up):

def inventory_mynics(info):
   for nic, type, state in info:
       if type == "6" and state == "1":

So what if we find a match? Simply add a pair for item and parameter to the inventory. Since our check does not use a parameter, we specify None as second argument. Here is the complete inventory function:

def inventory_mynics(info):
   for nic, type, state in info:
       if type == "6" and state == "1":
	  yield, nic, None

An inventory will now find one check for our host:

root@linux# check_mk --checks mynics -I 192.168.56.2
mynics                1 new checks

2.3. The check function

For each item (in this case NIC) the check function will be called once, with the following arguments:

  • The item (e.g. "eth0")
  • The parameters for the check
  • The agent data (just the same as for inventory)

How to work with parameters is explained in the tutorial for agent based checks. It is good style to name the argument _no_params if you check does not process any parameters.

The check now is free in how to computes its result and returns a pair or triple of values:

  • The Nagios status (0, 1, 2 or 3
  • The plugin output for Nagios (free text)
  • optional: performance data

Our check just checks the operations status of the NIC and returns an according state:

def check_mynics(item, _no_params, info):
   for nic, type, state in info:
      if nic == item:
	 if state == "1":
	    return 0, "OK - link is up"
	 else:
	    return 2, "CRITICAL - link is " + state

Now we can try and check the host:

root@linux# cmk -nv localhost
Check_mk version 1.1.9i9
NIC eth0             OK - link is up

2.4. What's next?

If you got so far, you can further improve your check, for example:

If you thing that your check is really useful and also well implemented then consider donating it to the official Check_MK project. Our check development guidelines tell you what criteria a check must fullfil to be accepted into our distribution.

If your check ist not completely matching our guidelines but still of use for others than you can make an MKP package out of it and upload it to the Check_MK Exchange.