How to write your own checksNovember 06. 2009
Why not use local checks or MRPE?Using local checks or MRPE for adding your own self-written checks to Check_MK is easy. Even inventory and performance data are supported. So why should you want to write native checks for Check_MK? Well, there can be several reasons:
If one or more of those issues are relevant for you, then you'll find all information needed for writing your own checks in this article. Do I have to learn Python?Well, to be honest: yes - at least to a certain basic degree. People have suggested to change Check_MK such that checks can be written in other languages, as well. I understand this request very well. But from a technical point of view I cannot image how such an integration could be done in a clean, simple and performant way. Check_MK's checks are not standalone programs or scripts but are closely integrated into the check mechanism. They need to have access to some of Check_MK's internal functions. And at the end, for each host one Python program will be created by combining a base and all checks used by that host into one new program. That feature saves about 75% of the CPU ressources when compared to directly calling check_mk for checking. On the other hand, Python is a language which is cleanly designed, elegant and easy learn. I'm sure you'll like it once you have some experiance with it. For this tutorial I assume that you have some basic knowledge of Python. Looking at the code of some of the existing checks might help, if you are new to Python. How Check_MK's checks workEach check consists at least of the following three components:
Two further components are optional but strongly recommended:
The data sourceEverything begins with the data source, i.e. source of the data the check operates on. Currently there are two different kinds of data sources: agent sections (tcp), and SNMP queries (snmp). An agent section is a part of an agents output, for example the output of the Linux command df. An SNMP based data source returns data retrieved by one or several SNMP queries on certain OIDs. Both data sources are presented to the check function as a table (a Python list of lists). We will call that data the "agent data". The inventory functionWhen you want your check to support inventory, then you have to supply an inventory function. That function will extract from the agent data a list of all items to be checked. An item uniquely identifies a thing to be checked on a host within that type of check. Some examples of items are:
Some checks do not need to distinguish different items. That is because the thing they check does exist at most once on a host. An example is the check mem. But check_mk always requires an item, so those checks imply use None as an item. Please note that this does not mean that you cannot do an inventory on mem. It's just that the number of items the inventory returns is at most one. In some cases it is even zero: when the agent output does not contain the information needed for the check. This is a very useful feature and enables the Nagios administrator to automatically perform the right checks on the right operating systems. Your inventory function does not need to worry whether a certain item is already configured manuall or detected by a previous inventory. Check_MK handles this in a general way and makes sure, that only newly detected items are added to the list of services. The check functionWhen an actual check of a host is done, all services for that host will be checked in turn. If it's the turn for your check, Check_MK will call your check function for each item that is automatically or manually configured for that check and host. Just as the with the inventory - your function will be provided with the agent data. In addition it will get the item and the (optional) parameters of the check. The check function then:
This is very similar to what standard Nagios plugins do, with the important difference, that our check is already provided with data from the agent and does not have to retrieve it by itself. The manual pageIf you want to pass your check along to others, a manual page for the check is strongly
recommended. Check_MK has its own concept and syntax of check manuals. You do not need to learn
NROFF syntax or stuff like that. A check manual is a relatively simple text file named after
the check and usually installed in Let's jump to practice: Preparing the agentLet's now jump into practice and write our own check. We start with a tcp based check. That means, that our first step is to prepare our agent to output a new section. For sake of simplicity we will use the Linux agent as an example. S.M.A.R.TFor our example we are going to implement a monitoring of the hardware health of hard disks by using S.M.A.R.T. The linux package smartmontools contains a program named smartctl. On hosts where that utility is available, our agent shall send several hard disk parameters found by that tool. Here is a small demonstration of smartctl: root@linux# smartctl -d ata -A /dev/sda smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 129 127 021 Pre-fail Always - 6541 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 251 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1495 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 246 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 22 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 249 194 Temperature_Celsius 0x0022 108 098 000 Old_age Always - 39 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 From that output we are going to use only the lines containing the word Always. All other lines contain either no or invalid data. We are doing this by appending a simple grep: root@linux# smartctl -d ata -A /dev/sda | grep ' Always ' 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 129 127 021 Pre-fail Always - 6541 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 251 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1496 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 246 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 22 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 249 194 Temperature_Celsius 0x0022 106 098 000 Old_age Always - 41 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 Which values from the output we are going to use for monitoring we do not want and do not have to decide here. The agent simply sends all. That way, we won't have to change the agent when we want to change the way we use the information. Our next issue is: we do not want to hardcode a certain hard disk but query all available hard disks. A simple loop over all hard disk devices in /dev will help here: root@linux# for disk in /dev/[sh]d[a-z] /dev/sd[a-z][a-z] > do smartctl -d ata -A $disk | grep ' Always ' > done 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 129 127 021 Pre-fail Always - 6541 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 251 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 Our last problem is, that the information does not show which hard disk was queried. We solve that issue by using the stream editor to prefix each output line with the device name of the disk: root@linux# for disk in /dev/[sh]d[a-z] /dev/sd[a-z][a-z] > do smartctl -d ata -A $disk | grep ' Always ' | sed "s@^@$disk @" > done /dev/sda 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 /dev/sda 3 Spin_Up_Time 0x0027 129 127 021 Pre-fail Always - 6541 /dev/sda 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 251 /dev/sda 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 Integration into the agentNow we know which command line outputs the data we want. Our next step is to integrate that command into our agent. There are two ways for doing that:
The second method makes updates of the agent to a newer official version simpler. So let's put our code into a script in that directory on each target hosts. Important is, that our script also outputs a section header. That header will be the name of the data source in Check_MK. We decide to use the header smart. /usr/lib/check_mk_agent/plugins/smart #!/bin/sh echo '<<<smart>>>' for disk in /dev/[sh]d[a-z] /dev/sd[a-z][a-z] do smartctl -d ata -A $disk | grep ' Always ' | sed "s@^@$disk @" done Do not forget to make the script executable. Also please make sure that you do not leave editor backup files in that directory flying around: root@linux# cd /usr/lib/check_mk_agent/plugins root@linux# chmod +x smart root@linux# rm *~ We can make sure that everything works by calling the agent's output from our Nagios server and grep for our new section: user@host> check_mk -d Eiger | fgrep -A 5 smart <<<smart>>> /dev/sda 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 /dev/sda 3 Spin_Up_Time 0x0027 129 127 021 Pre-fail Always - 6541 /dev/sda 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 251 /dev/sda 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 /dev/sda 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 Our agent is now ready prepared! Creating a Hello World CheckWriting a check basically means writing a text file containing some Python code. Since the agent section containing our data is named <<<smart>>>, the file our check is implemented in must be named smart and copied to /usr/share/check_mk/checks. Our example check will not examine all SMART information but just one value: Temperature_Celsius. Since further checks using the smart-section might follow in future, we name our check smart.temp. The dot in the name tells Check_MK that the part left of the dot is the agent section providing the data for the check. The following minimal version will do for a first test: /usr/share/check_mk/checks/smart
# the inventory function (dummy)
def inventory_smart_temp(checkname, info):
print info
return [] # return empty list: nothing found
# the inventory function (dummy)
def check_smart_temp(item, params, info):
return (3, "Sorry - not implemented")
# declare the check to Check_MK
check_info['smart.temp'] = \
(check_smart_temp, "SMART drive %s", 0, inventory_smart_temp)
Inventory functionA few explanations: The inventory function is called with two arguments: the check name and the agent data. The check name is useful if you want to use the same inventory function for more than one check. We do not use that information for our check. The second argument is the smart-section of the agent output. Our function simply prints it to standard output for debugging. After that it returns an empty list. That means, that the inventory has found nothing. We will change that soon, of course. The check functionThe check function is called by Check_MK once for each item to be check. It gets three parameters: the item, the check parameters and the agent output. It must return a tuple with the following components:
We omit the performance data in our example and return just a hard coded dummy result. The declaration of the checkThe third section in our example makes the check known to Check_MK. check_info is a dictionary of all check types. Each entry is a four-tuple with the following entries:
TestingIf we've got this right, we can try if Check_MK recognizes our new check: root@linux# check_mk -L | grep smart smart.temp tcp no yes SMART drive %s That is looking good. Now let's have a look at the agent output. We do this by calling an inventory on our new check type and will see the output of our debug command "print info": root@linux# check_mk -I smart.temp Eiger [['/dev/sda', '1', 'Raw_Read_Error_Rate', '0x002f', '200', '200', '051', 'Pre-fa il', 'Always', '-', '0'], ['/dev/sda', '3', 'Spin_Up_Time', '0x0027', '129', '12 7', '021', 'Pre-fail', 'Always', '-', '6541'], ['/dev/sda', '4', 'Start_Stop_Cou nt', '0x0032', '100', '100', '000', 'Old_age', 'Always', '-', '251'], ['/dev/sda ', '5', 'Reallocated_Sector_Ct', '0x0033', '200', '200', '140', 'Pre-fail', 'Alw ays', '-', '0'], ['/dev/sda', '7', 'Seek_Error_Rate', '0x002e', '200', '200', '0 00', 'Old_age', 'Always', '-', '0'], ['/dev/sda', '9', 'Power_On_Hours', '0x0032 ', '098', '098', '000', 'Old_age', 'Always', '-', '1497'], ['/dev/sda', '10', 'S pin_Retry_Count', '0x0032', '100', '100', '000', 'Old_age', 'Always', '-', ... As you can see from that output, Check_MK has already splitted up the output of the agent by whitespaces. Each line of agent output is transformed into a list of strings. The whole sections is a list of those lists. The inventory functionThat task of the inventory function is now to extract from this list of lists a list of items to be checked on that particular host. In our case we want to create a check for each hard disk providing a Temperature_Celsius field. The name of the field is in the third column. The name of the disk is in the first column. A simple loop will do: smart
def inventory_smart_temp(checkname, info):
# begin with empty inventory
inventory = []
# loop over all output lines of the agent
for line in info:
disk = line[0] # device name is in the first column
field = line[2] # SMART variable name in the third
if field == "Temperature_Celsius":
# found an interesting line, add to inventory
inventory.append( (disk, "", None) )
return inventory
Our inventory function looks for lines containing Temperature_Celsius and adds their first column - the disk device - to the inventory. But the inventory is not a single list if items. Each entry is a triple of:
Let's now try our inventory on a host with two hard disks: root@linux# check_mk -I smart.temp Eiger smart.temp 2 new checks If something goes wrong, try calling check_mk with the option --debug. That will not catch Python exceptions but let them through:
root@linux# check_mk --debug -I smart.temp Eiger
Traceback (most recent call last):
File "/usr/share/check_mk/modules/check_mk.py", line 2883, in <module>
make_inventory(checkname, args)
File "/usr/share/check_mk/modules/check_mk.py", line 1505, in make_inventory
inventory = inventory_function(checkname, info) # inventory is a list of
File "/usr/share/check_mk/checks/smart", line 5, in inventory_smart_temp
this_is_rubbish
NameError: global name 'this_is_rubbish' is not defined
The check functionDuring normal operation of Nagios the inventory function is never called. Instead our check function is called for each item to be checked. It's main task is deciding about the service's status. We can first try our dummy function with our two newly inventorized services on our test host Eiger. We do not need Nagios for that but simply call check_mk with the options -n and -v: root@linux# check_mk -nv Eiger Check_mk version 1.1.0beta4 SMART drive /dev/sda Sorry - not implemented SMART drive /dev/sdb Sorry - not implemented OK - Agent Version 1.0.36, processed 2 host infos That looks good, but it's just a dummy output. Let's now do some real coding. We want to make the check critical, if the disk's temperature is more than 40 degrees and warning, it is more than 35. Our first task is to find the correct line in the agent output. We code a loop which is similar to that one in the inventory function. But remember: now we are looking for one specific item (a hard disk device). The line we are looking for has the item in its first column and the word Temperature_Celsius in the third. smart
def check_smart_temp(item, params, info):
# loop over all lines
for line in info:
# is this our line?
if line[0] == item and line[2] == "Temperature_Celsius":
Now remember the output of our agent. The current value of the smart item is in the eleventh column (and thus has index 10). We take that value and convert it into an integer:
celsius = int(line[10])
Now we can check our hard coded levels. We also want the current temperature to be part of the plugin output:
if celsius > 40:
return (2, "CRITICAL - Temperature is %dC" % celsius)
elif celsius > 35:
return (1, "WARNING - Temperature is %dC" % celsius)
else:
return (0, "OK - Temperature is %dC" % celsius)
What, if we cannot find our disk at all in the output? The disk might be missing. Or the user might have manually configured a check for a non-existing disk. We simply return an UNKNOWN state in that case. return (3, "UNKNOWN - disk %s not found in agent output" % item) Here is our complete check so far in one piece: /usr/share/check_mk/checks/smart
def inventory_smart_temp(checkname, info):
inventory = []
for line in info:
disk = line[0]
field = line[2]
if field == "Temperature_Celsius":
inventory.append( (disk, "", None) )
return inventory
def check_smart_temp(item, params, info):
for line in info:
if line[0] == item and line[2] == "Temperature_Celsius":
celsius = int(line[10])
if celsius > 40:
return (2, "CRITICAL - Temperature is %dC" % celsius)
elif celsius > 35:
return (1, "WARNING - Temperature is %dC" % celsius)
else:
return (0, "OK - Temperature is %dC" % celsius)
return (3, "UNKNOWN - disk %s not found in agent output" % item)
check_info['smart.temp'] = \
(check_smart_temp, "SMART drive %s", 0, inventory_smart_temp)
Now we can try a real check: root@linux# check_mk -nv Eiger Check_mk version 1.1.0beta4 SMART drive /dev/sda WARNING - Temperature is 40C SMART drive /dev/sdb CRITICAL - Temperature is 41C OK - Agent Version 1.0.36, processed 2 host infos Check parametersHard coding levels like 35 and 40 degrees is surely not the way to go if your check will be of any use. What we need are parameters. From a technical point of view a check parameter is an arbitrary Python value. That can be a single value, a tuple or may be even a complex python data object. Most checks use tuples to group several values into one parameter. Our check shall have two parameters: the level for warning and critical. Those levels shall be two integer numbers group together into a pair (or a 2-tuple as some people might say). So if our check function is called with such a pair of integers, we can make use of Python's nice unpack operation two extract our levels: def check_smart_temp(item, params, info): # unpack check parameters warn, crit = params The rest is easy. We simply replace 35 and 40 with the two new variables:
for line in info:
if line[0] == item and line[2] == "Temperature_Celsius":
celsius = int(line[10])
if celsius > crit:
return (2, "CRITICAL - Temperature is %dC" % celsius)
elif celsius > warn:
return (1, "WARNING - Temperature is %dC" % celsius)
else:
return (0, "OK - Temperature is %dC" % celsius)
return (3, "UNKNOWN - disk %s not found in agent output" % item)
If you are testing this change, the result might bit somewhat surprising at the first look: Check_mk version 1.1.0beta4 SMART drive /dev/sda UNKNOWN - invalid output from plugin section <<<smart.temp>>> or error in check type smart.temp SMART drive /dev/sdb UNKNOWN - invalid output from plugin section <<<smart.temp>>> or error in check type smart.temp OK - Agent Version 1.0.36, processed 2 host infos A look into the autochecks directory where our inventorized checks are, clears up that thing: /var/lib/check_mk/autochecks/smart.temp-2009-11-06_16.34.56.mk
[
# === Eiger ===
("Eiger", "smart.temp", '/dev/sda', None), #
("Eiger", "smart.temp", '/dev/sdb', None), #
]
Our check is called with None as check parameter! And Python cannot unpack that into warn and crit. So we also need to change our inventory function such that it creates the checks with correct parameters. The inventory function must set correct default parametersBut what parameters shall we use for inventorized checks? The Check_MK way is to use a variable for that which can be configured in main.mk. The trick is not to enter the current value of that variable as parameters but the variable itself when doing inventory. Also important is to define that variable with a default value. Otherwise all users that do not define the variable in main.mk will run into an error, even those that do not use our the check. Here is an updated inventory function:
# set default value of variable (user can override in main.mk)
smart_temp_default_values = (35, 40)
def inventory_smart_temp(checkname, info):
inventory = []
for line in info:
disk = line[0]
field = line[2]
if field == "Temperature_Celsius":
# use default variable as parameter. Note the quotes!
inventory.append( (disk, "", "smart_temp_default_values") )
return inventory
We need to reinventorize our Test host. We delete the autochecks file and rerun check_mk -I: root@linux# rm /var/lib/check_mk/autochecks/smart.temp-2009-11-06_16.34.56.mk root@linux# check_mk -I smart.temp Eiger smart.temp 2 new checks A look into the newly created autochecks file will show, that our variable is now used as check parameter: /var/lib/check_mk/autochecks/smart.temp-2009-11-07_12.56.22.mk
[
# === Eiger ===
("Eiger", "smart.temp", '/dev/sda', smart_temp_default_values), #
("Eiger", "smart.temp", '/dev/sdb', smart_temp_default_values), #
]
Now our check should work again: root@linux# check_mk -nv Eiger Check_mk version 1.1.0beta4 SMART drive /dev/sda WARNING - Temperature is 40C SMART drive /dev/sdb CRITICAL - Temperature is 41C OK - Agent Version 1.0.36, processed 2 host infos It should be possible to set alternative levels in main.mk: main.mk smart_temp_default_values = (50, 60) A test shows, that the two checks are now OK: root@linux# check_mk -nv Eiger Check_mk version 1.1.0beta4 SMART drive /dev/sda OK - Temperature is 40C SMART drive /dev/sdb OK - Temperature is 41C OK - Agent Version 1.0.36, processed 2 host infos If a user wantis to change levels just for singular items, she or he can do that as usual by defining an explicit check in main.mk: main.mk checks += [ ( "Eiger", "smart.temp", "/dev/sda", (20, 30) ) ] Now one of our disks will get CRITICAL: root@linux# check_mk -nv Eiger Check_mk version 1.1.0beta4 SMART drive /dev/sda CRITICAL - Temperature is 40C SMART drive /dev/sdb OK - Temperature is 41C OK - Agent Version 1.0.36, processed 2 host infos Performance dataIf you are using a graphing tool like PNP4Nagios, you know that each Nagios checks can optionally output "performance data". That data can be used for visualizing numbers in round robin databases or other systems. Creating performance data with a Check_MK check is simple. You just need to:
The declaration is done by changing a 0 into a 1 here:
check_info['smart.temp'] = \
(check_smart_temp, "SMART drive %s", 1, inventory_smart_temp)
The third argument of the result tuple of the check function is a list of entries. Each entry is a tuple with the following components:
Only the variable name and the current value are mandatory. Insert and empty string if you want to skip an unneeded value. Trailing empty strings can be left out. The following example shows a check function returning a valid list of performance values:
check_foobar(item, params, info):
return (0, "OK - Foobar", [
( "size", 125 ), # simple value, no levels, no range
( "used", 88.5, "", "", 0, 100), # no levels, range is from 0 to 100
( "guzzi", -14.5, -20, -30), # warning at -20, crit at -30
( "argl", 66, 80, 90, 0, 100), # levels at 80/90, min/max at 0/100
])
Check_MK converts that list into standard Nagios syntax when sending the check information to Nagios. If you have activate direct RRD updates, Check_MK analyses the data itself and writes them into the correct RRD database. Performance data in our exampleOur temperature checks will yield one performance value: the current temperature. There is no minimal or maximal value available, but we will output the levels. Some graphing tools are able to visualize those levels in their graphs. Here is an updated version and final version of our complete check: smart
smart_temp_default_values = (35, 40)
def inventory_smart_temp(checkname, info):
inventory = []
for line in info:
disk = line[0]
field = line[2]
if field == "Temperature_Celsius":
inventory.append( (disk, "", "smart_temp_default_values") )
return inventory
def check_smart_temp(item, params, info):
# unpack check parameters
warn, crit = params
for line in info:
if line[0] == item and line[2] == "Temperature_Celsius":
celsius = int(line[10])
perfdata = [ ( "temp", celsius, warn, crit ) ]
if celsius > crit:
return (2, "CRITICAL - Temperature is %dC" % celsius, perfdata)
elif celsius > warn:
return (1, "WARNING - Temperature is %dC" % celsius, perfdata)
else:
return (0, "OK - Temperature is %dC" % celsius, perfdata)
return (3, "UNKNOWN - disk %s not found in agent output" % item)
check_info['smart.temp'] = \
(check_smart_temp, "SMART drive %s", 1, inventory_smart_temp)
When you try your check function, do not forget to add the option -p: it activates the output of performance data: root@linux# check_mk -nvp Eiger Check_mk version 1.1.0beta4 SMART drive /dev/sda CRITICAL - Temperature is 40C (temp=38;20;30;;) SMART drive /dev/sdb OK - Temperature is 41C (temp=39;50;60;;) OK - Agent Version 1.0.36, processed 2 host infos SNMP based checksSNMP based checks work exactly like the agent based ones with one exception: Instead of using a section from an agent's output you specify a list of subtrees in an SNMP MIB as data source. Check_MK will retrieve each of the subtrees with a separate snmp walk and combine the output to a table compatible to those used by the agent based checks. Finding the correct OIDsThe first step to implementing our own SNMP check is to find the interesting places of the whole tree of variables. Let's assume that you have found the following place in the tree. It's a part with information about local network interfaces: root@linux# snmpwalk -v1 -c public 192.168.56.2 ifTable IF-MIB::ifIndex.1 = INTEGER: 1 IF-MIB::ifIndex.2 = INTEGER: 2 IF-MIB::ifIndex.3 = INTEGER: 3 IF-MIB::ifDescr.1 = STRING: lo IF-MIB::ifDescr.2 = STRING: eth0 IF-MIB::ifDescr.3 = STRING: eth1 IF-MIB::ifType.1 = INTEGER: softwareLoopback(24) IF-MIB::ifType.2 = INTEGER: ethernetCsmacd(6) IF-MIB::ifType.3 = INTEGER: ethernetCsmacd(6) IF-MIB::ifMtu.1 = INTEGER: 16436 In order to be independent of MIB files to be installed on the Nagios server, let snmpwalk display numeric OIDs: root@linux# snmpwalk -On -v1 -c public 192.168.56.2 ifTable .1.3.6.1.2.1.2.2.1.1.1 = INTEGER: 1 .1.3.6.1.2.1.2.2.1.1.2 = INTEGER: 2 .1.3.6.1.2.1.2.2.1.1.3 = INTEGER: 3 .1.3.6.1.2.1.2.2.1.2.1 = STRING: lo .1.3.6.1.2.1.2.2.1.2.2 = STRING: eth0 .1.3.6.1.2.1.2.2.1.2.3 = STRING: eth1 .1.3.6.1.2.1.2.2.1.3.1 = INTEGER: softwareLoopback(24) .1.3.6.1.2.1.2.2.1.3.2 = INTEGER: ethernetCsmacd(6) .1.3.6.1.2.1.2.2.1.3.3 = INTEGER: ethernetCsmacd(6) .1.3.6.1.2.1.2.2.1.4.1 = INTEGER: 16436 .1.3.6.1.2.1.2.2.1.4.2 = INTEGER: 1500 .1.3.6.1.2.1.2.2.1.4.3 = INTEGER: 1500 .1.3.6.1.2.1.2.2.1.5.1 = Gauge32: 10000000 .1.3.6.1.2.1.2.2.1.5.2 = Gauge32: 10000000 .1.3.6.1.2.1.2.2.1.5.3 = Gauge32: 10000000 .1.3.6.1.2.1.2.2.1.6.1 = STRING: .1.3.6.1.2.1.2.2.1.6.2 = STRING: 8:0:27:38:45:b1 .1.3.6.1.2.1.2.2.1.6.3 = STRING: 8:0:27:f4:e2:e .1.3.6.1.2.1.2.2.1.7.1 = INTEGER: up(1) .1.3.6.1.2.1.2.2.1.7.2 = INTEGER: up(1) .1.3.6.1.2.1.2.2.1.7.3 = INTEGER: up(1) .1.3.6.1.2.1.2.2.1.8.1 = INTEGER: up(1) .1.3.6.1.2.1.2.2.1.8.2 = INTEGER: up(1) .1.3.6.1.2.1.2.2.1.8.3 = INTEGER: up(1) The longest common prefix of all relevant OIDs is .1.3.6.1.2.1.2.2.1 in our case. We'll remember that for later. Now we have to decide, which subtrees are needed. In our case we will use the following sub OIDs:
Now let's create a check file with the name mynics. We begin with the declaration of the SNMP data source: /usr/share/check_mk/checks/mynics snmp_info["mynics"] = ( ".1.3.6.1.2.1.2.2.1", [ "2", "3", "8" ] ) This line declares the data source mynics to be of type SNMP. Check_MK will fetch the three sub-OIDs 2, 3 and 8 below the prefix .1.3.6.1.2.1.2.2.1. As in our previous example, we first write dummy functions and output the data arriving at the inventory function. Here is a complete check implementation: mynics
def inventory_mynics(checkname, info):
print info
return []
def check_mynics(item, params, info):
return (3, "UNKNOWN - not yet implemented")
check_info["mynics"] = \
(check_mynics, "NIC %s", 0, inventory_mynics)
snmp_info["mynics"] = ( ".1.3.6.1.2.1.2.2.1", [ "2", "3", "8" ] )
A test inventory of the same host we did the walk on shows this debug output: root@linux# check_mk -I mynics 192.168.56.2 [['lo', 'softwareLoopback', 'up'], ['eth0', 'ethernetCsmacd', 'up'], ['eth1', 'ethernetCsmacd', 'up']] You might have noticed from this that Check_MK transforms the SNMP output such that each item is in one line (one list). So from now on everything works exactly like at agent based checks. Our inventory function will skip non-Ethernet NICs and output only NICs that are currently up. The loop uses the fact that each element of the list has exactly three members and upacks them into the three variables nic, type and state:
def inventory_mynics(checkname, info):
inventory = []
for nic, type, state in info:
if type == "ethernetCsmacd" and state == "up":
inventory.append( (nic, "", None) )
return inventory
An inventory will now find two checks for our host: root@linux# check_mk -I mynics 192.168.56.2 mynics 2 new checks The check function is easy and needs no parameters. We just make sure that the NIC is really up:
def check_mynics(item, params, info):
for nic, type, state in info:
if nic == item:
if state == "up":
return (0, "OK - link is up")
else:
return (2, "CRITICAL - link is " + state)
return (3, "UNKNOWN - NIC not found")
Now we can try and check the host: root@linux# check_mk -nv 192.168.56.2 Check_mk version 1.1.0beta4 Fetching OID .1.3.6.1.2.1.2.2.1 from IP 192.168.56.2 with snmpbulkwalk -v2c NIC eth0 OK - link is up NIC eth1 OK - link is up CRIT - Cannot get data from TCP port 192.168.56.2:6556: (111, 'Connection refused') The error message in the last line has nothing to do with our check. It's just that Check_MK tries to contact the normal agent in parallel to SNMP. This can easily be fixed by declaring the host as being SNMP-only in main.mk. Simply add the host tag snmp: main.mk all_hosts = [ "192.168.56.2|snmp", ] Still missing in this tutorial
|
| ||||||||||||||||||||||||||||||||||||