Writing checks for Check_MK


This article is obsolete and may be not valid anymore!

1. Why not use local checks or MRPE?

Using local checks or MRPE for adding your own self-written checks to Check_MK is easy. Even inventory and performance data are supported. So why should you want to write native checks for Check_MK? Well, there can be several reasons:

  • You want to define your check parameters in main.mk or WATO rather then locally on each target host.
  • You want to exploit currently unused information sent already by an agent (for example Windows' numerous performance counters)
  • You want to implement SNMP based checks.
  • You want your check to be easily ported to other installations of Check_MK.
  • You want your check to become official part of Check_MK.
  • You are simply interested in how Check_MK works.

If one or more of those issues are relevant for you, then you'll find all information needed for writing your own checks in this article and a couple of further articles.

2. Do I have to learn Python?

Well, to be honest: yes - at least to a certain basic degree. People have suggested to change Check_MK such that checks can be written in other languages as well. I understand this request very well. But from a technical point of view I cannot image how such an integration could be done in a clean, simple and performant way. Check_MK's checks are not standalone programs or scripts but are closely integrated into the check mechanism. They need to have access to some of Check_MK's internal functions. In the end, for each host one Python program will be created by combining a base and all checks used by this host into one new program. This feature saves about 75% of the CPU resources when compared to directly calling check_mk for checking.

On the other hand, Python is a language which is cleanly designed, elegant and easy learn. I'm sure you'll like it once you have some experience with it (even if you dislike its style of indentation).

Within this tutorial I assume that you have some basic knowledge of Python. Looking at the code of some of the existing checks might help if you are new to Python.

3. How Check_MK's checks work

Each check consists at least of the following three components:

  • a unique name
  • a data source definition
  • a check function

Two further components are optional but strongly recommended:

  • an inventory function
  • a manual page

If your check outputs performance data, then two further components form a perfect check:

  • a PNP graph template
  • a Multisite Perf-O-Meter

3.1. The data source

Everything begins with the data source, i.e. source of the data the check operates on. Currently there are two different kinds of data sources: agent sections (tcp), and SNMP queries (snmp). An agent section is a part of the output of an agent, for example the output of the Linux command df. An SNMP based data source returns data retrieved by one or several SNMP queries on certain OIDs. Both data sources are presented to the check function as a table (a Python list of lists). We will call these data the "agent data".

3.2. The agent plugin

If you write a TCP based check you need a plugin for the agent. This is a typically small executable script which is placed in the plugins directory of the agent. It uses standard operating system methods for retrieving the data of interest.

It is important to understand the philosophy of Check_MK at this point. The plugin should:

  • use standard operating system commands generally available
  • remove unneccessary output (such as headings)
  • not remove any of the actual data, even if its not needed in the first version of your check
  • not decide about the status of a check
  • not process the data by any means (other than removing garbage output)
  • not break anything on hosts that do not support the used commands
  • not run longer than a couple of seconds

3.3. The inventory function

If you want your check to support inventory (which is always a good idea), you have to supply an inventory function. This function examines the agent data of a host and creates a list of all items to be checked on this specific host. An item uniquely identifies a thing to be checked on a host within that type of check. Some examples of items are:

  • The check "df" uses the mount point as the item, for example "/var/log".
  • The check "services" uses the Windows service name as its item, for example "TnsListener".
  • The check "ps" uses an artificial user supplied item.
  • The check "local" uses the service description as output by the local check.

Some checks do not need to distinguish items. This is because the thing they check does not exist more than once on a host. An example is the check mem. But Check_MK always requires an item, so these checks simply use None as the item.

Please note that this does not mean you cannot do an inventory on mem. It's just that the number of items the inventory returns is at most one. In some cases it is even zero: when the agent output does not contain the information needed for the check. This is a very useful feature and enables the Nagios administrator to automatically perform the right checks on the right operating systems.

Your inventory function does not need to worry whether a certain item was already configured manually or detected by a previous inventory. Check_MK handles this in a general way, and makes sure that only newly detected items are added to the list of services.

3.4. The check function

When an actual check of a host is done, all services for this host will be checked in turn. When it's your check's turn, Check_MK will call your check function for each item that is automatically or manually configured for the check and host. Your function will be provided with the checked item, the (optional) parameters of the check and the agent data.

The check function then

  • extracts the information relevant for the item in question from the agent data
  • decides on the Nagios status of the service
  • creates one line of text output for Nagios
  • optionally computes performance data
  • returns status, line of text and performance data as a Python tuple

This is very similar to what standard Nagios plugins do, with the important difference, that our check is already provided with data from the agent and does not have to retrieve it by itself.

3.5. The manual page

If you want to pass your check along to others, a manual page for the check is strongly recommended. Check_MK has its own concept and syntax of check manuals. You do not need to learn NROFF syntax or stuff like that. A check manual is a relatively simple text file named after the check and usually installed in /usr/share/doc/check_mk/checkman.

Please read our article of how to write man pages for further information.

3.6. The PNP template

If you check delivers performance data (i.e. not only returns a status and an explanatory text but also values like memused=77364), you should provide a template for PNP4Nagios which nicely displays the evolution of the value.

If you are using another graphing tool, or no graphing tool at all, then a PNP template is not useful for you - of course. You only need one if you want your check to be officially part of Check_MK.

3.7. Perf-O-Meter

The same holds for the Perf-O-Meter for Multisite. People like Perf-O-Meters. If you do not use Multisite then Perf-O-Meters are of no use to you. Checks wanting to be part of Check_MK must provide Perf-O-Meters (even if some older checks of Check_MK still do not have ones either).

4. Let's jump to practice: Preparing the agent

Let's now write our first check. For a start we offer two tutorials.