Services


1. Introduction

Services are the core of a monitoring system. Each one represents an important cog in your complex IT landscape. The usefullness of the complete monitoring stands or falls depending on how accurately and usefully the services have been configured. Finally, the monitoring should reliably notify whenever a problem becomes apparent somewhere, but should certainly also minimise false or useless alarms.

Check_MK demonstrates possibly its greatest strength when configuring services: it possesses an unrivalled and very powerful system for an automatic detection and configuration of services. With Check_MK there is no need to define every single service via templates and individual allocations. Check_MK can automatically and reliably detect the list of services to be monitored, and first and foremost, keep it up to date. This not only saves a lot of time - it also makes the monitoring accurate. It ensures that the daily changes in a processing centre are always promptly covered and that no important service goes unmonitored.

The service discovery in Check_MK is based on an important basic principle: the separation of what from how:

  • What should be monitored? → The system/var on the host lnxsrv015
  • How should it be monitored? → at 90% used space WARN, at 95% CRIT

What is automatically detected by the service discovery. It is a combination of the host name (lnxsrv015), the check plugin (df: data system check in Linux) and the item (/var). Check plugins that can create a maximum of one service on a host do not require an item (e.g., the check plugin for CPU utilisation). The results from a service discovery are presented in a table as shown below:

Host Checkplugin Item
lnxsrv015 df /
lnxsrv015 df /var
lnxsrv015 cpu.util
... ... ...
app01cz2 hr_fs /
... ... ...

The how - thus the thresholds / check parameters for the individual services - is configured independently via rules. You can e.g., define a rule that monitors all data systems with the mount point /var, and the 90%/95% thresholds, without needing to think about on which hosts such a data system even exists. This is what makes configuring with Check_MK so clear and simple!

A few services can't be installed using an automatic discovery. Among these are e.g., checks queried per HTTP-specified websites. These are created via rules, as you can learn about in below.

2. Host services in WATO

2.1. Incorporating a new host

Once you have added a new host in WATO the next step is to call up the list of services. With this action the automatic service discovery takes place for the host. You can also call up this list at any time later in order to restart the discovery or to carry out modifications to the configuration. There are various ways of opening the service list:

  • via the button or in the host details in WATO
  • via the symbol in the list of hosts in a folder in WATO
  • via the Edit services entry in a host's service's context menu Check_MK Discovery

When a host has been newly-incorporated its services have not yet been configured, and therefore all discovered services appear in the Available (missing) services category:

The usual method is to simply save with , followed by an Activate changes - and subsequently the host will be in monitoring.

2.2. Adding missing services

For a host that is already being monitored this list looks different. Instead of Available (missing) services you will see Already configured services. If Check_MK detects something on a host that is not being monitored that however should be monitored, then the list will look something like this:

The above illustration shows a Windows host on which a filesystem D:/ has been found which is not currently being monitored! In this case it could be that a colleague has created a new LUN. A click on simply adds all of the missing services so that the monitoring is again complete. If you only want to add some of the missing services, you can alternatively select them via the check boxes and save them with .

2.3. Vanished services

In processing centres things can not only newly appear, but also disappear. A data base instance can be discontinued, a LUN unmounted, a data system removed, etc. Check_MK automatically recognises such services as vanished. In the Service List e.g., it will look like this:

The simplest way to be free of these services is with a click on the button that appears in such a case. Attention: The reason for the disappearance can of course be due to a problem! The disappearance of a filesystem can also mean that due to an error it could not be mounted. The monitoring is after all there for such cases! You should only remove the service when you really know that this no longer needs monitoring.

2.4. Removing unwanted services

You won't necessarily want to monitor everything that Check_MK finds. The discovery works in a target-oriented way of course, and it can exclude much unnecessary data in advance. Nonetheless, how can Check_MK know e.g., that a particular data base instance has been set up only 'to play around with', and is not in production? There are two ways of eliminating such services:

Temporarily disabling services

Use the check boxes to simply deselect the services that are not to be monitored, and then save with . And naturally, don't forget the usual Activate changes...

This is however only intended for temporary and smaller actions, as the services deselected in this way will be highlighted as missing by Check_MK, and the Discovery Check (which we will show you later below) will likewise be unhappy. In any case, that would simply be too much work and not really practical in an environment with x-thousand services...

Permanently disabling services

It is far more elegant and enduring to permanently ignore services with the aid of the Disabled services rule set. Here you can not only exclude individual services from monitoring, but also formulate rules like "I don't want to monitor filesystems on test systems that begin with /mnt/dsk". The symbol in the service bar simplifies the creation of such new rules, so that you don't need to take the longer route via the Host & Service parameters WATO-Modul:

Takes you directly to the creation of a new rule that will be prepopulated for the current folder, host and service:

You can very easily generalise this rule for all hosts: simply remove the 'check' at Specify explicit host names and - importantly - set the Folder to Main directory. Naturally, as always, you can formulate all other desired conditions in the Rules.

Once you have saved the rules, and return to the host's service list, you will discover the new table Disabled services (configured away by admin), as shown below:

2.5. Refreshing services

There are a number of plugins that notice things during a discovery. For example, the plugin for network interfaces notices the speed set on the interface during the discovery. Why? In order to be able to warn you in case it changes! It is rarely a good sign when an interface is sometimes set to 10MBit, and sometimes to 1GBit - this could rather be an indication of a defective autonegotiation.

What happens when this change is desired and is to be accepted as OK from now on?

Either - remove the service via the check box (you will need to save after the removal), and re-add it later.

Or - click on - with this all of the host's services will be refreshed and newly-identified. This is naturally much easier, but only when you don't want to keep individual services in an error state.

2.6. Special conditions with SNMP

With SNMP devices, internally the service discovery is achieved quite differently from those monitored by Check_MK agents. With those Check_MK can simply look at the agent's output and - with the help of individual check plugins - find the relevant items. Somewhat more work is required for SNMP. During a discovery Check_MK can in fact conduct an SNMP-Walk over the complete device on the lookout for interesting OIDs within. There are however devices for which a single discovery can take several hours!

Check_MK therefore proceeds more intelligently. It initially calls up only the first two OIDs from the device - (sysDescr and sysObjectID) - and depending on their data, around ten further OIDs. Based on these results, each of the more than 600 standard SNMP check plugins determines whether the device can support the plugin. This phase is referred to as the SNMP-Scan and it results in a list of check plugins.

In a second step the actual discovery runs. The plugins found call up the precise data that they need using the local SNMP-Walks, and from this data determine which services are to be monitored. The data retrieved here is also exactly the same as that which later will be regularly retrieved for the monitoring.

For devices in a LAN the whole procedure generally doesn't take long - more than a few seconds is really an exception. If you monitor devices over WAN-routes with a high latency however, the complete scan can take several minutes. It would be very impractical if you had to wait for so long every time you opened the page for the services.

Therefore WATO normally bypasses the scan and performs the discovery only with the host's production check plugins. The SNMP-Walks are then available via the standard monitoring as cache data, and the discovery takes only fractions of a second. In fact, you can only find new items from existing plugins in this way (e.g. new switchports, hard drives, sensors, VPNs, etc.), but not completely new plugins.

The button forces an SNMP scan and retrieval of fresh data via SNMP. Completely new plugins also find services in this way. A delay can occur with slow-responding devices.

3. Bulk Discovery - simultaneous discovery on multiple hosts

If you want to perform a discovery for multiple hosts with a single action, you can make the work easier with WATO's [wato_hosts#bulk_operations|Bulk operations]. Firstly, choose the hosts on which the discovery is to be performed. You have several options for this:

  1. In a folder, select the check boxes for individual hosts and press
  2. Search for hosts with Host search, and then press in the search results
  3. Click on in a folder

With the third variant you can also perform the service discovery recursively in all subfolders. In all of the above three options the next step will take you to the following dialogue:

In Mode you will find exactly the same options as in the WATO service list that we have previously discussed.

Under Selection you can again control the host selection. This is primarily sensible if you have selected these via the folder rather than via the check boxes. Most of the options are intended to accelerate the discovery:

Only include hosts that failed on previous discovery Hosts for which an earlier service discovery via bulk operations has failed (e.g. because the host was not accessible at the time), are flagged with the symbol. This option allows the discovery to be repeated only for these hosts.
Only include hosts with a failed discovery check This restricts the discovery to such hosts for which the Discovery Check failed. When you work with Discovery Check this is a good method for greatly-accelerating a discovery on many hosts. The combination with the Refresh all services (tabula rasa) option makes less sense in this case however as it can distort the status of existing services.
Exclude hosts where the agent is unreachable Hosts that are not accessible cause long delays during discovery due to connection timeouts. This can greatly-impede a discovery's performance on larger numbers of hosts. If the hosts are already in monitoring - and it knows that the hosts are DOWN - you can bypass them here and thus avoid the timeouts.

The Performance Options are predefined so that a Full Scan is always performed on SNMP devices. If you are not interested in new plugins a discovery can be greatly-accelerated by not choosing this option. Working without cache data is only advisable in exceptional cases. Especially for hosts that are monitored using Check_MK agents - as luck will have it - it can occur that log messages are “consumed” by the discovery and not be received by the production check.

The 10 set in Number of hosts to handle at once means that ten hosts are always processed in one action. This is achieved internally with a HTTP request. If you encounter timeout problems due to some hosts requiring a long time to discover, you can try setting this number lower (to the detriment of the total time required).

As soon as you confirm the dialogue the procedure will start and you can observe its progress - and also interrupt it if necessary:

4. Check parameters in services

Many of the check plugins can be configured using parameters. The most common practice is the setting of thresholds for WARN and CRIT. Parameters can be composed much more complicatedly however, as shown in this example of temperature-monitoring with Check_MK:

The check parameter for a service is composed in three steps:

  1. Every plugin has a Default value for the Parameter.
  2. Some plugins set values during a discovery (see above).
  3. Parameters can be set via rules.

Parameters from rules have priority over those set by a discovery, and these in turn have priority over default values. For complex parameters in which individual sub-parameters are set using check boxes (as with temperatur for example), these priorities apply separately for each sub-parameter. So, if you set only one sub-parameter via rules, the others retain their respective default values. In this way you can e.g., activate the trend calculation of the temperatures with one rule, and with another rule set the temperature threshold value for a physical sensor device. The complete parameter set will then be composed from both rules.

The exact parameters a service eventually has can be found in the service's parameter page. This can be accessed via the symbol in the host's service list. If you wish to see the parameters from all services directly in the service table, you can show it with the button. It will look something like this:

5. Customising the service discovery

We have already shown how you can configure the service discovery to suppress the displaying of undesired services earlier above. In addition there are further rule sets for a number of plugins that influence the behaviour of the discovery with these plugins. Not only are there settings for omitting items, there are also those that affirmatively find items, or collect them into groups. The naming of items is sometimes also an issue - e.g. for those switchports where you can decide on a description or alias to be used as an item (which will be used in the service name) instead of its interface ID.

All rule sets that are relevent for service discovery can be found under Host & Services parameters ➳ Parameters for discovered services ➳ Discovery - automatic service detection. Please don't confuse these rule sets with those intended for parameterising the actual services. A number of plugins have two rule sets in fact - one for the discovery, and one for the parameters. Later, a couple of examples.

5.1. Monitoring of processes

It would make little sense for Check_MK to simply define a service to monitor every process found on a host. Most processes are either of no interest or are only temporarily present. At the very least there are hundreds of processes running on a typical Linux server.

For monitoring services you therefore need to work with Manual checks or - and this is much more elegant - by using the rule set Process discovery to tell the service discovery which processes it should be on the lookout for. In this manner you can always allow a monitoring to be instituted automatically when a definitely interesting process is found on a host.

The following image shows a rule in the Process discovery rule set which searches for processes that execute the program /usr/sbin/apache2. In this example a service (Grab user from found processes) will be created for every different operating system user for whom such a process is found. The service will be named Apache %u, where %u will be replaced by the user name. For the threshold the number of process instances will be set to 1/1 (minimum) and 30/60 (maximum) respectively:

Please note, that the predefined thresholds are referred to as Default parameters for detected services. You can assign these - and likewise all other services - via rules. As a reminder: the above rules configure the service discovery - the what. If the services are present for the first time, the rule chain State and count of processes is responsible for the thresholds.

The fact that you can set thresholds during a discovery is an aid to convenience. There is a catch though: changes to the discovery rule only take effect with the next discovery. If you change thresholds you will need to run a new discovery. If, however, you only use the rule to discover the services (the what), and the rule set State and count of processes for the how, then you won't have this problem.

Further information on process discovery can be found in the online help for this rule set.

5.2. Monitoring services under Windows

The discovery and parameterising of the monitoring of Windows services is analogous to the processes and is controlled via the rule sets Windows Service Discovery (what) and Windows services (how) respectively. Here is an example of a rule that watches out for two services:

Exactly as for the processes, here the service discovery is also only one option. If, on the basis of host characteristics and folders, you can formulate precise rules for hosts on which specific services are to be expected, then you can also work with manual services. This is therefore independent of the actual current situation - it can however require considerably more effort, as under these circumstances you need many rules in order to exactly describe which service is to be expected on which host.

5.3. Monitoring of switch ports

Check_MK uses the same logic for monitoring network interfaces on servers and ports on ethernet switches. With switch ports the existing options for controlling the service discovery are especially interesting, even though (in contrast to the processes and Windows services) the discovery initially functions without rules. That is to say, by default Check_MK automatically monitors all physical ports that currently have an UP state. The applicable rule set is called Network Interface and Switch Port Discovery and offers numerous setting options that are only briefly described here:

The following options are the most important:

  • The use of the Description or the Alias in service names
  • The restriction or expansion of the types or names of interfaces being monitored

6. Setting-up services manually

There are some situations in which an automatic service discovery would make no sense. This is always the case if you want to force compliance with a specific guideline. As we saw in the previous chapter, you can allow the monitoring of Windows services to set itself up automatically when these are found. What happens when the absence of such a service presents a problem? For example:

  • A particular virus scanner should be installed on every Windows host.
  • NTP should be configured on every Linux host.

In such cases you can install the services manually. The starting point for this is the Manual Checks WATO module. Underlying this is a collection of Rule sets which have exactly the same names as the rule sets used for configuring the parameters for these checks.

The rules differ in two points however:

  • These are rules for hosts, not for services. The services will be created by the rules
  • Since no discovery takes place, you must select the check plugin to be used for the check

The following example shows the body of the State of NTP time synchronisation rule under Manual Checks:

Alongside the thresholds, here you set the check plugin (e.g. chrony or ntp.time). For check plugins that require an item you must also specify these. For example, this is necessary for the <> plugin, which requires the details of the data base SID to be monitored:

A manual service defined in this way will be installed on all hosts to which these rules apply. There will now be three possible conditions for the actual monitoring:

  1. The host is correctly installed and the service opens OK.
  2. The agent notifies that the requested service does not run or has a problem. The service then flags CRIT or UNKNOWN.
  3. The agent provides no information at all, e.g., because NTP is not even installed. The service then remains in PEND and the Check_MK service goes into WARN with the notice that the relevent section in the agent data is missing.

You will never require most of the rule sets in the Manual Checks module, they are only present for the sake of completeness. The most common cases for manual checks are:

  • Monitoring of Windows services (Rule set: Windows Services)
  • Monitoring of processes (Rule set: State and count of processes)

7. The discovery check

In the introduction we promised that Check_MK not only detects the list of services automatically, it can also keep it up to date. It would also be natural to have the possibility of manually running a bulk discovery for all hosts from time to time.

7.1. Automatic check for unmonitored services

Much better for this however is a regular Discovery Check, which is set up automatically on new instances (from Check_MK version 1.2.8). This service exists for every host and will log a warning whenever it finds unmonitored items:

The details of unmonitored or vanished services can be found in the Long output of check plugin in the details of the service:

The host's server list in WATO can be easily accessed via the Discovery Check's context menu using the Edit services entry.

If your instance has been updated from an older version you must install this check manually. The installation and parameterising of the Discovery Check is very simply done using the Rule set Periodic service discovery. In the rule's parameter area you have the following installation options: BI:periodic_service_discovery.jpg

With SNMP devices, alongside the interval in which the check is to be run, and the monitoring state for cases of unmonitored or vanished services, you can also select whether a SNMP-Scan should take place.

7.2. Adding services automatically

Missing services can be added automatically to the Discovery Check. To this end activate the Automatically update service configuration option, which will make further options available.

Alongside the additions, in Mode you can also choose to delete superfluous services, or even to delete all existing services and perform a complete new discovery (Refresh). Both options should be used carefully! A vanished service can indicate a problem! The Discovery Check will simply delete such a service and lull you into thinking everything is in order. The refresh is especially risky. For example, the check for switchports will only take ports that are 'up' into the monitoring. Ports with a status of 'down' will be perceived as vanished and quickly deleted from the Discovery Check!

A further problem needs to be considered: adding services or even the automatic Activate Changes can distract you - the admin - when you are performing a configuration. It can theoretically occur that while you are working on rules and settings, in that moment a discovery check activates your changes. The WATO can only always activate all changes! In order to preclude such situations you can reschedule the time for this function, e.g. to nighttime. The above image shows an example of this.

The Group discovery and activation for up to setting ensures that not every single service that has been newly found immediately triggers an Activate Changes - rather there will be a specified waiting time so that multiple changes can be activated in a single action. Even if the discovery check is set to an interval of two hours or more, this only applies to each separate host. The checks don't run simultaneously for every host - which is a good thing as a discovery check requires significantly more resources than a normal check.

8. Network checks and other active checks

8.1. Network checks

As the shown in the article on the Monitoring agents, as well as queries via SNMP and via the Check_MK agents it is not the case that you want to monitor network services directly. Alongside the normal PING that Check_MK establishes for host monitoring anyway, the most common situation is a check that accesses and monitors a website via HTTP.

Such services cannot be set up by Check_MK automatically. You can however install them very easily via Rules, which you can find under Host & Service parameters ➳ Active checks.

Similar to the manual checks, these are host rules that create services. And likewise the principle applies that not only will the first applicable rule be evaluated, but rather all of those that match the host. Only in this way can you, for example, install multiple HTTP checks on a host.

During monitoring these checks will be executed by invoking Nagios-compatible plugins. Check_MK additionally provides all of the plugins from the Monitoring plugins project, which also includes the check_http plugin. There are also a number of further plugins that have been developed in the Check_MK project.

Check_MK assists you with comfortable entry masks when invoking plugins, so that you don't need to study the numerous command line options in the individual plugins. Here for example is the entry mask for the checking of DNS using check_dns:

From the selected options, at the next Activate changes the appropriate command line will be created for the plugin and a suitable service installed.

8.2. Using own plugins

You can also use your own Nagios-compatible plug ins with Check_MK, either ones that you have written yourself, or have found somewhere on the internet. The procedure is as follows:

Write a script which outputs a line of text and which ends with one of these exit codes: 0 = OK, 1 = WARN, 2 = CRIT or 3 = UNKNOWN:

#!/bin/bash
echo "I am a self-written check and I feel well."
exit 0

Copy the plugin into the ~/local/lib/nagios/plugins folder in your instance.

OMD[mysite]:~$ cp check_foo ~/local/lib/nagios/plugins

Make it executable:

OMD[mysite]:~$ chmod 755 ~/local/lib/nagios/plugins/check_foo

Then create a rule in the {{Active Checks|Classical active and passive monitoring checks}} rule set. Choose a suitable service name. You don't have to enter an absolute path in the command line, rather just the actual command line for your plugin. The macros $HOSTNAME$ und $HOSTADDRESS$ are always available for this. A complete overview of all available macros can be found here. The check_foo from above naturally needs no call argument...

If your plugin produces performance data that you want to further process, activate the Performance data check box. With the Internal command name field you can optionally specify a name for the command definition for the core. From this, in the  Check_MK Raw Edition the graphing tool PNP4Nagios determines which template to use for the graph. The Freshness is only for passive services and is unnecessary here.

Following an Activate Changes a new service appears on every host to which the rule applies. That this is an active service can be recognised by the green in the context menu, via which you can trigger an immediate execution of the check. The result:

9. Passive services

Passive services are those that are not actively initiated by Check_MK, rather by check results regularly channelled from external sources. This generally occurs via the core's command pipe. Here is a step-by-step procedure for creating a passive service:

Nextly, you need to notify the core of the service. This is done with the same rule set as in your own active checks, except that you omit the Command line:

The image also shows how you can verify if check results are being regularly received. If these fail to appear for longer than ten minutes then the service will be automatically flagged as UNKNOWN.

After an Activate Changes the new service will start its life in the PEND state:

Sending the check result now takes place on the command line via an echo of the PROCESS_SERVICE_CHECK_RESULT command in the ~/tmp/run/nagios.cmd command pipe.

The syntax conforms to the usual Nagios conventions - including a current time stamp in square brackets. As the argument with the command you need the host name (e.g., myhost123) and the selected service name (e.g., BAR). The two subsequent arguments are again the status (0 ... 3) and the plugin's output. The time stamp is created with $(date +%s):

OMD[mysite]:~$ echo "[$(date +%s)] PROCESS_SERVICE_CHECK_RESULT;myhost123;BAR;2;Something bad has happened" > ~/tmp/run/nagios.cmd

The service now immediately shows its new status:

If you are familiar with the Nagios NSCA tool, you can continue using it with Check_MK as well. Activate the NSCA receiver with omd config, and as needed modify the NSCA configuration, which is found under etc/nsca/nsca.cfg:

OMD[mysite]:~$ omd stop
OMD[mysite]:~$ omd config set NSCA on
OMD[mysite]:~$ omd config set NSCA_TCP_PORT 5667
OMD[mysite]:~$ vim etc/nsca/nsca.cfg
OMD[mysite]:~$ omd start

The system is now ready to receive passive check results via NSCA.

10. Service discovery on the command line

A GUI is fine, but the good old command line is sometimes still practical - whether it is for automation or it simply enables an experienced user to work quickly. A service discovery can be triggered with the cmk -I command on the command line. There are a couple of variables in this process. For all of these the -v option is recomended, so that you can see what happens. Without -v Check_MK behaves like the good old traditional Unix - as long as everything is ok it says nothing.

With a simple '-I' search for all hosts by new services:

OMD[mysite]:~$ cmk -vI
switch-cisco-c4000:
nothing new

switch-cisco-c4500:
nothing new

switch-cisco-c4500-2:
nothing new

switch-cisco-c4500-3:
 nothing new

With the -I you can also enter one or more host names in order to only discover these. This additionally has a second effect - whereas an -I on all hosts basically works only with cached data, Check_MK always works with fresh data from an explicitly-nominated host!

OMD[mysite]:~$ cmk -vI myhost123

With the --cache and respectively --no-cache options you can explicitly determine the use of cache.

Additional outputs can be received with a second -v. With SNMP-based devices you can even see every single OID retrieved from the device:

OMD[mysite]:~$ cmk -vvI myswitch123
Discovering services on myswitch123:
myswitch123:
 SNMP scan:
       Getting OID .1.3.6.1.2.1.1.1.0: Executing SNMP GET of .1.3.6.1.2.1.1.1.0 on switch
=> ['24G Managed Switch'] OCTETSTR
24G Managed Switch
       Getting OID .1.3.6.1.2.1.1.2.0: Executing SNMP GET of .1.3.6.1.2.1.1.2.0 on switch
=> ['.1.3.6.1.4.1.11863.1.1.3'] OBJECTID
.1.3.6.1.4.1.11863.1.1.3
       Getting OID .1.3.6.1.4.1.231.2.10.2.1.1.0: Executing SNMP GET of .1.3.6.1.4.1.231.2.10.2.1.1.0 on switch
=> [None] NOSUCHOBJECT
failed.
       Getting OID .1.3.6.1.4.1.232.2.2.4.2.0: Executing SNMP GET of .1.3.6.1.4.1.232.2.2.4.2.0 on switch
=> [None] NOSUCHOBJECT
failed.

A complete renewal of the services (tabula rasa) can be performed with a double -II:

OMD[mysite]:~$ cmk -vII myhost123
Discovering services on myhost123:
myhost123:
    1 cpu.loads
    1 cpu.threads
    6 cups_queues
    3 df
    1 diskstat
    3 kernel
    1 kernel.util
    3 livestatus_status
    1 lnx_if
    1 lnx_thermal

You can also restrict all of this to a single check plugin. For this the option is --checks=, and it must be placed before the host name:

OMD[mysite]:~$ cmk -vII --checks=df myhost123
Discovering services on myhost123:
myhost123:
    3 df

When you are finished you can activate the changes with cmk -O (cmk -R with Nagios Core):

OMD[mysite]:~$ cmk -O
Generating configuration for core (type cmc)...OK
Packing config...OK
Reloading monitoring core...OK

And when you encounter an error with a discovery...

OMD[mysite]:~$ cmk -vII --checks=df myhost123
  WARNING: Exception in discovery function of check type 'df': global name 'bar' is not defined
  nothing

... with an additional --debug you can produce a detailed Python stack trace of the fault location:

OMD[mysite]:~$ cmk --debug -vII --checks=df myhost123
Discovering services on today:
today:
Traceback (most recent call last):
  File "/omd/sites/heute/share/check_mk/modules/check_mk.py", line 5252, in <module>
    do_discovery(hostnames, check_types, seen_I == 1)
  File "/omd/sites/heute/share/check_mk/modules/discovery.py", line 76, in do_discovery
    do_discovery_for(hostname, check_types, only_new, use_caches, on_error)
  File "/omd/sites/heute/share/check_mk/modules/discovery.py", line 96, in do_discovery_for
    new_items = discover_services(hostname, check_types, use_caches, do_snmp_scan, on_error)
  File "/omd/sites/heute/share/check_mk/modules/discovery.py", line 677, in discover_services
    for item, paramstring in discover_check_type(hostname, ipaddress, check_type, use_caches, on_error):
  File "/omd/sites/heute/share/check_mk/modules/discovery.py", line 833, in discover_check_type
    discovered_items = discovery_function(info)
  File "/omd/sites/heute/share/check_mk/checks/df", line 91, in inventory_df
    foo = bar
NameError: global name 'bar' is not defined

10.1. Overview of options

To recap - all options at a glance:

cmk -I discover new services
cmk -II delete and rediscover all services (tabula rasa)
-v verbose: display hosts and detected services
-vv very verbose: display a precise protocol of all operations
--checks=foo Execute a discovery (and also a tabula rasa) only for the specified check plugin
--cache Force the use of cache data (normally the default only when no host is specified)
--no-cache fetch fresh data (normally the default only when a host name is specified)
--debug Cancel in an error situation, and display the complete Python stack trace
cmk -O Activate changes ( Check_MK Enterprise Edition with CMC as Core)
cmk -R Activate changes ( Check_MK Raw Edition with Nagios as Core)

10.2. Saving in data sets

The result of a service discovery - thus, as explained earlier, the tables of host names, check plugins, items and identified parameters - can be found in the var/check_mk/autochecks folder. Here, for every host there is a data set that stores the automatically-discovered services. As long as you don't damage this data set's Python syntax you can alter or delete individual lines manually. Deleting the data set removes all services and flags them as quasi 'unmonitored' again.

var/check_mk/autochecks/myhost123.mk
[
  ('cpu.loads', None, cpuload_default_levels),
  ('cpu.threads', None, threads_default_levels),
  ('diskstat', u'SUMMARY', diskstat_default_levels),
  ('kernel', u'Context Switches', kernel_default_levels),
  ('kernel', u'Major Page Faults', kernel_default_levels),
  ('kernel', u'Process Creations', kernel_default_levels),
  ('kernel.util', None, {}),
  ('livestatus_status', u'stable', {}),
  ('lnx_if', u'2', {'state': ['1'], 'speed': 0}),
  ('lnx_thermal', u'Zone 0', {}),
  ('mem.linux', None, {}),
  ('mknotifyd', u'today', {}),
  ('mknotifyd', u'stable', {}),
  ('mounts', u'/', [u'data=ordered', u'errors=remount-ro', u'relatime', u'rw']),
  ('ntp.time', None, ntp_default_levels),
  ('omd_apache', u'stable', None),
  ('tcp_conn_stats', None, tcp_conn_stats_default_levels),
  ('uptime', None, {}),
]

11. Service groups in wato_services

11.1. Why have service groups?

So far you have learned how to include services in monitoring. Now it makes little sense to have to look at lists of thousands of services and/or always have to go through host views. For example, if you want to view all file system or update services together, you can simply assemble groups in a similar way as you can with host groups.

Service groups make it easy for you to bring a lot more order to monitoring via views and NagVis maps, and to switch targeted notifications and alert handlers. By the way – you could almost always construct corresponding views purely using the view filters – but service groups are more clearly arranged and easier to work with.

11.2. Creating service groups

Service groups can be found at WATO ➳ Host & Service Groups. By default the host groups appear here, so first click on . There you will find a similar menu with which the service groups can then be defined:

Creating a service group is simple: Create a group via and assign a name that cannot be subsequently changed, and likewise a meaningful alias:

11.3. Adding services to a service group

To assign services to service groups you need the rule set found under WATO ➳ Host & Service Parameters ➳ Grouping}. Now use to create a new rule in the desired folder. First you specify which service group to assign services to, for example myservicegroup or its alias My Service Group 1.

The exciting part now follows in the Conditions section. On the one hand, you can use folders, host characteristics, and explicit host names to make restrictions outside of the services. Secondly, you name the services you would like to group, such as Filesystems and CIFS mount to create a set of file systems. The specification of the services takes place here in the form of regular expressions. This allows you to define groups exactly.

11.4. Checking the service groups for a service

You can check the assignment of services on the detail page of a particular service. Below, by default, is the Service groups the service is member of line.

11.5. Using service groups

As already mentioned, the service groups are used in several places: views, NagVis maps, notifications and alert handlers. For new views it is important that you use the Servicegroups as the data source. Of course, the Views widget also contains predefined views for service groups, for example a clear summary:

With a click on the service group names you will receive a complete view of all of the services of the respective group.

If you use service groups in NagVis maps, you will receive a summary of service groups opened in a menu by hovering over a single icon:

When you use service groups in notifications and alert traders, they are available as conditions/filters, of which you can use one or more:

12. More on Check plugins

12.1. A short description of their functionality

Check plugins are required to generate services in Check_MK. Each service refers back to a Check plugin in order to communicate its status, its metrics to be generated/maintained, etc. When doing so such a plugin can create one or more services per host. So that multiple services from the same plugin can be distinguished, an Item is needed. For example, for the service Filesystem /var the Item is the text /var. In the case of plugins that can only generate a maximum of one service per host, CPU utilization) for example, the Item is empty and not shown.

12.2. Available check plugins

A list of all available check plugins can be found under {{WATO|Check Plugins}}. Here the individual plugins can be searched for, filtered in various categories:

For each plugin three columns of information will be shown – the name of the check plugin, its compatible data sources, and a description of the service: