In Checkmk you configure parameters for hosts and services by using rules. This feature makes Checkmk very effective in complex environments, and also brings a number of advantages to smaller installations. In order to clarify the principle of rule-based configuration we will compare it to the classic method:
1.1. The classic approach
As an example, let's take the configuration of the thresholds for WARN and CRIT in the monitoring of file systems. With a data base oriented configuration, one would enter a line for every file system into a table:
This is relatively clear - but only because the table in this example is short. In practice there tend to be hundreds or thousands of file systems. Tools like copy & paste, and bulk operations can simplify the work but the basic problem remains - how can you identify and implement a policy? What is the general rule? How should thresholds for future hosts be preset?
1.2. Rules-based is better!
A rules-based configuration however consists of the policy! We will replace the logic of the above table with a set of four rules. If we assume that mysrv123 is a test system, and that in each case the first relevant rule applies to every file system, the result will be the same thresholds as in the table above:
- File systems with the mount point /var/trans have a 100/100% threshold.
- The /sapdata file system on mysrv124 has a 85/95% threshold.
- File systems on test systems have a 90/95% threshold.
- All (unspecified) file systems have a 85/90% threshold.
Granted, for only two hosts that doesn't achieve much, but if there are only a few more it can quickly make quite a big difference. The advantages of the rules-based configuration are obvious:
- The policy is clearly recognisable and can be reliably implemented.
- You can change the policy at any time without needing to handle thousands of data sets.
- Exceptions are always still possible, but are documented in the form of rules.
- The incorporation of new hosts is simple and less fault-prone.
In summary, then: less work - more quality! For this reason, with Checkmk you will find an abundance of rules to customize hosts and services, such as thresholds, monitoring settings, responsibilities, alerting, agent configuration and many more.
1.3. Types of rule sets
WATO organises rules in Rule sets. Every rule set has the task of of defining a specific parameter for hosts or services. From Version 1.2.8, Checkmk contains more than 700 rule sets! Here are some examples:
- Host check command - defines how to determine whether hosts are UP.
- Alternative display name for services - defines alternative names for services' displays.
- JVM memory levels - sets thresholds and other parameters for the monitoring of Javas-VMs' memory usage.
Every rule set is responsible either for hosts or for services - never for both. If a parameter can be defined for hosts as well as services, there is a pair of applicable rules - e.g., Normal check interval for host checks and Normal check interval for services checks.
A few rule sets, strictly-speaking, don't define parameters, rather they create services. An example are the rules in the Active checks category. With these you can, e.g., set up an HTTP check for specific hosts. These rules are classified as host rules - due to the fact that if such a check exists on a host it is deemed to be a host characteristic of the host.
Further, there are rule sets that control the Service discovery. With these you can, for example, define via Windows service discovery for which Windows-services automatic checks should be created if they are found on a system. These are also host rules.
The bulk of the rule sets determine parameters for specific check plug-ins. An example is Network interfaces and switch ports. The settings in these rules are tailored very specifically to their appropriate plug-in. Such rule sets fundamentally only find use with those services that are based on this plug-in. In case you are uncertain which rule set is responsible for which services, then you can best find out by navigating directly via the service to the relevant rule. How to do this will be explained later.
1.4. Host tags
One thing we have so far not mentioned: In the above example there is a rule for all test systems. Where is it actually defined which host is a test system?
In Checkmk, something like test system is known as a host tag You can freely-define which tags are available, and some are already predefined. Applying them to hosts is done either in the host's detail mask or through inheritance in the folder hierarchy. How to do this is explained in the article on the hosts. How you can create your own tags, and what the predefined tags are about will be explained later in this article.
2. Determining the correct rule sets
2.1. Host rule sets
If you wish to create a new rule that defines a parameter for one or more hosts there are several ways to this end. The direct way is via the Host & service parameters WATO module:
Now the quickest way is with the search field. You also naturally need to know the rule set's name. Here as an example is a search for host check. The numbers show the number of rules already present in the relevant rule sets:
Another way is via the button in the details for an existing host in WATO, or via the symbol in a folder's list of hosts. Here you will not only find all rule sets applicable to the host, but also the relevant parameters currently in effect for this host. In this example for Host check command, no rules apply to the host shown and thus it has the default setting PING (active check with ICMP echo request:
Click on Host check command in order to see the complete rule set.
If a rule already exists, instead of the Default value the number of the rule defining this parameter appears. Clicking on this takes you directly to the rule.
2.2. Service rule sets
The way to the rule sets for services is similar. The general access is also over the Host & service parameters WATO module and again via the search field.
If you are not yet very experienced with the rule sets' names, then the procedure via the service is simpler. Similarly to the hosts, here there is also a page in which all of a service's parameters are shown and where you have the possibility of directly accessing the applicable rule sets. You can access this parameter page with the symbol in a host's list of services in WATO. The symbol takes you directly to the rule set that defines the parameter for the check plug-in for this service.
The symbol is also found in the status window in every service's context menu:
2.3. Manual checks
A part of the rule sets is not included in the Host & service parameters module, rather it is located in the Manual checks module. This is not applicable to services that are created by a service discovery, but rather to those that have been manually created. Relevant details may be found in the article on services.
2.4. Rule sets in use
In the main dashboard under Host & Service Parameters you will find the button. This displays all rule sets in which you have defined at least one rule. This is often an easy starting point when you wish to make changes to your already existing rules.
2.5. Ineffective rules
Monitoring is a complex matter. It can sometimes occur that there are rules that don't apply to any hosts or services - either because you have made an error, or because the associated hosts and services have disappeared. Such ineffective rules can be displayed with the button.
2.6. Obsolete rule sets
Checkmk is constantly being developed. From time to time elements will be harmonised and it can occur that some rule sets will be replaced by others. An example is the harmonisation of all the check plug-ins that monitor temperature. From Checkmk's Version 1.2.8, without exception these will be configured with a single rule set. A number of the previous rule sets have been rendered ineffective through this action. Such rule sets can be found under . There you can also see if any of your defined rules are present, so that you can clone them in appropriate new rule sets as needed.
3. Creating and editing rules
The following image shows the Filesystems (used space and growth) rule set in which exactly the same four rule examples as shown in the introduction have been configured:
New rules are created either with the Create rule in folder button or by cloning an existing rule with . Cloning creates an identical copy of the rule that you can then edit with . A new rule created using the Create rule in folder button will always appear at the end of the list of rules, whereas a cloned rule will be displayed as a copy below the source rule.
The rule's sequence can be changed with the , , , and buttons. The sequence is important because rules positioned higher in the list always have priority over those located lower.
The rules are stored in the same folders from which they also manage the hosts. The rules' authority is restricted to the hosts in this folder or in subfolders. In the case of conflicting rules, the rule lower in the folder structure has priority. In this way, for example, users with rights limited to certain authorised folders can create rules for their hosts without affecting the rest of the system. In a rule's characteristics you can change its folder and thus 'relocate' it.
3.1. The analysis mode with 'traffic lights'
When you access a rule set via a host or service - for example, by using the or symbols in the host or service - WATO shows you the rule set in the analysis mode:
This mode has two features. Firstly, a second button for setting rules appears - Create mount point specific rule for as an example here. With this you can create a new rule which has the appropriate current host or service already preselected. You can create an exceptional rule very easily and directly in this way. Secondly, a 'traffic light' symbol appears in every line, the colour of which shows whether and/or how this rule affects the current host, or respectively, service. The following conditions are possible:
|This rule has no effect on the current host or service.|
|This rule accesses and defines parameters.|
|The rule is applicable. But because another rule higher in the hierarchy has priority this rule is ineffective.|
|This rule is applicable. Another rule higher in the hierarchy in fact has priority but doesn't define all parameters, so that at least one parameter is defined by this lower rule.|
The last condition - the rule is a partial match, which can only occur for rule sets in which a rule can define multiple parameters by selecting individual check boxes. Theoretically, every parameter for another rule can also be set individually here. More on this later.
4. Rule characteristics
4.1. General options
Every rule is assembled from three blocks. Everything in the first block Rule options is optional, and serves primarily for documentation:
- The Description will be shown in the table of all rules in a rule set.
- The Comment field can be used for a longer description. It only appears in a rule's edit mode. Via the symbol you can insert a date stamp and your login name in the text (here, for example, 2016-05-06 mk:).
- The Documentation-URL is intended for a link to internal documentation that you maintain in another system (e.g., a CMDB). It will appear as the clickable symbol in the rules table.
- With the Do not apply this rule check box you can temporarily disable this rule. It will then be flagged as in the table and is thus ineffective.
4.2. The predefined parameters
The second block is different for every rule. The following image shows a widely-used type of rule (DB2 Tablespaces). Using check boxes you can determine which individual parameters the rule should define. As described earlier, Checkmk ascertains, separately for each individual parameter, which rules will set the parameters. The rule in the image simply deactivates the verification of auto extend, and leaves all other settings unaffected.
Some rule sets define no parameters, rather they only decide which hosts are in and which are not. An example is the Hosts to be monitored rule set with which you can remove some hosts completely from the monitoring. The parameter area then looks like this:
In the third block, Conditions, you can define for which hosts or services the rules should apply. Here there are four different conditions, all of which must be met in order for the rule to be applied. The conditions are quasi-logically AND-linked:
With the Folder condition you define that the rule only applies to hosts in this folder (or subfolder). If the setting is Main Directory, this condition is applicable to all hosts. As described above, the folders have an effect on the rule's sequence. Rules in lower folders always have priority over higher ones.
Host tags restrict rules to hosts according to whether they have - or do not have - specific host tags. Here AND-links are also always used. Every other host tag condition in a rule reduces the number of hosts affected by the rule.
If you wish to make a rule applicable for two possible values for a tag, (for example, Criticality as well as Productive system and Business critical), you can't do this with a single rule. You will require a copy of the rule for every variable. Sometimes a negation can also help here. You can also define that a tag is not present as a condition (e.g., not test system). The so-called auxiliary tags are another possibility.
This type of condition is intended for exception rules. Here you can list one or more host names. The rule will apply only to these hosts. Please note that if you check the Specify explicit host names box but enter no hosts, then the rule will be completely ineffective.
Via the Negate option you can define a reversed exception. With this you can exclude explicitly-named hosts from the rule.
Important: all host names entered here will be checked for exact congruence. Checkmk is fundamentally case-sensitive!
You can change this behaviour to regular expressions by prefixing host names with a tilde (~). In this case, as always in WATO:
- The match is applied to the beginning of the host name
- The match is not case-sensitive
A point-asterisk (.*) in regular expressions allows an arbitrary sequence of characters following the point. The following example shows a condition which all hosts will match whose names contain the character sequence test (or Test, TEST, tEsT etc.):
For rules that are applicable to services there is a fourth and last type of condition that defines a match on a service's name, or respectively - for rules that set check parameters - the check item's name. With what exactly the match will be made can be seen in the caption. In our example it is the name of a tablespace:
A match with regular expressions fundamentally applies here. The sequence .*temp matches all tablespaces containing temp because the match is always applied to the start of the name. The dollar sign at the end of transfer$ represents the end and thereby forces an exact match. A tablespace with the name transfer2 will thus not match.
Please don't forget: for rules concerning explicit services a match with the service name is required (e.g. Tablespace transfer). For check parameter rules a match with the item applies (e.g. transfer). The item is quasi the variable part of of the service name, and determines to which tablespace it applies.
There are incidentally services without an item. An example is CPU load. This exists only once for each host - so no item is required. It follows then that rules for such check types are also without conditions.
5. Types of rule analysis
In the introduction, regarding the principle of rules I wrote that the first applicable rule determines the results of an analysis. That is not the whole truth - there are altogether three different types of analysis:
|The first rule||The first rule that applies defines the value. Subsequent rules will not be analysed. This is the normal situation with rules that set simple parameters.|
|The first rule per parameter||Every individual parameter will be set by the first rule that defines this parameter (check box selected). This is the normal situation for all rules with subparameters that are activated with check boxes.|
|All rules||All applicable rules add elements to the results. This type is used for the allocation of hosts and services to host, service and contact groups for example.|
From version 1.2.8p1 of Checkmk this information is shown at the top of every rule set.
6. Host tags in detail
As we have seen, the host tags are an important basis for defining rules. They are also however useful in other locations. In views, for example, there is a filter for host tags. The side bar element Virtual host tree can arrange your folders by host tags into a tree. And on the command line, with many commands you can select all hosts having the foo tagline by using the @foo syntax.
So that everything makes sense you should set up your own host tags scheme that optimally suits your environment. But before we show you how you can define your own host tags with WATO, we should now explain a few terms.
6.1. Tag groups, check box tags, themes and auxiliary tags
Host tags are organised in groups. For this reason a host from every group can have a maximum of one tag! A good example of an own group would be Datacenter, with the possible tags DC 1 and DC 2. With these every host will be assigned to one or the other data centres. Should you wish to install a host that is located in neither of the data centres you will need a third option for selection - for example, Not in a datacenter.
Some users have attempted to display the application running on a host in a tag group. One such group was called, let's say, Application, and had the characteristics ORACLE, SAP, MS Exchange, etc. ... This WILL work until the day a host has two applications - and that day will certainly come!
The correct solution for this situation would be: create a tag group for each application, each with only two options - yes or no. Checkmk simplifies this by allowing you to create tag groups with only a single tag. These will not be shown as a selection field in the host mask, rather they will be displayed as a check box. Selecting the check box sets a tag, otherwise the tag will not apply. Such tag groups are also called Checkboxtags.
So that this doesn't get confusing if you have many tag groups (e.g., because you have numerous different applications), you can collate the tag groups into Topics). All tag groups with the same topic are then ...
- ... consolidated into their own box in the host details.
- ... displayed in the rule conditions as a list which can be expanded and collapsed using a small triangle icon.
The topics have 'only' a visualisation function, and have no influence on the actual configuration.
Auxiliary tags solve the following problem: Imagine that you have defined an operating system tag group with the characteristics Linux, AIX, Windows 2008 and Windows 2012. Now you want to define a rule which will be valid for all Windows hosts. This cannot work, because in a situation as described above you can only ever choose one tag per group.
In order to get around this problem you can define a Windows auxiliary tag. Assign this auxiliary tag to both the Windows 2008 and the Windows 2012 tags. A host that possesses either of these tags will always automatically receive the Windows auxiliary tag from WATO. In the rules, Windows will appear as its own tag for formulating conditions.
6.2. Predefined tags
During installation, Checkmk furnishes you with numerous tag groups:
|Agent type||Defines what type of data the host receives from its agents.|
|Criticality||The system's service level. For the Do not monitor this host tag a predefined rule is provided which will disable the host monitoring. The other tags are merely examples without function. You can however assign these to hosts and then use them in rules.|
|Networking segment||Treat this tag group only as an example. For the WAN (high latency) tag an example rule is deposited which matches the thresholds for PING response times to the higher message latency in WAN.|
|IP Address Family||Defines whether the host should be monitored per IPv4 or IPv6, or both. The group has the status builtin and can not be modified. This is necessary as the tags are required internally by Checkmk during the creation of the configuration.|
Modifying predefined tag groups
You can theoretically customise the predefined tag groups as long as they are not marked as builtin. Modifications in Criticality or Network Segment are non-critical as these are only provided as examples. The Agent type group should under no circumstances be altered or extended - even though it is not marked as builtin! The tags for this group are referenced internally by Checkmk.
6.3. Editing tag groups using WATO
Creating your own tags is achieved using the Host tags WATO module. Depending on the Checkmk version, in a freshly-installed system it will look something like this:
Creating a new tag group is performed with the button, which opens the following entry mask:
The Internal ID is used internally to identify the tag group. This must be explicit and may not be subsequently changed. The standard syntax for permitted characters applies (only letters, digits, underscore).
The Title will be used everywhere in the GUI in connection with the tag group. Because this is purely a display text it can be changed at any time without affecting the existing configuration.
You can leave Topic blank. Your tag group will then be displayed with the predefined groups. You can also create your own topics and use these to arrange your tags clearly in a summary.
The Choices are naturally of most importance. It is essential that the appropriate Tag ID is explicit - not only within the group but also across all groups! In case of doubt you can simply work with prefixes - e.g., loc_dc1 - instead of only dc1.
The sequence - which you can as usual change with the , , and buttons - has only a visual function - the first tag in the list is deemed to be the default value! That means that all hosts without an explicit setting for this tag group will be automatically set to this value.
6.4. Editing Auxiliary tags
You can create new Auxiliary tags with . As usual you assign a fixed ID and an informative title in the following dialog. You can add a Topic in the same way as in the tag groups.
6.5. Deleting and modifying existing tags and tag groups
Modifying an existing tag group configuration appears to be a simple operation at first - but that is unfortunately not always the case, as it can have big impacts on an existing configuration. Changes that solely affect the display or only add new selections are no problem and have no effect on the existing hosts and rules:
- A change to the title or topic of tags and tag groups
- Adding an additional tag to a tag group
All other changes can impact existing hosts or rules that use the affected tags. WATO not only forbids such changes, it also attempts to adapt your existing configuration so that everything again functions effectively. What exactly that means depends on the type of operation.
Deleting tag groups
Information from the affected tags will be erased from all hosts. If the tag group is used as a condition in existing rules you will receive the follwing warning:
Here you need to decide whether you wish to remove the conditions from the existing rules or whether you wish to delete the rules completely. Both actions can make sense, but WATO cannot decide which action is better for you. If uncertain, you should go through the rule set (linked via the warning) and manually delete or modify all of the conditions for the affected group as needed.
Deleting single tags
Deleting tags is achieved by editing the group, removing the tag and then saving the data. This action can trigger a similar warning to that when deleting a tag group.
Hosts that had set the affected tag will be automatically reset to the default value. This will always be (as described above) the top tag in the list.
Rules that have a negative condition for the Tag simply lose this condition, without comment. If you have, for example, a rule for all hosts that don't have the loc_dc2 tag, and you delete the loc_dc2 tag completely from the configuration, then this condition is obviously superfluous.
If however a positive condition with the tag exists, you will receive the above warning and must decide how to adapt the configuration.
Unlike those in tag groups, you can in fact change the IDs of tags retrospectively. This is, so to speak, an exception to the Checkmk principle that IDs once set are unchangeable. It can however be useful if you want to prepare an import of data from an existing system for which you need to accommodate a different tag scheme.
To rename tag-IDs, go into the tag group's edit mode and there simply change the IDs, but leaving the title unaltered in doing so. This last point is important so that Checkmk recognises that a rename has occured rather than simply an option being removed and a new one added.
Before Checkmk executes the changes to the configuration, it will inform you of the consequences:
WATO will now update all relevant hosts, folders and rules as appropriate.
Please be aware that there can nonetheless be situations in which manual corrections need to be made in some locations. So, for example, Tag-IDs are components of URLs which summon views that filter by tags. WATO cannot alter these URLs for you. Likewise, filter configurations in reports and dashboards cannot be automatically updated. It is also a good idea at the beginning to give enough thought to the tag scheme so that possible later renames can be minimised.