Last updated: July 04. 2012
Working with events is not for free. The problem is that devices and applications tend to send more messages than are relevant to you. Your task is to select those messages that are relevant for your business and ignore all other ones. And that involves a bit of work.
As a first step for deciding which messages are important you can make use of existing log levels or severities of messages. Syslog for example uses eight different levels - ranging from debug to emerg. In many cases, however, those classification will not match your need. Even if something is surely a critical problem for a specific application - that application as a whole might not be relevant to you at all.
For that reason the Event Console allows you to configure a list of rules that describe what to do with which message. And each rule can use any available attribute of a message for its decision. This article describes how these rules can be configured and how the procession of the messages exactly works.
2. WATO Module for Event Console
In general the Event Console can be configured on the command line with text files - just as Check_MK itself. But all of its features can also be deployed by a new dedicated WATO module.
This module will become visiable as soon as you activate the Event Console. It allows a complete graphical configuration of the event processing rules.
The following screen shot shows a situation with three configured rules, the first of which is disabled.
If you create a new rule it will automatically be put on top of the list. If you create a clone of a rule with , then the clone will be sorted into the same place as the cloned rule.
Every message that is received by the event daemon will be matched against the rules in this list - starting from the top. Only the first matching rule will be executed. If no rule matches then the message is silently being dropped. Disabled rules will be silently ignored.
3. Structure of a Rule
The configuration of a rule is divided into five sections:
4. General Properties
4.1. Rule ID
Each rule must have a unique id. You can use an arbitrary sequence of digits, letters and underscores, such as nagios_warn, ntp17 or _TEST. You cannot change the ID of a rule later! The ID is not only useful for your conveniance but is also used internally for mapping events to rules. For each event you can see from which rule it originated.
4.3. Rule activation
5. Matching Criteria
Whenever a message arrives at the event daemon, all non-disabled rules are checked one after another. The first matching rule decides the outcome. If no rule matches then the message is silently dropped.
5.1. Text to match
This is surely the most important part of a rule. Here you can write a regular expression that must be found in the text of the message in order for the rule to match. The syntax of the regular expression is that of Python. It is compatible to extended regular expressions (egrep) and to those of Perl. If you do not want an infix match but an exact match, then you can use - as always with regular expressions - the special characters ^ and $ for matching the beginning and end of the message.
For details about regular expressions please use available ressources. The most imporant special characters are:
5.2. Match groups
Important for the event processing and correlation is the notion of grouping. In the example in the screen shot you see the expression Foobar instance (.*) failed. The expressions .* matches any sequence of text. Putting this into brackets fetches the actual match for later reference. We call this the first match group. Lets assume that the rule is matched against the message
Foobar instance PROD18 failed.
The expression will match and the sub-expression (.*) will exactly match the text PROD18. This will be saved together with the event. Such match groups can be used in order to distinguish multiple instances of events when it comes to cancelling and counting. The are also important when you want to rewrite messages (as we will see later).
When trying out your rule set and you hover with the mouse over the green ball that indicates a matching rule, then the match groups will be displayed:
5.3. Match host
This is a regular expression that must match the complete hostname. No infix match is done here. No match groups are being saved. As already stated above, the case (upper, lower) is ignored during the match.
5.4. Match original source IP address
New in 1.2.7: The Event Console now stores the original source IP address when receiving an event directly via an SNMP trap or the builting syslog server. This IP address is put into a new dedicated field in the event and is shown in the details of the event. In the event rules you can now match on this address using a X.X.X.X/Y networking syntax.
5.5. Match syslog application (tag)
Syslog messages always contain an application tag, such as postfix/pickup or rsyslog. When parsing a message, the event daemon splits away the information about the process ID, so the match in our example would be against postfix/pickup.
5.6. Match Syslog Priority
Criteria prefixed with Checkboxes will be ignored if the box is not checked. Syslog knows eight levels of priority , which can be used for matching here. For messages not originating from syslog this information might not be available. The event daemon assumes notice in such a case.
5.7. Match Syslog Facility
Syslog defines a couple of different facilities. These are hard coded into the syslog standard and do not reflect the modern variety of applications. Matching against the facility can be helpful nevertheless. Some applications let you choose the facility they use for logging.
5.8. Match Service Level
The service level is a user supplied parameter for rules and also for monitoring hosts and services. If you let Nagios alert into the event console and there is a service levels attached to the service triggering the alert, then you can match against this level here.
6. Cancelling Events
The feature of cancelling is used to automatically close events when "OK"-messages arrive that correspond to previous error messages. In order to use that feature you can make use of the following rule parameters:
6.1. Text to cancel event
If a message is seen that matches this text while also matching host, syslog application and syslog facility then all previous events that originate from this rule are being cancelled, if:
6.2. Syslog priority to cancel event
If you activate this option you can make the cancelling depend on the message having a certain syslog priority.
7. Outcome & Action
7.1. Drop Message
Here you can assign a severity to the event that is being generated from the rule. For an optimal integration with Nagios - the Event Console supports the four states OK, WARN, CRIT and UNKNOWN. The state OK is very uncommon to use - do not mix it up with cancelling. OK-events are being displayed just like the other events in the console.
7.3. Service Level
The service level is one out of a list that you can configure yourself. This is done in the global settings of WATO in the section Notification under Service Levels for Event Console. The idea is that you assign some level of criticality or importance to each rule. When the rule is triggered then the service level is attached to the event and being displayed to the operator. You also can create specialized views that show only the most (or least) important events.
Actions allow you to sends emails or run custom scripts at the point of time when a rule triggers and creates an event. Please refer to this document for details.
8. Further Rule Features
Rules allow you to do even more things with events. This is explained in specialized articles: