Regular expressions in Check_MK


1. Introduction

Regular expressions – regexes for short – are used in Check_MK for specifying service names, and they are used in many other functions as well. They are character strings serving as templates that (match) or (do not match) strings in specific texts. Regexes can be employed for many practical tasks, for example, to formulate flexible rules that affect all services whose names include foo or bar.

Regexes are often confused with search patterns for file names, because both use the special characters * and ?. These so-called globbing patterns however have a quite different syntax, and are not nearly as powerful as the regular expressions. If you are uncertain whether a regular expression is allowed in a particular situation, activate the online help for advice.

In this article we will explain the most important uses for regular expressions – but by no means all of them. When the options shown here are insufficient for your needs, for further reference below you can find more comprehensive information. And of course there is always the internet.

1.1. Normal characters and the point

With regular expressions it is always a question of a template – the expression – matching a specific text – e.g, a service name. A template can include a string of special characters that have 'magic' significances. All normal characters in the expression simply match themselves.

Check_MK does not distinguish between capital and non-capital letters. The CPU load expression thus matches the text CPU load as well as the text cpu LoAd. Note: for entry fields where – without regular expressions – an exact match is required (mainly with host names), case sensitivity will always be essential!

The most important special character is the . point. It matches any single character:

Example:

Regular Expression Match Match No match
Me.er Meier Meyer Meyyer
.var.log 1var2log /var/log /var//log

1.2. Using a backslash to mask special characters

Since the point matches everything, it naturally follows that it also matches a point. Should you wish to explicitly match a point, then the point must be masked by a \ backslash (escape). This similarly applies to all other special characters, as we shall see. These are: \ . * + ? { } ( ) [ ] | & ^ and $.

Regular Expression Match No match No match
example\.com example.com example\.com example-com
How\? How? How\? How
C:\\Programs C:\Programs C:Programs C:\\Programs

1.3. Repeating characters

One will very often want to define that any string of characters may appear somewhere in an expression. In regexes this is coded with .* (point asterisk). This is actually only a special case. The asterisk can represent any character, which can appear any number of times in a search text. An empty sequence is also a valid sequence. This means that .* matches any character string and that * matches the preceeding character any number of times:

Regular Expression Match Match No match
State.*OK State is OK State = OK StatOK
State*OK StateOK StatOK State OK
a *= *5 a=5 a = 5 a==5

The + is almost the same as *, but it allows no empty sequences. The leading character must occur at least once:

Regular Expression Match Match No match
State +OK State OK State  OK StateOK
switch +off switch off switch  off switchoff

Should you wish to restrict the number of repetitions, for this purpose there is a syntax with braces with which a precise number or a range can be specified:

Regular Expression Match Match No match
Ax{3}B AxxxB AxB
Ax{2,4} Axx Axxxx Ax

A question mark is the abreviation for {0,1} – i.e. something that appears once, or never. It thus designates the preceeding character as optional:

Regular Expression Match Match No match
a-?b ab a-b a--b
Meyi?er Meyer Meyier Meyiier

1.4. Character classes, numerals and letters

Character classes allow situations such as 'a numeral must occur here'. To this end set all permitted characters in square brackets. You can also enter ranges with a minus sign. Note: The sequence in ASCII-character sets applies here.

For example, [abc] specifically stands for one of the letters a, b or c and [0-9] for any character – both can be combined. A negation for all of these is also possible: Adding a ^ in the brackets thus allows [^abc] to stand for any character except for a, b, c..

Character classes can of course be combined with other operations. Here are some abstract examples:

Character class Meaning
[abc] Stands for exactly one of the letters a, b or c.
[0-9a-z_] Exactly a numeral, a letter or an underscore.
[^abc] Any character except for a, b, c.
[ --] Exactly one character between blank characters and minus, in accordance with the ASCII-Table.
[0-9a-z]{1,20} A designator with a maximum or 20 letters or numerals.

The following are a few practical examples:

Regular Expression Match Match No match
[0-7] 0 5 9
[0-7]{2} 00 53 123
myhost_[0-9a-z_]{3} myhost_1a3 myhost_1_5 myhost_1234
[+0-9/ --]+ +49 89 9982 09700 089 / 9982 097-00 089 : 9982 097-00

Note: If you need one or the other of the characters - or -> you will need a trick. Simply code - directly at the end of the class – as shown in the preceeding example. With this it will be clear to the regex interpreter that it can't be a sequence. Code the square brackets as the first character in the class. Since no empty classes are permitted it will be interpreted as a normal character. A class with precisely these two characters will look like this: []-].

1.5. Beginning and end, prefix, suffix and infix

When comparing regular expressions with service names and other elements, Check_MK always verifies that the text matches the beginning of the expression. The reason is that this is what you usually need. A rule in which for services the terms CPU and core are coded thus applies to all services whose name begins with one of these terms:

This is described as a prefix match. Should you require an exact match, this can be accomplished by appending a $. This effectively matches the end of the text. It is sufficient if the expression matches at any location in the text – a so-called infix match. This is achieved in advance with the familiar .*:

Regular Expression Match Match No match
/var /var /var/log /test/var
/var$ /var /var/log
.*/var$ /var /test/var /var/log
.*/var /test/var /test/var/log \test\var\log

An exception to the rule that Check_MK always uses a prefix match is the Event Console (EC), which always works with an infix match – so that only containedness is checked. Here, by prefixing ^, a match for the beginning can be forced – a prefix match in other words.

Regular Expression in EC Match Match Kein Match
ORA- ORACLEserver myORACLEserver myoracleserver
^ORA- ORACLEserver ORACLEhost myORACLEserver

1.6. Alternatives

With a | vertical bar – an OR-link – you can define alternatives: 1|2|3 thus matches with 1, 2 or 3. If the alternatives are required in the middle of an expression, enclose them in brackets '()'.

Regular Expression Match Match No match
CPU load|core|memory CPU load core CPU utilisation
01|02|1[1-5] 01 11 to 15 05
server\.(intern|dmz|123)\.net server.intern.net server.dmz.net server.extern.net

1.7. Match groups

In the Event Console, in Business Intelligence (BI) and also in Bulk renaming of hosts there is the possibilty of relating to text components that are found in the original text. For this patterns in regular expressions are marked with brackets. The text component that matches the first bracketed expression will be available in the substitution as \1, the second expression as \2, etc.

Regular Expression Text Group 1 Group 2
([a-z])+([123])+ abc123 abc 123
server-(.*)\.local server-lnx02.local lnx02

The image below shows such a rename. All host names that match the regular expression server-(.*)\.local will be substituted with \1.servers.local. In doing so the \1 represents the exact text that will be 'captured' by the .* in the brackets:

In a concrete example, server-lnx02.local will be renamed to lnx02.servers.local.

Groups can of course also be combined with the repetition operators *, +, ? und {...}. Thus for example the expression (/local)?/share matches /local/share, as well as /share.

2. Table of all special characters

Here is a summary of all of the special characters as described above and the functions performed by the regular expressions as used in Check_MK:

. Matches any character
\ Treats the next special character as a normal character
* The preceeding character may appear any number of times – or never
+ The preceeding character must appear at least once
{5} The preceeding character must appear precisely five times
{5,10} The preceeding character must appear between five and ten times
? The preceeding character may appear once, or not at all
[abc] Represents exactly one of the characters a, b or c
[0-9] Represents explicitly one of the characters n 0, 1 ... 9 (i.e., a numeral)
[0-9a-z_] Represents exactly ONE numeral, letter or underscore
[^"'] Represents any single character except single or double quotes
$ Matches to the end of a text
^ Matches to the beginning of a text
A|B|C Matches A or B or C
(A) Combines the sub-expression A into a group

The following characters must be escaped with a backslash if they are to be explicitly used: \ . * + ? { } ( ) [ ] | & ^ $

3. If you'd like to learn the full details

Back in the '60s, Ken Thompson, one of the inventors of UNIX, had already developed the first regular expressions in their current form – including today's standard Unix command grep. Since then countless extensions and dialects have been derived from standard expressions – including extended regexes, Perl-compatible regexes and a very similar variant in Python.

Under Filters in views Check_MK utilises POSIX extended regular expressions (extended REs). These are analysed in the monitoring core using C with the Regex function of the C-Bibliothek. A complete reference for this subject can be found in the Linux-Manpage for regex(7):

OMD[mysite]:~$ man 7 regex

REGEX(7)                   Linux Programmer's Manual                   REGEX(7)

NAME
       regex - POSIX.2 regular expressions

DESCRIPTION
       Regular  expressions  ("RE"s), as defined in POSIX.2, come in two forms:
       modern REs (roughly those of egrep; POSIX.2 calls these "extended"  REs)
       and  obsolete  REs (roughly those of ed(1); POSIX.2 "basic" REs). Obsolete
       REs mostly exist for backward compatibility in some  old  programs;

In all other locations all of Python's other options for regular expressions are additionally available. These apply to, among others, the Configurations rules, the Event Console and Business Intelligence (BI). The Python-regexes are an enhancement of the extended REs, and they are very similar to those from Perl. They support, e.g., the so-called negative lookahead, a non-greedy asterisk *, or a forced differentiation between upper and lower cases. The detailed options for these regexes can be found in the Python online help for the re module:

OMD[mysite]:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> help(re)
Help on module re:

NAME
    re - Support for regular expressions (RE).

FILE
    /usr/lib/python2.7/re.py

MODULE DOCS
    http://docs.python.org/library/re

DESCRIPTION
    This module provides regular expression matching operations similar to
    those found in Perl.  It supports both 8-bit and Unicode strings; both
    the pattern and the strings being processed can contain null bytes and
    characters outside the US ASCII range.

    Regular expressions can contain both special and ordinary characters.
    Most ordinary characters, like "A", "a", or "0", are the simplest
    regular expressions; they simply match themselves.  You can
    concatenate ordinary characters, so last matches the string 'last'.

    The special characters are:
        "."      Matches any character except a newline.
        "^"      Matches the start of the string.
        "$"      Matches the end of the string or just before the newline at
                 the end of the string.
        "*"      Matches 0 or more (greedy) repetitions of the preceding RE.
                 Greedy means that it will match as many repetitions as possible.
        "+"      Matches 1 or more (greedy) repetitions of the preceding RE.
        "?"      Matches 0 or 1 (greedy) of the preceding RE.
        *?,+?,?? Non-greedy versions of the previous three special characters.
        {m,n}    Matches from m to n repetitions of the preceding RE.
        {m,n}?   Non-greedy version of the above.
        "\\"     Either escapes special characters or signals a special sequence.
        []       Indicates a set of characters.
                 A "^" as the first character indicates a complementing set.
        "|"      A|B, creates an RE that will match either A or B.
        (...)    Matches the RE inside the parentheses.
                 The contents can be retrieved or matched later in the string.
        (?iLmsux) Set the I, L, M, S, U, or X flag for the RE (see below).
        (?:...)  Non-grouping version of regular parentheses.
        (?P<name>...) The substring matched by the group is accessible by name.
        (?P=name)     Matches the text matched earlier by the group named name.
        (?#...)  A comment; ignored.
        (?=...)  Matches if ... matches next, but doesn't consume the string.
        (?!...)  Matches if ... doesn't match next.
        (?<=...) Matches if preceded by ... (must be fixed length).
        (?<!...) Matches if not preceded by ... (must be fixed length).
        (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
                           the (optional) no pattern otherwise.

    The special sequences consist of "\\" and a character from the list
    below.  If the ordinary character is not on the list, then the
    resulting RE will match the second character.
        \number  Matches the contents of the group of the same number.
        \A       Matches only at the start of the string.
        \Z       Matches only at the end of the string.
        \b       Matches the empty string, but only at the start or end of a word.
        \B       Matches the empty string, but not at the start or end of a word.
        \d       Matches any decimal digit; equivalent to the set [0-9].
        \D       Matches any non-digit character; equivalent to the set [^0-9].
        \s       Matches any whitespace character; equivalent to [ \t\n\r\f\v].
        \S       Matches any non-whitespace character; equiv. to [^ \t\n\r\f\v].
        \w       Matches any alphanumeric character; equivalent to [a-zA-Z0-9_].
                 With LOCALE, it will match the set [0-9_] plus characters defined
                 as letters for the current locale.
        \W       Matches the complement of \w.
        \\       Matches a literal backslash.

Copyright © 2001-2018 Python Software Foundation. All rights reserved.
Copyright © 2000 BeOpen.com. All rights reserved.
Copyright © 1995-2000 Corporation for National Research Initiatives. All rights reserved.
Copyright © 1991-1995 Stichting Mathematisch Centrum. All rights reserved.

License: https://docs.python.org/2/license.html

A very comprehensive explanation covering regular expressions can be found in the Wikipedia.