Monitoring instances (sites)


1. OMD - The Open Monitoring Distribution

The Check_MK Monitoring System is an Open-Source-Project founded by Mathias Kettner that revolves around the comfortable and flexible installation of a monitoring solution assembled from various components. The System builds on the Open Monitoring Distribution (OMD). The abreviation OMD will already be familiar as a part of the name of the RPM/DEB-Package installation.

An OMD-based installation is distinguished by:

  • the ability to run multiple Instances (or ‘sites’) in parallel
  • the ability to operate in differing versions of the monitoring software
  • an intelligent and easy to operate upgrade/downgrade-mechanism
  • uniform data paths – regardless of which Linux-platform is installed
  • a clear separation of data and software
  • a very simple installation – with no dependence on third-party software
  • a perfect preconfiguration of all components

2. Creating instances (or ‘sites’)

Perhaps the best thing about OMD is that it can manage any chosen number of monitoring instances on a server. These can also be referred to as Sites. Each ‘instance’ is a self-contained monitoring system which runs independently of the others.

An instance always has a distinct name, specified at its creation. This name is the same as that of the Linux-user which is created at the same time. The instance's name conforms to the same conventions as user names under Linux.

The creation is performed with the omd create command. This must be executed as root:

root@linux# omd create mysite
Adding /opt/omd/sites/mysite/tmp to /etc/fstab.
Restarting Apache...OK
Creating temporary filesystem /omd/sites/mysite/tmp...OK
Created new site mysite with version 2014.11.17.mmk.

  The site can be started with omd start mysite.
  The default web UI is available at http://Klappfisch/mysite/
  The admin user for the web applications is omdadmin with password omd.
  Please do a su - mysite for administration of this site.

From 1.4.0, when creating the cmkadmin user a password will be randomly-generated and issued.

What takes place during the creation of an instance ‘mysite’?

  • An operating system user mysite, and a group mysite will be created.
  • A new home directory /omd/sites/mysite will be created and assigned.
  • This home directory will be populated with configuration files and sub-directories.
  • A basic configuration will be created for the new instance.

Important: Avoid using a name which is already allocated in another service. A duplicated allocation can cause problems.

2.1. User and group IDs

In some cases it is also desirable to specify the user/group ID of the new user to be created. This is performed with the -u and -g options, e.g.:

root@linux# omd create -u 6100 -g 180 mysite

An overview of the further options can be shown with omd create --help. The most important options are:

-u UID The new user will be created with the User-ID UID.
-g GID The new user's group will be created with the Group-ID GID.
--reuse OMD assumes that the new user already exists, and does not create it.
-t SIZE The new site's tmpfs will be created with the ‘SIZE’ parameter. SIZE has the suffix M (Megabyte), G (Gigabyte) or % (percentage of RAM). Example: -t 4G

3. Instance User (Site User)

The further administration of the instance is always best performed with the rights of the newly-created user. Switching users is done with su:

root@linux# su - mysite

Please note that the ‘minus sign’ following the su is essential. It ensures that the user switch processes ALL of the operations that take place during a normal login. In particular, all environment variables will be correctly set, and your session will continue as mysite in the home directory of the /omd/sites/mysite instance.

As an instance-user you can execute all important operations affecting this site. Entering the instance ID then of course becomes unnecessary when issuing the relevant omd-commands.

4. Starting and stopping instances

Your instance is now ready to be started – which can be done as root with omd start mysite. It is fundamentally better though to work with the instance as the instance user (site user):

OMD[mysite]:~$ omd start
Starting Livestatus Proxy-Daemon...OK
Starting rrdcached...OK
Starting CMC Rushing Ahead Daemon...OK
Starting Check_MK Micro Core...OK
Starting dedicated Apache for site mysite...OK
Initializing Crontab...OK

Unsurprisingly, stopping is achieved with omd stop:

OMD[mysite]:~$ omd stop
Removing Crontab...
Stopping dedicated Apache for site mysite....OK
Stopping Check_MK Micro Core...killing 15085...OK
Stopping CMC Rushing Ahead Daemon...killing 15071....OK
Stopping rrdcached...waiting for termination...OK
Stopping Livestatus Proxy-Daemon...killing 15049....OK

Starting and stopping an instance is nothing other than starting or stopping a collection of services. These can also be individually managed by specifying the name of the service, e.g.:

OMD[mysite]:~$ omd start apache
Starting dedicated Apache for site mysite...OK

The names of the various services can be found in the ~/etc/init.d directory. Please note the leading tilde – this represents the home directory for the instance-user (the site-directory). This is not the same as /etc/init.d!

Alongside start and stop, there are also the restart, reload and status commands. Reloading Apache is, for example, always necessary following a manual change to the Apache-configuration. Please note that this does not apply to the global Apache-process on the Linux-server, but rather the site's own dedicated Apache-process:

OMD[mysite]:~$ omd reload apache
Reloading dedicated Apache for site mysite....OK

In order to be able to maintain an overview of state of the site following all of the starts and stops, simply use omd status:

OMD[mysite]:~$ omd status
liveproxyd:     stopped
rrdcached:      running
cmcrushd:       running
cmc:            stopped
apache:         running
crontab:        running
-----------------------
Overall state:  partially running

5. Deleting instances

Deleting an instance is as easy as creating one – with the omd rm command. The instance will first be automatically stopped.

root@linux# omd rm mysite
omd rm mysite
omd rm mysite
PLEASE NOTE: This action removes all configuration files
             and variable data of the site.

In detail the following steps will be done:
- Stop all processes of the site
- Unmount tmpfs of the site
- Remove tmpfs of the site from fstab
- Remove the system user <SITENAME>
- Remove the system group <SITENAME>
- Remove the site home directory
- Restart the system wide apache daemon
 (yes/NO): yes

It goes without saying that this action also deletes all of the instance's data!

If you are no fan of confirmation prompts, or wish to perform the deletion as part of a script, the deletion can be forced with the -f option. Attention: here the -f must be placed before the rm:

root@linux# omd -f rm mysite

6. Configuring the components

As already mentioned, OMD is a system that integrates multiple software components into a monitoring system. In so doing, some components are optional, and for some there are alternatives or different operational settings. All of this can be comfortably configured with omd config. There are also scripting and interactive modes. This latter can be simply opened by a site-user with:

OMD[mysite]:~$ omd config

If you alter a setting, the OMD will be immediately notified that the site must be stopped (if that is not already the case), and does this as needed:

Please don't forget to restart the site following the completion of the work. omd config will NOT do this for you automatically.

6.1. Script-interfaces

Those who don't like the interactive mode, or prefer to work with scripts, can set the individual variables using commands. For this there is the omd config set command. The following example sets the CORE variable to cmc:

OMD[mysite]:~$ omd config set CORE cmc

As always, this can be performed as root if the site's name is added as an argument:

root@linux# omd config mysite set CORE cmc

The current configuration of all variables can be viewed using omd config show:

OMD[mysite]:~$ omd config show
APACHE_MODE: own
APACHE_TCP_ADDR: 127.0.0.1
APACHE_TCP_PORT: 5000
AUTOSTART: off
CMCRUSHD: on
CORE: cmc
[...]

6.2. Commonly used settings

There are numerous settings in omd config. The most important are:

Variable Standard Function
CORE cmc Selection of the monitoring core. As well as the Check_MK Micro Core (CMC), the standard Nagios core is still available. In earlier versions this was set as the default.
MKEVENTD on Activates the Check_MK Event Console, with which the syslog messages, SNMP-Traps and other events can be processed
MKNOTIFYD on  Check_MK Enterprise Edition: Activates the Notification-Spooler. Firstly, this forwards remotely-generated notifications to a central system. This will require mknotifyd on the central and remote sites respectively. Additionally, an asynchronous delivery of messages can be performed using this.
AUTOSTART on Set this to off if you want to suppress an automatic starting of the instance when the computer is started. This is primarily of interest for test installations that should not normally start by themselves.
LIVESTATUS_TCP off Allows external access to the status data for this site. A distributed monitoring can be constructed with this. The status of this instance can be incorporated into the central instance. Please only activate it in a secure network.

7. Copying and renaming instances

It is sometimes useful to create a copy of an instance, for testing purposes or for the preparation of an update. Of course one could simply copy the /omd/sites/alt directory to /omd/sites/neu. That will however not work because:

  • Many configuration files include the site's name.
  • In addition, at numerous locations there are absolute data paths with the /omd/sites/alt prefix.
  • Not least, a user and a group with the site's name to which everything belongs, must be available.

To simplify the copying of an instance, there is the omd cp command, which takes all of these factors into consideration. Its use is very simple. As argument simply enter the name of the existing site followed by the name of the new one. For example:

root@linux# omd cp alt neu

The copy can only work if:

  • The site has been stopped.
  • No processes that belong to the instance user are running.

Both points ensure that at the time of the copy the instance is in a consistent state and cannot change during the action.

7.1. Limiting data volume

If a large number of hosts are being monitored, the volume of data to be copied can be quite substantial. The greater part of this is the performance data which is stored in RRD-files. But the log files containing historic events can also produce larger data volumes. If the history is not required (for example, if only testing is being performed), these can be omitted from the copy. In such cases the following options can be added to omd cp:

--no-rrds The copy will exclude performance data (RRDs)
--no-logs All log files and remaining historic data will be excluded
-N This is an abreviation of --no-rrds --nologs

The order of the options is important:

root@linux# omd cp --no-rrds alt neu

7.2. Renaming instances

Renaming an instance is performed with the omd mv command. This functions similarly to the copy command and has the same prerequisites. The options to restrict the data volume are not available since the data is only being moved to another directory and is not being duplicated. For example:

root@linux# omd mv alt neu

7.3. Further options for cp and mv

Both operations will create new Linux-users in exactly the same way as create does, thus some of the options for omd create are also available for use:

-u UIDThe new user will be created with the User-ID UID.
-g GIDThe new user's group will be created with the Group-ID GID.
--reuseOMD assumes that the new user already exists and does not create it.
-t SIZEThe new site's tmpfs will be created with the ‘SIZE’ parameter. SIZE has the suffix M (Megabyte), G (Gigabyte) or % (percentage of RAM). Example: -t 4G

8. Showing changes with omd diff

When creating a new Check_MK-instance the omd create command populates the etc directory with numerous predefined configuration files. A number of directories will also be created under var and local.

Now it is probably the case that in the course of time a number of the files will have been customised. When after a time you wish to determine which files are no longer in the condition as originally supplied, the omd diff command can provide the answer. Amongst other things, this is useful before an update of Check_MK, since your changes could conflict with changes in the default files.

In a request without additional arguments, all changed files will be listed:

OMD[mysite]:~$ omd diff
 * Deleted var/log/nagios.log
 * Changed content var/check_mk/wato/auth/auth.php
 * Changed content etc/htpasswd
 ! Changed permissions etc/htpasswd
 * Changed content etc/diskspace.conf
 * Changed content etc/auth.secret
 * Changed content etc/apache/apache.conf

You can also enter a query for a specific directory:

OMD[mysite]:~$ omd diff etc/apache
 * Changed content etc/apache/apache.conf

If you wish to see the changes in detail, simply enter the complete file name:

OMD[mysite]:~$ omd diff etc/apache/apache.conf
--- /dev/fd/63  2017-01-24 09:14:46.248968199 +0100
+++ /omd/sites/mysite/etc/apache/apache.conf    2017-01-24 09:12:37.705355164 +0100
@@ -66,8 +66,8 @@
 StartServers         1
 MinSpareServers      1
 MaxSpareServers      5
-ServerLimit          128
-MaxClients           128
+ServerLimit          64
+MaxClients           64
 MaxRequestsPerChild  4000

 ###############################################################################

9. Backing-up and restoring instances

9.1. Backing-up instances with omd backup

The site management in Check_MK has a built-in mechanism for backing up and restoring Check_MK-instances. The omd backup and omd restore commands are the basics for packing all of an instance's data into a tar archive, and respectively, extracting that data for a restore.

From Version 1.4.0 Check_MK additionally uses the Backup WATO-module which makes a backup and restore possible without the command line, and which also enables the setting-up of regular backup jobs.

Backing up an instance with omd backup does not require root-permissions. An instance user can perform this. Simply enter as an argument the name for the backup file to be created:

OMD[mysite]:~$ omd backup /tmp/mysite.tar.gz

Please note however:

  • The created file type is a gzip-compressed tar archive. Therefore use .tar.gz or .tgz as the file extension.
  • Do not store the backup in the instance directory, since this will of course be completely backed up – thus every subsequent backup will contain a copy of ALL of its predecessors!

If the backup's target directory is not writable for an instance user, the backup can otherwise be performed as a root-user. In this case an additional argument is always required specifying the name of the instance to be backed up:

root@linux# omd backup mysite /var/backups/mysite.tar.gz

The backup contains all of the instance's data – except for the volatile data under tmp/. With the tar tzf command one can easily have a look at the file's contents:

OMD[mysite]:~$ tar tvzf /tmp/mysite.tar.gz  | less
lrwxrwxrwx mysite/mysite     0 2017-01-24 09:02 mysite/version -> ../../versions/2017.01.16.cee
drwxr-xr-x mysite/mysite     0 2017-01-24 09:12 mysite/
drwxr-xr-x mysite/mysite     0 2017-01-24 09:02 mysite/local/
drwxr-xr-x mysite/mysite     0 2017-01-24 09:02 mysite/local/share/
drwxr-xr-x mysite/mysite     0 2017-01-24 09:02 mysite/local/share/nagvis/
drwxr-xr-x mysite/mysite     0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/
drwxr-xr-x mysite/mysite     0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/
drwxr-xr-x mysite/mysite     0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/styles/
drwxr-xr-x mysite/mysite     0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/scripts/
drwxr-xr-x mysite/mysite     0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/templates/
drwxr-xr-x mysite/mysite     0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/gadgets/

9.2. Backup without history

The lion's share of an instance's data is the performance data retained in the RRDs. The monitoring history can also be very large. If neither of these are absolutely required, with the following options the history data can be omitted, thus making the backup smaller and faster running. The options must be coded after the word ‘backup’:

--no-rrds Omits backing up the RRD-databases (performance data)
--no-logs Omits the monitoring history stored in the log files
-N An abreviation of --no-rrds --no-logs

Example:

OMD[mysite]:~$ omd backup -N /tmp/mysite.tar.gz

9.3. Backing up a running instance

A backup does not require the instance to be stopped, and therefore can be executed while the system is running. In order to ensure a consistent condition of the RRDs used for recording the performance data, the omd backup command automatically alters the Round-Robin-Cache to a mode with which the running updates are written only to the journal, and no longer to the RRDs. The journal files are the last to be backed up – thus it can be achieved that as much as possible of the performance data that has been generated during the backup is also included in the backup.

9.4. Restore

The restoring of a backup is as simple as the backup itself. The omd restore command restores an instance from a backup. This is even possible for a user. The instance must be stopped for this procedure. The instance will not be newly-generated (which would require root-permissions), rather it will be completely emptied and then refilled:

OMD[mysite]:~$ omd stop
OMD[mysite]:~$ omd restore /tmp/mysite.tar.gz

Following the restore the instance can be started:

OMD[mysite]:~$ omd start

A restore can also be performed by a root-user. If an instance with the same name already exists, this must first be deleted. This can be performed either with an omd rm, or by simply including the --reuse option. A --kill additionally ensures that the existing instance is first stopped. It is not necessary to use the instance's name with the restore, since this is contained in the backup:

root@linux# omd restore --reuse --kill /var/backup/mysite.tar.gz
root@linux# omd start mysite

When operating as root, you can restore the instance with a different name from that in the backup. Include the desired alternative name as an argument following the restore command:

root@linux# omd restore mysite2 /var/backup/mysite.tar.gz
Restoring site mysite2 from /tmp/mysite.tar.gz...
 * Converted      ./.modulebuildrc
 * Converted      ./.profile
 * Converted      .pip/pip.conf
 * Converted      etc/logrotate.conf

The long list of conversions found here has the same function as for the renaming of instances described earlier: The instance's name is included in numerous configuration files, and with this these occurrences will be replaced automatically by the new name.

9.5. Live migration of instances with backup & restore

The omd backup and omd restore commands can – in the good old Unix tradition – instead of files, also work with the standard input/output. Instead of a data path for the tar file, simply enter a hyphen (-).

In this way a pipe can be constructed and the data ‘streamed’ directly to another computer without requiring intermediate files. The larger the backup, the more advantageous this will be since no temporary space in the backed up server's file system will be needed.

The following command backs up an instance to another computer using SSH:

root@linux# omd backup mysite - | ssh user@otherserver "cat > /var/backup/mysite.tar.gz"

If you want to reverse the SSH-access – by which you prefer to log in TO the Check_MK-instance FROM the backup server – that is also possible, as shown in the following example. For this, first an SSH-Login as an instance user must be permitted:

root@otherserver# ssh mysite@checkmkserver "omd backup -" > /var/backup/mysite.tar.gz

If you are clever, and combine the above with an omd restore which reads the data from the standard input, you can copy a complete, running instance from one server to another – and without needing any additional space for a backup file:

root@otherserver# ssh mysite@checkmkserver "omd backup -" | omd restore - 

And now, the same procedure with a reversed SSH-access – but this time from the source system to the target system:

root@linux# omd backup mysite - | ssh root@otherserver "omd restore -"