Monitoring instances (sites)
Last updated: June 07. 2017
1. OMD - The Open Monitoring Distribution
The Check_MK Monitoring System is an Open-Source-Project founded by Mathias Kettner that revolves around the comfortable and flexible installation of a monitoring solution assembled from various components. The System builds on the Open Monitoring Distribution (OMD). The abreviation OMD will already be familiar as a part of the name of the RPM/DEB-Package installation.
An OMD-based installation is distinguished by:
2. Creating instances (or ‘sites’)
Perhaps the best thing about OMD is that it can manage any chosen number of monitoring instances on a server. These can also be referred to as Sites. Each ‘instance’ is a self-contained monitoring system which runs independently of the others.
An instance always has a distinct name, specified at its creation. This name is the same as that of the Linux-user which is created at the same time. The instance's name conforms to the same conventions as user names under Linux.
The creation is performed with the omd create command. This must be executed as root:
root@linux# omd create mysite Adding /opt/omd/sites/mysite/tmp to /etc/fstab. Restarting Apache...OK Creating temporary filesystem /omd/sites/mysite/tmp...OK Created new site mysite with version 2014.11.17.mmk. The site can be started with omd start mysite. The default web UI is available at http://Klappfisch/mysite/ The admin user for the web applications is omdadmin with password omd. Please do a su - mysite for administration of this site.
From 1.4.0, when creating the cmkadmin user a password will be randomly-generated and issued.
What takes place during the creation of an instance ‘mysite’?
2.1. User and group IDs
In some cases it is also desired to specify the user/group ID of the new user to be created. This is performed with the -u and -g options, e.g.:
root@linux# omd create -u 6100 -g 180 mysite
An overview of the further options can be shown with omd create --help. The most important options are:
3. Instance User (Site User)
The further administration of the instance is always best performed with the rights of the newly-created user. Switching users is done with su:
root@linux# su - mysite
Please note that the ‘minus sign’ following the su is essential. It ensures that the user switch processes ALL of the operations that take place during a normal login. In particular, all environment variables will be correctly set, and your life will commence as mysite in the home directory of the /omd/sites/mysite instance.
As an instance-user you can execute all important operations affecting this site. Entering the instance ID then of course becomes unnecessary when issuing the relevant omd-commands.
4. Starting and stopping instances
Your instance is now ready to be started – which can be done as root with omd start mysite. It is fundamentally better though to work with the instance as the instance user (site user):
OMD[mysite]:~$ omd start Starting Livestatus Proxy-Daemon...OK Starting rrdcached...OK Starting CMC Rushing Ahead Daemon...OK Starting Check_MK Micro Core...OK Starting dedicated Apache for site mysite...OK Initializing Crontab...OK
Unsurprisingly, stopping is achieved with omd stop:
OMD[mysite]:~$ omd stop Removing Crontab... Stopping dedicated Apache for site mysite....OK Stopping Check_MK Micro Core...killing 15085...OK Stopping CMC Rushing Ahead Daemon...killing 15071....OK Stopping rrdcached...waiting for termination...OK Stopping Livestatus Proxy-Daemon...killing 15049....OK
Starting and stopping an instance is nothing other than starting or stopping a collection of services. These can also be individually managed by specifying the name of the service, e.g.:
OMD[mysite]:~$ omd start apache Starting dedicated Apache for site mysite...OK
The names of the various services can be found in the ~/etc/init.d directory. Please note the leading tilde – this represents the home directory for the instance-user (the site-directory). This is not the same as /etc/init.d!
Alongside start and stop, there are also the restart, reload and status commands. Reloading Apache is, for example, always necessary following a manual change to the Apache-configuration. Please note that this does not apply to the global Apache-process on the Linux-server, but rather the site's own dedicated Apache-process:
OMD[mysite]:~$ omd reload apache Reloading dedicated Apache for site mysite....OK
In order to be able to maintain an overview of state of the site following all of the starts and stops, simply use omd status:
OMD[mysite]:~$ omd status liveproxyd: stopped rrdcached: running cmcrushd: running cmc: stopped apache: running crontab: running ----------------------- Overall state: partially running
5. Deleting instances
Deleting an instance is as easy as creating one – with the omd rm command. The instance will first be automatically stopped.
root@linux# omd rm mysite omd rm mysite omd rm mysite PLEASE NOTE: This action removes all configuration files and variable data of the site. In detail the following steps will be done: - Stop all processes of the site - Unmount tmpfs of the site - Remove tmpfs of the site from fstab - Remove the system user <SITENAME> - Remove the system group <SITENAME> - Remove the site home directory - Restart the system wide apache daemon (yes/NO): yes
It goes without saying that this action also deletes all of the instance's data!
If you are no fan of confirmation prompts, or wish to perform the deletion as part of a script, the deletion can be forced with the -f option. Attention: here the -f must be placed before the rm:
root@linux# omd -f rm mysite
6. Configuring the components
As already mentioned, OMD is a system that integrates multiple software components into a monitoring system. In so doing, some components are optional, and for some there are alternatives or different operational settings. All of this can be comfortably configured with omd config. There are also scripting and interactive modes. This latter can be simply opened by a site-user with:
OMD[mysite]:~$ omd config
If you alter a setting, the OMD will be immediately notified that the site must be stopped (if that is not already the case), and does this as needed:
Those who don't like the interactive mode, or prefer to work with scripts, can set the individual variables using commands. For this there is the omd config set command. The following example sets the CORE variable to cmc:
OMD[mysite]:~$ omd config set CORE cmc
As always, this can be performed as root if the site's name is added as an argument:
root@linux# omd config mysite set CORE cmc
The current configuration of all variables can be viewed using omd config show:
OMD[mysite]:~$ omd config show APACHE_MODE: own APACHE_TCP_ADDR: 127.0.0.1 APACHE_TCP_PORT: 5000 AUTOSTART: off CMCRUSHD: on CORE: cmc [...]
6.2. Commonly used settings
There are numerous settings in omd config. The most important are:
7. Copying and renaming instances
It is sometimes useful to create a copy of an instance, for testing purposes or for the preparation of an update. Of course one could simply copy the /omd/sites/alt directory to /omd/sites/neu. That will however not work because:
To simplify the copying of an instance, there is the omd cp command, which takes all of these factors into consideration. Its use is very simple. As argument simply enter the name of the existing site followed by the name of the new one. For example:
root@linux# omd cp alt neu
The copy can only work if:
7.1. Limiting data volume
If a large number of hosts are being monitored, the volume of data to be copied can be quite substantial. The greater part of this is the performance data which is stored in RRD-files. But the log files containing historic events can also produce larger data volumes. If the history is not required (for example, if only testing is being performed), these can be omitted from the copy. In such cases the following options can be added to omd cp:
The order of the options is important:
root@linux# omd cp --no-rrds alt neu
7.2. Renaming instances
Renaming an instance is performed with the omd mv command. This functions similarly to the copy command and has the same prerequisites. The options to restrict the data volume are not available since the data is only being moved to another directory and is not being duplicated. For example:
root@linux# omd mv alt neu
7.3. Further options for cp and mv
Both operations will create new Linux-users in exactly the same way as create does, thus some of the options for omd create are also available for use:
8. Showing changes with omd diff
When creating a new Check_MK-instance the omd create command populates the etc directory with numerous predefined configuration files. A number of directories will also be created under var and local.
Now it is probably the case that in the course of time a number of the files will have been customised. When after a time you wish to determine which files are no longer in the condition as originally supplied, the omd diff command can provide the answer. Amongst other things, this is useful before an update of Check_MK, since your changes could conflict with changes in the default files.
In a request without additional arguments, all changed files will be listed:
OMD[mysite]:~$ omd diff * Deleted var/log/nagios.log * Changed content var/check_mk/wato/auth/auth.php * Changed content etc/htpasswd ! Changed permissions etc/htpasswd * Changed content etc/diskspace.conf * Changed content etc/auth.secret * Changed content etc/apache/apache.conf
You can also enter a query for a specific directory:
OMD[mysite]:~$ omd diff etc/apache * Changed content etc/apache/apache.conf
If you wish to see the changes in detail, simply enter the complete file name:
OMD[mysite]:~$ omd diff etc/apache/apache.conf --- /dev/fd/63 2017-01-24 09:14:46.248968199 +0100 +++ /omd/sites/mysite/etc/apache/apache.conf 2017-01-24 09:12:37.705355164 +0100 @@ -66,8 +66,8 @@ StartServers 1 MinSpareServers 1 MaxSpareServers 5 -ServerLimit 128 -MaxClients 128 +ServerLimit 64 +MaxClients 64 MaxRequestsPerChild 4000 ###############################################################################
9. Backing-up and restoring instances
9.1. Backing-up instances with omd backup
The site management in Check_MK has a built-in mechanism for backing up and restoring Check_MK-instances. The omd backup and omd restore commands are the basics for packing all of an instance's data into a tar archive, and respectively, extracting that data for a restore.
From Version 1.4.0 Check_MK additionally uses the Backup WATO-module which makes a backup and restore possible without the command line, and which also enables the setting-up of regular backup jobs.
Backing up an instance with omd backup does not require root-permissions. An instance user can perform this. Simply enter as an argument the name for the backup file to be created:
OMD[mysite]:~$ omd backup /tmp/mysite.tar.gz
Please note however:
If the backup's target directory is not writable for an instance user, the backup can otherwise be performed as a root-user. In this case an additional argument is always required specifying the name of the instance to be backed up:
root@linux# omd backup mysite /var/backups/mysite.tar.gz
The backup contains all of the instance's data – except for the volatile data under tmp/. With the tar tzf command one can easily have a look at the file's contents:
OMD[mysite]:~$ tar tvzf /tmp/mysite.tar.gz | less lrwxrwxrwx mysite/mysite 0 2017-01-24 09:02 mysite/version -> ../../versions/2017.01.16.cee drwxr-xr-x mysite/mysite 0 2017-01-24 09:12 mysite/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/styles/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/scripts/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/templates/ drwxr-xr-x mysite/mysite 0 2017-01-24 09:02 mysite/local/share/nagvis/htdocs/userfiles/gadgets/
9.2. Backup without history
The lion's share of an instance's data is the performance data retained in the RRDs. The monitoring history can also be very large. If neither of these are absolutely required, with the following options the history data can be omitted, thus making the backup smaller and faster running. The options must be coded after the word ‘backup’:
OMD[mysite]:~$ omd backup -N /tmp/mysite.tar.gz
9.3. Backing up a running instance
A backup does not require the instance to be stopped, and therefore can be executed while the system is running. In order to ensure a consistent condition of the RRDs used for recording the performance data, the omd backup command automatically alters the Round-Robin-Cache to a mode with which the running updates are written only to the journal, and no longer to the RRDs. The journal files are the last to be backed up – thus it can be achieved that as much as possible of the performance data that has been generated during the backup is also included in the backup.
The restoring of a backup is as simple as the backup itself. The omd restore command restores an instance from a backup. This is even possible for a user. The instance must be stopped for this procedure. The instance will not be newly-generated (which would require root-permissions), rather it will be completely emptied and then refilled:
OMD[mysite]:~$ omd stop OMD[mysite]:~$ omd restore /tmp/mysite.tar.gz
Following the restore the instance can be started:
OMD[mysite]:~$ omd start
A restore can also be performed by a root-user. If an instance with the same name already exists, this must first be deleted. This can be performed either with an omd rm, or by simply including the --reuse option. A --kill additionally ensures that the existing instance is first stopped. It is not necessary to use the instance's name with the restore, since this is contained in the backup:
root@linux# omd restore --reuse --kill /var/backup/mysite.tar.gz root@linux# omd start mysite
When operating as root, you can restore the instance with a different name from that in the backup. Include the desired alternative name as an argument following the restore command:
root@linux# omd restore mysite2 /var/backup/mysite.tar.gz Restoring site mysite2 from /tmp/mysite.tar.gz... * Converted ./.modulebuildrc * Converted ./.profile * Converted .pip/pip.conf * Converted etc/logrotate.conf
The long list of conversions found here has the same function as for the renaming of instances described earlier: The instance's name is included in numerous configuration files, and with this these occurrences will be replaced automatically by the new name.
9.5. Live migration of instances with backup & restore
The omd backup and omd restore commands can – in the good old Unix tradition – instead of files, also work with the standard input/output. Instead of a data path for the tar file, simply enter a hyphen (-).
In this way a pipe can be constructed and the data ‘streamed’ directly to another computer without requiring intermediate files. The larger the backup, the more advantageous this will be since no temporary space in the backed up server's file system will be needed.
The following command backs up an instance to another computer using SSH:
root@linux# omd backup mysite - | ssh user@otherserver "cat > /var/backup/mysite.tar.gz"
If you want to reverse the SSH-access – by which you prefer to log in TO the Check_MK-instance FROM the backup server – that is also possible, as shown in the following example. For this, first an SSH-Login as an instance user must be permitted:
root@otherserver# ssh mysite@checkmkserver "omd backup -" > /var/backup/mysite.tar.gz
If you are clever, and combine the above with an omd restore which reads the data from the standard input, you can copy a complete, running instance from one server to another – and without needing any additional space for a backup file:
root@otherserver# ssh mysite@checkmkserver "omd backup -" | omd restore -
And now, the same procedure with a reversed SSH-access – but this time from the source system to the target system:
root@linux# omd backup mysite - | ssh root@otherserver "omd restore -"