1200 Check_MK installations in a single night2012, Februar
Das zentrale Dashboard
Installations with more than 1200 independent Nagios-Servers are certainly not an everyday event. As a rule the management of such environments is more than a fulltime job.
Auch die weiteren Zahlen dieser Nagios-Projektes sind beeindruckend:
EDEKA Minden-Hannover in brief
EDEKA Minden-Hannover with a turnover of 6.9 billion Euros in 2011, 32,000 employees and around 1,600 stores is the largest of seven EDEKA regional businesses in Germany. The regional market reaches from The Netherland's to Poland's borders. It covers a part of Eastern Westphalia, most of Lower Saxony, and likewise the federal states of Bremen, Sachsen-Anhalt, Berlin and Brandenburg.
The starting position
EDEKA Minden-Hannover required a cost-efficient, flexible and modern solution for monitoring its stores and a central system. The open source software Nagios was quickly being discussed. As in many other businesses Nagios was initially installed by hand and extended bit by bit with add-ons like NSCA, NagVis etc. It had become clear that such a classically-installed Nagios had soon reached its limits.
A problem point was the heavy network load due to the many single Checks. Furthermore it turned out to be that the graphical configuration tools as well such as NConf were at the limits of their capacity to manage. In time the performance bottlenecks were also making themselves recognisable on the central Nagios-Server.
The first contact with Mathias Kettner GmbH came through the Nagios add-on Check_MK. Together over a three day workshop ideas and concepts for a modern, agile monitoring were developed.
The main requirements
With this project the challenge was not only the large scale, but also the special requirements that come from the retail environment.
The implemented solution utilises the current 1.1.13 version of Check_MK.
In every store there is a Nagios installation based on Open Monitoring Distribution (OMD) and Check_MK. Using OMD ensures a simple and standardised installation of the monitoring server. In this way the distribution of the installation and configuration can be applied to existing tools.
The Nagios-Instance in a store is responsible for all local systems and for monitoring the central services from the store's point of view. Furthermore, based on the collected data a dashboard can be created to display the local status.
Check_MK retrieves data using its own agent that does not need to be configured. The rollout on the large scale of the Systems to be monitored is possible without problems.
With the inventory function of Check_MK it is automatically determined what can be monitored on a system. The thresholds are configured via flexible rules on the central server.
The store's local network regularly and automatically scans for new components. At the same time the standard tool nmap is activated. When a new system is located, with the help of Check_MK's automatic inventory the new system's services to be monitored are detected. This process takes place fully automatically without manual intervention. In this way the effort for managing the monitorings in the stores is minimised.
Detail of the store aggregation
The statuses of all of a store's systems are combined into an overall status. This aggregation is achieved with the aid of Check_MK Business Intelligence (BI). In the aggregation a common formulated system of rules can be applied to a store's existing systems. This rule system consists of 26 different rules. In this way the large effort required for an explicit configuration can be avoided.
From the central instance the total status for every store is requested. In oder to achieve this the JSON-based webservice of Check_MK is utilised. The central instance requests the status of the store's aggregation every minute. This way it is ensured that the information in the central location represents the current status in the store.
In order for the environment to be clearly displayed a dashboard is created in Check_MK Multisite, that as an overview page is shown continuously on two 55-inch TFT Monitors. On this dashboard are found individual views (Dashlets), that among other things show stores with connection problems, and show host or service problems in their own lists.
In the framework of the project NagVis is extended to the Geomap functionality. With this, NagVis with the the GPS-coordinates of all locations and freely accessible map material from the Openstreetmap project, a map is created on which all of the locations are shown.
The new Geomap function from NagVis
The central Nagios system was conceived and assembled in the course of a three day workshop. Following an intoductuctory phase and exchange of information the basic installation of the central Nagios instance was completed in one day thanks to OMD. During this step a number of new Check_MK Checks have been developed (for example the monitoring of Bintec routers) that have been incorporated into the official version of Check_MK.
Following a few week's conception, and development of the connections to the stores, with a further four-day on site appointment during which the big rollout to all stores was prepared on the first two days, and started around the end of the second day. Within six hours a total of 1200 Nagios systems were installed. That's three new installations per minute!
After the installation in the stores, on the third day all systems were connected to the central instance. In the same action the Geomap with the information from all stores was put into service.
With little effort and working together as partners an appropriate solution for EDEKA was developed and successfully implemented. The complete project was developed on a licence cost free software basis. Simultaneously knowledge was imparted and the customer's know-how in the area of monitoring was enhanced.