As we mentioned in our article, DevOps is about gaining visibility into key metrics, communicating that information to the stakeholders, and reacting to that information automation where and when possible.
The first step in this process is gaining visibility, and for that there are a wide array of monitoring tools available, both open source and proprietary. Below is a list of DevOps monitoring tools we use in our DevOps consulting practice to check out in this landscape. Please note, in the interest of fairness, this list is in alphabetical order:
CopperEgg is a SaaS based cloud monitoring service for public and private clouds. CopperEgg services include monitoring of
Real User Experience
Its products seamlessly integrate with cloud providers, including Amazon EC2 and Rackspace. The distinguishing feature of CopperEgg is its cloud-friendly pricing with paying by the instance hour. Additionally, they have excellent process-level metrics and instance-sizing recommendations.
Data Dog, Cloud Monitoring as a Service, is a monitoring service for
Data Dog is compatible with most existing softwares, platforms and databases. It has a very social approach of handling events and data. It seamlessly integrates metrics and events across the entire devops stack thereby providing a single view for both on-premise and cloud deployments. And, they provide a team dashboard to view and coordinate over these events.
Ganglia is an open source, distributed and highly-scalable cluster monitoring tool based on a hierarchical model that supports a federation of clusters. It allows monitoring of memory, disk, CPU usage and other aspects of vital cluster health, and makes that information available for offline analysis. Ganglia has integrations with many existing open source solutions like Hadoop and Logstash. The Ganglia monitoring system consists of 3 main packages:
Ganglia monitor, named gmond, runs on every node that’s being monitored.
Ganglia meta data monitor, named gmetad, is a daemon that polls and collects metrics from gmond on remote machines.
Ganglia web frontend, named ganglia-web, resides on the same machine as gmetad and accesses the RRD files.
LogStash is an OpenSource log management solution. It is generically a large system that encompasses the following activities:
Search of logs.
LogStash works with ElasticSearch to provide search and storage of the logs and uses Kibana to provide a dashboard for viewing the events.
Nagios is an Open Source infrastructure monitoring and alerting tool developed by Ethan Galstad, et al. It offers monitoring of computer systems, network, infrastructure, servers, switches and applications. Nagios is one of the most common solutions in monitoring uptime in traditional uptime. Nagios supports three agents:
- NRPE: Nagios Remote Plugin Executor monitors remote systems using scripts hosted on the remote systems.
- NRDP: Nagios Remote Data Processor offers a flexible and customized data transport mechanism
- NSClient++: monitors Windows systems.
New Relic is a SaaS solution developed for monitoring real-time web and mobile applications. It runs on the cloud, on-premise and on hybrid environments. New Relic offers a dashboard to deploy and use more than 50 plugins from various technology partners including PaaS services, caching, DBs, Web Servers and Queuing. It’s offered in the following flavors:
- APM: Application Performance Management
- Mobile: For native mobile applications
- Insights: Big Data Analytics for business decision making
- Servers: Monitors cloud and data centers
- Browser: Collect real-time data from end users through browsers
- Platform: Monitor entire stacks.
Stack Driver provides full-stack management for public and private cloud resources. They aim to minimize alert fatigue to as near to zero as possible. They automatically classify resources into functional clusters and compare nodes against their peers. In addition, they use trends to provide alerts in a more timely manner than thresholds.
Splunk, an American multi-national corporation, provides software for searching, monitoring and analyzing data. Its log management product is considered to be an industry leader. Splunk provides both hosted and on-premise features. Splunk’s log management serves to lower risks, improve security and reduce operational complexity.
Sumo logic is a cloud-based log management service that features an elastic petabyte platform to handle and analyze large enterprise log data efficiently. It is built on a globally distributed data retention architecture, thereby making the data available for instant analysis, as well as reducing backup and storage costs. They use a global view to correlate various metrics and provide additional insights and alerts.
Zabbix, an Open Source monitoring tool for network services, servers and network hardware was developed by Zabbix SIA. It’s fairly recent and efficiently handles autoscaled instances in the cloud. Zabbix modules include:
It’s backend is written in C and frontend in PHP. Zabbix’s monitoring options include:
Simple Checks - verifying availability of standard services (SMTP, HTTP) without any installation
Zabbix Agent - installed on a Unix or Windows machine
Monitoring via SNMP, TCP and ICMP checks using custom parameters
Which monitoring tool do you use and why? We'd like to know your favorite monitoring tool. Drop a comment in the section below.