Book description
Written by Ganglia designers and maintainers, this book shows you how to collect and visualize metrics from clusters, grids, and cloud infrastructures at any scale. Want to track CPU utilization from 50,000 hosts every ten seconds? Ganglia is just the tool you need, once you know how its main components work together. This hands-on book helps experienced system administrators take advantage of Ganglia 3.x.
Learn how to extend the base set of metrics you collect, fetch current values, see aggregate views of metrics, and observe time-series trends in your data. You’ll also examine real-world case studies of Ganglia installs that feature challenging monitoring requirements.
- Determine whether Ganglia is a good fit for your environment
- Learn how Ganglia’s gmond and gmetad daemons build a metric collection overlay
- Plan for scalability early in your Ganglia deployment, with valuable tips and advice
- Take data visualization to a new level with gweb, Ganglia’s web frontend
- Write plugins to extend gmond’s metric-collection capability
- Troubleshoot issues you may encounter with a Ganglia installation
- Integrate Ganglia with the sFlow and Nagios monitoring systems
Contributors include: Robert Alexander, Jeff Buchbinder, Frederiko Costa, Alex Dean, Dave Josephsen, Peter Phaal, and Daniel Pocock. Case study writers include: John Allspaw, Ramon Bastiaans, Adam Compton, Andrew Dibble, and Jonah Horowitz.
Publisher resources
Table of contents
- Preface
- 1. Introducing Ganglia
- 2. Installing and Configuring Ganglia
- 3. Scalability
- 4. The Ganglia Web Interface
-
5. Managing and Extending Metrics
- gmond: Metric Gathering Agent
- Base Metrics
- Extended Metrics
- Extending gmond with Modules
- Extending gmond with gmetric
- How to Choose Between C/C++, Python, and gmetric
- XDR Protocol
- Java and gmetric4j
- Real World: GPU Monitoring with the NVML Module
-
6. Troubleshooting Ganglia
- Overview
- Useful Resources
- Monitoring the Monitoring System
- General Troubleshooting Mechanisms and Tools
- Common Deployment Issues
-
Typical Problems and Troubleshooting Procedures
-
Web Issues
- Blank page appears in the browser
- Browser displays white page with error message
- Cluster view shows uppercase hostname, link doesn’t work
- Host appears in the wrong cluster
- Host appears multiple times in web, different variations of the hostname (or IP address)
- Some hosts appear with shortname instead of FQDN
- One or more hosts don’t appear in the web interface
- Hosts don’t appear/data stale after UDP aggregator restarted
- Dead/retired hosts still appearing in the Web
- Wrong CPU count or other metrics are missing
- Fonts in graphs are too big or too small
- Spikes in graphs
- Custom metrics don’t appear
- Custom metric’s value is truncated
- Gaps appear randomly in the graphs
- Some host is completely missing from the cluster
- gmetad hierarchy and federation; some grids don’t appear on the Web
- gmetad Issues
- rrdcached Issues
- gmond Issues
-
Web Issues
- 7. Ganglia and Nagios
- 8. Ganglia and sFlow
- 9. Ganglia Case Studies
- A. Advanced Metric Configuration and Debugging
- B. Ganglia and Hadoop/HBase
- Index
- About the Authors
- Colophon
- Copyright
Product information
- Title: Monitoring with Ganglia
- Author(s):
- Release date: November 2012
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781449329709
You might also like
book
Monitoring Distributed Systems
Monitoring is an essential part of a modern production system. If you can’t monitor a service, …
book
Monitoring Taxonomy
Choosing a monitoring tool can be a tedious exercise. Perhaps you need to inspect sFlow traffic. …
book
Learning Nagios - Third Edition
Learn and monitor your entire IT infrastructure to ensure your systems, applications, services, and business function …
book
Monitoring with Graphite
Graphite has become one of the most powerful monitoring tools available today, due to its ease …