Book description
Monitoring is an essential part of a modern production system. If you can’t monitor a service, you don’t know what’s happening, and if you’re blind to what’s happening, your service can’t be reliable. In this excerpt from O’Reilly’s book Site Reliability Engineering, you’ll learn how and what to monitor, using implementation-agnostic best practices.
Author Rob Ewaschuk explains basic principles and best practices that he and other members of Google’s Site Reliability Engineering (SRE) teams use for building successful monitoring and alerting systems. You’ll learn guidelines for determining which issues are serious enough to involve human intervention, and how to deal with issues that aren’t.
Complete with case studies describing monitoring efforts with Bigtable and Gmail, this article helps you ask the right questions—regardless of your organization’s size or the complexity of your service or system.
About the author:
Rob Ewaschuk is a Staff Software Engineer at Google. He has a strong working background in high-availability, low-latency, many-petabyte globally distributed data storage and serving systems.
About Site Reliability Engineering:
This book is a collection of essays and articles written by key members of Google’s Site Reliability Teams (SRT). You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons you can apply directly to your organization.
Table of contents
-
Monitoring Distributed Systems
- Definitions
- Why Monitor?
- Setting Reasonable Expectations for Monitoring
- Symptoms Versus Causes
- Black-Box Versus White-Box
- The Four Golden Signals
- Worrying About Your Tail (or, Instrumentation and Performance)
- Choosing an Appropriate Resolution for Measurements
- As Simple as Possible, No Simpler
- Tying These Principles Together
- Monitoring for the Long Term
- Conclusion
Product information
- Title: Monitoring Distributed Systems
- Author(s):
- Release date: August 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491965245
You might also like
book
Monitoring with Ganglia
Written by Ganglia designers and maintainers, this book shows you how to collect and visualize metrics …
book
Monitoring Taxonomy
Choosing a monitoring tool can be a tedious exercise. Perhaps you need to inspect sFlow traffic. …
article
Managing Encryption Keys
This collection of shortcuts provides a practical and concise guide to securing cloud environments. It covers …
book
Learning Nagios - Third Edition
Learn and monitor your entire IT infrastructure to ensure your systems, applications, services, and business function …