Cloud Observability in Action

Book description

Don’t fly blind. Observability gives you actionable insights into your cloud native systems—from pinpointing errors, to increasing developer productivity, to tracking compliance.

Observability is the difference between an error message and an error explanation with a recipe how to resolve the error! You know exactly which service is affected, who’s responsible for its repair, and even how it can be optimized in the future. Cloud Observability in Action teaches you how to set up an observability system that learns from a cloud application’s signals, logging, and monitoring, all using free and open source tools.

In Cloud Observability in Action you will learn how to:

  • Apply observability in cloud native systems
  • Understand observability signals, including their costs and benefits
  • Apply good practices around instrumentation and signal collection
  • Deliver dashboarding, alerting, and SLOs/SLIs at scale
  • Choose the correct signal types for given roles or tasks
  • Pick the right observability tool for any given function
  • Communicate the benefits of observability to management

A well-designed observability system provides insight into bugs and performance issues in cloud native applications. They help your development team understand the impact of code changes, measure optimizations, and track user experience. Best of all, observability can even automate your error handling so that machine users apply their own fixes—no more 3AM calls for emergency outages.

About the Technology
Cloud native systems are made up of hundreds of moving parts. When something goes wrong, it’s not enough to know there is a problem—you need to know where it is, what it is, and how to fix it. This book takes you beyond traditional monitoring, explaining observability systems that turn application telemetry into actionable insights.

About the Book
Cloud Observability in Action gives you the background and techniques you need to successfully introduce observability into cloud-based serverless and Kubernetes environments. In it, you’ll learn to use open standards and tools like OpenTelemetry, Prometheus, and Grafana to build your own observability system and end reliance on proprietary software. You’ll discover insights from different telemetry signals, including logs, metrics, traces, and profiles. Plus, the book’s rigorous cost-benefit analysis ensures you’re getting a real return on your observability investment.

What's Inside
  • Observability in and of cloud native systems
  • Dashboarding, alerting, and SLOs/SLIs at scale
  • Signal types for any role or task
  • State-of-the-art open source observability tools


About the Reader
For application developers, platform owners, DevOps, and SREs.

About the Author
Michael Hausenblas is a Product Owner in the AWS Open Source Observability team.

Quotes
Incredible! This book gets you up to speed, and sets you up for the future. I was especially impressed with the depth and detail for continuous profiling, whose impact is only starting to be understood.
- Frederic Branczyk, Polar Signals

Pairs a wealth of knowledge on cloud native environments with best practices and insights into observability. A must-read for cloud engineers.
- Daniel Gomez Blanco, Author of Practical OpenTelemetry

Take the right steps to protect your precious infrastructure from those pesky outages and incidents.
- Kit Merker, Nobl9

Does a fantastic job distilling the key concepts for cloud observability. An important guide for practitioners.
- Ken Finnigan, Lumigo

Table of contents

  1. inside front cover
  2. Cloud Observability in Action
  3. Copyright
  4. dedication
  5. contents
  6. front matter
    1. preface
    2. acknowledgments
    3. about this book
      1. Who should read this book
      2. How this book is organized
      3. About the code
      4. liveBook discussion forum
      5. Online resources
    4. about the author
    5. about the cover illustration
  7. 1 End-to-end observability
    1. 1.1 What is observability?
    2. 1.2 Observability use cases
    3. 1.3 Roles and goals
    4. 1.4 Example microservices app
    5. 1.5 Challenges and how observability helps
      1. 1.5.1 Return on investment
      2. 1.5.2 Signal correlation
      3. 1.5.3 Portability
    6. Summary
  8. 2 Signal types
    1. 2.1 Reference example
    2. 2.2 Assessing instrumentation costs
    3. 2.3 Logs
      1. 2.3.1 Instrumentation
      2. 2.3.2 Telemetry
      3. 2.3.3 Costs and benefits
      4. 2.3.4 Observability with logs
    4. 2.4 Metrics
      1. 2.4.1 Instrumentation
      2. 2.4.2 Telemetry
      3. 2.4.3 Costs and benefits
      4. 2.4.4 Observability with metrics
    5. 2.5 Traces
      1. 2.5.1 Instrumentation
      2. 2.5.2 Telemetry
      3. 2.5.3 Costs and benefits
      4. 2.5.4 Observability with traces
    6. 2.6 Selecting signals
    7. Summary
  9. 3 Sources
    1. 3.1 Selecting sources
    2. 3.2 Compute-related sources
      1. 3.2.1 Basics
      2. 3.2.2 Containers
      3. 3.2.3 Kubernetes
      4. 3.2.4 Serverless compute
    3. 3.3 Storage-related sources
      1. 3.3.1 Relational databases and NoSQL data stores
      2. 3.3.2 File systems and object stores
    4. 3.4 Network-related sources
      1. 3.4.1 Network interfaces
      2. 3.4.2 Higher-level network sources
    5. 3.5 Your code
      1. 3.5.1 Instrumentation
      2. 3.5.2 Proxy sources
    6. Summary
  10. 4 Agents and instrumentation
    1. 4.1 Log routers
      1. 4.1.1 Fluentd and Fluent Bit
      2. 4.1.2 Other log routers
    2. 4.2 Metrics collection
      1. 4.2.1 Prometheus
      2. 4.2.2 Other metrics agents
    3. 4.3 OpenTelemetry
      1. 4.3.1 Instrumentation
      2. 4.3.2 Collector
    4. 4.4 Other agents
    5. 4.5 Selecting an agent
      1. 4.5.1 Security for and of the agent
      2. 4.5.2 Agent performance and resource usage
      3. 4.5.3 Agent nonfunctional requirements
    6. Summary
  11. 5 Backend destinations
    1. 5.1 Backend destination terminology
    2. 5.2 Backend destinations for logs
      1. 5.2.1 Cloud providers
      2. 5.2.2 Open source log backends
      3. 5.2.3 Commercial offerings for log backends
    3. 5.3 Backend destinations for metrics
      1. 5.3.1 Cloud providers
      2. 5.3.2 Open source metrics backends
      3. 5.3.3 Commercial offerings for metrics backends
    4. 5.4 Backend destinations for traces
      1. 5.4.1 Cloud providers
      2. 5.4.2 Open source traces backends
      3. 5.4.3 Commercial offerings for trace backends
    5. 5.5 Columnar data stores
    6. 5.6 Selecting backend destinations
      1. 5.6.1 Costs
      2. 5.6.2 Open standards
      3. 5.6.3 Back pressure
      4. 5.6.4 Cardinality and queries
    7. Summary
  12. 6 Frontend destinations
    1. 6.1 Frontends
      1. 6.1.1 Grafana
      2. 6.1.2 Kibana and OpenSearch Dashboards
      3. 6.1.3 Other open source frontends
      4. 6.1.4 Cloud providers and commercial frontends
    2. 6.2 All-in-ones
      1. 6.2.1 CNCF Jaeger
      2. 6.2.2 CNCF Pixie
      3. 6.2.3 Zipkin
      4. 6.2.4 Apache SkyWalking
      5. 6.2.5 SigNoz
      6. 6.2.6 Uptrace
      7. 6.2.7 Commercial offerings
    3. 6.3 Selecting frontends and all-in-ones
    4. Summary
  13. 7 Cloud operations
    1. 7.1 Incident management
      1. 7.1.1 Health and performance monitoring
      2. 7.1.2 Handling the incident
      3. 7.1.3 Learning from the incident after the fact
    2. 7.2 Alerting
      1. 7.2.1 Prometheus alerting
      2. 7.2.2 Using Grafana for alerting
      3. 7.2.3 Cloud providers
    3. 7.3 Usage tracking
      1. 7.3.1 Users
      2. 7.3.2 Costs
    4. Summary
  14. 8 Distributed tracing
    1. 8.1 Intro and terminology
      1. 8.1.1 Motivational example
      2. 8.1.2 Terminology
      3. 8.1.3 Use cases
    2. 8.2 Using distributed tracing in a microservices app
      1. 8.2.1 Example app overview
      2. 8.2.2 Implementing the example app
      3. 8.2.3 The “happy path”
      4. 8.2.4 Exploring a failure in the example app
    3. 8.3 Practical considerations
      1. 8.3.1 Sampling
      2. 8.3.2 Observability tax
      3. 8.3.3 Traces vs. metrics vs. logs
    4. Summary
  15. 9 Developer observability
    1. 9.1 Continuous profiling
      1. 9.1.1 The humble beginnings
      2. 9.1.2 Common technologies
      3. 9.1.3 Open source CP tooling
      4. 9.1.4 Commercial continuous profiling offerings
      5. 9.1.5 Using continuous profiling to assess continuous profiling
    2. 9.2 Developer productivity
      1. 9.2.1 Challenges
      2. 9.2.2 Tooling
    3. 9.3 Tooling considerations
      1. 9.3.1 Symbolization
      2. 9.3.2 Storing profiles
      3. 9.3.3 Querying profiles
      4. 9.3.4 Correlation
      5. 9.3.5 Standards
      6. 9.3.6 Using tooling in production
    4. Summary
  16. 10 Service level objectives
    1. 10.1 The fundamentals of SLOs
      1. 10.1.1 Types of services
      2. 10.1.2 Service level indicator
      3. 10.1.3 Service level objective
      4. 10.1.4 Service level agreement
    2. 10.2 Implementing SLOs
      1. 10.2.1 High-level example
      2. 10.2.2 Using Prometheus to implement SLOs
      3. 10.2.3 Commercial SLO offerings
    3. 10.3 Considerations
    4. Summary
  17. 11 Signal correlation
    1. 11.1 Correlation fundamentals
      1. 11.1.1 Correlation with OpenTelemetry
      2. 11.1.2 Correlating traces
      3. 11.1.3 Correlating metrics
      4. 11.1.4 Correlating logs
      5. 11.1.5 Correlating profiles
    2. 11.2 Using Prometheus, Jaeger, and Grafana for correlation
      1. 11.2.1 Metrics–traces correlation example setup
      2. 11.2.2 Using metrics–traces correlation
    3. 11.3 Signal correlation support in commercial offerings
    4. 11.4 Considerations
      1. 11.4.1 Early days
      2. 11.4.2 Signals
      3. 11.4.3 User experience
    5. 11.5 Conclusion
    6. Summary
  18. Appendix. A Kubernetes end-to-end example
    1. A.1 Overview
    2. A.2 Prerequisites
    3. A.3 Demo walk-through
      1. A.3.1 Installing the demo
      2. A.3.2 Using the demo
  19. index
  20. inside back cover

Product information

  • Title: Cloud Observability in Action
  • Author(s): Michael Hausenblas
  • Release date: January 2024
  • Publisher(s): Manning Publications
  • ISBN: 9781633439597