Cloud Native Observability

Book description

Cloud native technologies allow you to build scalable, resilient, and novel software architectures with idiomatic backend systems. Cloud native observability, on the other hand, measures how well you understand the total state of your system, with all of the complexities of highly interlinked, flexible, and scalable components running in containers on a microservices architecture in the cloud.

With this insightful guide, authors Kenichi Shibata, Rob Skillington, and Martin Mao take you through the differences between traditional and cloud native system observability. SREs, cloud native engineers, CIOs, and CTOs will learn that while many principles of cloud native and traditional systems are similar, highly scalable and dynamic cloud native systems present unique challenges to overcome.

In four succinct chapters, this report helps you explore:

  • Cloud native's impact on observability: Learn how interlinked, highly flexible, and dynamic cloud native systems present new observability challenges.
  • Challenges of cloud native in the real world: Understand performance on growing observability data.
  • Observability data growth and complexity: Learn the impact of uncontrolled data growth and weigh practical mitigation strategies.
  • Implementations of open source cloud native telemetry standards: Explore the rise of de facto standards like Prometheus, OpenTelemetry, and more.

Kenichi Shibata is a cloud native architect at esure.

Rob Skillington is the cofounder and CTO of Chronosphere.

Martin Mao is the cofounder and CEO of Chronosphere.

Table of contents

  1. 1. The Cloud Native Impact on Observability
    1. Challenges of Cloud Native Observability
    2. Deep Dive into Observability Data
      1. Observability Data Is Growing in Scale
      2. Understanding Cardinality and Dimensionality
      3. Cloud Native Systems Are Flexible and Ephemeral
    3. The Goldilocks Zone of Cloud Native Observability
      1. Cloud Native Environments Emit Exponentially More Data Than Traditional Environments
      2. Delivering Reduced Business Outcomes
      3. Observability Practitioners Lose Focus
      4. Increasing Cost of Observability Data
    4. The Cloud Native Impact
      1. Slower Troubleshooting
      2. Tools Become Unreliable
      3. Use Context to Troubleshoot Faster
    5. The Three Phases of Observability: An Outcome-Focused Approach
    6. Remediating at Any Phase, with Any Signal
    7. Conclusion
  2. 2. Cloud Native Challenges in the Real World
    1. Impact of Uncontrolled Data Growth on System Performance
    2. Controlling Cost
    3. Case Study 1: Improving Performance While Gaining Huge Cost Savings
      1. The Challenge
      2. Approach
    4. Impact of Uncontrolled Data Growth on Observability Reliability
    5. Poor Developer Experience Caused by Poor Observability Data
    6. Case Study 2: Increased Observability Reliability and Improved Developer Experience
      1. The Challenge
      2. Approach
    7. Making Way for Fast-Paced Innovation
    8. Regulatory Requirements
    9. Case Study 3: Navigating Observability Challenges in Balancing Rapid Fintech Growth and SLA Compliance
      1. The Challenge
      2. Approach
    10. Conclusion
  3. 3. Strategies for Controlling Observability Data Growth and Complexity
    1. Emerging Solution Using a Repeatable Framework
    2. Using FinOps as an Inspiration
    3. Observability Data Optimization Cycle
    4. Step 0: Centralized Governance
      1. Autonomy and Allocations to Increase Responsibility and Improve Responsiveness
      2. Usable Capacity by Allocation to Optimize Use Cases
      3. Using Observability Team as Consultants Instead of as Bottlenecks
    5. Framework Components
    6. Step 1: Analyze
      1. Traffic Analysis
      2. Usage Analysis
      3. Combining Traffic and Usage Analysis to Make Decisions
      4. Output of Analyze Step
    7. Step 2: Refine
      1. Dropping
      2. Retention
      3. Resolution
      4. Downsampling
      5. Aggregation
      6. Output of Refine Step
    8. Step 3: Operate
      1. Expanding Visibility and Coverage
      2. Freeing Up More of the Observability Team’s Time to Tackle Strategic Projects
    9. Conclusion
  4. 4. Open Source Telemetry Standards: Prometheus, OpenTelemetry, and Beyond
    1. Instrumentation Before Prometheus and OTel
      1. Data Collection Is Controlled by Users
    2. Prometheus
      1. Interoperability Between Different Observability Tools
      2. Standardization to Prometheus
      3. Prometheus Reliability
      4. Prometheus: The Good
      5. Prometheus: The Not-So-Good
    3. OpenTelemetry
      1. What Is OTel?
      2. The OTel Specification
      3. OTel: The Promise
      4. OTel: The Reality
      5. Where to Start with OTel
      6. Implications of OTel’s Approach
    4. Fluent Bit
  5. Conclusion
  6. About the Authors

Product information

  • Title: Cloud Native Observability
  • Author(s): Kenichi Shibata, Rob Skillington, Martin Mao
  • Release date: February 2024
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098158941