Designing Distributed Systems, 2nd Edition

Book description

Every distributed system strives for reliability, performance, and quality, but building such a system is hard. Establishing a set of design patterns enables software developers and system architects to use a common language to describe their systems and learn from the patterns and practices developed by others.

The popularity of containers and Kubernetes paves the way for core distributed system patterns and reusable containerized components. This practical guide presents a collection of repeatable, generic patterns to help guide the systems you build using common patterns and practices drawn from some of the highest performing distributed systems in use today. These common patterns make the systems you build far more approachable and efficient, even if you've never built a distributed system before.

Author Brendan Burns demonstrates how you can adapt existing software design patterns for designing and building reliable distributed applications. Systems engineers and application developers will learn how these long-established patterns provide a common language and framework for dramatically increasing the quality of your system.

This fully updated second edition includes new chapters on AI inference, AI training, and building robust systems for the real world.

  • Understand how patterns and reusable components enable the rapid development of reliable distributed systems
  • Use the sidecar, adapter, and ambassador patterns to split your application into a group of containers on a single machine
  • Explore loosely coupled multinode distributed patterns for replication, scaling, and communication between components
  • Learn distributed system patterns for large-scale batch data processing covering work queues, event-based processing, and coordinated workflows

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Who Should Read This Book
    2. Why I Wrote This Book
    3. The World of Distributed Systems Today
    4. Navigating This Book
    5. Conventions Used in This Book
    6. Online Resources
    7. Using Code Examples
    8. O’Reilly Online Learning
    9. How to Contact Us
    10. Acknowledgments
  2. I. Foundational Concepts
  3. 1. Introduction
    1. A Brief History of Systems Development
    2. A Brief History of Patterns in Software Development
      1. Formalization of Algorithmic Programming
      2. Patterns for Object-Oriented Programming
      3. The Rise of Open Source Software
    3. The Value of Patterns, Practices, and Components
      1. Standing on the Shoulders of Giants
      2. A Shared Language for Discussing Our Practice
      3. Shared Components for Easy Reuse
    4. Summary
  4. 2. Important Distributed System Concepts
    1. APIs and RPCs
    2. Latency
    3. Reliability
    4. Percentiles
    5. Idempotency
    6. Delivery Semantics
    7. Relational Integrity
    8. Data Consistency
    9. Orchestration and Kubernetes
    10. Health Checks
    11. Summary
  5. II. Single-Node Patterns
  6. 3. The Sidecar Pattern
    1. An Example Sidecar: Adding HTTPS to a Legacy Service
    2. Dynamic Configuration with Sidecars
    3. Modular Application Containers
      1. Hands On: Deploying the topz Container
    4. Building a Simple PaaS with Sidecars
    5. Designing Sidecars for Modularity and Reusability
      1. Parameterized Containers
      2. Define Each Container’s API
      3. Documenting Your Containers
    6. Summary
  7. 4. Ambassadors
    1. Using an Ambassador to Shard a Service
      1. Hands On: Implementing a Sharded Redis
    2. Using an Ambassador for Service Brokering
    3. Using an Ambassador to Do Experimentation or Request Splitting
      1. Hands On: Implementing 10% Experiments
    4. Summary
  8. 5. Adapters
    1. Monitoring
      1. Hands On: Using Prometheus for Monitoring
    2. Logging
      1. Hands On: Normalizing Different Logging Formats with fluentd
    3. Adding a Health Monitor
      1. Hands On: Adding Rich Health Monitoring for MySQL
    4. Summary
  9. III. Serving Patterns
  10. 6. Replicated Load-Balanced Services
    1. Stateless Services
      1. Readiness Probes for Load Balancing
      2. Hands On: Creating a Replicated Service in Kubernetes
    2. Session Tracked Services
    3. Application-Layer Replicated Services
    4. Introducing a Caching Layer
      1. Deploying Your Cache
      2. Hands On: Deploying the Caching Layer
    5. Expanding the Caching Layer
      1. Rate Limiting and Denial-of-Service Defense
      2. SSL Termination
      3. Hands On: Deploying nginx and SSL Termination
    6. Summary
  11. 7. Sharded Services
    1. Sharded Caching
      1. Why You Might Need a Sharded Cache
      2. The Role of the Cache in System Performance
      3. Replicated Sharded Caches
      4. Hands On: Deploying an Ambassador and Memcache for a Sharded Cache
    2. An Examination of Sharding Functions
      1. Selecting a Key
      2. Consistent Hashing Functions
      3. Hands On: Building a Consistent HTTP Sharding Proxy
    3. Sharded Replicated Serving
    4. Hot Sharding Systems
    5. Summary
  12. 8. Scatter/Gather
    1. Scatter/Gather with Root Distribution
      1. Hands On: Distributed Document Search
    2. Scatter/Gather with Leaf Sharding
      1. Hands On: Sharded Document Search
      2. Choosing the Right Number of Leaves
    3. Scaling Scatter/Gather for Reliability and Scale
    4. Summary
  13. 9. Functions and Event-Driven Processing
    1. Determining When FaaS Makes Sense
      1. The Benefits of FaaS
      2. The Challenges of FaaS
      3. The Need for Background Processing
      4. The Need to Hold Data in Memory
      5. The Costs of Sustained Request-Based Processing
    2. Patterns for FaaS
      1. The Decorator Pattern: Request or Response Transformation
      2. Hands On: Adding Request Defaulting Prior to Request Processing
      3. Handling Events
      4. Hands On: Implementing Two-Factor Authentication
      5. Event-Based Pipelines
      6. Hands On: Implementing a Pipeline for New User Signup
    3. Summary
  14. 10. Ownership Election
    1. Determining If You Even Need Leader Election
    2. The Basics of Leader Election
      1. Hands On: Deploying etcd
      2. Implementing Locks
      3. Hands On: Implementing Locks in etcd
      4. Implementing Ownership
      5. Hands On: Implementing Leases in etcd
    3. Handling Concurrent Data Manipulation
    4. Summary
  15. IV. Batch Computational Patterns
  16. 11. Work Queue Systems
    1. A Generic Work Queue System
      1. The Source Container Interface
      2. Work Queue API
      3. The Worker Container Interface
      4. The Shared Work Queue Infrastructure
    2. Hands On: Implementing a Video Thumbnailer
    3. Dynamic Scaling of the Workers
    4. The Multiworker Pattern
    5. Summary
  17. 12. Event-Driven Batch Processing
    1. Patterns of Event-Driven Processing
      1. Copier
      2. Filter
      3. Splitter
      4. Sharder
      5. Merger
    2. Hands On: Building an Event-Driven Flow for New User Signup
    3. Publisher/Subscriber Infrastructure
    4. Hands On: Deploying Kafka
    5. Resiliency and Performance in Work Queues
      1. Work Stealing
      2. Errors, Priority, and Retry
    6. Summary
  18. 13. Coordinated Batch Processing
    1. Join (or Barrier Synchronization)
    2. Reduce
      1. Hands On: Count
      2. Sum
      3. Histogram
      4. Hands On: An Image Tagging and Processing Pipeline
    3. Summary
  19. V. Universal Concepts
  20. 14. Monitoring and Observability Patterns
    1. Monitoring and Observability Basics
      1. Logging
      2. Metrics
      3. Basic Request Monitoring
      4. Advanced Request Monitoring
      5. Alerting
      6. Tracing
    2. Aggregating Information
    3. Summary
  21. 15. AI Inference and Serving
    1. The Basics of AI Systems
    2. Hosting a Model
    3. Distributing a Model
    4. Development with Models
    5. Retrieval-Augmented Generation
    6. Testing and Deployment
    7. Summary
  22. 16. Common Failure Patterns
    1. The Thundering Herd
    2. The Absence of Errors Is an Error
    3. “Client” and “Expected” Errors
    4. Versioning Errors
    5. The Myth of Optional Components
    6. Oops, We “Cleaned Up” Everything
    7. Challenges with the Breadth of Inputs
    8. Processing Obsolete Work
    9. The “Second System” Problem
    10. Summary
  23. Conclusion: A New Beginning?
  24. Index
  25. About the Author

Product information

  • Title: Designing Distributed Systems, 2nd Edition
  • Author(s): Brendan Burns
  • Release date: December 2024
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098156350