Designing Data-Intensive Applications, 2nd Edition

Book description

Data is at the center of many challenges in system design today. Difficult issues such as scalability, consistency, reliability, efficiency, and maintainability need to be resolved. In addition, there's an overwhelming variety of tools and analytical systems, including relational databases, NoSQL datastores, plus data warehouses and data lakes. What are the right choices for your application? How do you make sense of all these buzzwords?

In this second edition, authors Martin Kleppmann and Chris Riccomini build on the foundation laid in the acclaimed first edition, integrating new technologies and emerging trends. You'll be guided through the maze of decisions and trade-offs involved in building a modern data system, from choosing the right tools like Spark and Flink to understanding the intricacies of data laws like the GDPR.

  • Peer under the hood of the systems you already use, and learn to use them more effectively
  • Make informed decisions by identifying the strengths and weaknesses of different tools
  • Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
  • Understand the distributed systems research upon which modern databases are built
  • Peek behind the scenes of major online services, and learn from their architectures

Publisher resources

View/Submit Errata

Table of contents

  1. Brief Table of Contents (Not Yet Final)
  2. 1. Trade-offs in Data Systems Architecture
    1. Transaction Processing versus Analytics
      1. Characterizing Analytical and Operational Systems
      2. Data Warehousing
      3. Systems of Record and Derived Data
    2. Cloud versus Self-Hosting
      1. Pros and Cons of Cloud Services
      2. Cloud-Native System Architecture
      3. Operations in the Cloud Era
    3. Distributed versus Single-Node Systems
      1. Problems with Distributed Systems
      2. Microservices and Serverless
      3. Cloud Computing versus Supercomputing
    4. Data Systems, Law, and Society
    5. Summary
  3. 2. Defining Nonfunctional Requirements
    1. Case Study: Social Network Home Timelines
      1. Representing Users, Posts, and Follows
      2. Materializing and Updating Timelines
    2. Describing Performance
      1. Latency and Response Time
      2. Average, Median, and Percentiles
      3. Use of Response Time Metrics
    3. Reliability and Fault Tolerance
      1. Fault Tolerance
      2. Hardware and Software Faults
      3. Humans and Reliability
    4. Scalability
      1. Describing Load
      2. Shared-Memory, Shared-Disk, and Shared-Nothing Architecture
      3. Principles for Scalability
    5. Maintainability
      1. Operability: Making Life Easy for Operations
      2. Simplicity: Managing Complexity
      3. Evolvability: Making Change Easy
    6. Summary
  4. 3. Data Models and Query Languages
    1. Relational Model versus Document Model
      1. The Object-Relational Mismatch
      2. Normalization, Denormalization, and Joins
      3. Many-to-One and Many-to-Many Relationships
      4. Stars and Snowflakes: Schemas for Analytics
      5. When to Use Which Model
    2. Graph-Like Data Models
      1. Property Graphs
      2. The Cypher Query Language
      3. Graph Queries in SQL
      4. Triple-Stores and SPARQL
      5. Datalog: Recursive Relational Queries
      6. GraphQL
    3. Event Sourcing and CQRS
    4. Dataframes, Matrices, and Arrays
    5. Summary
  5. 4. Storage and Retrieval
    1. Storage and Indexing for OLTP
      1. Log-Structured Storage
      2. B-Trees
      3. Comparing B-Trees and LSM-Trees
      4. Multi-Column and Secondary Indexes
      5. Keeping everything in memory
    2. Data Storage for Analytics
      1. Cloud Data Warehouses
      2. Column-Oriented Storage
      3. Query Execution: Compilation and Vectorization
      4. Materialized Views and Data Cubes
    3. Multidimensional and Full-Text Indexes
      1. Full-Text Search
      2. Vector Embeddings
    4. Summary
  6. About the Authors

Product information

  • Title: Designing Data-Intensive Applications, 2nd Edition
  • Author(s): Martin Kleppmann, Chris Riccomini
  • Release date: December 2025
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098119065