Managing Cloud Native Data on Kubernetes

Book description

Is Kubernetes ready for stateful workloads? This open source system has become the primary platform for deploying and managing cloud native applications. But because it was originally designed for stateless workloads, working with data on Kubernetes has been challenging. If you want to avoid the inefficiencies and duplicative costs of having separate infrastructure for applications and data, this practical guide can help.

Using Kubernetes as your platform, you'll learn open source technologies that are designed and built for the cloud. Authors Jeff Carpenter and Patrick McFadin provide case studies to help you explore new use cases and avoid the pitfalls others have faced. You'll get an insider's view of what's coming from innovators who are creating next-generation architectures and infrastructure.

With this book, you will:

  • Learn how to use basic Kubernetes resources to compose data infrastructure
  • Automate the deployment and operations of data infrastructure on Kubernetes using tools like Helm and operators
  • Evaluate and select data infrastructure technologies for use in your applications
  • Integrate data infrastructure technologies into your overall stack
  • Explore emerging technologies that will enhance your Kubernetes-based applications in the future

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Why We Wrote This Book
    2. Who Is This Book For?
    3. How to Read This Book
    4. Conventions Used in This Book
    5. Using Code Examples
    6. O’Reilly Online Learning
    7. How to Contact Us
    8. Acknowledgments
  3. 1. Introduction to Cloud Native Data Infrastructure: Persistence, Streaming, and Batch Analytics
    1. Infrastructure Types
    2. What Is Cloud Native Data?
    3. More Infrastructure, More Problems
    4. Kubernetes Leading the Way
      1. Managing Compute on Kubernetes
      2. Managing Network on Kubernetes
      3. Managing Storage on Kubernetes
    5. Cloud Native Data Components
    6. Looking Forward
    7. Getting Ready for the Revolution
      1. Adopt an SRE Mindset
      2. Embrace Distributed Computing
      3. Principles of Cloud Native Data Infrastructure
    8. Summary
  4. 2. Managing Data Storage on Kubernetes
    1. Docker, Containers, and State
      1. Managing State in Docker
      2. Bind Mounts
      3. Volumes
      4. Tmpfs Mounts
      5. Volume Drivers
    2. Kubernetes Resources for Data Storage
      1. Pods and Volumes
      2. PersistentVolumes
      3. PersistentVolumeClaims
      4. StorageClasses
    3. Kubernetes Storage Architecture
      1. Flexvolume
      2. Container Storage Interface
      3. Container Attached Storage
      4. Container Object Storage Interface
    4. Summary
  5. 3. Databases on Kubernetes the Hard Way
    1. The Hard Way
    2. Prerequisites for Running Data Infrastructure on Kubernetes
    3. Running MySQL on Kubernetes
      1. ReplicaSets
      2. Deployments
      3. Services
      4. Accessing MySQL
    4. Running Apache Cassandra on Kubernetes
      1. StatefulSets
      2. Accessing Cassandra
    5. Summary
  6. 4. Automating Database Deployment on Kubernetes with Helm
    1. Deploying Applications with Helm Charts
    2. Using Helm to Deploy MySQL
      1. How Helm Works
      2. Labels
      3. ServiceAccounts
      4. Secrets
      5. ConfigMaps
      6. Updating Helm Charts
      7. Uninstalling Helm Charts
    3. Using Helm to Deploy Apache Cassandra
      1. Affinity and Anti-Affinity
    4. Helm, CI/CD, and Operations
    5. Summary
  7. 5. Automating Database Management on Kubernetes with Operators
    1. Extending the Kubernetes Control Plane
      1. Extending Kubernetes Clients
      2. Extending Kubernetes Control Plane Components
      3. Extending Kubernetes Worker Node Components
    2. The Operator Pattern
      1. Controllers
      2. Custom Resources
      3. Operators
    3. Managing MySQL in Kubernetes Using the Vitess Operator
      1. Vitess Overview
      2. PlanetScale Vitess Operator
    4. A Growing Ecosystem of Operators
      1. Choosing Operators
      2. Building Operators
    5. Summary
  8. 6. Integrating Data Infrastructure in a Kubernetes Stack
    1. K8ssandra: Production-Ready Cassandra on Kubernetes
      1. K8ssandra Architecture
      2. Installing the K8ssandra Operator
      3. Creating a K8ssandraCluster
    2. Managing Cassandra in Kubernetes with Cass Operator
    3. Enabling Developer Productivity with Stargate APIs
    4. Unified Monitoring Infrastructure with Prometheus and Grafana
    5. Performing Repairs with Cassandra Reaper
    6. Backing Up and Restoring Data with Cassandra Medusa
      1. Creating a Backup
      2. Restoring from Backup
    7. Deploying Multicluster Applications in Kubernetes
    8. Summary
  9. 7. The Kubernetes Native Database
    1. Why a Kubernetes Native Approach Is Needed
    2. Hybrid Data Access at Scale with TiDB
      1. TiDB Architecture
      2. Deploying TiDB in Kubernetes
    3. Serverless Cassandra with DataStax Astra DB
    4. What to Look for in a Kubernetes Native Database
      1. Basic Requirements
      2. The Future of Kubernetes Native
    5. Summary
  10. 8. Streaming Data on Kubernetes
    1. Introduction to Streaming
      1. Types of Delivery
      2. Delivery Guarantees
      3. Feature Scope
    2. The Role of Streaming in Kubernetes
    3. Streaming on Kubernetes with Apache Pulsar
      1. Preparing Your Environment
      2. Securing Communications by Default with cert-manager
      3. Using Helm to Deploy Apache Pulsar
    4. Stream Analytics with Apache Flink
      1. Deploying Apache Flink on Kubernetes
    5. Summary
  11. 9. Data Analytics on Kubernetes
    1. Introduction to Analytics
    2. Deploying Analytic Workloads in Kubernetes
    3. Introduction to Apache Spark
    4. Deploying Apache Spark in Kubernetes
      1. Build Your Custom Container
      2. Submit and Run Your Application
    5. Kubernetes Operator for Apache Spark
    6. Alternative Schedulers for Kubernetes
      1. Apache YuniKorn
      2. Volcano
    7. Analytic Engines for Kubernetes
      1. Dask
      2. Ray
    8. Summary
  12. 10. Machine Learning and Other Emerging Use Cases
    1. The Cloud Native AI/ML Stack
      1. AI/ML Definitions
      2. Defining an AI/ML Stack
      3. Real-Time Model Serving with KServe
      4. Full Lifecycle Feature Management with Feast
      5. Vector Similarity Search with Milvus
    2. Efficient Data Movement with Apache Arrow
    3. Versioned Object Storage with lakeFS
    4. Summary
  13. 11. Migrating Data Workloads to Kubernetes
    1. The Vision: Application-Aware Platforms
    2. Charting Your Path to Success
      1. People
      2. Technology
      3. Process
    3. The Future of Cloud Native Data
    4. Summary
  14. Index
  15. About the Authors

Product information

  • Title: Managing Cloud Native Data on Kubernetes
  • Author(s): Jeff Carpenter, Patrick McFadin
  • Release date: December 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098111397