Machine Learning Production Systems

Book description

Using machine learning for products, services, and critical business processes is quite different from using ML in an academic or research setting—especially for recent ML graduates and those moving from research to a commercial environment. Whether you currently work to create products and services that use ML, or would like to in the future, this practical book gives you a broad view of the entire field.

Authors Robert Crowe, Hannes Hapke, Emily Caveness, and Di Zhu help you identify topics that you can dive into deeper, along with reference materials and tutorials that teach you the details. You'll learn the state of the art of machine learning engineering, including a wide range of topics such as modeling, deployment, and MLOps. You'll learn the basics and advanced aspects to understand the production ML lifecycle.

This book provides four in-depth sections that cover all aspects of machine learning engineering:

  • Data: collecting, labeling, validating, automation, and data preprocessing; data feature engineering and selection; data journey and storage
  • Modeling: high performance modeling; model resource management techniques; model analysis and interoperability; neural architecture search
  • Deployment: model serving patterns and infrastructure for ML models and LLMs; management and delivery; monitoring and logging
  • Productionalizing: ML pipelines; classifying unstructured texts and images; genAI model pipelines

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Who Should Read This Book
    2. Why We Wrote This Book
    3. Navigating This Book
    4. Conventions Used in This Book
    5. Using Code Examples
    6. O’Reilly Online Learning
    7. How to Contact Us
    8. Acknowledgments
      1. Robert
      2. Hannes
      3. Emily
      4. Di
  3. 1. Introduction to Machine Learning Production Systems
    1. What Is Production Machine Learning?
    2. Benefits of Machine Learning Pipelines
      1. Focus on Developing New Models, Not on Maintaining Existing Models
      2. Prevention of Bugs
      3. Creation of Records for Debugging and Reproducing Results
      4. Standardization
      5. The Business Case for ML Pipelines
    3. When to Use Machine Learning Pipelines
    4. Steps in a Machine Learning Pipeline
      1. Data Ingestion and Data Versioning
      2. Data Validation
      3. Feature Engineering
      4. Model Training and Model Tuning
      5. Model Analysis
      6. Model Deployment
    5. Looking Ahead
  4. 2. Collecting, Labeling, and Validating Data
    1. Important Considerations in Data Collection
    2. Responsible Data Collection
    3. Labeling Data: Data Changes and Drift in Production ML
    4. Labeling Data: Direct Labeling and Human Labeling
    5. Validating Data: Detecting Data Issues
    6. Validating Data: TensorFlow Data Validation
      1. Skew Detection with TFDV
      2. Types of Skew
    7. Example: Spotting Imbalanced Datasets with TensorFlow Data Validation
    8. Conclusion
  5. 3. Feature Engineering and Feature Selection
    1. Introduction to Feature Engineering
    2. Preprocessing Operations
    3. Feature Engineering Techniques
      1. Normalizing and Standardizing
      2. Bucketizing
      3. Feature Crosses
      4. Dimensionality and Embeddings
      5. Visualization
    4. Feature Transformation at Scale
      1. Choose a Framework That Scales Well
      2. Avoid Training–Serving Skew
      3. Consider Instance-Level Versus Full-Pass Transformations
    5. Using TensorFlow Transform
      1. Analyzers
      2. Code Example
    6. Feature Selection
      1. Feature Spaces
      2. Feature Selection Overview
      3. Filter Methods
      4. Wrapper Methods
      5. Embedded Methods
      6. Feature and Example Selection for LLMs and GenAI
    7. Example: Using TF Transform to Tokenize Text
      1. Benefits of Using TF Transform
      2. Alternatives to TF Transform
    8. Conclusion
  6. 4. Data Journey and Data Storage
    1. Data Journey
    2. ML Metadata
    3. Using a Schema
      1. Schema Development
      2. Schema Environments
      3. Changes Across Datasets
    4. Enterprise Data Storage
      1. Feature Stores
      2. Data Warehouses
      3. Data Lakes
    5. Conclusion
  7. 5. Advanced Labeling, Augmentation, and Data Preprocessing
    1. Advanced Labeling
      1. Semi-Supervised Labeling
      2. Active Learning
      3. Weak Supervision
      4. Advanced Labeling Review
    2. Data Augmentation
      1. Example: CIFAR-10
      2. Other Augmentation Techniques
      3. Data Augmentation Review
    3. Preprocessing Time Series Data: An Example
      1. Windowing
      2. Sampling
    4. Conclusion
  8. 6. Model Resource Management Techniques
    1. Dimensionality Reduction: Dimensionality Effect on Performance
      1. Example: Word Embedding Using Keras
      2. Curse of Dimensionality
      3. Adding Dimensions Increases Feature Space Volume
      4. Dimensionality Reduction
    2. Quantization and Pruning
      1. Mobile, IoT, Edge, and Similar Use Cases
      2. Quantization
      3. Optimizing Your TensorFlow Model with TF Lite
      4. Optimization Options
      5. Pruning
    3. Knowledge Distillation
      1. Teacher and Student Networks
      2. Knowledge Distillation Techniques
      3. TMKD: Distilling Knowledge for a Q&A Task
      4. Increasing Robustness by Distilling EfficientNets
    4. Conclusion
  9. 7. High-Performance Modeling
    1. Distributed Training
      1. Data Parallelism
    2. Efficient Input Pipelines
      1. Input Pipeline Basics
      2. Input Pipeline Patterns: Improving Efficiency
      3. Optimizing Your Input Pipeline with TensorFlow Data
    3. Training Large Models: The Rise of Giant Neural Nets and Parallelism
      1. Potential Solutions and Their Shortcomings
      2. Pipeline Parallelism to the Rescue?
    4. Conclusion
  10. 8. Model Analysis
    1. Analyzing Model Performance
      1. Black-Box Evaluation
      2. Performance Metrics and Optimization Objectives
    2. Advanced Model Analysis
      1. TensorFlow Model Analysis
      2. The Learning Interpretability Tool
    3. Advanced Model Debugging
      1. Benchmark Models
      2. Sensitivity Analysis
      3. Residual Analysis
    4. Model Remediation
    5. Discrimination Remediation
    6. Fairness
      1. Fairness Evaluation
      2. Fairness Considerations
    7. Continuous Evaluation and Monitoring
    8. Conclusion
  11. 9. Interpretability
    1. Explainable AI
    2. Model Interpretation Methods
      1. Method Categories
      2. Intrinsically Interpretable Models
      3. Model-Agnostic Methods
      4. Local Interpretable Model-Agnostic Explanations
      5. Shapley Values
      6. The SHAP Library
      7. Testing Concept Activation Vectors
      8. AI Explanations
    3. Example: Exploring Model Sensitivity with SHAP
      1. Regression Models
      2. Natural Language Processing Models
    4. Conclusion
  12. 10. Neural Architecture Search
    1. Hyperparameter Tuning
    2. Introduction to AutoML
    3. Key Components of NAS
      1. Search Spaces
      2. Search Strategies
      3. Performance Estimation Strategies
    4. AutoML in the Cloud
      1. Amazon SageMaker Autopilot
      2. Microsoft Azure Automated Machine Learning
      3. Google Cloud AutoML
    5. Using AutoML
    6. Generative AI and AutoML
    7. Conclusion
  13. 11. Introduction to Model Serving
    1. Model Training
    2. Model Prediction
    3. Latency
    4. Throughput
    5. Cost
    6. Resources and Requirements for Serving Models
      1. Cost and Complexity
      2. Accelerators
      3. Feeding the Beast
    7. Model Deployments
      1. Data Center Deployments
      2. Mobile and Distributed Deployments
    8. Model Servers
    9. Managed Services
    10. Conclusion
  14. 12. Model Serving Patterns
    1. Batch Inference
      1. Batch Throughput
      2. Batch Inference Use Cases
      3. ETL for Distributed Batch and Stream Processing Systems
    2. Introduction to Real-Time Inference
      1. Synchronous Delivery of Real-Time Predictions
      2. Asynchronous Delivery of Real-Time Predictions
      3. Optimizing Real-Time Inference
    3. Real-Time Inference Use Cases
    4. Serving Model Ensembles
      1. Ensemble Topologies
      2. Example Ensemble
      3. Ensemble Serving Considerations
      4. Model Routers: Ensembles in GenAI
    5. Data Preprocessing and Postprocessing in Real Time
      1. Training Transformations Versus Serving Transformations
      2. Windowing
      3. Options for Preprocessing
      4. Enter TensorFlow Transform
      5. Postprocessing
    6. Inference at the Edge and at the Browser
      1. Challenges
      2. Model Deployments via Containers
      3. Training on the Device
      4. Federated Learning
      5. Runtime Interoperability
      6. Inference in Web Browsers
    7. Conclusion
  15. 13. Model Serving Infrastructure
    1. Model Servers
      1. TensorFlow Serving
      2. NVIDIA Triton Inference Server
      3. TorchServe
    2. Building Scalable Infrastructure
    3. Containerization
      1. Traditional Deployment Era
      2. Virtualized Deployment Era
      3. Container Deployment Era
      4. The Docker Containerization Framework
      5. Container Orchestration
    4. Reliability and Availability Through Redundancy
      1. Observability
      2. High Availability
      3. Automated Deployments
    5. Hardware Accelerators
      1. GPUs
      2. TPUs
    6. Conclusion
  16. 14. Model Serving Examples
    1. Example: Deploying TensorFlow Models with TensorFlow Serving
      1. Exporting Keras Models for TF Serving
      2. Setting Up TF Serving with Docker
      3. Basic Configuration of TF Serving
      4. Making Model Prediction Requests with REST
      5. Making Model Prediction Requests with gRPC
      6. Getting Predictions from Classification and Regression Models
      7. Using Payloads
      8. Getting Model Metadata from TF Serving
      9. Making Batch Inference Requests
    2. Example: Profiling TF Serving Inferences with TF Profiler
      1. Prerequisites
      2. TensorBoard Setup
      3. Model Profile
    3. Example: Basic TorchServe Setup
      1. Installing the TorchServe Dependencies
      2. Exporting Your Model for TorchServe
      3. Setting Up TorchServe
      4. Making Model Prediction Requests
      5. Making Batch Inference Requests
    4. Conclusion
  17. 15. Model Management and Delivery
    1. Experiment Tracking
      1. Experimenting in Notebooks
      2. Experimenting Overall
      3. Tools for Experiment Tracking and Versioning
    2. Introduction to MLOps
      1. Data Scientists Versus Software Engineers
      2. ML Engineers
      3. ML in Products and Services
      4. MLOps
    3. MLOps Methodology
      1. MLOps Level 0
      2. MLOps Level 1
      3. MLOps Level 2
      4. Components of an Orchestrated Workflow
    4. Three Types of Custom Components
      1. Python Function–Based Components
      2. Container-Based Components
      3. Fully Custom Components
    5. TFX Deep Dive
      1. TFX SDK
      2. Intermediate Representation
      3. Runtime
      4. Implementing an ML Pipeline Using TFX Components
      5. Advanced Features of TFX
    6. Managing Model Versions
      1. Approaches to Versioning Models
      2. Model Lineage
      3. Model Registries
    7. Continuous Integration and Continuous Deployment
      1. Continuous Integration
      2. Continuous Delivery
    8. Progressive Delivery
      1. Blue/Green Deployment
      2. Canary Deployment
      3. Live Experimentation
    9. Conclusion
  18. 16. Model Monitoring and Logging
    1. The Importance of Monitoring
    2. Observability in Machine Learning
      1. What Should You Monitor?
      2. Custom Alerting in TFX
    3. Logging
    4. Distributed Tracing
    5. Monitoring for Model Decay
      1. Data Drift and Concept Drift
      2. Model Decay Detection
      3. Supervised Monitoring Techniques
      4. Unsupervised Monitoring Techniques
      5. Mitigating Model Decay
    6. Retraining Your Model
      1. When to Retrain
      2. Automated Retraining
    7. Conclusion
  19. 17. Privacy and Legal Requirements
    1. Why Is Data Privacy Important?
      1. What Data Needs to Be Kept Private?
      2. Harms
      3. Only Collect What You Need
      4. GenAI Data Scraped from the Web and Other Sources
    2. Legal Requirements
      1. The GDPR and the CCPA
      2. The GDPR’s Right to Be Forgotten
    3. Pseudonymization and Anonymization
    4. Differential Privacy
      1. Local and Global DP
      2. Epsilon-Delta DP
      3. Applying Differential Privacy to ML
      4. TensorFlow Privacy Example
    5. Federated Learning
    6. Encrypted ML
    7. Conclusion
  20. 18. Orchestrating Machine Learning Pipelines
    1. An Introduction to Pipeline Orchestration
      1. Why Pipeline Orchestration?
      2. Directed Acyclic Graphs
    2. Pipeline Orchestration with TFX
      1. Interactive TFX Pipelines
      2. Converting Your Interactive Pipeline for Production
    3. Orchestrating TFX Pipelines with Apache Beam
    4. Orchestrating TFX Pipelines with Kubeflow Pipelines
      1. Introduction to Kubeflow Pipelines
      2. Installation and Initial Setup
      3. Accessing Kubeflow Pipelines
      4. The Workflow from TFX to Kubeflow
      5. OpFunc Functions
      6. Orchestrating Kubeflow Pipelines
    5. Google Cloud Vertex Pipelines
      1. Setting Up Google Cloud and Vertex Pipelines
      2. Setting Up a Google Cloud Service Account
      3. Orchestrating Pipelines with Vertex Pipelines
      4. Executing Vertex Pipelines
    6. Choosing Your Orchestrator
      1. Interactive TFX
      2. Apache Beam
      3. Kubeflow Pipelines
      4. Google Cloud Vertex Pipelines
    7. Alternatives to TFX
    8. Conclusion
  21. 19. Advanced TFX
    1. Advanced Pipeline Practices
      1. Configure Your Components
      2. Import Artifacts
      3. Use Resolver Node
      4. Execute a Conditional Pipeline
      5. Export TF Lite Models
      6. Warm-Starting Model Training
      7. Use Exit Handlers
      8. Trigger Messages from TFX
    2. Custom TFX Components: Architecture and Use Cases
      1. Architecture of TFX Components
      2. Use Cases of Custom Components
    3. Using Function-Based Custom Components
    4. Writing a Custom Component from Scratch
      1. Defining Component Specifications
      2. Defining Component Channels
      3. Writing the Custom Executor
      4. Writing the Custom Driver
      5. Assembling the Custom Component
      6. Using Our Basic Custom Component
    5. Implementation Review
    6. Reusing Existing Components
    7. Creating Container-Based Custom Components
    8. Which Custom Component Is Right for You?
    9. TFX-Addons
    10. Conclusion
  22. 20. ML Pipelines for Computer Vision Problems
    1. Our Data
    2. Our Model
    3. Custom Ingestion Component
    4. Data Preprocessing
    5. Exporting the Model
    6. Our Pipeline
      1. Data Ingestion
      2. Data Preprocessing
      3. Model Training
      4. Model Evaluation
      5. Model Export
      6. Putting It All Together
    7. Executing on Apache Beam
    8. Executing on Vertex Pipelines
    9. Model Deployment with TensorFlow Serving
    10. Conclusion
  23. 21. ML Pipelines for Natural Language Processing
    1. Our Data
    2. Our Model
    3. Ingestion Component
    4. Data Preprocessing
    5. Putting the Pipeline Together
    6. Executing the Pipeline
    7. Model Deployment with Google Cloud Vertex
      1. Registering Your ML Model
      2. Creating a New Model Endpoint
      3. Deploying Your ML Model
      4. Requesting Predictions from the Deployed Model
      5. Cleaning Up Your Deployed Model
    8. Conclusion
  24. 22. Generative AI
    1. Generative Models
    2. GenAI Model Types
    3. Agents and Copilots
    4. Pretraining
      1. Pretraining Datasets
      2. Embeddings
      3. Self-Supervised Training with Masks
    5. Fine-Tuning
      1. Fine-Tuning Versus Transfer Learning
      2. Fine-Tuning Datasets
      3. Fine-Tuning Considerations for Production
      4. Fine-Tuning Versus Model APIs
    6. Parameter-Efficient Fine-Tuning
      1. LoRA
      2. S-LoRA
    7. Human Alignment
      1. Reinforcement Learning from Human Feedback
      2. Reinforcement Learning from AI Feedback
      3. Direct Preference Optimization
    8. Prompting
    9. Chaining
    10. Retrieval Augmented Generation
    11. ReAct
    12. Evaluation
      1. Evaluation Techniques
      2. Benchmarking Across Models
    13. LMOps
    14. GenAI Attacks
      1. Jailbreaks
      2. Prompt Injection
    15. Responsible GenAI
      1. Design for Responsibility
      2. Conduct Adversarial Testing
      3. Constitutional AI
    16. Conclusion
  25. 23. The Future of Machine Learning Production Systems and Next Steps
    1. Let’s Think in Terms of ML Systems, Not ML Models
    2. Bringing ML Systems Closer to Domain Experts
    3. Privacy Has Never Been More Important
    4. Conclusion
  26. Index
  27. About the Authors

Product information

  • Title: Machine Learning Production Systems
  • Author(s): Robert Crowe, Hannes Hapke, Emily Caveness, Di Zhu
  • Release date: October 2024
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098156015