Feature Store for Machine Learning

Book description

Learn how to leverage feature stores to make the most of your machine learning models

Key Features

  • Understand the significance of feature stores in the ML life cycle
  • Discover how features can be shared, discovered, and re-used
  • Learn to make features available for online models during inference

Book Description

Feature store is one of the storage layers in machine learning (ML) operations, where data scientists and ML engineers can store transformed and curated features for ML models. This makes them available for model training, inference (batch and online), and reuse in other ML pipelines. Knowing how to utilize feature stores to their fullest potential can save you a lot of time and effort, and this book will teach you everything you need to know to get started.

Feature Store for Machine Learning is for data scientists who want to learn how to use feature stores to share and reuse each other's work and expertise. You'll be able to implement practices that help in eliminating reprocessing of data, providing model-reproducible capabilities, and reducing duplication of work, thus improving the time to production of the ML model. While this ML book offers some theoretical groundwork for developers who are just getting to grips with feature stores, there's plenty of practical know-how for those ready to put their knowledge to work. With a hands-on approach to implementation and associated methodologies, you'll get up and running in no time.

By the end of this book, you'll have understood why feature stores are essential and how to use them in your ML projects, both on your local system and on the cloud.

What you will learn

  • Understand the significance of feature stores in a machine learning pipeline
  • Become well-versed with how to curate, store, share and discover features using feature stores
  • Explore the different components and capabilities of a feature store
  • Discover how to use feature stores with batch and online models
  • Accelerate your model life cycle and reduce costs
  • Deploy your first feature store for production use cases

Who this book is for

If you have a solid grasp on machine learning basics, but need a comprehensive overview of feature stores to start using them, then this book is for you. Data/machine learning engineers and data scientists who build machine learning models for production systems in any domain, those supporting data engineers in productionizing ML models, and platform engineers who build data science (ML) platforms for the organization will also find plenty of practical advice in the later chapters of this book.

Table of contents

  1. Feature Store for Machine Learning
  2. Contributors
  3. About the author
  4. About the reviewer
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share Your Thoughts
  6. Section 1 – Why Do We Need a Feature Store?
  7. Chapter 1: An Overview of the Machine Learning Life Cycle
    1. Technical requirements
    2. The ML life cycle in practice
      1. Problem statement (plan and create)
      2. Data (preparation and cleaning)
      3. Model
      4. Package, release, and monitor
    3. An ideal world versus the real world
      1. Reusability and sharing
      2. Everything in a notebook
    4. The most time-consuming stages of ML
      1. Figuring out the dataset
      2. Data exploration and feature engineering
      3. Modeling to production and monitoring
    5. Summary
  8. Chapter 2: What Problems Do Feature Stores Solve?
    1. Importance of features in production
    2. Ways to bring features to production
      1. Batch model pipeline
      2. Online model pipeline
    3. Common problems with the approaches used for bringing features to production
      1. Re-inventing the wheel
      2. Feature re-calculation
      3. Feature discoverability and sharing
      4. Training vs Serving skew
      5. Model reproducibility
      6. Low latency
    4. Feature stores to the rescue
      1. Standardizing ML with a feature store
      2. Feature store avoids reprocessing data
      3. Features are discoverable and sharable with the feature store
      4. Serving features at low latency with feature stores
    5. Philosophy behind feature stores
    6. Summary
    7. Further reading
  9. Section 2 – A Feature Store in Action
  10. Chapter 3: Feature Store Fundamentals, Terminology, and Usage
    1. Technical requirements
    2. Introduction to Feast and installation
    3. Feast terminology and definitions
    4. Feast initialization
    5. Feast usage
      1. Register feature definitions
      2. Browsing the feature store
      3. Adding an entity and FeatureView
      4. Generate training data
      5. Load features to the online store
    6. Feast behind the scenes
      1. Data flow in Feast
    7. Summary
    8. Further reading
  11. Chapter 4: Adding Feature Store to ML Models
    1. Technical requirements
    2. Creating Feast resources in AWS
      1. Amazon S3 for storing data
      2. AWS Redshift for an offline store
      3. Creating an IAM user to access the resources
    3. Feast initialization for AWS
    4. Exploring the ML life cycle with Feast
      1. Problem statement (plan and create)
      2. Data (preparation and cleaning)
      3. Model (feature engineering)
    5. Summary
    6. References
  12. Chapter 5: Model Training and Inference
    1. Prerequisites
    2. Technical requirements
    3. Model training with the feature store
      1. Dee's model training experiments
      2. Ram's model training experiments
    4. Model packaging
    5. Batch model inference with Feast
    6. Online model inference with Feast
      1. Syncing the latest features from the offline to the online store
      2. Packaging the online model as a REST endpoint with Feast code
    7. Handling changes to the feature set during development
      1. Step 1 – Change feature definitions
      2. Step 2 – Add/update schema in the Glue/Lake Formation console
      3. Step 3 – Update notebooks with the changes
    8. Summary
    9. Further reading
  13. Chapter 6: Model to Production and Beyond
    1. Technical requirements
    2. Setting up Airflow for orchestration
      1. S3 bucket for Airflow metadata
      2. Amazon MWAA environment for orchestration
    3. Productionizing the batch model pipeline
    4. Productionizing an online model pipeline
      1. Orchestration of a feature engineering job
      2. Deploying the model as a SageMaker endpoint
    5. Beyond model production
      1. Feature drift monitoring and model retraining
      2. Model reproducibility and prediction issues
      3. A headstart for the next model
      4. Changes to feature definition after production
    6. Summary
  14. Section 3 – Alternatives, Best Practices, and a Use Case
  15. Chapter 7: Feast Alternatives and ML Best Practices
    1. Technical requirements
    2. The available feature stores on the market
      1. The Tecton Feature Store
      2. Databricks Feature Store
      3. Google's Vertex AI Feature Store
      4. The Hopsworks Feature Store
      5. SageMaker Feature Store
    3. Feature management with SageMaker Feature Store
      1. Resources to use SageMaker
      2. Generating features
      3. Defining the feature group
      4. Feature ingestion
      5. Getting records from an online store
      6. Querying historical data with Amazon Athena
      7. Cleaning up a SageMaker feature group
    4. ML best practices
      1. Data validation at source
      2. Breaking down ML pipeline and orchestration
      3. Tracking data lineage and versioning
      4. The feature repository
      5. Experiment tracking, model versioning, and the model repository
      6. Feature and model monitoring
      7. Miscellaneous
    5. Summary
  16. Chapter 8: Use Case – Customer Churn Prediction
    1. Technical requirements
    2. Infrastructure setup
    3. Introduction to the problem and the dataset
    4. Data processing and feature engineering
    5. Feature group definitions and feature ingestion
    6. Model training
    7. Model prediction
    8. Feature monitoring
    9. Model monitoring
    10. Summary
    11. Why subscribe?
  17. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts

Product information

  • Title: Feature Store for Machine Learning
  • Author(s): Jayanth Kumar M J
  • Release date: June 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803230061