Book description
Take your machine learning skills to the next level by mastering databricks and building robust ML pipeline solutions for future ML innovations
Key Features
- Learn to build robust ML pipeline solutions for databricks transition
- Master commonly available features like AutoML and MLflow
- Leverage data governance and model deployment using MLflow model registry
- Purchase of the print or Kindle book includes a free PDF eBook
Book Description
Unleash the potential of databricks for end-to-end machine learning with this comprehensive guide, tailored for experienced data scientists and developers transitioning from DIY or other cloud platforms. Building on a strong foundation in Python, Practical Machine Learning on Databricks serves as your roadmap from development to production, covering all intermediary steps using the databricks platform.
You’ll start with an overview of machine learning applications, databricks platform features, and MLflow. Next, you’ll dive into data preparation, model selection, and training essentials and discover the power of databricks feature store for precomputing feature tables. You’ll also learn to kickstart your projects using databricks AutoML and automate retraining and deployment through databricks workflows.
By the end of this book, you’ll have mastered MLflow for experiment tracking, collaboration, and advanced use cases like model interpretability and governance. The book is enriched with hands-on example code at every step. While primarily focused on generally available features, the book equips you to easily adapt to future innovations in machine learning, databricks, and MLflow.
What you will learn
- Transition smoothly from DIY setups to databricks
- Master AutoML for quick ML experiment setup
- Automate model retraining and deployment
- Leverage databricks feature store for data prep
- Use MLflow for effective experiment tracking
- Gain practical insights for scalable ML solutions
- Find out how to handle model drifts in production environments
Who this book is for
This book is for experienced data scientists, engineers, and developers proficient in Python, statistics, and ML lifecycle looking to transition to databricks from DIY clouds. Introductory Spark knowledge is a must to make the most out of this book, however, end-to-end ML workflows will be covered. If you aim to accelerate your machine learning workflows and deploy scalable, robust solutions, this book is an indispensable resource.
Table of contents
- Contributors
- Preface
- Part 1: Introduction
-
Chapter 1: The ML Process and Its Challenges
- Understanding the typical machine learning process
- Discovering the roles associated with machine learning projects in organizations
- Challenges with productionizing machine learning use cases in organizations
- Understanding the requirements of an enterprise-grade machine learning platform
-
Exploring Databricks and the Lakehouse architecture
- Scalability – the growth catalyst
- Performance – ensuring efficiency and speed
- Security – safeguarding data and models
- Governance – steering the machine learning life cycle
- Reproducibility – ensuring trust and consistency
- Ease of use – balancing complexity and usability
- Simplifying machine learning development with the Lakehouse architecture
- Summary
- Further reading
- Chapter 2: Overview of ML on Databricks
- Part 2: ML Pipeline Components and Implementation
- Chapter 3: Utilizing the Feature Store
- Chapter 4: Understanding MLflow Components on Databricks
- Chapter 5: Create a Baseline Model Using Databricks AutoML
- Part 3: ML Governance and Deployment
- Chapter 6: Model Versioning and Webhooks
-
Chapter 7: Model Deployment Approaches
- Technical requirements
- Understanding ML deployments and paradigms
- Deploying ML models for batch and streaming inference
- Deploying ML models for real-time inference
- Incorporating custom Python libraries into MLflow models for Databricks deployment
- Packaging dependencies with MLflow models
- Summary
- Further reading
- Chapter 8: Automating ML Workflows Using Databricks Jobs
- Chapter 9: Model Drift Detection and Retraining
- Chapter 10: Using CI/CD to Automate Model Retraining and Redeployment
- Index
- Other Books You May Enjoy
Product information
- Title: Practical Machine Learning on Databricks
- Author(s):
- Release date: November 2023
- Publisher(s): Packt Publishing
- ISBN: 9781801812030
You might also like
book
Machine Learning Engineering in Action
Field-tested tips, tricks, and design patterns for building machine learning projects that are deployable, maintainable, and …
book
Practical Deep Learning at Scale with MLflow
Train, test, run, track, store, tune, deploy, and explain provenance-aware deep learning models and pipelines at …
book
Interpretable Machine Learning with Python - Second Edition
A deep dive into the key aspects and challenges of machine learning interpretability using a comprehensive …
book
Machine Learning Engineering with Python
Supercharge the value of your machine learning models by building scalable and robust solutions that can …