Book description
Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, author Adi Polak introduces data and ML practitioners to creative solutions that supersede today's traditional methods. You'll learn a more holistic approach that takes you beyond specific requirements and organizational goals--allowing data and ML practitioners to collaborate and understand each other better.
Scaling Machine Learning with Spark examines several technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLflow, TensorFlow, and PyTorch. If you're a data scientist who works with machine learning, this book shows you when and why to use each technology.
You will:
- Explore machine learning, including distributed computing concepts and terminology
- Manage the ML lifecycle with MLflow
- Ingest data and perform basic preprocessing with Spark
- Explore feature engineering, and use Spark to extract features
- Train a model with MLlib and build a pipeline to reproduce it
- Build a data system to combine the power of Spark with deep learning
- Get a step-by-step example of working with distributed TensorFlow
- Use PyTorch to scale machine learning and its internal architecture
Publisher resources
Table of contents
- Preface
-
1. Distributed Machine Learning Terminology and Concepts
- The Stages of the Machine Learning Workflow
- Tools and Technologies in the Machine Learning Pipeline
- Distributed Computing Models
- Introduction to Distributed Systems Architecture
- Introduction to Ensemble Methods
- The Challenges of Distributed Machine Learning Systems
- Setting Up Your Local Environment
- Summary
- 2. Introduction to Spark and PySpark
- 3. Managing the Machine Learning Experiment Lifecycle with MLflow
- 4. Data Ingestion, Preprocessing, and Descriptive Statistics
- 5. Feature Engineering
- 6. Training Models with Spark MLlib
- 7. Bridging Spark and Deep Learning Frameworks
- 8. TensorFlow Distributed Machine Learning Approach
- 9. PyTorch Distributed Machine Learning Approach
- 10. Deployment Patterns for Machine Learning Models
- Index
- About the Author
Product information
- Title: Scaling Machine Learning with Spark
- Author(s):
- Release date: March 2023
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098106829
You might also like
book
Kubeflow for Machine Learning
If you're training a machine learning model but aren't sure how to put it into production, …
book
Deep Learning with PyTorch
Every other day we hear about new ways to put deep learning to good use: improved …
book
Feature Store for Machine Learning
Learn how to leverage feature stores to make the most of your machine learning models Key …
book
Building Machine Learning Pipelines
Companies are spending billions on machine learning projects, but it’s money wasted if the models can’t …