Book description
Bringing a deep-learning project into production at scale is quite challenging. To successfully scale your project, a foundational understanding of full stack deep learning, including the knowledge that lies at the intersection of hardware, software, data, and algorithms, is required.
This book illustrates complex concepts of full stack deep learning and reinforces them through hands-on exercises to arm you with tools and techniques to scale your project. A scaling effort is only beneficial when it's effective and efficient. To that end, this guide explains the intricate concepts and techniques that will help you scale effectively and efficiently.
You'll gain a thorough understanding of:
- How data flows through the deep-learning network and the role the computation graphs play in building your model
- How accelerated computing speeds up your training and how best you can utilize the resources at your disposal
- How to train your model using distributed training paradigms, i.e., data, model, and pipeline parallelism
- How to leverage PyTorch ecosystems in conjunction with NVIDIA libraries and Triton to scale your model training
- Debugging, monitoring, and investigating the undesirable bottlenecks that slow down your model training
- How to expedite the training lifecycle and streamline your feedback loop to iterate model development
- A set of data tricks and techniques and how to apply them to scale your training model
- How to select the right tools and techniques for your deep-learning project
- Options for managing the compute infrastructure when running at scale
Publisher resources
Table of contents
- Preface
- 1. What Nature and History Have Taught Us About Scale
- I. Foundational Concepts of Deep Learning
- 2. Deep Learning
- 3. The Computational Side of Deep Learning
- 4. Putting It All Together: Efficient Deep Learning
- II. Distributed Training
- 5. Distributed Systems and Communications
- 6. Theoretical Foundations of Distributed Deep Learning
- 7. Data Parallelism
- 8. Scaling Beyond Data Parallelism: Model, Pipeline, Tensor, and Hybrid Parallelism
- 9. Gaining Practical Expertise with Scaling Across All Dimensions
- III. Extreme Scaling
- 10. Data-Centric Scaling
- 11. Scaling Experiments: Effective Planning and Management
- 12. Efficient Fine-Tuning of Large Models
- 13. Foundation Models
- Index
- About the Author
Product information
- Title: Deep Learning at Scale
- Author(s):
- Release date: June 2024
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098145286
You might also like
book
Practical Deep Learning at Scale with MLflow
Train, test, run, track, store, tune, deploy, and explain provenance-aware deep learning models and pipelines at …
book
Deep Learning
Ever since computers began beating us at chess, they've been getting better at a wide range …
book
Deep Learning with PyTorch
Every other day we hear about new ways to put deep learning to good use: improved …
book
Scaling Machine Learning with Spark
Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, …