Book description
Whether you're part of a small startup or a multinational corporation, this practical book shows data scientists, software and site reliability engineers, product managers, and business owners how to run and establish ML reliably, effectively, and accountably within your organization. You'll gain insight into everything from how to do model monitoring in production to how to run a well-tuned model development team in a product organization.
By applying an SRE mindset to machine learning, authors and engineering professionals Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and featured guest authors show you how to run an efficient and reliable ML system. Whether you want to increase revenue, optimize decision making, solve problems, or understand and influence customer behavior, you'll learn how to perform day-to-day ML tasks while keeping the bigger picture in mind.
You'll examine:
- What ML is: how it functions and what it relies on
- Conceptual frameworks for understanding how ML "loops" work
- How effective productionization can make your ML systems easily monitorable, deployable, and operable
- Why ML systems make production troubleshooting more difficult, and how to compensate accordingly
- How ML, product, and production teams can communicate effectively
Publisher resources
Table of contents
- Foreword
- Preface
- 1. Introduction
- 2. Data Management Principles
- 3. Basic Introduction to Models
- 4. Feature and Training Data
- 5. Evaluating Model Validity and Quality
- 6. Fairness, Privacy, and Ethical ML Systems
-
7. Training Systems
- Requirements
- Basic Training System Implementation
-
General Reliability Principles
- Most Failures Will Not Be ML Failures
- Models Will Be Retrained
- Models Will Have Multiple Versions (at the Same Time!)
- Good Models Will Become Bad
- Data Will Be Unavailable
- Models Should Be Improvable
- Features Will Be Added and Changed
- Models Can Train Too Fast
- Resource Utilization Matters
- Utilization != Efficiency
- Outages Include Recovery
- Common Training Reliability Problems
- Structural Reliability
- Conclusion
- 8. Serving
- 9. Monitoring and Observability for Models
- 10. Continuous ML
- 11. Incident Response
- 12. How Product and ML Interact
- 13. Integrating ML into Your Organization
- 14. Practical ML Org Implementation Examples
- 15. Case Studies: MLOps in Practice
- Index
- About the Authors
Product information
- Title: Reliable Machine Learning
- Author(s):
- Release date: September 2022
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098106225
You might also like
book
Grokking Machine Learning
Discover valuable machine learning techniques you can understand and apply using just high-school math. In Grokking …
book
Machine Learning for High-Risk Applications
The past decade has witnessed the broad adoption of artificial intelligence and machine learning (AI/ML) technologies. …
book
Machine Learning Interviews
As tech products become more prevalent today, the demand for machine learning professionals continues to grow. …
book
Designing Machine Learning Systems
Machine learning systems are both complex and unique. Complex because they consist of many different components …