Spotlight on Data: Machine Learning in Production at Google Scale with Todd Underwood
Published by O'Reilly Media, Inc.
Making machine learning reliable
Watch the video recording of this event.
You can propel your business forward with AI-centric approaches to solving customer needs, but to be successful, you need to deploy your machine learning models at scale. Yet engineers face unique challenges when using machine learning-based products in production environments, such as specialized resource management and measuring user happiness.
Join us for this edition of Spotlight on Data as Todd Underwood, Google’s director and lead for machine learning in site reliability engineering (SRE), explains how to sustainably run machine learning systems at scale. You’ll learn why machine learning is essential to Google’s core functions, providing key advantages across most of Google’s products, including Search, Ads, Payments, Billing, Shopping, and more, and how SRE supports these production machine learning systems. You’ll also discover how the company is working to democratize access to AI by making machine learning technologies available to customers via its Cloud AI products.
O’Reilly Spotlight explores emerging business and technology topics and ideas through a series of one-hour interactive events. You’ll engage in a live conversation with experts, sharing your questions and ideas while hearing their unique perspectives, insights, fears, and predictions for the future.
In every edition of Spotlight on Data, you’ll learn about, discuss, and debate the tools, techniques, questions, and quandaries in the world of data. You’ll discover how successful companies leverage data effectively and how you can follow their lead to transform your organization and prepare for the Next Economy.
What you’ll learn and how you can apply it
- Key considerations for deploying your machine learning models and services at scale
- How SRE can best support production machine learning systems
This live event is for you because...
- You're an engineer or other technical contributor to machine learning projects, and you need to know how to scale and support your services in production environments.
Prerequisites
- Come with your questions for Todd Underwood
- Have a pen and paper handy to capture notes, insights, and inspiration
Recommended follow-up:
- Read AI and Analytics in Production (report)
- Read Machine Learning Logistics (report)
- Take Deploying Machine Learning Models to Production (live online training course with Armen Donigian)
- Explore Google ML Crash Course (online training)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Monday, August 5, 2019, at 9:00am PT / 12:00pm ET
- Introduction and presentation (15 minutes)
- Interactive discussion and Q&A (45 minutes)
Your Guest
Todd Underwood
Todd Underwood is a site reliability engineering director at Google in Pittsburgh, leading several teams of engineers working on machine learning, Ads, Payments, Billing, Shopping, and data center and cluster infrastructure. Todd’s expertise includes distributed systems, especially for machine learning and AI pipelines, and he has a background in systems engineering and networking. He’s presented work on the future of systems and software reliability engineering at LISA13, LISA16, and SREcon EU15. He’s coauthor of a chapter in the O'Reilly Site Reliability Engineering book and has published a paper in USENIX’s ;login: magazine. Todd has presented work related to internet routing dynamics and relationships at NANOG, RIPE, and various internet interconnection meetings and was previously chair of the NANOG Program Committee and the RIPE Programme Committee.