Optimization: Gradient Descent and Deep Learning (ML Foundations Series)
Published by Pearson
State-of-the-Art Approaches for Accurate and Efficient Model Fitting
- Capstone ML class: Serves as the final class in the 14-part "ML Foundations" series by Jon Krohn, blending material from the fields of linear algebra, calculus, probability, statistics and algorithms.
- Gradient descent mastery: Develop a deep understanding of the essential theory behind the ubiquitous gradient descent approach to optimization, along with hands-on experience applying it using PyTorch and TensorFlow.
- Latest optimization techniques: Learn about state-of-the-art optimizers, such as Adam and Nadam, widely used for training deep neural networks, while also receiving guidance on next steps in your ML journey.
The Machine Learning Foundations series of online trainings provides a comprehensive overview of all of the subjects — mathematics, statistics, and computer science — that underlie contemporary machine learning techniques, including deep learning and other artificial intelligence approaches. Extensive curriculum detail can be found at the course’s GitHub repo.
All of the classes in the ML Foundations series bring theory to life through the combination of vivid full-color illustrations, straightforward Python examples within hands-on Jupyter notebook demos, and comprehension exercises with fully-worked solutions.
The focus is on providing you with a practical, functional understanding of the content covered. Context will be given for each topic, highlighting its relevance to machine learning. You will be better positioned to understand cutting-edge machine learning papers and you will be provided with resources for digging even deeper into topics that pique your curiosity.
There are 14 classes in the series, organized into four subject areas:
Linear Algebra (three classes)
- Linear Algebra for Machine Learning: Intro
- Linear Algebra for Machine Learning, Level II: Matrix Tensors
- Linear Algebra for Machine Learning, Level III: Eigenvectors
Calculus (four classes)
- Calculus for Machine Learning: Intro
- Calculus for Machine Learning, Level II: Automatic Differentiation
- Calculus for Machine Learning, Level III: Partial Derivatives
- Calculus for Machine Learning, Level IV: Gradients & Integrals
Probability and Statistics (four classes)
- Intro to Probability Theory
- Probability II and Information Theory
- Intro to Statistics
- Statistics II: Regression and Bayesian
Computer Science (three classes)
- Intro to Data Structures and Algorithms
- DSA II: Hashing, Trees, and Graphs
- Optimization
Each of the four subject areas are fairly independent, however theory within a given subject area generally builds over the 3-4 classes — topics in later classes of a given subject area often assume an understanding of topics from earlier classes. Work through the individual classes based on your particular interests or your existing familiarity with the material.
(Note that at any given time, only a subset of the ML Foundations classes will be scheduled and open for registration.)
This class, Optimization, is the final class in the 14-part Machine Learning Foundations series. It builds upon the material from each of the other classes in the series — on linear algebra, calculus, probability, statistics, and algorithms — in order to provide a detailed introduction to training ML models. Through the measured exposition of theory paired with interactive examples, you’ll develop a working understanding of all of the essential theory behind the ubiquitous gradient descent approach to optimization as well as how to apply it yourself — both at a granular, matrix operations level and a quick, abstract level — with TensorFlow and PyTorch. You’ll also learn about the latest optimizers, such as Adam and Nadam, that are widely used for training deep neural networks. Now well-equipped with all the foundational knowledge underlying ML, at the end of class you’ll be left with guidance on where to proceed from here on your ML journey.
What you’ll learn and how you can apply it
- Discover how the statistical and machine learning approaches to optimization differ, and why you would select one or the other for a given problem you’re solving.
- Find out how the extremely versatile (stochastic) gradient descent optimization algorithm works, including how to apply it — from a low, in-depth level as well as from a high, abstracted level — within the most popular deep learning libraries, Tensorflow and PyTorch
- Get acquainted with the “fancy” optimizers that are available for advanced machine learning approaches (e.g., deep learning) and when you should consider using them.
This live event is for you because...
- You use high-level software (e.g., scikit-learn, the Keras API, PyTorch Lightning) to train or deploy machine learning algorithms, and would now like to understand the fundamentals underlying the abstractions, enabling you to expand your capabilities
- You’re a software developer who would like to develop a firm foundation for the deployment of machine learning algorithms into production systems
- You’re a data scientist who would like to reinforce your understanding of the subjects at the core of your professional discipline
- You’re a data analyst or A.I. enthusiast who would like to become a data scientist or data/ML engineer, and so you’re keen to deeply understand the field you’re entering from the ground up (very wise of you!)
Prerequisites
- Programming: All code demos will be in Python so experience with it or another object-oriented programming language would be helpful for following along with the code examples.
- Mathematics: You should either have attended the Calculus IV: Gradients and Integrals live training or be familiar with the content in Lessons 1-7 of Jon Krohn’s Calculus for ML LiveLessons.
Course Set-up
- During class, we’ll work on Jupyter notebooks interactively in the cloud via Google Colab. This requires zero setup and instructions will be provided in class.
Recommended Preparation
If you’re feeling extremely ambitious, you can get a headstart on the content we’ll be covering in class by viewing Lessons 8-9 of Jon Krohn’s Data Structures, Algorithms, and ML Optimization LiveLessons.
Note: The remainder of Jon’s ML Foundations curriculum is split across the following videos:
- Watch: Linear Algebra for Machine Learning
- Watch: Probability and Statistic for Machine Learning LiveLessons
- Watch: Data Structures, Algorithms, and Machine Learning Optimization LiveLessons
Recommended Follow-up
- Watch: Data Structures, Algorithms, and ML Optimization LiveLessons by Jon Krohn
- Explore: Math for Machine Learning by Jon Krohn
- Explore: Deep Learning: The Complete Guide by Jon Krohn
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Segment 1: Optimization Approaches (30 min)
- The Statistical Approach to Regression: Ordinary Least Squares
- When Statistical Approaches to Optimization Break Down
- The Machine Learning Solution
Q&A: 5 minutes
Break: 10 minutes
Segment 2: Gradient Descent (105 min)
- Objective Functions
- Cost / Loss / Error Functions
- Minimizing Cost with Gradient Descent
- Learning Rate
- Critical Points, incl. Saddle Points
- Gradient Descent from Scratch with PyTorch
- Checkpoint, Q&A, and Break
- The Global Minimum and Local Minima
- Mini-Batches and Stochastic Gradient Descent (SGD)
- Learning Rate Scheduling
- Maximizing Reward with Gradient Ascent
Q&A: 5 minutes
Break: 10 minutes
Segment 3: Fancy Deep Learning Optimizers (60 min)
- A Layer of Artificial Neurons in PyTorch
- Jacobian Matrices
- Hessian Matrices and Second-Order Optimization
- Momentum
- Nesterov Momentum
- AdaGrad
- AdaDelta
- RMSProp
- Adam
- Nadam
- Training a Deep Neural Net
- Resources for the Further Study of Machine Learning
Q&A: 15 minutes
Course wrap-up and next steps (15 minutes)
Your Instructor
Jon Krohn
Jon Krohn is Co-Founder and Chief Data Scientist at the machine learning company Nebula. He authored the book Deep Learning Illustrated, an instant #1 bestseller that was translated into seven languages. He is also the host of SuperDataScience, the data science industry’s most listened-to podcast. Jon is renowned for his compelling lectures, which he offers at leading universities and conferences, as well as via his award-winning YouTube channel. He holds a PhD from Oxford and has been publishing on machine learning in prominent academic journals since 2010.