Inside Deep Learning

Book description

Journey through the theory and practice of modern deep learning, and apply innovative techniques to solve everyday data problems.

In Inside Deep Learning, you will learn how to:

  • Implement deep learning with PyTorch
  • Select the right deep learning components
  • Train and evaluate a deep learning model
  • Fine tune deep learning models to maximize performance
  • Understand deep learning terminology
  • Adapt existing PyTorch code to solve new problems

Inside Deep Learning is an accessible guide to implementing deep learning with the PyTorch framework. It demystifies complex deep learning concepts and teaches you to understand the vocabulary of deep learning so you can keep pace in a rapidly evolving field. No detail is skipped—you’ll dive into math, theory, and practical applications. Everything is clearly explained in plain English.

About the Technology
Deep learning doesn’t have to be a black box! Knowing how your models and algorithms actually work gives you greater control over your results. And you don’t have to be a mathematics expert or a senior data scientist to grasp what’s going on inside a deep learning system. This book gives you the practical insight you need to understand and explain your work with confidence.

About the Book
Inside Deep Learning illuminates the inner workings of deep learning algorithms in a way that even machine learning novices can understand. You’ll explore deep learning concepts and tools through plain language explanations, annotated code, and dozens of instantly useful PyTorch examples. Each type of neural network is clearly presented without complex math, and every solution in this book can run using readily available GPU hardware!

What's Inside
  • Select the right deep learning components
  • Train and evaluate a deep learning model
  • Fine tune deep learning models to maximize performance
  • Understand deep learning terminology


About the Reader
For Python programmers with basic machine learning skills.

About the Author
Edward Raff is a Chief Scientist at Booz Allen Hamilton, and the author of the JSAT machine learning library.

Quotes
Pick up this book, and you won’t be able to put it down. A rich, engaging knowledge base of deep learning math, algorithms, and models—just like the title says!
- From the Foreword by Kirk Borne Ph.D., Chief Science Officer, DataPrime.ai

The clearest and easiest book for learning deep learning principles and techniques I have ever read. The graphical representations for the algorithms are an eye-opening revelation.
- Richard Vaughan, Purple Monkey Collective

A great read for anyone interested in understanding the details of deep learning.
- Vishwesh Ravi Shrimali, MBRDI

Publisher resources

View/Submit Errata

Table of contents

  1. inside front cover
  2. Inside Deep Learning
  3. Copyright
  4. dedication
  5. contents
  6. front matter
    1. Foreword
    2. Preface
    3. Acknowledgments
    4. About this book
      1. Who should read this book?
      2. How this book is organized: A road map
      3. About the mathematical notations
      4. About the exercises
      5. About Google Colab
      6. About the code
      7. liveBook discussion forum
      8. Other online resources
    5. About the author
    6. About the cover
  7. Part 1. Foundational methods
  8. 1 The mechanics of learning
    1. 1.1 Getting started with Colab
    2. 1.2 The world as tensors
      1. 1.2.1 PyTorch GPU acceleration
    3. 1.3 Automatic differentiation
      1. 1.3.1 Using derivatives to minimize losses
      2. 1.3.2 Calculating a derivative with automatic differentiation
      3. 1.3.3 Putting it together: Minimizing a function with derivatives
    4. 1.4 Optimizing parameters
    5. 1.5 Loading dataset objects
      1. 1.5.1 Creating a training and testing split
    6. Exercises
    7. Summary
  9. 2 Fully connected networks
    1. 2.1 Neural networks as optimization
      1. 2.1.1 Notation of training a neural network
      2. 2.1.2 Building a linear regression model
      3. 2.1.3 The training loop
      4. 2.1.4 Defining a dataset
      5. 2.1.5 Defining the model
      6. 2.1.6 Defining the loss function
      7. 2.1.7 Putting it together: Training a linear regression model on the data
    2. 2.2 Building our first neural network
      1. 2.2.1 Notation for a fully connected network
      2. 2.2.2 A fully connected network in PyTorch
      3. 2.2.3 Adding nonlinearities
    3. 2.3 Classification problems
      1. 2.3.1 Classification toy problem
      2. 2.3.2 Classification loss function
      3. 2.3.3 Training a classification network
    4. 2.4 Better training code
      1. 2.4.1 Custom metrics
      2. 2.4.2 Training and testing passes
      3. 2.4.3 Saving checkpoints
      4. 2.4.4 Putting it all together: A better model training function
    5. 2.5 Training in batches
    6. Exercises
    7. Summary
  10. 3 Convolutional neural networks
    1. 3.1 Spatial structural prior beliefs
      1. 3.1.1 Loading MNIST with PyTorch
    2. 3.2 What are convolutions?
      1. 3.2.1 1D convolutions
      2. 3.2.2 2D convolutions
      3. 3.2.3 Padding
      4. 3.2.4 Weight sharing
    3. 3.3 How convolutions benefit image processing
    4. 3.4 Putting it into practice: Our first CNN
      1. 3.4.1 Making a convolutional layer with multiple filters
      2. 3.4.2 Using multiple filters per layer
      3. 3.4.3 Mixing convolutional layers with linear layers via flattening
      4. 3.4.4 PyTorch code for our first CNN
    5. 3.5 Adding pooling to mitigate object movement
      1. 3.5.1 CNNs with max pooling
    6. 3.6 Data augmentation
    7. Exercises
    8. Summary
  11. 4 Recurrent neural networks
    1. 4.1 Recurrent neural networks as weight sharing
      1. 4.1.1 Weight sharing for a fully connected network
      2. 4.1.2 Weight sharing over time
    2. 4.2 RNNs in PyTorch
      1. 4.2.1 A simple sequence classification problem
      2. 4.2.2 Embedding layers
      3. 4.2.3 Making predictions using the last time step
    3. 4.3 Improving training time with packing
      1. 4.3.1 Pad and pack
      2. 4.3.2 Packable embedding layer
      3. 4.3.3 Training a batched RNN
      4. 4.3.4 Simultaneous packed and unpacked inputs
    4. 4.4 More complex RNNs
      1. 4.4.1 Multiple layers
      2. 4.4.2 Bidirectional RNNs
    5. Exercises
    6. Summary
  12. 5 Modern training techniques
    1. 5.1 Gradient descent in two parts
      1. 5.1.1 Adding a learning rate schedule
      2. 5.1.2 Adding an optimizer
      3. 5.1.3 Implementing optimizers and schedulers
    2. 5.2 Learning rate schedules
      1. 5.2.1 Exponential decay: Smoothing erratic training
      2. 5.2.2 Step drop adjustment: Better smoothing
      3. 5.2.3 Cosine annealing: Greater accuracy but less stability
      4. 5.2.4 Validation plateau: Data-based adjustments
      5. 5.2.5 Comparing the schedules
    3. 5.3 Making better use of gradients
      1. 5.3.1 SGD with momentum: Adapting to gradient consistency
      2. 5.3.2 Adam: Adding variance to momentum
      3. 5.3.3 Gradient clipping: Avoiding exploding gradients
    4. 5.4 Hyperparameter optimization with Optuna
      1. 5.4.1 Optuna
      2. 5.4.2 Optuna with PyTorch
      3. 5.4.3 Pruning trials with Optuna
    5. Exercises
    6. Summary
  13. 6 Common design building blocks
    1. 6.1 Better activation functions
      1. 6.1.1 Vanishing gradients
      2. 6.1.2 Rectified linear units (ReLUs): Avoiding vanishing gradients
      3. 6.1.3 Training with LeakyReLU activations
    2. 6.2 Normalization layers: Magically better convergence
      1. 6.2.1 Where do normalization layers go?
      2. 6.2.2 Batch normalization
      3. 6.2.3 Training with batch normalization
      4. 6.2.4 Layer normalization
      5. 6.2.5 Training with layer normalization
      6. 6.2.6 Which normalization layer to use?
      7. 6.2.7 A peculiarity of layer normalization
    3. 6.3 Skip connections: A network design pattern
      1. 6.3.1 Implementing fully connected skips
      2. 6.3.2 Implementing convolutional skips
    4. 6.4 1 × 1 Convolutions: Sharing and reshaping information in channels
      1. 6.4.1 Training with 1 × 1 convolutions
    5. 6.5 Residual connections
      1. 6.5.1 Residual blocks
      2. 6.5.2 Implementing residual blocks
      3. 6.5.3 Residual bottlenecks
      4. 6.5.4 Implementing residual bottlenecks
    6. 6.6 Long short-term memory RNNs
      1. 6.6.1 RNNs: A fast review
      2. 6.6.2 LSTMs and the gating mechanism
      3. 6.6.3 Training an LSTM
    7. Exercises
    8. Summary
  14. Part 2. Building advanced networks
  15. 7 Autoencoding and self-supervision
    1. 7.1 How autoencoding works
      1. 7.1.1 Principle component analysis is a bottleneck autoencoder
      2. 7.1.2 Implementing PCA
      3. 7.1.3 Implementing PCA with PyTorch
      4. 7.1.4 Visualizing PCA results
      5. 7.1.5 A simple nonlinear PCA
    2. 7.2 Designing autoencoding neural networks
      1. 7.2.1 Implementing an autoencoder
      2. 7.2.2 Visualizing autoencoder results
    3. 7.3 Bigger autoencoders
      1. 7.3.1 Robustness to noise
    4. 7.4 Denoising autoencoders
      1. 7.4.1 Denoising with Gaussian noise
    5. 7.5 Autoregressive models for time series and sequences
      1. 7.5.1 Implementing the char-RNN autoregressive text model
      2. 7.5.2 Autoregressive models are generative models
      3. 7.5.3 Changing samples with temperature
      4. 7.5.4 Faster sampling
    6. Exercises
    7. Summary
  16. 8 Object detection
    1. 8.1 Image segmentation
      1. 8.1.1 Nuclei detection: Loading the data
      2. 8.1.2 Representing the image segmentation problem in PyTorch
      3. 8.1.3 Building our first image segmentation network
    2. 8.2 Transposed convolutions for expanding image size
      1. 8.2.1 Implementing a network with transposed convolutions
    3. 8.3 U-Net: Looking at fine and coarse details
      1. 8.3.1 Implementing U-Net
    4. 8.4 Object detection with bounding boxes
      1. 8.4.1 Faster R-CNN
      2. 8.4.2 Using Faster R-CNN in PyTorch
      3. 8.4.3 Suppressing overlapping boxes
    5. 8.5 Using the pretrained Faster R-CNN
    6. Exercises
    7. Summary
  17. 9 Generative adversarial networks
    1. 9.1 Understanding generative adversarial networks
      1. 9.1.1 The loss computations
      2. 9.1.2 The GAN games
      3. 9.1.3 Implementing our first GAN
    2. 9.2 Mode collapse
    3. 9.3 Wasserstein GAN: Mitigating mode collapse
      1. 9.3.1 WGAN discriminator loss
      2. 9.3.2 WGAN generator loss
      3. 9.3.3 Implementing WGAN
    4. 9.4 Convolutional GAN
      1. 9.4.1 Designing a convolutional generator
      2. 9.4.2 Designing a convolutional discriminator
    5. 9.5 Conditional GAN
      1. 9.5.1 Implementing a conditional GAN
      2. 9.5.2 Training a conditional GAN
      3. 9.5.3 Controlling the generation with conditional GANs
    6. 9.6 Walking the latent space of GANs
      1. 9.6.1 Getting models from the Hub
      2. 9.6.2 Interpolating GAN output
      3. 9.6.3 Labeling latent dimensions
    7. 9.7 Ethics in deep learning
    8. Exercises
    9. Summary
  18. 10 Attention mechanisms
    1. 10.1 Attention mechanisms learn relative input importance
      1. 10.1.1 Training our baseline model
      2. 10.1.2 Attention mechanism mechanics
      3. 10.1.3 Implementing a simple attention mechanism
    2. 10.2 Adding some context
      1. 10.2.1 Dot score
      2. 10.2.2 General score
      3. 10.2.3 Additive attention
      4. 10.2.4 Computing attention weights
    3. 10.3 Putting it all together: A complete attention mechanism with context
    4. Exercises
    5. Summary
  19. 11 Sequence-to-sequence
    1. 11.1 Sequence-to-sequence as a kind of denoising autoencoder
      1. 11.1.1 Adding attention creates Seq2Seq
    2. 11.2 Machine translation and the data loader
      1. 11.2.1 Loading a small English-French dataset
    3. 11.3 Inputs to Seq2Seq
      1. 11.3.1 Autoregressive approach
      2. 11.3.2 Teacher-forcing approach
      3. 11.3.3 Teacher forcing vs. an autoregressive approach
    4. 11.4 Seq2Seq with attention
      1. 11.4.1 Implementing Seq2Seq
      2. 11.4.2 Training and evaluation
    5. Exercises
    6. Summary
  20. 12 Network design alternatives to RNNs
    1. 12.1 TorchText: Tools for text problems
      1. 12.1.1 Installing TorchText
      2. 12.1.2 Loading datasets in TorchText
      3. 12.1.3 Defining a baseline model
    2. 12.2 Averaging embeddings over time
      1. 12.2.1 Weighted average over time with attention
    3. 12.3 Pooling over time and 1D CNNs
    4. 12.4 Positional embeddings add sequence information to any model
      1. 12.4.1 Implementing a positional encoding module
      2. 12.4.2 Defining positional encoding models
    5. 12.5 Transformers: Big models for big data
      1. 12.5.1 Multiheaded attention
      2. 12.5.2 Transformer blocks
    6. Exercises
    7. Summary
  21. 13 Transfer learning
    1. 13.1 Transferring model parameters
      1. 13.1.1 Preparing an image dataset
    2. 13.2 Transfer learning and training with CNNs
      1. 13.2.1 Adjusting pretrained networks
      2. 13.2.2 Preprocessing for pretrained ResNet
      3. 13.2.3 Training with warm starts
      4. 13.2.4 Training with frozen weights
    3. 13.3 Learning with fewer labels
    4. 13.4 Pretraining with text
      1. 13.4.1 Transformers with the Hugging Face library
      2. 13.4.2 Freezing weights with no-grad
    5. Exercises
    6. Summary
  22. 14 Advanced building blocks
    1. 14.1 Problems with pooling
      1. 14.1.1 Aliasing compromises translation invariance
      2. 14.1.2 Anti-aliasing by blurring
      3. 14.1.3 Applying anti-aliased pooling
    2. 14.2 Improved residual blocks
      1. 14.2.1 Effective depth
      2. 14.2.2 Implementing ReZero
    3. 14.3 MixUp training reduces overfitting
      1. 14.3.1 Picking the mix rate
      2. 14.3.2 Implementing MixUp
    4. Exercises
    5. Summary
  23. Appendix. Setting up Colab
    1. A.1 Creating a Colab session
      1. Adding a GPU
      2. Testing your GPU
  24. Index
  25. inside back cover

Product information

  • Title: Inside Deep Learning
  • Author(s): Edward Raff
  • Release date: June 2022
  • Publisher(s): Manning Publications
  • ISBN: 9781617298639