Deep Learning for Vision Systems

Book description

Computer vision is central to many leading-edge innovations, including self-driving cars, drones, augmented reality, facial recognition, and much, much more. Amazing new computer vision applications are developed every day, thanks to rapid advances in AI and deep learning (DL). Deep Learning for Vision Systems teaches you the concepts and tools for building intelligent, scalable computer vision systems that can identify and react to objects in images, videos, and real life. With author Mohamed Elgendy's expert instruction and illustration of real-world projects, you’ll finally grok state-of-the-art deep learning techniques, so you can build, contribute to, and lead in the exciting realm of computer vision!

About the Technology
How much has computer vision advanced? One ride in a Tesla is the only answer you’ll need. Deep learning techniques have led to exciting breakthroughs in facial recognition, interactive simulations, and medical imaging, but nothing beats seeing a car respond to real-world stimuli while speeding down the highway.

About the Book
How does the computer learn to understand what it sees? Deep Learning for Vision Systems answers that by applying deep learning to computer vision. Using only high school algebra, this book illuminates the concepts behind visual intuition. You'll understand how to use deep learning architectures to build vision system applications for image generation and facial recognition.

What's Inside
  • Image classification and object detection
  • Advanced deep learning architectures
  • Transfer learning and generative adversarial networks
  • DeepDream and neural style transfer
  • Visual embeddings and image search


About the Reader
For intermediate Python programmers.

About the Author
Mohamed Elgendy is the VP of Engineering at Rakuten. A seasoned AI expert, he has previously built and managed AI products at Amazon and Twilio.

Quotes
From text and object detection to DeepDream and facial recognition...this book is comprehensive, approachable, and relevant for modern applications of deep learning to computer vision systems!
- Bojan Djurkovic, DigitalOcean

Real-world problem solving without drowning you in details. It elaborates concepts bit by bit, making them easy to assimilate.
- Burhan Ul Haq, Audit XPRT

An invaluable and comprehensive tour for anyone looking to build real-world vision systems.
- Richard Vaughan, Purple Monkey Collective

Shows you what’s behind modern technologies that allow computers to see things.
- Alessandro Campeis, Vimar

Table of contents

  1. Deep Learning for Vision Systems
  2. Copyright
  3. dedication
  4. contents
  5. front matter
    1. preface
    2. acknowledgments
    3. about this book
      1. Who should read this book
      2. How this book is organized: A roadmap
      3. About the code
      4. liveBook discussion forum
    4. about the author
    5. about the cover illustration
  6. Part 1. Deep learning foundation
  7. 1 Welcome to computer vision
    1. 1.1 Computer vision
      1. 1.1.1 What is visual perception?
      2. 1.1.2 Vision systems
      3. 1.1.3 Sensing devices
      4. 1.1.4 Interpreting devices
    2. 1.2 Applications of computer vision
      1. 1.2.1 Image classification
      2. 1.2.2 Object detection and localization
      3. 1.2.3 Generating art (style transfer)
      4. 1.2.4 Creating images
      5. 1.2.5 Face recognition
      6. 1.2.6 Image recommendation system
    3. 1.3 Computer vision pipeline: The big picture
    4. 1.4 Image input
      1. 1.4.1 Image as functions
      2. 1.4.2 How computers see images
      3. 1.4.3 Color images
    5. 1.5 Image preprocessing
      1. 1.5.1 Converting color images to grayscale to reduce computation complexity
    6. 1.6 Feature extraction
      1. 1.6.1 What is a feature in computer vision?
      2. 1.6.2 What makes a good (useful) feature?
      3. 1.6.3 Extracting features (handcrafted vs. automatic extracting)
    7. 1.7 Classifier learning algorithm
    8. Summary
  8. 2 Deep learning and neural networks
    1. 2.1 Understanding perceptrons
      1. 2.1.1 What is a perceptron?
      2. 2.1.2 How does the perceptron learn?
      3. 2.1.3 Is one neuron enough to solve complex problems?
    2. 2.2 Multilayer perceptrons
      1. 2.2.1 Multilayer perceptron architecture
      2. 2.2.2 What are hidden layers?
      3. 2.2.3 How many layers, and how many nodes in each layer?
      4. 2.2.4 Some takeaways from this section
    3. 2.3 Activation functions
      1. 2.3.1 Linear transfer function
      2. 2.3.2 Heaviside step function (binary classifier)
      3. 2.3.3 Sigmoid/logistic function
      4. Softmax function
      5. 2.3.5 Hyperbolic tangent function (tanh)
      6. 2.3.6 Rectified linear unit
      7. 2.3.7 Leaky ReLU
    4. 2.4 The feedforward process
      1. 2.4.1 Feedforward calculations
      2. 2.4.2 Feature learning
    5. 2.5 Error functions
      1. 2.5.1 What is the error function?
      2. 2.5.2 Why do we need an error function?
      3. 2.5.3 Error is always positive
      4. 2.5.4 Mean square error
      5. 2.5.5 Cross-entropy
      6. 2.5.6 A final note on errors and weights
    6. 2.6 Optimization algorithms
      1. 2.6.1 What is optimization?
      2. 2.6.2 Batch gradient descent
      3. 2.6.3 Stochastic gradient descent
      4. 2.6.4 Mini-batch gradient descent
      5. 2.6.5 Gradient descent takeaways
    7. 2.7 Backpropagation
      1. 2.7.1 What is backpropagation?
      2. 2.7.2 Backpropagation takeaways
    8. Summary
  9. 3 Convolutional neural networks
    1. 3.1 Image classification using MLP
      1. 3.1.1 Input layer
      2. 3.1.2 Hidden layers
      3. 3.1.3 Output layer
      4. 3.1.4 Putting it all together
      5. 3.1.5 Drawbacks of MLPs for processing images
    2. 3.2 CNN architecture
      1. 3.2.1 The big picture
      2. 3.2.2 A closer look at feature extraction
      3. 3.2.3 A closer look at classification
    3. 3.3 Basic components of a CNN
      1. 3.3.1 Convolutional layers
      2. 3.3.2 Pooling layers or subsampling
      3. 3.3.3 Fully connected layers
    4. 3.4 Image classification using CNNs
      1. 3.4.1 Building the model architecture
      2. 3.4.2 Number of parameters (weights)
    5. 3.5 Adding dropout layers to avoid overfitting
      1. 3.5.1 What is overfitting?
      2. 3.5.2 What is a dropout layer?
      3. 3.5.3 Why do we need dropout layers?
      4. 3.5.4 Where does the dropout layer go in the CNN architecture?
    6. 3.6 Convolution over color images (3D images)
      1. 3.6.1 How do we perform a convolution on a color image?
      2. 3.6.2 What happens to the computational complexity?
    7. 3.7 Project: Image classification for color images
    8. Summary
  10. 4 Structuring DL projects and hyperparameter tuning
    1. 4.1 Defining performance metrics
      1. 4.1.1 Is accuracy the best metric for evaluating a model?
      2. 4.1.2 Confusion matrix
      3. 4.1.3 Precision and recall
      4. 4.1.4 F-score
    2. 4.2 Designing a baseline model
    3. 4.3 Getting your data ready for training
      1. 4.3.1 Splitting your data for train/validation/test
      2. 4.3.2 Data preprocessing
    4. 4.4 Evaluating the model and interpreting its performance
      1. 4.4.1 Diagnosing overfitting and underfitting
      2. 4.4.2 Plotting the learning curves
      3. 4.4.3 Exercise: Building, training, and evaluating a network
    5. 4.5 Improving the network and tuning hyperparameters
      1. 4.5.1 Collecting more data vs. tuning hyperparameters
      2. 4.5.2 Parameters vs. hyperparameters
      3. 4.5.3 Neural network hyperparameters
      4. 4.5.4 Network architecture
    6. 4.6 Learning and optimization
      1. 4.6.1 Learning rate and decay schedule
      2. 4.6.2 A systematic approach to find the optimal learning rate
      3. 4.6.3 Learning rate decay and adaptive learning
      4. 4.6.4 Mini-batch size
    7. 4.7 Optimization algorithms
      1. 4.7.1 Gradient descent with momentum
      2. 4.7.2 Adam
      3. 4.7.3 Number of epochs and early stopping criteria
      4. 4.7.4 Early stopping
    8. 4.8 Regularization techniques to avoid overfitting
      1. 4.8.1 L2 regularization
      2. 4.8.2 Dropout layers
      3. 4.8.3 Data augmentation
    9. 4.9 Batch normalization
      1. 4.9.1 The covariate shift problem
      2. 4.9.2 Covariate shift in neural networks
      3. 4.9.3 How does batch normalization work?
      4. 4.9.4 Batch normalization implementation in Keras
      5. 4.9.5 Batch normalization recap
    10. 4.10 Project: Achieve high accuracy on image classification
    11. Summary
  11. Part 2. Image classification and detection
  12. 5 Advanced CNN architectures
    1. 5.1 CNN design patterns
    2. 5.2 LeNet-5
      1. 5.2.1 LeNet architecture
      2. 5.2.2 LeNet-5 implementation in Keras
      3. 5.2.3 Setting up the learning hyperparameters
      4. 5.2.4 LeNet performance on the MNIST dataset
    3. 5.3 AlexNet
      1. 5.3.1 AlexNet architecture
      2. 5.3.2 Novel features of AlexNet
      3. 5.3.3 AlexNet implementation in Keras
      4. 5.3.4 Setting up the learning hyperparameters
      5. 5.3.5 AlexNet performance
    4. 5.4 VGGNet
      1. 5.4.1 Novel features of VGGNet
      2. 5.4.2 VGGNet configurations
      3. 5.4.3 Learning hyperparameters
      4. 5.4.4 VGGNet performance
    5. 5.5 Inception and GoogLeNet
      1. 5.5.1 Novel features of Inception
      2. 5.5.2 Inception module: Naive version
      3. 5.5.3 Inception module with dimensionality reduction
      4. 5.5.4 Inception architecture
      5. 5.5.5 GoogLeNet in Keras
      6. 5.5.6 Learning hyperparameters
      7. 5.5.7 Inception performance on the CIFAR dataset
    6. 5.6 ResNet
      1. 5.6.1 Novel features of ResNet
      2. 5.6.2 Residual blocks
      3. 5.6.3 ResNet implementation in Keras
      4. 5.6.4 Learning hyperparameters
      5. 5.6.5 ResNet performance on the CIFAR dataset
    7. Summary
  13. 6 Transfer learning
    1. 6.1 What problems does transfer learning solve?
    2. 6.2 What is transfer learning?
    3. 6.3 How transfer learning works
      1. 6.3.1 How do neural networks learn features?
      2. 6.3.2 Transferability of features extracted at later layers
    4. 6.4 Transfer learning approaches
      1. 6.4.1 Using a pretrained network as a classifier
      2. 6.4.2 Using a pretrained network as a feature extractor
      3. 6.4.3 Fine-tuning
    5. 6.5 Choosing the appropriate level of transfer learning
      1. 6.5.1 Scenario 1: Target dataset is small and similar to the source dataset
      2. 6.5.2 Scenario 2: Target dataset is large and similar to the source dataset
      3. 6.5.3 Scenario 3: Target dataset is small and different from the source dataset
      4. 6.5.4 Scenario 4: Target dataset is large and different from the source dataset
      5. 6.5.5 Recap of the transfer learning scenarios
    6. 6.6 Open source datasets
      1. 6.6.1 MNIST
      2. 6.6.2 Fashion-MNIST
      3. 6.6.3 CIFAR
      4. 6.6.4 ImageNet
      5. 6.6.5 MS COCO
      6. 6.6.6 Google Open Images
      7. 6.6.7 Kaggle
    7. 6.7 Project 1: A pretrained network as a feature extractor
    8. 6.8 Project 2: Fine-tuning
    9. Summary
  14. 7 Object detection with R-CNN, SSD, and YOLO
    1. 7.1 General object detection framework
      1. 7.1.1 Region proposals
      2. 7.1.2 Network predictions
      3. 7.1.3 Non-maximum suppression (NMS)
      4. 7.1.4 Object-detector evaluation metrics
    2. 7.2 Region-based convolutional neural networks (R-CNNs)
      1. 7.2.1 R-CNN
      2. 7.2.2 Fast R-CNN
      3. 7.2.3 Faster R-CNN
      4. 7.2.4 Recap of the R-CNN family
    3. 7.3 Single-shot detector (SSD)
      1. 7.3.1 High-level SSD architecture
      2. 7.3.2 Base network
      3. 7.3.3 Multi-scale feature layers
      4. 7.3.4 Non-maximum suppression
    4. 7.4 You only look once (YOLO)
      1. 7.4.1 How YOLOv3 works
      2. 7.4.2 YOLOv3 architecture
    5. 7.5 Project: Train an SSD network in a self-driving car application
      1. 7.5.1 Step 1: Build the model
      2. 7.5.2 Step 2: Model configuration
      3. 7.5.3 Step 3: Create the model
      4. 7.5.4 Step 4: Load the data
      5. 7.5.5 Step 5: Train the model
      6. 7.5.6 Step 6: Visualize the loss
      7. 7.5.7 Step 7: Make predictions
    6. Summary
  15. Part 3. Generative models and visual embeddings
  16. 8 Generative adversarial networks (GANs)
    1. 8.1 GAN architecture
      1. 8.1.1 Deep convolutional GANs (DCGANs)
      2. 8.1.2 The discriminator model
      3. 8.1.3 The generator model
      4. 8.1.4 Training the GAN
      5. 8.1.5 GAN minimax function
    2. 8.2 Evaluating GAN models
      1. 8.2.1 Inception score
      2. 8.2.2 Fréchet inception distance (FID)
      3. 8.2.3 Which evaluation scheme to use
    3. 8.3 Popular GAN applications
      1. 8.3.1 Text-to-photo synthesis
      2. 8.3.2 Image-to-image translation (Pix2Pix GAN)
      3. 8.3.3 Image super-resolution GAN (SRGAN)
      4. 8.3.4 Ready to get your hands dirty?
    4. 8.4 Project: Building your own GAN
    5. Summary
  17. 9 DeepDream and neural style transfer
    1. 9.1 How convolutional neural networks see the world
      1. 9.1.1 Revisiting how neural networks work
      2. 9.1.2 Visualizing CNN features
      3. 9.1.3 Implementing a feature visualizer
    2. 9.2 DeepDream
      1. 9.2.1 How the DeepDream algorithm works
      2. 9.2.2 DeepDream implementation in Keras
    3. 9.3 Neural style transfer
      1. 9.3.1 Content loss
      2. 9.3.2 Style loss
      3. 9.3.3 Total variance loss
      4. 9.3.4 Network training
    4. Summary
  18. 10 Visual embeddings
    1. 10.1 Applications of visual embeddings
      1. 10.1.1 Face recognition
      2. 10.1.2 Image recommendation systems
      3. 10.1.3 Object re-identification
    2. 10.2 Learning embedding
    3. 10.3 Loss functions
      1. 10.3.1 Problem setup and formalization
      2. 10.3.2 Cross-entropy loss
      3. 10.3.3 Contrastive loss
      4. 10.3.4 Triplet loss
      5. 10.3.5 Naive implementation and runtime analysis of losses
    4. 10.4 Mining informative data
      1. 10.4.1 Dataloader
      2. 10.4.2 Informative data mining: Finding useful triplets
      3. 10.4.3 Batch all (BA)
      4. 10.4.4 Batch hard (BH)
      5. 10.4.5 Batch weighted (BW)
      6. 10.4.6 Batch sample (BS)
    5. 10.5 Project: Train an embedding network
      1. 10.5.1 Fashion: Get me items similar to this
      2. 10.5.2 Vehicle re-identification
      3. 10.5.3 Implementation
      4. 10.5.4 Testing a trained model
    6. 10.6 Pushing the boundaries of current accuracy
    7. Summary
    8. References
  19. appendix A. Getting set up
    1. A.1 Downloading the code repository
    2. A.2 Installing Anaconda
    3. A.3 Setting up your DL environment
      1. A.3.1 Setting up your development environment manually
      2. A.3.2 Using the conda environment in the book’s repo
      3. A.3.3 Saving and loading environments
    4. A.4 Setting up your AWS EC2 environment
      1. A.4.1 Creating an AWS account
      2. A.4.2 Connecting remotely to your instance
      3. A.4.3 Running your Jupyter notebook
  20. index

Product information

  • Title: Deep Learning for Vision Systems
  • Author(s): Mohamed Elgendy
  • Release date: November 2020
  • Publisher(s): Manning Publications
  • ISBN: 9781617296192