Mastering Computer Vision with TensorFlow 2.x

Book description

Apply neural network architectures to build state-of-the-art computer vision applications using the Python programming language

Key Features

  • Gain a fundamental understanding of advanced computer vision and neural network models in use today
  • Cover tasks such as low-level vision, image classification, and object detection
  • Develop deep learning models on cloud platforms and optimize them using TensorFlow Lite and the OpenVINO toolkit

Book Description

Computer vision allows machines to gain human-level understanding to visualize, process, and analyze images and videos. This book focuses on using TensorFlow to help you learn advanced computer vision tasks such as image acquisition, processing, and analysis. You'll start with the key principles of computer vision and deep learning to build a solid foundation, before covering neural network architectures and understanding how they work rather than using them as a black box. Next, you'll explore architectures such as VGG, ResNet, Inception, R-CNN, SSD, YOLO, and MobileNet. As you advance, you'll learn to use visual search methods using transfer learning. You'll also cover advanced computer vision concepts such as semantic segmentation, image inpainting with GAN's, object tracking, video segmentation, and action recognition. Later, the book focuses on how machine learning and deep learning concepts can be used to perform tasks such as edge detection and face recognition. You'll then discover how to develop powerful neural network models on your PC and on various cloud platforms. Finally, you'll learn to perform model optimization methods to deploy models on edge devices for real-time inference. By the end of this book, you'll have a solid understanding of computer vision and be able to confidently develop models to automate tasks.

What you will learn

  • Explore methods of feature extraction and image retrieval and visualize different layers of the neural network model
  • Use TensorFlow for various visual search methods for real-world scenarios
  • Build neural networks or adjust parameters to optimize the performance of models
  • Understand TensorFlow DeepLab to perform semantic segmentation on images and DCGAN for image inpainting
  • Evaluate your model and optimize and integrate it into your application to operate at scale
  • Get up to speed with techniques for performing manual and automated image annotation

Who this book is for

This book is for computer vision professionals, image processing professionals, machine learning engineers and AI developers who have some knowledge of machine learning and deep learning and want to build expert-level computer vision applications. In addition to familiarity with TensorFlow, Python knowledge will be required to get started with this book.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Mastering Computer Vision with TensorFlow 2.x
  3. About Packt
    1. Why subscribe?
  4. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Section 1: Introduction to Computer Vision and Neural Networks
  7. Computer Vision and TensorFlow Fundamentals
    1. Technical requirements
    2. Detecting edges using image hashing and filtering
      1. Using a Bayer filter for color pattern formation
      2. Creating an image vector
      3. Transforming an image
      4. Linear filtering—convolution with kernels
        1. Image smoothing
          1. The mean filter
          2. The median filter
          3. The Gaussian filter
          4. Image filtering with OpenCV
        2. Image gradient
        3. Image sharpening
      5. Mixing the Gaussian and Laplacian operations
      6. Detecting edges in an image
        1. The Sobel edge detector
        2. The Canny edge detector
    3. Extracting features from an image
      1. Image matching using OpenCV
    4. Object detection using Contours and the HOG detector
      1. Contour detection
      2. Detecting a bounding box
      3. The HOG detector
      4. Limitations of the contour detection method
    5. An overview of TensorFlow, its ecosystem, and installation
      1. TensorFlow versus PyTorch
        1. TensorFlow Installation
    6. Summary
  8. Content Recognition Using Local Binary Patterns
    1. Processing images using LBP
      1. Generating an LBP pattern
      2. Understanding the LBP histogram
        1. Histogram comparison methods
      3. The computational cost of LBP
    2. Applying LBP to texture recognition
    3. Matching face color with foundation color – LBP and its limitations
    4. Matching face color with foundation color – color matching technique
    5. Summary
  9. Facial Detection Using OpenCV and CNN
    1. Applying Viola-Jones AdaBoost learning and the Haar cascade classifier for face recognition
      1. Selecting Haar-like features
      2. Creating an integral image
      3. Running AdaBoost training
      4. Attentional cascade classifiers
      5. Training the cascade detector
    2. Predicting facial key points using a deep neural network
      1. Preparing the dataset for key-point detection
      2. Processing key-point data
        1. Preprocessing before being input into the Keras–Python code
        2. Preprocessing within the Keras–Python code
      3. Defining the model architecture
      4. Training the model to make key point predictions
    3. Predicting facial expressions using a CNN
    4. Overview of 3D face detection
      1. Overview of hardware design for 3D reconstruction
      2. Overview of 3D reconstruction and tracking
      3. Overview of parametric tracking
    5. Summary
  10. Deep Learning on Images
    1. Understanding CNNs and their parameters
      1. Convolution
      2. Convolution over volume – 3 x 3 filter
      3. Convolution over volume – 1 x 1 filter
      4. Pooling
      5. Padding
      6. Stride
      7. Activation
        1. Fully connected layers
      8. Regularization
      9. Dropout
      10. Internal covariance shift and batch normalization
      11. Softmax
    2. Optimizing CNN parameters
      1. Baseline case
      2. Iteration 1 – CNN parameter adjustment
      3. Iteration 2 – CNN parameter adjustment
      4. Iteration 3 – CNN parameter adjustment
      5. Iteration 4 – CNN parameter adjustment
    3. Visualizing the layers of a neural network
      1. Building a custom image classifier model and visualizing its layers
        1. Neural network input and parameters
        2. Input image
        3. Defining the train and validation generators
        4. Developing the model
        5. Compiling and training the model
        6. Inputting a test image and converting it into a tensor
        7. Visualizing the first layer of activation
        8. Visualizing multiple layers of activation
      2. Training an existing advanced image classifier model and visualizing its layers
    4. Summary
  11. Section 2: Advanced Concepts of Computer Vision with TensorFlow
  12. Neural Network Architecture and Models
    1. Overview of AlexNet
    2. Overview of VGG16
    3. Overview of Inception
      1. GoogLeNet detection
    4. Overview of ResNet
    5. Overview of R-CNN
      1. Image segmentation
        1. Clustering-based segmentation
        2. Graph-based segmentation
      2. Selective search
      3. Region proposal
      4. Feature extraction
      5. Classification of the image
      6. Bounding box regression
    6. Overview of Fast R-CNN
    7. Overview of Faster R-CNN
    8. Overview of GANs
    9. Overview of GNNs
      1. Spectral GNN
    10. Overview of Reinforcement Learning
    11. Overview of Transfer Learning
    12. Summary
  13. Visual Search Using Transfer Learning
    1. Coding deep learning models using TensorFlow
      1. Downloading weights
      2. Decoding predictions
      3. Importing other common features
      4. Constructing a model
      5. Inputting images from a directory
      6. Loop function for importing multiple images and processing using TensorFlow Keras
    2. Developing a transfer learning model using TensorFlow
      1. Analyzing and storing data
      2. Importing TensorFlow libraries
      3. Setting up model parameters
      4. Building an input data pipeline
        1. Training data generator
        2. Validation data generator
      5. Constructing the final model using transfer learning
      6. Saving a model with checkpoints
      7. Plotting training history
    3. Understanding the architecture and applications of visual search
      1. The architecture of visual search
      2. Visual search code and explanation
        1. Predicting the class of an uploaded image
        2. Predicting the class of all images
    4. Working with a visual search input pipeline using tf.data
    5. Summary
  14. Object Detection Using YOLO
    1. An overview of YOLO
      1. The concept of IOU
      2. How does YOLO detect objects so fast?
      3. The YOLO v3 neural network architecture
      4. A comparison of YOLO and Faster R-CNN
    2. An introduction to Darknet for object detection
      1. Detecting objects using Darknet
      2. Detecting objects using Tiny Darknet
    3. Real-time prediction using Darknet
    4. YOLO versus YOLO v2 versus YOLO v3
    5. When to train a model?
    6. Training your own image set with YOLO v3 to develop a custom model
      1. Preparing images
      2. Generating annotation files
      3. Converting .xml files to .txt files
      4. Creating a combined train.txt and test.txt file
      5. Creating a list of class name files
      6. Creating a YOLO .data file
      7. Adjusting the YOLO configuration file
      8. Enabling the GPU for training
      9. Start training
    7. An overview of the Feature Pyramid Network and RetinaNet
    8. Summary
  15. Semantic Segmentation and Neural Style Transfer
    1. Overview of TensorFlow DeepLab for semantic segmentation
      1. Spatial Pyramid Pooling
        1. Atrous convolution
        2. Encoder-decoder network
          1. Encoder module
          2. Decoder module
      2. Semantic segmentation in DeepLab – example
        1. Google Colab, Google Cloud TPU, and TensorFlow
    2. Artificial image generation using DCGANs
      1. Generator
      2. Discriminator
      3. Training
        1. Image inpainting using DCGAN
      4. TensorFlow DCGAN – example
    3. Image inpainting using OpenCV
    4. Understanding neural style transfer
    5. Summary
  16. Section 3: Advanced Implementation of Computer Vision with TensorFlow
  17. Action Recognition Using Multitask Deep Learning
    1. Human pose estimation – OpenPose
      1. Theory behind OpenPose
      2. Understanding the OpenPose code
    2. Human pose estimation – stacked hourglass model
      1. Understanding the hourglass model
      2. Coding an hourglass model
        1. argparse block
        2. Training an hourglass network
        3. Creating the hourglass network
          1. Front module
          2. Left half-block
          3. Connect left to right
          4. Right half-block
          5. Head block
        4. Hourglass training
    3. Human pose estimation – PoseNet
      1. Top-down approach
      2. Bottom-up approach
      3. PoseNet implementation
      4. Applying human poses for gesture recognition
    4. Action recognition using various methods
      1. Recognizing actions based on an accelerometer
      2. Combining video-based actions with pose estimation
      3. Action recognition using the 4D method
    5. Summary
  18. Object Detection Using R-CNN, SSD, and R-FCN
    1. An overview of SSD
    2. An overview of R-FCN
    3. An overview of the TensorFlow object detection API
    4. Detecting objects using TensorFlow on Google Cloud
    5. Detecting objects using TensorFlow Hub
    6. Training a custom object detector using TensorFlow and Google Colab
      1. Collecting and formatting images as .jpg files
      2. Annotating images to create a .xml file
      3. Separating the file by train and test folders
      4. Configuring parameters and installing the required packages
      5. Creating TensorFlow records
      6. Preparing the model and configuring the training pipeline
      7. Monitoring training progress using TensorBoard
        1. TensorBoard running on a local machine
        2. TensorBoard running on Google Colab
      8. Training the model
      9. Running an inference test
      10. Caution when using the neural network model
    7. An overview of Mask R-CNN and a Google Colab demonstration
    8. Developing an object tracker model to complement the object detector
      1. Centroid-based tracking
      2. SORT tracking
      3. DeepSORT tracking
      4. The OpenCV tracking method
      5. Siamese network-based tracking
      6. SiamMask-based tracking
    9. Summary
  19. Section 4: TensorFlow Implementation at the Edge and on the Cloud
  20. Deep Learning on Edge Devices with CPU/GPU Optimization
    1. Overview of deep learning on edge devices
    2. Techniques used for GPU/CPU optimization
    3. Overview of MobileNet
    4. Image processing with a Raspberry Pi
      1. Raspberry Pi hardware setup
      2. Raspberry Pi camera software setup
      3. OpenCV installation in Raspberry Pi
      4. OpenVINO installation in Raspberry Pi
      5. Installing the OpenVINO toolkit components
        1. Setting up the environmental variable
        2. Adding a USB rule
        3. Running inference using Python code
        4. Advanced inference
          1. Face detection, pedestrian detection, and vehicle detection
          2. Landmark models
          3. Models for action recognition
          4. License plate, gaze, and person detection
    5. Model conversion and inference using OpenVINO
      1. Running inference in a Terminal using ncappzoo
      2. Converting the pre-trained model for inference
        1. Converting from a TensorFlow model developed using Keras
    6. Converting a TensorFlow model developed using the TensorFlow Object Detection API
      1. Summary of the OpenVINO Model inference process
    7. Application of TensorFlow Lite
      1. Converting a TensorFlow model into tflite format
        1. Python API
        2. TensorFlow Object Detection API – tflite_convert
        3. TensorFlow Object Detection API – toco
      2. Model optimization
    8. Object detection on Android phones using TensorFlow Lite
    9. Object detection on Raspberry Pi using TensorFlow Lite
      1. Image classification
      2. Object detection
    10. Object detection on iPhone using TensorFlow Lite and Create ML
      1. TensorFlow Lite conversion model for iPhone
      2. Core ML
      3. Converting a TensorFlow model into Core ML format
    11. A summary of various annotation methods
      1. Outsource labeling work to a third party
      2. Automated or semi-automated labeling
    12. Summary
  21. Cloud Computing Platform for Computer Vision
    1. Training an object detector in GCP
      1. Creating a project in GCP
      2. The GCP setup
      3. The Google Cloud Storage bucket setup
        1. Setting up a bucket using the GCP API
        2. Setting up a bucket using Ubuntu Terminal
      4. Setting up the Google Cloud SDK
      5. Linking your terminal to the Google Cloud project and bucket
      6. Installing the TensorFlow object detection API
      7. Preparing the dataset
        1. TFRecord and labeling map data
          1. Data preparation
          2. Data upload
        2. The model.ckpt files
        3. The model config file
      8. Training in the cloud
      9. Viewing the model output in TensorBoard
      10. The model output and conversion into a frozen graph
      11. Executing export tflite graph.py from Google Colab
    2. Training an object detector in the AWS SageMaker cloud platform
      1. Setting up an AWS account, billing, and limits
      2. Converting a .xml file to JSON format
      3. Uploading data to the S3 bucket
      4. Creating a notebook instance and beginning training
      5. Fixing some common failures during training
    3. Training an object detector in the Microsoft Azure cloud platform
      1. Creating an Azure account and setting up Custom Vision
      2. Uploading training images and tagging them
    4. Training at scale and packaging
      1. Application packaging
    5. The general idea behind cloud-based visual search
    6. Analyzing images and search mechanisms in various cloud platforms
      1. Visual search using GCP
      2. Visual search using AWS
      3. Visual search using Azure
    7. Summary
  22. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Mastering Computer Vision with TensorFlow 2.x
  • Author(s): Krishnendu Kar
  • Release date: May 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781838827069