Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision

Book description

Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision provides an overview of general deep learning methodology and its applications of natural language processing (NLP), speech and computer vision tasks.

Table of contents

  1. Cover
  2. Half Title
  3. Title
  4. Copyright
  5. Dedication
  6. Contents
  7. About the Authors
  8. Preface
  9. Acknowledgments
  10. Chapter 1 Introduction
    1. Learning Outcomes
    2. 1.1 Introduction
      1. 1.1.1 Subsets of Artificial Intelligence
      2. 1.1.2 Three Horizons of Deep Learning Applications
      3. 1.1.3 Natural Language Processing
      4. 1.1.4 Speech Recognition
      5. 1.1.5 Computer Vision
    3. 1.2 Machine Learning Methods for NLP, Computer Vision (CV), and Speech
      1. 1.2.1 Support Vector Machine (SVM)
      2. 1.2.2 Bagging
      3. 1.2.3 Gradient-boosted Decision Trees (GBDTs)
      4. 1.2.4 Naïve Bayes
      5. 1.2.5 Logistic Regression
      6. 1.2.6 Dimensionality Reduction Techniques
    4. 1.3 Tools, Libraries, Datasets, and Resources for the Practitioners
      1. 1.3.1 TensorFlow
      2. 1.3.2 Keras
      3. 1.3.3 Deeplearning4j
      4. 1.3.4 Caffe
      5. 1.3.5 ONNX
      6. 1.3.6 PyTorch
      7. 1.3.7 scikit-learn
      8. 1.3.8 NumPy
      9. 1.3.9 Pandas
      10. 1.3.10 NLTK
      11. 1.3.11 Gensim
      12. 1.3.12 Datasets
    5. 1.4 Summary
    6. Bibliography
  11. Chapter 2 Natural Language Processing
    1. Learning Outcomes
    2. 2.1 Natural Language Processing
    3. 2.2 Generic NLP Pipeline
      1. 2.2.1 Data Acquisition
      2. 2.2.2 Text Cleaning
    4. 2.3 Text Pre-processing
      1. 2.3.1 Noise Removal
      2. 2.3.2 Stemming
      3. 2.3.3 Tokenization
      4. 2.3.4 Lemmatization
      5. 2.3.5 Stop Word Removal
      6. 2.3.6 Parts of Speech Tagging
    5. 2.4 Feature Engineering
    6. 2.5 Modeling
      1. 2.5.1 Start with Simple Heuristics
      2. 2.5.2 Building Your Model
      3. 2.5.3 Metrics to Build Model
    7. 2.6 Evaluation
    8. 2.7 Deployment
    9. 2.8 Monitoring and Model Updating
    10. 2.9 Vector Representation for NLP
      1. 2.9.1 One Hot Vector Encoding
      2. 2.9.2 Word Embeddings
      3. 2.9.3 Bag of Words
      4. 2.9.4 TF-IDF
      5. 2.9.5 N-gram
      6. 2.9.6 Word2Vec
      7. 2.9.7 Glove
      8. 2.9.8 ElMo
    11. 2.10 Language Modeling with n-grams
      1. 2.10.1 Evaluating Language Models
      2. 2.10.2 Smoothing
      3. 2.10.3 Kneser-Ney Smoothing
    12. 2.11 Vector Semantics and Embeddings
      1. 2.11.1 Lexical Semantics
      2. 2.11.2 Vector Semantics
      3. 2.11.3 Cosine for Measuring Similarity
      4. 2.11.4 Bias and Embeddings
    13. 2.12 Summary
    14. Bibliography
  12. Chapter 3 State-of-the-Art Natural Language Processing
    1. Learning Outcomes
    2. 3.1 Introduction
    3. 3.2 Sequence-to-Sequence Models
      1. 3.2.1 Sequence
      2. 3.2.2 Sequence Labeling
      3. 3.2.3 Sequence Modeling
    4. 3.3 Recurrent Neural Networks
      1. 3.3.1 Unrolling RNN
      2. 3.3.2 RNN-based POS Tagging Use Case
      3. 3.3.3 Challenges in RNN
    5. 3.4 Attention Mechanisms
      1. 3.4.1 Self-attention Mechanism
      2. 3.4.2 Multi-head Attention Mechanism
      3. 3.4.3 Bahdanau Attention
      4. 3.4.4 Luong Attention
      5. 3.4.5 Global Attention versus Local Attention
      6. 3.4.6 Hierarchical Attention
    6. 3.5 Transformer Model
      1. 3.5.1 Bidirectional Encoder, Representations, and Transformers (BERT)
      2. 3.5.2 GPT3
    7. 3.6 Summary
    8. Bibliography
  13. Chapter 4 Applications of Natural Language Processing
    1. Learning Outcomes
    2. 4.1 Introduction
    3. 4.2 Word Sense Disambiguation
      1. 4.2.1 Word Senses
      2. 4.2.2 WordNet: A Database of Lexical Relations
      3. 4.2.3 Approaches to Word Sense Disambiguation
      4. 4.2.4 Applications of Word Sense Disambiguation
    4. 4.3 Text Classification
      1. 4.3.1 Building the Text Classification Model
      2. 4.3.2 Applications of Text Classification
      3. 4.3.3 Other Applications
    5. 4.4 Sentiment Analysis
      1. 4.4.1 Types of Sentiment Analysis
    6. 4.5 Spam Email Classification
      1. 4.5.1 History of Spam
      2. 4.5.2 Spamming Techniques
      3. 4.5.3 Types of Spams
    7. 4.6 Question Answering
      1. 4.6.1 Components of Question Answering System
      2. 4.6.2 Information Retrieval-based Factoid Question and Answering
      3. 4.6.3 Entity Linking
      4. 4.6.4 Knowledge-based Question Answering
    8. 4.7 Chatbots and Dialog Systems
      1. 4.7.1 Properties of Human Conversation
      2. 4.7.2 Chatbots
      3. 4.7.3 The Dialog-state Architecture
    9. 4.8 Summary
    10. Bibliography
  14. Chapter 5 Fundamentals of Speech Recognition
    1. Learning Outcomes
    2. 5.1 Introduction
      1. 5.2 Structure of Speech
    3. 5.3 Basic Audio Features
      1. 5.3.1 Pitch
      2. 5.3.2 Timbral Features
      3. 5.3.3 Rhythmic Features
      4. 5.3.4 MPEG-7 Features
    4. 5.4 Characteristics of Speech Recognition System
      1. 5.4.1 Pronunciations
      2. 5.4.2 Vocabulary
      3. 5.4.3 Grammars
      4. 5.4.4 Speaker Dependence
    5. 5.5 The Working of a Speech Recognition System
      1. 5.5.1 Input Speech
      2. 5.5.2 Audio Pre-processing
      3. 5.5.3 Feature Extraction
    6. 5.6 Audio Feature Extraction Techniques
      1. 5.6.1 Spectrogram
      2. 5.6.2 MFCC
      3. 5.6.3 Short-Time Fourier Transform
      4. 5.6.4 Linear Prediction Coefficients (LPCC)
      5. 5.6.5 Discrete Wavelet Transform (DWT)
      6. 5.6.6 Perceptual Linear Prediction (PLP)
    7. 5.7 Statistical Speech Recognition
      1. 5.7.1 Acoustic Model
      2. 5.7.2 Pronunciation Model
      3. 5.7.3 Language Model
      4. 5.7.4 Conventional ASR Approaches
    8. 5.8 Speech Recognition Applications
      1. 5.8.1 In Banking
      2. 5.8.2 In-Car Systems
      3. 5.8.3 Health Care
      4. 5.8.4 Experiments by Different Speech Groups for Large-Vocabulary Speech Recognition
      5. 5.8.5 Measure of Performance
    9. 5.9 Challenges in Speech Recognition
      1. 5.9.1 Vocabulary Size
      2. 5.9.2 Speaker-Dependent or -Independent
      3. 5.9.3 Isolated, Discontinuous, and Continuous Speech
      4. 5.9.4 Phonetics
      5. 5.9.5 Adverse Conditions
    10. 5.10 Open-source Toolkits for Speech Recognition
      1. 5.10.1 Frameworks
      2. 5.10.2 Additional Tools and Libraries
    11. 5.11 Summary
    12. Bibliography
  15. Chapter 6 Deep Learning Models for Speech Recognition
    1. Learning Outcomes
    2. 6.1 Traditional Methods of Speech Recognition
      1. 6.1.1 Hidden Markov Models (HMMs)
      2. 6.1.2 Gaussian Mixture Models (GMMs)
      3. 6.1.3 Artificial Neural Network (ANN)
      4. 6.1.4 HMM and ANN Acoustic Modeling
      5. 6.1.5 Deep Belief Neural Network (DBNN) for Acoustic Modelling
    3. 6.2 RNN-based Encoder–Decoder Architecture
    4. 6.3 Encoder
    5. 6.4 Decoder
    6. 6.5 Attention-based Encoder–Decoder Architecture
    7. 6.6 Challenges in Traditional ASR and the Motivation for End-to-End ASR
    8. 6.7 Summary
    9. Bibliography
  16. Chapter 7 End-to-End Speech Recognition Models
    1. Learning Outcomes
    2. 7.1 End-to-End Speech Recognition Models
      1. 7.1.1 Definition of End-to-End ASR System
      2. 7.1.2 Connectionist Temporal Classification (CTC)
      3. 7.1.3 Deep Speech
      4. 7.1.4 Deep Speech 2
      5. 7.1.5 Listen, Attend, Spell (LAS) Model
      6. 7.1.6 JASPER
      7. 7.1.7 QuartzNet
    3. 7.2 Self-supervised Models for Automatic Speech Recognition
      1. 7.2.1 Wav2Vec
      2. 7.2.2 Data2Vec
      3. 7.2.3 HuBERT
    4. 7.3 Online/Streaming ASR
      1. 7.3.1 RNN-transducer-Based Streaming ASR
      2. 7.3.2 Wav2Letter for Streaming ASR
      3. 7.3.3 Conformer Model
    5. 7.4 Summary
    6. Bibliography
  17. Chapter 8 Computer Vision Basics
    1. Learning Outcomes
    2. 8.1 Introduction
      1. 8.1.1 Fundamental Steps for Computer Vision
      2. 8.1.2 Fundamental Steps in Digital Image Processing
    3. 8.2 Image Segmentation
      1. 8.2.1 Steps in Image Segmentation
    4. 8.3 Feature Extraction
    5. 8.4 Image Classification
      1. 8.4.1 Image Classification Using Convolutional Neural Network (CNN)
      2. 8.4.2 Convolution Layer
      3. 8.4.3 Pooling or Down Sampling Layer
      4. 8.4.4 Flattening Layer
      5. 8.4.5 Fully Connected Layer
      6. 8.4.6 Activation Function
    6. 8.5 Tools and Libraries for Computer Vision
      1. 8.5.1 OpenCV
      2. 8.5.2 MATLAB
    7. 8.6 Applications of Computer Vision
      1. 8.6.1 Object Detection
      2. 8.6.2 Face Recognition
      3. 8.6.3 Number Plate Identification
      4. 8.6.4 Image-based Search
      5. 8.6.5 Medical Imaging
    8. 8.7 Summary
    9. Bibliography
  18. Chapter 9 Deep Learning Models for Computer Vision
    1. Learning Outcomes
    2. 9.1 Deep Learning for Computer Vision
    3. 9.2 Pre-trained Architectures for Computer Vision
      1. 9.2.1 LeNet
      2. 9.2.2 AlexNet
      3. 9.2.3 VGG
      4. 9.2.4 Inception
      5. 9.2.5 R-CNN
      6. 9.2.6 Fast R-CNN
      7. 9.2.7 Faster R-CNN
      8. 9.2.8 Mask R-CNN
      9. 9.2.9 YOLO
    4. 9.3 Summary
    5. Bibliography
  19. Chapter 10 Applications of Computer Vision
    1. Learning Outcomes
    2. 10.1 Introduction
    3. 10.2 Optical Character Recognition
      1. 10.2.1 Code Snippets
      2. 10.2.2 Result Analysis
    4. 10.3 Face and Facial Expression Recognition
      1. 10.3.1 Face Recognition
      2. 10.3.2 Facial Recognition System
      3. 10.3.3 Major Challenges in Recognizing Face Expression
      4. 10.3.4 Result Analysis
    5. 10.4 Visual-based Gesture Recognition
      1. 10.4.1 Framework Used
      2. 10.4.2 Code Snippets
      3. 10.4.3 Result Analysis
      4. 10.4.4 Major Challenges in Gesture Recognition
    6. 10.5 Posture Detection and Correction
      1. 10.5.1 Framework Used
      2. 10.5.2 Squats
      3. 10.5.3 Result Analysis
    7. 10.6 Summary
    8. Bibliography
  20. Index

Product information

  • Title: Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision
  • Author(s): L. Ashok Kumar, D. Karthika Renuka
  • Release date: May 2023
  • Publisher(s): CRC Press
  • ISBN: 9781000875607