Natural Language Processing with TensorFlow - Second Edition

Book description

From introductory NLP tasks to Transformer models, this new edition teaches you to utilize powerful TensorFlow APIs to implement end-to-end NLP solutions driven by performant ML (Machine Learning) models

Key Features

  • Learn to solve common NLP problems effectively with TensorFlow 2.x
  • Implement end-to-end data pipelines guided by the underlying ML model architecture
  • Use advanced LSTM techniques for complex data transformations, custom models and metrics

Book Description

Learning how to solve natural language processing (NLP) problems is an important skill to master due to the explosive growth of data combined with the demand for machine learning solutions in production. Natural Language Processing with TensorFlow, Second Edition, will teach you how to solve common real-world NLP problems with a variety of deep learning model architectures.

The book starts by getting readers familiar with NLP and the basics of TensorFlow. Then, it gradually teaches you different facets of TensorFlow 2.x. In the following chapters, you then learn how to generate powerful word vectors, classify text, generate new text, and generate image captions, among other exciting use-cases of real-world NLP.

TensorFlow has evolved to be an ecosystem that supports a machine learning workflow through ingesting and transforming data, building models, monitoring, and productionization. We will then read text directly from files and perform the required transformations through a TensorFlow data pipeline. We will also see how to use a versatile visualization tool known as TensorBoard to visualize our models.

By the end of this NLP book, you will be comfortable with using TensorFlow to build deep learning models with many different architectures, and efficiently ingest data using TensorFlow Additionally, you’ll be able to confidently use TensorFlow throughout your machine learning workflow.

What you will learn

  • Learn core concepts of NLP and techniques with TensorFlow
  • Use state-of-the-art Transformers and how they are used to solve NLP tasks
  • Perform sentence classification and text generation using CNNs and RNNs
  • Utilize advanced models for machine translation and image caption generation
  • Build end-to-end data pipelines in TensorFlow
  • Learn interesting facts and practices related to the task at hand
  • Create word representations of large amounts of data for deep learning

Who this book is for

This book is for Python developers and programmers with a strong interest in deep learning, who want to learn how to leverage TensorFlow to simplify NLP tasks. Fundamental Python skills are assumed, as well as basic knowledge of machine learning and undergraduate-level calculus and linear algebra. No previous natural language processing experience required.

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Get in touch
  2. Introduction to Natural Language Processing
    1. What is Natural Language Processing?
    2. Tasks of Natural Language Processing
    3. The traditional approach to Natural Language Processing
      1. Understanding the traditional approach
        1. Example – generating football game summaries
      2. Drawbacks of the traditional approach
    4. The deep learning approach to Natural Language Processing
      1. History of deep learning
      2. The current state of deep learning and NLP
      3. Understanding a simple deep model – a fully connected neural network
    5. Introduction to the technical tools
      1. Description of the tools
      2. Installing Anaconda and Python
        1. Creating a Conda environment
      3. TensorFlow (GPU) software requirements
      4. Accessing Jupyter Notebook
      5. Verifying the TensorFlow installation
    6. Summary
  3. Understanding TensorFlow 2
    1. What is TensorFlow?
      1. Getting started with TensorFlow 2
      2. TensorFlow 2 architecture – What happens during graph build?
      3. TensorFlow architecture – what happens when you execute the graph?
      4. Café Le TensorFlow 2 – understanding TensorFlow 2 with an analogy
      5. Flashback: TensorFlow 1
    2. Inputs, variables, outputs, and operations
      1. Defining inputs in TensorFlow
        1. Feeding data as NumPy arrays
        2. Feeding data as tensors
        3. Building a data pipeline using the tf.data API
      2. Defining variables in TensorFlow
      3. Defining outputs in TensorFlow
      4. Defining operations in TensorFlow
        1. Comparison operations
        2. Mathematical operations
        3. Updating (scattering) values in tensors
        4. Collecting (gathering) values from a tensor
      5. Neural network-related operations
        1. Nonlinear activations used by neural networks
        2. The convolution operation
        3. The pooling operation
        4. Defining loss
    3. Keras: The model building API of TensorFlow
      1. Sequential API
      2. Functional API
      3. Sub-classing API
    4. Implementing our first neural network
      1. Preparing the data
      2. Implementing the neural network with Keras
        1. Training the model
        2. Testing the model
    5. Summary
  4. Word2vec – Learning Word Embeddings
    1. What is a word representation or meaning?
    2. Classical approaches to learning word representation
      1. One-hot encoded representation
      2. The TF-IDF method
      3. Co-occurrence matrix
    3. An intuitive understanding of Word2vec – an approach to learning word representation
      1. Exercise: does queen = king – he + she?
    4. The skip-gram algorithm
      1. From raw text to semi-structured text
      2. Understanding the skip-gram algorithm
      3. Implementing and running the skip-gram algorithm with TensorFlow
        1. Implementing the data generators with TensorFlow
        2. Implementing the skip-gram architecture with TensorFlow
        3. Training and evaluating the model
    5. The Continuous Bag-of-Words algorithm
      1. Generating data for the CBOW algorithm
      2. Implementing CBOW in TensorFlow
      3. Training and evaluating the model
    6. Summary
  5. Advanced Word Vector Algorithms
    1. GloVe – Global Vectors representation
      1. Understanding GloVe
      2. Implementing GloVe
      3. Generating data for GloVe
      4. Training and evaluating GloVe
    2. ELMo – Taking ambiguities out of word vectors
      1. Downloading ELMo from TensorFlow Hub
      2. Preparing inputs for ELMo
      3. Generating embeddings with ELMo
    3. Document classification with ELMo
      1. Dataset
      2. Generating document embeddings
      3. Classifying documents with document embeddings
    4. Summary
  6. Sentence Classification with Convolutional Neural Networks
    1. Introducing CNNs
      1. CNN fundamentals
        1. The power of CNNs
    2. Understanding CNNs
      1. Convolution operation
        1. Standard convolution operation
        2. Convolving with stride
        3. Convolving with padding
        4. Transposed convolution
      2. Pooling operation
        1. Max pooling
        2. Max pooling with stride
        3. Average pooling
      3. Fully connected layers
      4. Putting everything together
    3. Exercise – image classification on Fashion-MNIST with CNN
      1. About the data
      2. Downloading and exploring the data
      3. Implementing the CNN
      4. Analyzing the predictions produced with a CNN
    4. Using CNNs for sentence classification
      1. How data is transformed for sentence classification
      2. Implementation – downloading and preparing data
        1. Implementation – building a tokenizer
      3. The sentence classification CNN model
        1. The convolution operation
        2. Pooling over time
      4. Implementation – sentence classification with CNNs
      5. Training the model
    5. Summary
  7. Recurrent Neural Networks
    1. Understanding RNNs
      1. The problem with feed-forward neural networks
      2. Modeling with RNNs
      3. Technical description of an RNN
    2. Backpropagation Through Time
      1. How backpropagation works
      2. Why we cannot use BP directly for RNNs
      3. Backpropagation Through Time – training RNNs
      4. Truncated BPTT – training RNNs efficiently
      5. Limitations of BPTT – vanishing and exploding gradients
    3. Applications of RNNs
      1. One-to-one RNNs
      2. One-to-many RNNs
      3. Many-to-one RNNs
      4. Many-to-many RNNs
    4. Named Entity Recognition with RNNs
      1. Understanding the data
      2. Processing data
      3. Defining hyperparameters
      4. Defining the model
        1. Introduction to the TextVectorization layer
        2. Defining the rest of the model
      5. Evaluation metrics and the loss function
      6. Training and evaluating RNN on NER task
      7. Visually analyzing outputs
    5. NER with character and token embeddings
      1. Using convolution to generate token embeddings
      2. Implementing the new NER model
        1. Defining hyperparameters
        2. Defining the input layer
        3. Defining the token-based TextVectorization layer
        4. Defining the character-based TextVectorization layer
        5. Processing the inputs for the char_vectorize_layer
        6. Performing convolution on the character embeddings
      3. Model training and evaluation
      4. Other improvements you can make
    6. Summary
  8. Understanding Long Short-Term Memory Networks
    1. Understanding Long Short-Term Memory Networks
      1. What is an LSTM?
      2. LSTMs in more detail
      3. How LSTMs differ from standard RNNs
    2. How LSTMs solve the vanishing gradient problem
    3. Improving LSTMs
      1. Greedy sampling
      2. Beam search
      3. Using word vectors
      4. Bidirectional LSTMs (BiLSTMs)
    4. Other variants of LSTMs
      1. Peephole connections
      2. Gated Recurrent Units
    5. Summary
  9. Applications of LSTM – Generating Text
    1. Our data
      1. About the dataset
      2. Generating training, validation, and test sets
      3. Analyzing the vocabulary size
      4. Defining the tf.data pipeline
    2. Implementing the language model
      1. Defining the TextVectorization layer
      2. Defining the LSTM model
      3. Defining metrics and compiling the model
      4. Training the model
      5. Defining the inference model
      6. Generating new text with the model
    3. Comparing LSTMs to LSTMs with peephole connections and GRUs
      1. Standard LSTM
        1. Review
      2. Gated Recurrent Units (GRUs)
        1. Review
        2. The model
      3. LSTMs with peepholes
        1. Review
        2. The code
      4. Training and validation perplexities over time
    4. Improving sequential models – beam search
      1. Implementing beam search
      2. Generating text with beam search
    5. Improving LSTMs – generating text with words instead of n-grams
      1. The curse of dimensionality
      2. Word2vec to the rescue
      3. Generating text with Word2vec
    6. Summary
  10. Sequence-to-Sequence Learning – Neural Machine Translation
    1. Machine translation
    2. A brief historical tour of machine translation
      1. Rule-based translation
      2. Statistical Machine Translation (SMT)
      3. Neural Machine Translation (NMT)
    3. Understanding neural machine translation
      1. Intuition behind NMT systems
      2. NMT architecture
        1. The embedding layer
        2. The encoder
        3. The context vector
        4. The decoder
    4. Preparing data for the NMT system
      1. The dataset
        1. Adding special tokens
        2. Splitting training, validation, and testing datasets
        3. Defining sequence lengths for the two languages
        4. Padding the sentences
    5. Defining the model
      1. Converting tokens to IDs
      2. Defining the encoder
      3. Defining the decoder
      4. Attention: Analyzing the encoder states
        1. Computing Attention
        2. Implementing Attention
      5. Defining the final model
    6. Training the NMT
    7. The BLEU score – evaluating the machine translation systems
      1. Modified precision
      2. Brevity penalty
      3. The final BLEU score
    8. Visualizing Attention patterns
    9. Inference with NMT
    10. Other applications of Seq2Seq models – chatbots
      1. Training a chatbot
      2. Evaluating chatbots – the Turing test
    11. Summary
  11. Transformers
    1. Transformer architecture
      1. The encoder and the decoder
      2. Computing the output of the self-attention layer
      3. Embedding layers in the Transformer
      4. Residuals and normalization
    2. Understanding BERT
      1. Input processing for BERT
      2. Tasks solved by BERT
      3. How BERT is pre-trained
        1. Masked Language Modeling (MLM)
        2. Next Sentence Prediction (NSP)
    3. Use case: Using BERT to answer questions
      1. Introduction to the Hugging Face transformers library
      2. Exploring the data
      3. Implementing BERT
        1. Implementing and using the Tokenizer
        2. Defining a TensorFlow dataset
        3. BERT for answering questions
        4. Defining the config and the model
      4. Training and evaluating the model
      5. Answering questions with Bert
    4. Summary
  12. Image Captioning with Transformers
    1. Getting to know the data
      1. ILSVRC ImageNet dataset
      2. The MS-COCO dataset
    2. Downloading the data
    3. Processing and tokenizing data
      1. Preprocessing data
      2. Tokenizing data
    4. Defining a tf.data.Dataset
    5. The machine learning pipeline for image caption generation
      1. Vision Transformer (ViT)
      2. Text-based decoder Transformer
      3. Putting everything together
    6. Implementing the model with TensorFlow
      1. Implementing the ViT model
      2. Implementing the text-based decoder
        1. Defining the self-attention layer
        2. Defining the Transformer layer
        3. Defining the full decoder
    7. Training the model
    8. Evaluating the results quantitatively
      1. BLEU
      2. ROUGE
      3. METEOR
      4. CIDEr
    9. Evaluating the model
    10. Captions generated for test images
    11. Summary
  13. Appendix A: Mathematical Foundations and Advanced TensorFlow
    1. Basic data structures
      1. Scalar
      2. Vectors
      3. Matrices
      4. Indexing of a matrix
    2. Special types of matrices
      1. Identity matrix
      2. Square diagonal matrix
      3. Tensors
    3. Tensor/matrix operations
      1. Transpose
      2. Matrix multiplication
      3. Element-wise multiplication
      4. Inverse
      5. Finding the matrix inverse – Singular Value Decomposition (SVD)
      6. Norms
      7. Determinant
    4. Probability
      1. Random variables
        1. Discrete random variables
        2. Continuous random variables
      2. The probability mass/density function
      3. Conditional probability
      4. Joint probability
      5. Marginal probability
      6. Bayes’ rule
    5. Visualizing word embeddings with TensorBoard
      1. Starting TensorBoard
      2. Saving word embeddings and visualizing via TensorBoard
    6. Summary
  14. Other Books You May Enjoy
  15. Index

Product information

  • Title: Natural Language Processing with TensorFlow - Second Edition
  • Author(s): Thushan Ganegedara
  • Release date: July 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781838641351