Getting Started with Natural Language Processing

Book description

Hit the ground running with this in-depth introduction to the NLP skills and techniques that allow your computers to speak human.

In Getting Started with Natural Language Processing you’ll learn about:

  • Fundamental concepts and algorithms of NLP
  • Useful Python libraries for NLP
  • Building a search algorithm
  • Extracting information from raw text
  • Predicting sentiment of an input text
  • Author profiling
  • Topic labeling
  • Named entity recognition

Getting Started with Natural Language Processing is an enjoyable and understandable guide that helps you engineer your first NLP algorithms. Your tutor is Dr. Ekaterina Kochmar, lecturer at the University of Bath, who has helped thousands of students take their first steps with NLP. Full of Python code and hands-on projects, each chapter provides a concrete example with practical techniques that you can put into practice right away. If you’re a beginner to NLP and want to upgrade your applications with functions and features like information extraction, user profiling, and automatic topic labeling, this is the book for you.

About the Technology
From smart speakers to customer service chatbots, apps that understand text and speech are everywhere. Natural language processing, or NLP, is the key to this powerful form of human/computer interaction. And a new generation of tools and techniques make it easier than ever to get started with NLP!

About the Book
Getting Started with Natural Language Processing teaches you how to upgrade user-facing applications with text and speech-based features. From the accessible explanations and hands-on examples in this book you’ll learn how to apply NLP to sentiment analysis, user profiling, and much more. As you go, each new project builds on what you’ve previously learned, introducing new concepts and skills. Handy diagrams and intuitive Python code samples make it easy to get started—even if you have no background in machine learning!

What's Inside
  • Fundamental concepts and algorithms of NLP
  • Extracting information from raw text
  • Useful Python libraries
  • Topic labeling
  • Building a search algorithm


About the Reader
You’ll need basic Python skills. No experience with NLP required.

About the Author
Ekaterina Kochmar is a lecturer at the Department of Computer Science of the University of Bath, where she is part of the AI research group.

Quotes
An accessible entry point. Learn key NLP concepts by building real-world projects.
- Samantha Berk, AdaptX

A well-written, pragmatic book.
- James Richard Woodruff, SAIC

The best NLP resource.
- Najeeb Arif, ThoughtWorks

Get started with NLP and understand its fundamentals.
- Walter Alexander Mata López, University of Colima

Makes a difficult subject easy to understand.
- Tanya Wilke, .NET Engineer

Publisher resources

View/Submit Errata

Table of contents

  1. inside front cover
  2. Getting Started with Natural Language Processing
  3. Copyright
  4. dedication
  5. contents
  6. front matter
    1. preface
    2. acknowledgments
    3. about this book
      1. Who should read this book
      2. How this book is organized: A road map
      3. About the code
      4. liveBook discussion forum
      5. Other online resources
    4. about the author
    5. about the cover illustration
  7. 1 Introduction
    1. 1.1 A brief history of NLP
    2. 1.2 Typical tasks
      1. 1.2.1 Information search
      2. 1.2.2 Advanced information search: Asking the machine precise questions
      3. 1.2.3 Conversational agents and intelligent virtual assistants
      4. 1.2.4 Text prediction and language generation
      5. 1.2.5 Spam filtering
      6. 1.2.6 Machine translation
      7. 1.2.7 Spell- and grammar checking
    3. Summary
    4. Solution to exercise 1.1
  8. 2 Your first NLP example
    1. 2.1 Introducing NLP in practice: Spam filtering
    2. 2.2 Understanding the task
      1. 2.2.1 Step 1: Define the data and classes
      2. 2.2.2 Step 2: Split the text into words
      3. 2.2.3 Step 3: Extract and normalize the features
      4. 2.2.4 Step 4: Train a classifier
      5. 2.2.5 Step 5: Evaluate the classifier
    3. 2.3 Implementing your own spam filter
      1. 2.3.1 Step 1: Define the data and classes
      2. 2.3.2 Step 2: Split the text into words
      3. 2.3.3 Step 3: Extract and normalize the features
      4. 2.3.4 Step 4: Train the classifier
      5. 2.3.5 Step 5: Evaluate your classifier
    4. 2.4 Deploying your spam filter in practice
    5. Summary
    6. Solutions to miscellaneous exercises
  9. 3 Introduction to information search
    1. 3.1 Understanding the task
      1. 3.1.1 Data and data structures
      2. 3.1.2 Boolean search algorithm
    2. 3.2 Processing the data further
      1. 3.2.1 Preselecting the words that matter: Stopwords removal
      2. 3.2.2 Matching forms of the same word: Morphological processing
    3. 3.3 Information weighing
      1. 3.3.1 Weighing words with term frequency
      2. 3.3.2 Weighing words with inverse document frequency
    4. 3.4 Practical use of the search algorithm
      1. 3.4.1 Retrieval of the most similar documents
      2. 3.4.2 Evaluation of the results
      3. 3.4.3 Deploying search algorithm in practice
    5. Summary
    6. Solutions to miscellaneous exercises
  10. 4 Information extraction
    1. 4.1 Use cases
      1. 4.1.1 Case 1
      2. 4.1.2 Case 2
      3. 4.1.3 Case 3
    2. 4.2 Understanding the task
    3. 4.3 Detecting word types with part-of-speech tagging
      1. 4.3.1 Understanding word types
      2. 4.3.2 Part-of-speech tagging with spaCy
    4. 4.4 Understanding sentence structure with syntactic parsing
      1. 4.4.1 Why sentence structure is important
      2. 4.4.2 Dependency parsing with spaCy
    5. 4.5 Building your own information extraction algorithm
    6. Summary
    7. Solutions to miscellaneous exercises
  11. 5 Author profiling as a machine-learning task
    1. 5.1 Understanding the task
      1. 5.1.1 Case 1: Authorship attribution
      2. 5.1.2 Case 2: User profiling
    2. 5.2 Machine-learning pipeline at first glance
      1. 5.2.1 Original data
      2. 5.2.2 Testing generalization behavior
      3. 5.2.3 Setting up the benchmark
    3. 5.3 A closer look at the machine-learning pipeline
      1. 5.3.1 Decision Trees classifier basics
      2. 5.3.2 Evaluating which tree is better using node impurity
      3. 5.3.3 Selection of the best split in Decision Trees
      4. 5.3.4 Decision Trees on language data
    4. Summary
    5. Solutions to miscellaneous exercises
  12. 6 Linguistic feature engineering for author profiling
    1. 6.1 Another close look at the machine-learning pipeline
      1. 6.1.1 Evaluating the performance of your classifier
      2. 6.1.2 Further evaluation measures
    2. 6.2 Feature engineering for authorship attribution
      1. 6.2.1 Word and sentence length statistics as features
      2. 6.2.2 Counts of stopwords and proportion of stopwords as features
      3. 6.2.3 Distributions of parts of speech as features
      4. 6.2.4 Distribution of word suffixes as features
      5. 6.2.5 Unique words as features
    3. 6.3 Practical use of authorship attribution and user profiling
    4. Summary
  13. 7 Your first sentiment analyzer using sentiment lexicons
    1. 7.1 Use cases
    2. 7.2 Understanding your task
      1. 7.2.1 Aggregating sentiment score with the help of a lexicon
      2. 7.2.2 Learning to detect sentiment in a data-driven way
    3. 7.3 Setting up the pipeline: Data loading and analysis
      1. 7.3.1 Data loading and preprocessing
      2. 7.3.2 A closer look into the data
    4. 7.4 Aggregating sentiment scores with a sentiment lexicon
      1. 7.4.1 Collecting sentiment scores from a lexicon
      2. 7.4.2 Applying sentiment scores to detect review polarity
    5. Summary
    6. Solutions to exercises
  14. 8 Sentiment analysis with a data-driven approach
    1. 8.1 Addressing multiple senses of a word with SentiWordNet
    2. 8.2 Addressing dependence on context with machine learning
      1. 8.2.1 Data preparation
      2. 8.2.2 Extracting features from text
      3. 8.2.3 Scikit-learn’s machine-learning pipeline
      4. 8.2.4 Full-scale evaluation with cross-validation
    3. 8.3 Varying the length of the sentiment-bearing features
    4. 8.4 Negation handling for sentiment analysis
    5. 8.5 Further practice
    6. Summary
  15. 9 Topic analysis
    1. 9.1 Topic classification as a supervised machine-learning task
      1. 9.1.1 Data
      2. 9.1.2 Topic classification with Naïve Bayes
      3. 9.1.3 Evaluation of the results
    2. 9.2 Topic discovery as an unsupervised machine-learning task
      1. 9.2.1 Unsupervised ML approaches
      2. 9.2.2 Clustering for topic discovery
      3. 9.2.3 Evaluation of the topic clustering algorithm
    3. Summary
    4. Solutions to miscellaneous exercises
  16. 10 Topic modeling
    1. 10.1 Topic modeling with latent Dirichlet allocation
      1. 10.1.1 Exercise 10.1: Question 1 solution
      2. 10.1.2 Exercise 10.1: Question 2 solution
      3. 10.1.3 Estimating parameters for the LDA
      4. 10.1.4 LDA as a generative model
    2. 10.2 Implementation of the topic modeling algorithm
      1. 10.2.1 Loading the data
      2. 10.2.2 Preprocessing the data
      3. 10.2.3 Applying the LDA model
      4. 10.2.4 Exploring the results
    3. Summary
    4. Solutions to miscellaneous exercises
  17. 11 Named-entity recognition
    1. 11.1 Named entity recognition: Definitions and challenges
      1. 11.1.1 Named entity types
      2. 11.1.2 Challenges in named entity recognition
    2. 11.2 Named-entity recognition as a sequence labeling task
      1. 11.2.1 The basics: BIO scheme
      2. 11.2.2 What does it mean for a task to be sequential?
      3. 11.2.3 Sequential solution for NER
    3. 11.3 Practical applications of NER
      1. 11.3.1 Data loading and exploration
      2. 11.3.2 Named entity types exploration with spaCy
      3. 11.3.3 Information extraction revisited
      4. 11.3.4 Named entities visualization
    4. Summary
    5. Conclusion
    6. Solutions to miscellaneous exercises
  18. Appendix A Installation instructions
  19. index
  20. inside back cover

Product information

  • Title: Getting Started with Natural Language Processing
  • Author(s): Ekaterina Kochmar
  • Release date: October 2022
  • Publisher(s): Manning Publications
  • ISBN: 9781617296765