Book description
Use Python and NLTK (Natural Language Toolkit) to build out your own text classifiers and solve common NLP problems.
Key Features
- Assimilate key NLP concepts and terminologies
- Explore popular NLP tools and techniques
- Gain practical experience using NLP in application code
Book Description
If NLP hasn't been your forte, Natural Language Processing Fundamentals will make sure you set off to a steady start. This comprehensive guide will show you how to effectively use Python libraries and NLP concepts to solve various problems.
You'll be introduced to natural language processing and its applications through examples and exercises. This will be followed by an introduction to the initial stages of solving a problem, which includes problem definition, getting text data, and preparing it for modeling. With exposure to concepts like advanced natural language processing algorithms and visualization techniques, you'll learn how to create applications that can extract information from unstructured data and present it as impactful visuals. Although you will continue to learn NLP-based techniques, the focus will gradually shift to developing useful applications. In these sections, you'll understand how to apply NLP techniques to answer questions as can be used in chatbots.
By the end of this book, you'll be able to accomplish a varied range of assignments ranging from identifying the most suitable type of NLP task for solving a problem to using a tool like spacy or gensim for performing sentiment analysis. The book will easily equip you with the knowledge you need to build applications that interpret human language.
What you will learn
- Obtain, verify, and clean data before transforming it into a correct format for use
- Perform data analysis and machine learning tasks using Python
- Understand the basics of computational linguistics
- Build models for general natural language processing tasks
- Evaluate the performance of a model with the right metrics
- Visualize, quantify, and perform exploratory analysis from any text data
Who this book is for
Natural Language Processing Fundamentals is designed for novice and mid-level data scientists and machine learning developers who want to gather and analyze text data to build an NLP-powered product. It'll help you to have prior experience of coding in Python using data types, writing functions, and importing libraries. Some experience with linguistics and probability is useful but not necessary.
Table of contents
- Preface
-
1. Introduction to Natural Language Processing
- Introduction
- History of NLP
- Text Analytics and NLP
-
Various Steps in NLP
- Tokenization
- Exercise 2: Tokenization of a Simple Sentence
- PoS Tagging
- Exercise 3: PoS Tagging
- Stop Word Removal
- Exercise 4: Stop Word Removal
- Text Normalization
- Exercise 5: Text Normalization
- Spelling Correction
- Exercise 6: Spelling Correction of a Word and a Sentence
- Stemming
- Exercise 7: Stemming
- Lemmatization
- Exercise 8: Extracting the base word using Lemmatization
- NER
- Exercise 9: Treating Named Entities
- Word Sense Disambiguation
- Exercise 10: Word Sense Disambiguation
- Sentence Boundary Detection
- Exercise 11: Sentence Boundary Detection
- Activity 1: Preprocessing of Raw Text
- Kick Starting an NLP Project
- Summary
-
2. Basic Feature Extraction Methods
- Introduction
- Types of Data
-
Cleaning Text Data
- Tokenization
- Exercise 12: Text Cleaning and Tokenization
- Exercise 13: Extracting n-grams
- Exercise 14: Tokenizing Texts with Different Packages – Keras and TextBlob
- Types of Tokenizers
- Exercise 15: Tokenizing Text Using Various Tokenizers
- Issues with Tokenization
- Stemming
- RegexpStemmer
- Exercise 16: Converting words in gerund form into base words using RegexpStemmer
- The Porter Stemmer
- Exercise 17: The Porter Stemmer
- Lemmatization
- Exercise 18: Lemmatization
- Exercise 19: Singularizing and Pluralizing Words
- Language Translation
- Exercise 20: Language Translation
- Stop-Word Removal
- Exercise 21: Stop-Word Removal
-
Feature Extraction from Texts
- Extracting General Features from Raw Text
- Exercise 22: Extracting General Features from Raw Text
- Activity 2: Extracting General Features from Text
- Bag of Words
- Exercise 23: Creating a BoW
- Zipf's Law
- Exercise 24: Zipf's Law
- TF-IDF
- Exercise 25: TF-IDF Representation
- Activity 3: Extracting Specific Features from Texts
- Feature Engineering
- Summary
-
3. Developing a Text classifier
- Introduction
-
Machine Learning
- Unsupervised Learning
- Hierarchical Clustering
- Exercise 29: Hierarchical Clustering
- K-Means Clustering
- Exercise 30: K-Means Clustering
- Supervised Learning
- Classification
- Logistic Regression
- Naive Bayes Classifiers
- K-Nearest Neighbors
- Exercise 31: Text Classification (Logistic regression, Naive Bayes, and KNN)
- Regression
- Linear Regression
- Exercise 32: Regression Analysis Using Textual Data
- Tree Methods
- Random Forest
- GBM and XGBoost
- Exercise 33: Tree-Based Methods (Decision Tree, Random Forest, GBM, and XGBoost)
- Sampling
- Exercise 34: Sampling (Simple Random, Stratified, Multi-Stage)
-
Developing a Text Classifier
- Feature Extraction
- Feature Engineering
- Removing Correlated Features
- Exercise 35: Removing Highly Correlated Features (Tokens)
- Dimensionality Reduction
- Exercise 36: Dimensionality Reduction (PCA)
- Deciding on a Model Type
- Evaluating the Performance of a Model
- Exercise 37: Calculate the RMSE and MAPE
- Activity 5: Developing End-to-End Text Classifiers
- Building Pipelines for NLP Projects
- Saving and Loading Models
- Summary
-
4. Collecting Text Data from the Web
- Introduction
- Collecting Data by Scraping Web Pages
- Requesting Content from Web Pages
-
Dealing with Semi-Structured Data
- JSON
- Exercise 43: Dealing with JSON Files
- Activity 8: Dealing with Online JSON Files
- XML
- Exercise 44: Dealing with a Local XML File
- Using APIs to Retrieve Real-Time Data
- Exercise 45: Collecting Data Using APIs
- API Creation
- Activity 9: Extracting Data from Twitter
- Extracting Data from Local Files
- Exercise 46: Extracting Data from Local Files
- Exercise 47: Performing Various Operations on Local Files
- Summary
-
5. Topic Modeling
- Introduction
- Topic Discovery
-
Topic Modeling Algorithms
- Latent Semantic Analysis
- LSA – How It Works
- Exercise 48: Analyzing Reuters News Articles with Latent Semantic Analysis
- Latent Dirichlet Allocation
- LDA – How It Works
- Exercise 49: Topics in Airline Tweets
- Topic Fingerprinting
- Exercise 50: Visualizing Documents Using Topic Vectors
- Activity 10: Topic Modelling Jeopardy Questions
- Summary
- 6. Text Summarization and Text Generation
-
7. Vector Representation
- Introduction
- Vector Definition
-
Why Vector Representations?
- Encoding
- Character-Level Encoding
- Exercise 54: Character Encoding Using ASCII Values
- Exercise 55: Character Encoding with the Help of NumPy Arrays
- Positional Character-Level Encoding
- Exercise 56: Character-Level Encoding Using Positions
- One-Hot Encoding
- Key Steps in One-Hot Encoding
- Exercise 57: Character One-Hot Encoding – Manual
- Exercise 58: Character-Level One-Hot Encoding with Keras
- Word-Level One-Hot Encoding
- Exercise 59: Word-Level One-Hot Encoding
- Word Embeddings
- Word2Vec
- Exercise 60: Training Word Vectors
- Using Pre-Trained Word Vectors
- Exercise 61: Loading Pre-Trained Word Vectors
- Document Vectors
- Uses of Document Vectors
- Exercise 62: From Movie Dialogue to Document Vectors
- Activity 12: Finding Similar Movie Lines Using Document Vectors
- Summary
- 8. Sentiment Analysis
- Appendix
Product information
- Title: Natural Language Processing Fundamentals
- Author(s):
- Release date: March 2019
- Publisher(s): Packt Publishing
- ISBN: 9781789954043
You might also like
book
Natural Language Processing
This book introduces the semantic aspects of natural language processing and its applications. Topics covered include: …
book
Getting Started with Natural Language Processing
Hit the ground running with this in-depth introduction to the NLP skills and techniques that allow …
book
Deep Learning for Natural Language Processing
Explore the most challenging issues of natural language processing, and learn how to solve them with …
book
Deep Learning for Natural Language Processing
Gain knowledge of various deep neural network architectures and their areas of application to conquer your …