Book description
Learn to build expert NLP and machine learning projects using NLTK and other Python libraries
About This Book
Break text down into its component parts for spelling correction, feature extraction, and phrase transformation
Work through NLP concepts with simple and easy-to-follow programming recipes
Gain insights into the current and budding research topics of NLP
Who This Book Is For
If you are an NLP or machine learning enthusiast and an intermediate Python programmer who wants to quickly master NLTK for natural language processing, then this Learning Path will do you a lot of good. Students of linguistics and semantic/sentiment analysis professionals will find it invaluable.
What You Will Learn
The scope of natural language complexity and how they are processed by machines
Clean and wrangle text using tokenization and chunking to help you process data better
Tokenize text into sentences and sentences into words
Classify text and perform sentiment analysis
Implement string matching algorithms and normalization techniques
Understand and implement the concepts of information retrieval and text summarization
Find out how to implement various NLP tasks in Python
In Detail
Natural Language Processing is a field of computational linguistics and artificial intelligence that deals with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning. The number of human-computer interaction instances are increasing so it’s becoming imperative that computers comprehend all major natural languages.
The first NLTK Essentials module is an introduction on how to build systems around NLP, with a focus on how to create a customized tokenizer and parser from scratch. You will learn essential concepts of NLP, be given practical insight into open source tool and libraries available in Python, shown how to analyze social media sites, and be given tools to deal with large scale text. This module also provides a workaround using some of the amazing capabilities of Python libraries such as NLTK, scikit-learn, pandas, and NumPy.
The second Python 3 Text Processing with NLTK 3 Cookbook module teaches you the essential techniques of text and language processing with simple, straightforward examples. This includes organizing text corpora, creating your own custom corpus, text classification with a focus on sentiment analysis, and distributed text processing methods.
The third Mastering Natural Language Processing with Python module will help you become an expert and assist you in creating your own NLP projects using NLTK. You will be guided through model development with machine learning tools, shown how to create training data, and given insight into the best practices for designing and building NLP-based applications using Python.
This Learning Path combines some of the best that Packt has to offer in one complete, curated package and is designed to help you quickly learn text processing with Python and NLTK. It includes content from the following Packt products:
NTLK essentials by Nitin Hardeniya
Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins
Mastering Natural Language Processing with Python by Deepti Chopra, Nisheeth Joshi, and Iti Mathur
Style and approach
This comprehensive course creates a smooth learning path that teaches you how to get started with Natural Language Processing using Python and NLTK. You’ll learn to create effective NLP and machine learning projects using Python and NLTK.
Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.
Table of contents
-
Natural Language Processing: Python and NLTK
- Table of Contents
- Natural Language Processing: Python and NLTK
- Natural Language Processing: Python and NLTK
- Credits
- Preface
-
1. Module 1
- 1. Introduction to Natural Language Processing
- 2. Text Wrangling and Cleansing
- 3. Part of Speech Tagging
- 4. Parsing Structure in Text
-
5. NLP Applications
- Building your first NLP application
- Other NLP applications
- Summary
- 6. Text Classification
- 7. Web Crawling
- 8. Using NLTK with Other Python Libraries
- 9. Social Media Mining in Python
- 10. Text Mining at Scale
-
2. Module 2
-
1. Tokenizing Text and WordNet Basics
- Introduction
- Tokenizing text into sentences
- Tokenizing sentences into words
- Tokenizing sentences using regular expressions
- Training a sentence tokenizer
- Filtering stopwords in a tokenized sentence
- Looking up Synsets for a word in WordNet
- Looking up lemmas and synonyms in WordNet
- Calculating WordNet Synset similarity
- Discovering word collocations
- 2. Replacing and Correcting Words
-
3. Creating Custom Corpora
- Introduction
- Setting up a custom corpus
- Creating a wordlist corpus
- Creating a part-of-speech tagged word corpus
- Creating a chunked phrase corpus
- Creating a categorized text corpus
- Creating a categorized chunk corpus reader
- Lazy corpus loading
- Creating a custom corpus view
- Creating a MongoDB-backed corpus reader
- Corpus editing with file locking
-
4. Part-of-speech Tagging
- Introduction
- Default tagging
- Training a unigram part-of-speech tagger
- Combining taggers with backoff tagging
- Training and combining ngram taggers
- Creating a model of likely word tags
- Tagging with regular expressions
- Affix tagging
- Training a Brill tagger
- Training the TnT tagger
- Using WordNet for tagging
- Tagging proper names
- Classifier-based tagging
- Training a tagger with NLTK-Trainer
-
5. Extracting Chunks
- Introduction
- Chunking and chinking with regular expressions
- Merging and splitting chunks with regular expressions
- Expanding and removing chunks with regular expressions
- Partial parsing with regular expressions
- Training a tagger-based chunker
- Classification-based chunking
- Extracting named entities
- Extracting proper noun chunks
- Extracting location chunks
- Training a named entity chunker
- Training a chunker with NLTK-Trainer
-
6. Transforming Chunks and Trees
- Introduction
- Filtering insignificant words from a sentence
- Correcting verb forms
- Swapping verb phrases
- Swapping noun cardinals
- Swapping infinitive phrases
- Singularizing plural nouns
- Chaining chunk transformations
- Converting a chunk tree to text
- Flattening a deep tree
- Creating a shallow tree
- Converting tree labels
-
7. Text Classification
- Introduction
- Bag of words feature extraction
- Training a Naive Bayes classifier
- Training a decision tree classifier
- Training a maximum entropy classifier
- Training scikit-learn classifiers
- Measuring precision and recall of a classifier
- Calculating high information words
- Combining classifiers with voting
- Classifying with multiple binary classifiers
- Training a classifier with NLTK-Trainer
-
8. Distributed Processing and Handling Large Datasets
- Introduction
- Distributed tagging with execnet
- Distributed chunking with execnet
- Parallel list processing with execnet
- Storing a frequency distribution in Redis
- Storing a conditional frequency distribution in Redis
- Storing an ordered dictionary in Redis
- Distributed word scoring with Redis and execnet
- 9. Parsing Specific Data Types
- A. Penn Treebank Part-of-speech Tags
-
1. Tokenizing Text and WordNet Basics
-
3. Module 3
- 1. Working with Strings
-
2. Statistical Language Modeling
- Understanding word frequency
- Applying smoothing on the MLE model
- Develop a back-off mechanism for MLE
- Applying interpolation on data to get mix and match
- Evaluate a language model through perplexity
- Applying metropolis hastings in modeling languages
- Applying Gibbs sampling in language processing
- Summary
- 3. Morphology – Getting Our Feet Wet
- 4. Parts-of-Speech Tagging – Identifying Words
- 5. Parsing – Analyzing Training Data
- 6. Semantic Analysis – Meaning Matters
- 7. Sentiment Analysis – I Am Happy
- 8. Information Retrieval – Accessing Information
- 9. Discourse Analysis – Knowing Is Believing
- 10. Evaluation of NLP Systems – Analyzing Performance
- B. Bibliography
- Index
Product information
- Title: Natural Language Processing: Python and NLTK
- Author(s):
- Release date: November 2016
- Publisher(s): Packt Publishing
- ISBN: 9781787285101
You might also like
book
Natural Language Processing with Python
This book offers a highly accessible introduction to natural language processing, the field that supports a …
book
Natural Language Processing with Python and spaCy
Natural Language Processing with Python and spaCy will show you how to create NLP applications like …
book
Hands-On Natural Language Processing with Python
Foster your NLP applications with the help of deep learning, NLTK, and TensorFlow Key Features Weave …
book
Python Natural Language Processing Cookbook
Get to grips with solving real-world NLP problems, such as dependency parsing, information extraction, topic modeling, …