Natural Language Processing (NLP) for Everyone
Published by Pearson
The rise of online social platforms has resulted in an explosion of written text in the form of blogs, posts, tweets, wiki pages, and more. This new wealth of data provides a unique opportunity to explore natural language in its many forms, both as a way of automatically extracting information from written text and as a way of artificially producing text that looks natural.
In this class we introduce viewers to natural language processing from scratch. Each concept is introduced and explained through coding examples using nothing more than just plain Python and numpy. In this way, attendees learn in depth about the underlying concepts and techniques instead of just learning how to use a specific NLP library.
What you’ll learn and how you can apply it
- Text representation
- Topic modeling
- Sentiment analysis
- Language detection
- Text classification
- Document clustering
This live event is for you because...
- You're a data scientist who is interested in mastering the concepts and ideas behind natural language processing.
- You have no previous experience in NLP and want to take the first grounded steps
- You have previous experience in using NLP libraries such as NLTK or Spacy and wish to get a greater understanding of what's going on “under the hood."
Prerequisites
- Attendees should understand basic Python
Course Set-up:
- Python - available here: https://www.python.org/
- Course GitHub repo - https://github.com/DataForScience/NLP
Recommended Preparation:
- Python Programming Language (video)
- Modern Python LiveLessons: Big Ideas and Little Code in Python (video)
- (video) Python Programming Language LiveLessons by David Beazley:
- (video) Modern Python LiveLessons: Big Ideas and Little Code in Python by Ramond Hettinger
Recommended Follow-up:
- (video) Natural Language Processing LiveLessons by Bruno Goncalves
- Stay connected with Bruno and up-to-date on the world of data, science, and machine learning at https://data4sci.com/newsletter
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Segment 1 Text Representation (50m)
- Represent words and numbers
- Use One-Hot Encoding
- Implement Bag of Words
- Apply stopwords
- Understand TF/IDF
- Understand Stemming
- Break 10m
Segment 2 Topic Modeling (60m)
- Find topics in documents
- Perform Explicit Semantic Analysis
- Understand Document clustering
- Implement Latent Semantic Analysis
- Implement Non-negative Matrix factorization
Segment 3 Sentiment Analysis (40m)
- Quantify words and feelings
- Use Negations and modifiers
- Understand corpus based approaches
- Break 10m
Segment 4 Applications(70m)
- Understand Word2vec word embeddings
- Define GloVe
- Apply Language detection
Your Instructor
Bruno Gonçalves
Bruno Gonçalves is currently a Head of Data Science working at the intersection of AI, Blockchain Technologies, and Finance. Previously, he was a Data Science Fellow at NYU's Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. Since the completion of his PhD in the Physics of Complex Systems in 2008, he has pursued the use of Data Science and Machine Learning to the large-scale study of human behavior. In 2015, he was awarded the Complex Systems Society's Junior Scientific Award for "outstanding contributions in Complex Systems Science," and in 2018 he was named a Science Fellow of the Institute for Scientific Interchange in Turin, Italy.