3 Math with words (TF-IDF vectors)

This chapter covers

Counting words and term frequencies to analyze meaning
Predicting word occurrence probabilities with Zipf’s Law
Vector representation of words and how to start using them
Finding relevant documents from a corpus using inverse document frequencies
Estimating the similarity of pairs of documents with cosine similarity and Okapi BM25

Having collected and counted words (tokens), and bucketed them into stems or lemmas, it’s time to do something interesting with them. Detecting words is useful for simple tasks, like getting statistics about word usage or doing keyword search. But you’d like to know which words are more important to a particular document and across the corpus as a whole. Then ...

Get Natural Language Processing in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Natural Language Processing in Action by Cole Howard, Hobson Lane, Hannes Hapke

3 Math with words (TF-IDF vectors)

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly