Chapter 6. Text Classification Algorithms

The internet is often referred to as the great enabler: it allows us to accomplish a lot in our daily lives with the help of online tools and platforms. On the other hand, it can also be a source of information overload and endless search. Whether it is communicating with colleagues and customers, partners, or vendors, emails and other messaging tools are an inherent part of our daily work lives. Brands interact with customers and get valuable feedback on their products through social media channels like Facebook and Twitter. Software developers and product managers communicate using ticketing applications like Trello to track development tasks, while open source communities use GitHub issues and Bugzilla to track software bugs that need to be fixed or new functionality that needs to be added.

While these tools are useful for getting work done, they can also become overwhelming and quickly turn into a deluge of information. A lot of emails contain promotional content, spam, and marketing newsletters that are often a distraction. Similarly, software developers can easily get buried under a mountain of bug reports and feature requests that take away their productivity. In order to make the best use of these tools, we must also use techniques to categorize, filter, and prioritize the more important information from the less relevant pieces, and text classification is one such technique that can help us achieve this.

The most common example ...

Get Blueprints for Text Analytics Using Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.