About This Book

Processing text is generally concerned with three things. The first concern is acquiring the text to be processed and getting it into your program. This is the subject of Part I of this book, which deals with reading from plain text files, standard input, delimited files, and binary files such as PDFs and Word documents.

This first part is fundamentally an exploration of Ruby’s core and standard library, and what’s possible with IO and its derived classes like File. Ruby’s history and design, and the high-level nature of these tasks, mean that we don’t need to dip into third-party libraries much, but we’ll use one in particular—Nokogiri—when looking at scraping data from web pages.

The second concern is with actually processing ...

Get Text Processing with Ruby now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.