Part 2. Tika in detail

By now you should have a fairly good understanding of what Tika is, what it can do, and where it fits in the bigger picture of information-processing systems. If you read through chapter 2 and tried out the examples, you’ve seen Tika in action and written your first Tika-based application. But if you’re anything like us, you’re wondering how this toolkit is put together and what programming APIs it provides. Wait no more, because that’s what we’ll be covering in this part of the book!

We’ll start in chapter 4 by describing the internet media type system and how Tika can detect the type of virtually any kind of document. Once the type is known, Tika can parse the document to extract its content and any associated metadata. ...

Get Tika in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.