Preface

I first encountered Hadoop in the fall of 2008 when I was working on an internet crawl and analysis project at Verisign. My team was making discoveries similar to those that Doug Cutting and others at Nutch had made several years earlier regarding how to efficiently store and manage terabytes of crawled and analyzed data. At the time, we were getting by with our home-grown distributed system, but the influx of a new data stream and requirements to join that stream with our crawl data couldn’t be supported by our existing system in the required timelines.

After some research we came across the Hadoop project, which seemed to be a perfect fit for our needs—it supported storing large volumes of data and provided a mechanism to combine ...

Get Hadoop in Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.