Buying Options
Hadoop: The Definitive Guide
Print $44.99
Add to Cart
Print+Ebook $49.49
Add to Cart
Ebook $35.99
Add to Cart
Safari Books Online
Add to Cart
What is this?
Print £34.50
Add to Cart
What is this?
Description
Apache Hadoop is ideal for organizations with a growing need to process massive application datasets. Hadoop: The Definitive Guide is a comprehensive resource for using Hadoop to build reliable, scalable, distributed systems. Programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. The book includes case studies that illustrate how Hadoop is used to solve specific problems.
Full Description
Table of Contents
  1. Chapter 1 Meet Hadoop

    1. Data!

    2. Data Storage and Analysis

    3. Comparison with Other Systems

    4. A Brief History of Hadoop

    5. The Apache Hadoop Project

  2. Chapter 2 MapReduce

    1. A Weather Dataset

    2. Analyzing the Data with Unix Tools

    3. Analyzing the Data with Hadoop

    4. Scaling Out

    5. Hadoop Streaming

    6. Hadoop Pipes

  3. Chapter 3 The Hadoop Distributed Filesystem

    1. The Design of HDFS

    2. HDFS Concepts

    3. The Command-Line Interface

    4. Hadoop Filesystems

    5. The Java Interface

    6. Data Flow

    7. Parallel Copying with distcp

    8. Hadoop Archives

  4. Chapter 4 Hadoop I/O

    1. Data Integrity

    2. Compression

    3. Serialization

    4. File-Based Data Structures

  5. Chapter 5 Developing a MapReduce Application

    1. The Configuration API

    2. Configuring the Development Environment

    3. Writing a Unit Test

    4. Running Locally on Test Data

    5. Running on a Cluster

    6. Tuning a Job

    7. MapReduce Workflows

  6. Chapter 6 How MapReduce Works

    1. Anatomy of a MapReduce Job Run

    2. Failures

    3. Job Scheduling

    4. Shuffle and Sort

    5. Task Execution

  7. Chapter 7 MapReduce Types and Formats

    1. MapReduce Types

    2. Input Formats

    3. Output Formats

  8. Chapter 8 MapReduce Features

    1. Counters

    2. Sorting

    3. Joins

    4. Side Data Distribution

    5. MapReduce Library Classes

  9. Chapter 9 Setting Up a Hadoop Cluster

    1. Cluster Specification

    2. Cluster Setup and Installation

    3. SSH Configuration

    4. Hadoop Configuration

    5. Post Install

    6. Benchmarking a Hadoop Cluster

    7. Hadoop in the Cloud

  10. Chapter 10 Administering Hadoop

    1. HDFS

    2. Monitoring

    3. Maintenance

  11. Chapter 11 Pig

    1. Installing and Running Pig

    2. An Example

    3. Comparison with Databases

    4. Pig Latin

    5. User-Defined Functions

    6. Data Processing Operators

    7. Pig in Practice

  12. Chapter 12 HBase

    1. HBasics

    2. Concepts

    3. Installation

    4. Clients

    5. Example

    6. HBase Versus RDBMS

    7. Praxis

  13. Chapter 13 ZooKeeper

    1. Installing and Running ZooKeeper

    2. An Example

    3. The ZooKeeper Service

    4. Building Applications with ZooKeeper

    5. ZooKeeper in Production

  14. Chapter 14 Case Studies

    1. Hadoop Usage at Last.fm

    2. Hadoop and Hive at Facebook

    3. Nutch Search Engine

    4. Log Processing at Rackspace

    5. Cascading

    6. TeraByte Sort on Apache Hadoop

  1. Appendix Installing Apache Hadoop

    1. Prerequisites

    2. Installation

    3. Configuration

  2. Appendix Cloudera’s Distribution for Hadoop

    1. Prerequisites

    2. Standalone Mode

    3. Pseudo-Distributed Mode

    4. Fully Distributed Mode

    5. Hadoop-Related Packages

  3. Appendix Preparing the NCDC Weather Data

  4. Colophon

View Full Table of Contents
Product Details
Title:
Hadoop: The Definitive Guide
By:
Tom White
Publisher:
O'Reilly Media
Formats:
  • Print
  • Ebook
  • Safari Books Online
Print Release:
June 2009
Ebook Release:
May 2009
Pages:
528
Print ISBN:
978-0-596-52197-4
| ISBN 10:
0-596-52197-9
Ebook ISBN:
978-0-596-80470-1
| ISBN 10:
0-596-80470-9
Customer Reviews
About the Author
  1. Tom White

    Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He works for Cloudera, a company set up to offer Hadoop support and training. Previously he was as an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop. Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.

    View Tom White's full profile page.

Colophon
The animal on the cover of Hadoop: The Definitive Guide is an African elephant. They are the largest land animals on earth (slightly larger than their cousin, the Asian elephant) and can be identified by their ears, which have been said to look somewhat like the continent of Asia. Males stand 12 feet tall at the shoulder and weigh 12,000 pounds, but they can get as big as 15,000 pounds, whereas females stand 10 feet tall and weigh 8,000-11,000 pounds. They have four molars; each weighs about 11 pounds and measures about 12 inches long. As the front pair wears down and drops out in pieces, the back pair shifts forward, and two new molars emerge in the back of the mouth. They replace their teeth six times throughout their lives, and between 40-60 years of age, they will lose all of their teeth and likely die of starvation (a common cause of death). Their tusks are teeth-actually it is the second set of incisors that becomes the tusks, which they use for digging for roots and stripping the bark off trees for food, fighting each other during mating season, and defending themselves against predators. Their tusks weigh between 50-100 pounds and are between 5-8 feet long.African elephants live throughout sub-Saharan Africa. Most of the continent's elephants live on savannas and in dry woodlands. In some regions, they can be found in desert areas; in others, they are found in mountains.Elephants are fond of water. They shower by sucking water into their trunks and spraying it all over themselves; afterward, they spray their skin with a protective coating of dust. An elephant's trunk is actually a long nose used for smelling, breathing, trumpeting, drinking, and grabbing things, especially food. The trunk alone contains about 100,000 different muscles. African elephants have two finger-like features on the end of their trunks that they can use to grab small items. They feed on roots, grass, fruit, and bark. An adult elephant can consume up to 300 pounds of food in a single day. These hungry animals do not sleep much-they roam great distances while foraging for the large quantities of food that they require to sustain their massive bodies.Having a baby elephant is a serious commitment. Elephants have longer pregnancies than any other mammal: almost 22 months. At birth, elephants already weigh approximately 200 pounds and stand about 3 feet tall.This species plays an important role in the forest and savanna ecosystems in which they live. Many plant species are dependent on passing through an elephant's digestive tract before they can germinate; it is estimated that at least a third of tree species in west African forests rely on elephants in this way. Elephants grazing on vegetation also affect the structure of habitats and influence bush fire patterns. For example, under natural conditions, elephants make gaps through the rainforest, enabling the sunlight to enter, which allows the growth of various plant species. This in turn facilitates a more abundant and more diverse fauna of smaller animals. As a result of the influence elephants have over many plants and animals, they are often referred to as a keystone species because they are vital to the long-term survival of the ecosystems in which they live.
  • Book cover of Hadoop: The Definitive Guide