Book description
How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems.
Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. You’ll learn about Flume’s design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub.
- Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers
- Dive into key Flume components, including sources that accept data and sinks that write and deliver it
- Write custom plugins to customize the way Flume receives, modifies, formats, and writes data
- Explore APIs for sending data to Flume agents from your own applications
- Plan and deploy Flume in a scalable and flexible way—and monitor your cluster once it’s running
Table of contents
- Foreword
- Preface
- 1. Apache Hadoop and Apache HBase: An Introduction
-
2. Streaming Data Using Apache Flume
- The Need for Flume
- Is Flume a Good Fit?
- Inside a Flume Agent
- Configuring Flume Agents
- Getting Flume Agents to Talk to Each Other
- Complex Flows
- Replicating Data to Various Destinations
- Dynamic Routing
- Flume’s No Data Loss Guarantee, Channels, and Transactions
- Agent Failure and Data Loss
- The Importance of Batching
- What About Duplicates?
- Running a Flume Agent
- Summary
- References
- 3. Sources
- 4. Channels
- 5. Sinks
- 6. Interceptors, Channel Selectors, Sink Groups, and Sink Processors
- 7. Getting Data into Flume*
- 8. Planning, Deploying, and Monitoring Flume
- Index
Product information
- Title: Using Flume
- Author(s):
- Release date: September 2014
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491905333
You might also like
book
Apache Flume: Distributed Log Collection for Hadoop - Second Edition
Design and implement a series of Flume agents to send streamed data into Hadoop In Detail …
video
Using Flume: Integrating Flume with Hadoop, HBase and Spark
In this webcast, Hari Shreedharan, the author of Using Flume will discuss how to use Flume …
article
Use GitHub Copilot: Additional Tips
Using GitHub Copilot can feel like magic. The tool automatically fills out entire blocks of code--but …
book
Java Data Objects
Java Data Objects revolutionizes the way Java developers interact with databases and other datastores. JDO allows …