Chapter 7. MapReduce API
One advantage of Accumulo’s integration with Hadoop is that MapReduce jobs can be made to read input from Accumulo tables and also to write results to Accumulo tables. This can be done for ingesting a large amount of data quickly, for analyzing data in Accumulo tables, or for outputting data from Accumulo tables to HDFS.
Formats
Accumulo provides MapReduce input and output formats that read from Accumulo and write to Accumulo directly.
There are input and output formats for both MapReduce APIs: org.apache.hadoop.mapred
and org.apache.hadoop.mapreduce
.
A MapReduce job can read input from an Accumulo table, write output to an Accumulo table, or both.
To configure a MapReduce job to read input from an Accumulo table, use code similar to the following:
job
.
setInputFormatClass
(
AccumuloInputFormat
.
class
);
AccumuloInputFormat
.
setInputTableName
(
job
,
"table_name"
);
ClientConfiguration
zkiConfig
=
new
ClientConfiguration
()
.
withInstance
(
"myInstance"
)
.
withZkHosts
(
"zoo1:2181,zoo2:2181"
);
AccumuloInputFormat
.
setZooKeeperInstance
(
job
,
zkiConfig
);
AccumuloInputFormat
.
setConnectorInfo
(
job
,
"username"
,
new
PasswordToken
(
"password"
));
List
<
Pair
<
Text
,
Text
>>
columns
=
new
ArrayList
<>();
columns
.
add
(
new
Pair
(
new
Text
(
"colFam"
),
new
Text
(
"colQual"
)));
AccumuloInputFormat
.
fetchColumns
(
job
,
columns
);
// optional
List
<
Ranges
>
ranges
=
new
ArrayList
<
Range
>();
ranges
.
add
(
new
Range
(
"a"
,
"k"
));
AccumuloInputFormat
.
setRanges
(
job
,
ranges
);
// optional
AccumuloInputFormat
.
setScanIsolation ...
Get Accumulo now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.