Appendix A. Built-in User Defined Functions and PiggyBank
This appendix covers UDFs that come as part of the Pig distribution, including built-in UDFs and user-contributed UDFs in PiggyBank.
Built-in UDFs
Pig comes prepackaged with many UDFs that can be
used directly in Pig without using register
or
define
. These include load, store, evaluation, and filter
functions.
Built-in Load and Store Functions
Pig’s built-in load functions are listed in Table A-1; Table A-2 lists the store functions.
Function | Location string indicates | Constructor arguments | Description |
---|---|---|---|
AccumuloStorage | Accumulo table | The first argument is a string describing the column family and column to Pig field mapping. The second is an option string (optional). | Load data from Accumulo. |
AvroStorage | HDFS file (Avro files) | The first argument is the input schema or record name (optional). The second is an option string (optional). | Load data from Avro files on HDFS. |
HBaseStorage | HBase table | The first argument is a string describing the column family and column to Pig field mapping. The second is an option string (optional). | Load data from HBase (see “HBase”). |
JsonLoader | HDFS file (JSON files) | The first argument is the input schema (optional). | Load data from JSON files on HDFS. |
OrcStorage | HDFS file (ORC files) | None. | Load data from ORC files on HDFS. |
ParquetLoader | HDFS file (Parquet files) | The first argument is a subset schema to load (optional). | Load data from Parquet files on HDFS. |
PigStorage | HDFS file | The first argument is a field separator ... |
Get Programming Pig, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.