Chapter 3. Pig’s Data Model
Before we take a look at the operators that Pig Latin provides, we first need to understand Pig’s data model. This includes Pig’s data types, how it handles concepts such as missing data, and how you can describe your data to Pig.
Types
Pig’s data types can be divided into two categories: scalar types, which contain a single value, and complex types, which contain other types.
Scalar Types
Pig’s scalar types are simple types that appear in most programming
languages. With the exception of bytearrays, they are all represented in
Pig interfaces by java.lang
classes, making them
easy to work with in UDFs:
- Int
An integer. Ints are represented in interfaces by
java.lang.Integer
. They store four-byte signed integers. Constant integers are expressed as integer numbers: for example,42
.- Long
A long integer. Longs are represented in interfaces by
java.lang.Long
. They store eight-byte signed integers. Constant longs are expressed as integer numbers with anL
appended: for example,5000000000L
.- Biginteger (since Pig 0.12)
An integer of effectively infinite size (it is bounded only by available memory). Bigintegers are represented in interfaces by
java.math.BigInteger
. There are no biginteger constants.1 Chararray and numeric types can be cast to biginteger to produce a constant value in the script. An important note: performance of bigintegers is significantly worse than ints or longs. Whenever your value will fit into one of those types you should use it rather than biginteger. ...
Get Programming Pig, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.