Chapter 5. HiveQL: Data Manipulation
This chapter continues our discussion of HiveQL, the Hive query language, focusing on the data manipulation language parts that are used to put data into tables and to extract data from tables to the filesystem.
This chapter uses SELECT ... WHERE
clauses extensively when we discuss populating tables with data queried from
other tables. So, why aren’t we covering SELECT ...
WHERE
clauses first, instead of waiting until the next chapter,
Chapter 6?
Since we just finished discussing how to create tables, we wanted to cover the next obvious topic: how to get data into these tables so you’ll have something to query! We assume you already understand the basics of SQL, so these clauses won’t be new to you. If they are, please refer to Chapter 6 for details.
Loading Data into Managed Tables
Since Hive has no row-level insert, update, and delete operations, the only way to put data into an table is to use one of the “bulk” load operations. Or you can just write files in the correct directories by other means.
We saw an example of how to load data into a managed table in Partitioned, Managed Tables, which we repeat here with an
addition, the use of the OVERWRITE
keyword:
LOAD
DATA
LOCAL
INPATH
'${env:HOME}/california-employees'
OVERWRITE
INTO
TABLE
employees
PARTITION
(
country
=
'US'
,
state
=
'CA'
);
This command will first create the directory for the partition, if it doesn’t already exist, then copy the data to it.
If the target table is not partitioned, you ...
Get Programming Hive now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.