Chapter 6. HiveQL: Queries
After learning the many ways we can define and format tables, let’s learn how to run queries. Of course, we have assumed all along that you have some prior knowledge of SQL. We’ve used some queries already to illustrate several concepts, such as loading query data into other tables in Chapter 5. Now we’ll fill in most of the details. Some special topics will be covered in subsequent chapters.
We’ll move quickly through details that are familiar to users with prior SQL experience and focus on what’s unique to HiveQL, including syntax and feature differences, as well as performance implications.
SELECT … FROM Clauses
SELECT
is the
projection operator in SQL. The FROM
clause identifies from which table, view,
or nested query we select records (see Chapter 7).
For a given record, SELECT
specifies the columns to keep, as well as the outputs of function calls on
one or more columns (e.g., the aggregation functions
like count(*)
).
Recall again our partitioned employees
table:
CREATE
TABLE
employees
(
name
STRING
,
salary
FLOAT
,
subordinates
ARRAY
<
STRING
>
,
deductions
MAP
<
STRING
,
FLOAT
>
,
address
STRUCT
<
street
:
STRING
,
city
:
STRING
,
state
:
STRING
,
zip
:
INT
>
)
PARTITIONED
BY
(
country
STRING
,
state
STRING
);
Let’s assume we have the same contents we showed in Text File Encoding of Data Values for four employees in the US state of Illinois (abbreviated IL). Here are queries of this table and the output they produce:
hive
>
SELECT
name
,
salary
FROM
employees
;
John
Doe
100000
.
0
Get Programming Hive now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.