The heart of all SQL queries is the SELECT
command. SELECT
is used to build queries (also known as SELECT
statements). Queries are the only SQL instructions by which your data can be retrieved from
tables and views. The data returned via a query is called a result set and
consists of rows, with columns, similar to a table.
The columns of a result set are not stored on the disk in any fixed form. They are purely a temporary result of the queryâs requested data. A query on a table may return a result set with the same column structure as the table, or it may differ drastically. Result sets may even have columns which are drawn from several other tables by a single query.
Since it is central to PostgreSQL, SELECT
is easily the most
complicated single command, having the most available clauses and parameters. The following is
the syntax for SELECT.
The terms used are summarized and described in
greater detail within the following sections. The term expression
is used to refer to either a column name, or a general
expression (such as a column being operated upon by a constant, or another column).
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ] target [ AS name ] [,...] [ FROM source [, ...] ] [ [ NATURAL ] join_type source [ ON condition | USING ( column_list ) ] ] [, ...] [ WHERE condition ] [ GROUP BY expression [, ...] ] [ HAVING condition [, ...] ] [ { UNION | INTERSECT | EXCEPT } [ ALL ] sub-query ] [ ORDER BY expression [ ASC | DESC | USING operator ] [, ...] ] [ FOR UPDATE [ OF table [, ...] ] ] [ LIMIT { count | ALL } [ { OFFSET | , } start ] ]
In this syntax diagram, source
may be either a
table name or a subselect. The syntax for these general forms is as follows:
FROM { [ ONLY ] table [ [ AS ] alias [ ( column_alias [, ...] ) ] ] | ( query ) [ AS ] alias [ ( column_alias [, ...] ) ] }
ALL
The
ALL
keyword may be specified as a noise term to make it clear that all rows should be returned.DISTINCT [ ON (
expression
[, ...] ) ]
The
DISTINCT
clause specifies a column (or expression) for which to retrieve only one row per unique value ofexpression
.target
[ AS
name
] [, ...]
The
SELECT
targets are usually column names, though they can be constants, identifiers, functions or general expressions. Eachtarget
requested must be separated by commas, and may be named dynamically toname
via theAS
clause. Supplying the asterisk symbol (*
) as a target is shorthand for requesting all non-system columns, and may be listed along with other targets.FROM
source
[, ... ]
The
FROM
clause dictates thesource
that PostgreSQL will look in for the specifiedtargets.
Thesource,
in this case, may be a table name or a sub-query. You can specify numerous sources, separated by commas. (This is roughly equivalent to a cross join). The syntax for theFROM
clause is described in more detail later in this section.[ NATURAL ]
join_type source
[ ON
condition
| USING (
column_list
) ]The
FROM
sources may be joined together via theJOIN
clause, which requires ajoin_type
(e.g.,INNER, FULL OUTER, CROSS)
and may require acondition
orcolumn_list
to further define the nature of the join, depending on thejoin_type
.WHERE
condition
The
WHERE
clause constrains the result set from theSELECT
statement to specified criteria, which are defined bycondition.
Conditions must return a single Boolean value (true or false), but may consist of several checks combined with logical operators (e.g., withAND,
andOR)
to indicate that available rows must meet all supplied conditions to be included in the statementâs results.GROUP BY
expression
[, ...]
The
GROUP BY
clause aggregates (groups) rows together by the criteria described inexpression
. This can be as simple as a column name (and often is) or an arbitrary expression applied to values of the result set.HAVING
condition
[, ...]
The
HAVING
clause is similar to theWHERE
clause, but checks its conditions on aggregated (grouped) sets instead of atomic rows.{ UNION | INTERSECT | EXCEPT } [ ALL ]
sub-query
Performs one of three set operations between the
SELECT
statement and a second query, returning their result sets in uniform column structure (which must be compatible). Duplicate rows are removed from the resultant set unless theALL
keyword is used.UNION
Returns the set of collected rows.
INTERSECT
Returns the set of rows where the values of the two sets overlap.
EXCEPT
Returns the set of rows which are found in the
SELECT
statement, but not found in the secondary query.
ORDER BY
expression
Sorts the results of the
SELECT
statement byexpression.
[ ASC | DESC | USING operator ]
Determines whether or not the
ORDER BY
expression
proceeds in ascending order(ASC),
or descending order(DESC).
Anoperator
may alternatively be specified with theUSING
keyword (e.g.,<
or>
).FOR UPDATE [ OF
table
[, ...] ]
Allows for exclusive locking of the returned rows. When used within a transaction block,
FOR UPDATE
locks the rows of the specified table until the transaction is committed. While locked, the rows cannot be updated by other transactions.LIMIT
{count
| ALL }Limits the number of rows returned to a maximum of
count,
or explicitly allowsALL
rows.{ OFFSET | , }
start
Instructs the
LIMIT
clause at what point to begin limiting the results. For example, aLIMIT
with acount
set to 100, and anOFFSET
clause with astart
value of 50 would return the rows from 50 to 150 (if there are that many results to return).
Terms used in the FROM
clauseâs syntax description are as
follows:
[ ONLY ]
table
The
table
name specifies what table to use as a source for theSELECT
statement. Specifying theONLY
clause causes the rows of any childâs table to be omitted from the query.[ AS ]
alias
An
alias
may optionally be assigned to aFROM
source, in order to simplify a query (e.g.,books
might be temporarily referenced with an alias ofb).
TheAS
term is considered noise, and is optional.(
query
) [ AS ]
alias
Any valid
SELECT
statement may be placed in parentheses as thequery
. This causes the result set created by the query to be used as aFROM
source, as if it had been a static table. This use of a sub-query requires a specifiedalias
.(
column_alias
[, ...] )
The
FROM
sources which have assigned aliases may also alias columns by specifying arbitrary column aliases. Eachcolumn_alias
must be separated by commas, and grouped within parentheses following theFROM
sourceâs alias. These aliases must match the order of the defined columns in the table to which it is applied.
A SELECT
statement may be as simple as a request for all rows and all
columns from a specified table. Use the following syntax to retrieve all rows and columns from
a table:
SELECT * FROM table_name;
The asterisk (*
) character, as mentioned in the explanation of
SELECT
âs syntax, is short-hand for all non-system columns. In essence, the
SELECT *
requests all non-system data in the table named
table_name
; this retrieves all columns and all rows,
because no row limit is specified. To demonstrate, Example 4-23 requests all columns (*
)
from Book Townâs books
table.
Example 4-23. Selecting all from the books table
booktown=# SELECT * FROM books;
id | title | author_id | subject_id
-------+-----------------------------+-----------+------------
7808 | The Shining | 4156 | 9
4513 | Dune | 1866 | 15
4267 | 2001: A Space Odyssey | 2001 | 15
1608 | The Cat in the Hat | 1809 | 2
1590 | Bartholomew and the Oobleck | 1809 | 2
25908 | Franklin in the Dark | 15990 | 2
1501 | Goodnight Moon | 2031 | 2
190 | Little Women | 16 | 6
1234 | The Velveteen Rabbit | 25041 | 3
2038 | Dynamic Anatomy | 1644 | 0
156 | The Tell-Tale Heart | 115 | 9
41472 | Practical PostgreSQL | 1212 | 4
41473 | Programming Python | 7805 | 4
41477 | Learning Python | 7805 | 4
41478 | Perl Cookbook | 7806 | 4
(15 rows)
While SELECT *
is a good example of a basic query, and is sometimes
very useful, you will probably be interested in retrieving only a few columns worth of
information at a time. To stay efficient, and to keep your queries clear, it is a good idea to
explicitly specify the intended target columns rather than to use the asterisk. This is
especially true when using the JOIN
clause, as will be discussed in the
section titled Joining Data Sets with JOIN.
To specify the target columns for a query, list the names of the columns following the
SELECT
keyword. The query will return data for only
those columns that you list. The order of these columns need not match their literal order in
the table, and columns may be listed more than once, or not at all, as shown in Example 4-24.
Example 4-24. Re-ordering columns
booktown=# SELECT id, author_id, title, id booktown-# FROM books; id | author_id | title | id -------+-----------+-----------------------------+------- 7808 | 4156 | The Shining | 7808 4513 | 1866 | Dune | 4513 4267 | 2001 | 2001: A Space Odyssey | 4267 1608 | 1809 | The Cat in the Hat | 1608 1590 | 1809 | Bartholomew and the Oobleck | 1590 25908 | 15990 | Franklin in the Dark | 25908 1501 | 2031 | Goodnight Moon | 1501 190 | 16 | Little Women | 190 1234 | 25041 | The Velveteen Rabbit | 1234 2038 | 1644 | Dynamic Anatomy | 2038 156 | 115 | The Tell-Tale Heart | 156 41472 | 1212 | Practical PostgreSQL | 41472 41473 | 7805 | Programming Python | 41473 41477 | 7805 | Learning Python | 41477 41478 | 7806 | Perl Cookbook | 41478 (15 rows)
As you can see, the data sets returned in both Example 4-24 and
Example 4-23 are nearly identical. The second set is
returned in a different column arrangement, (omitting the subject_id
column, and repeating the id
column twice) as a result of the target
list.
In addition to plain column names, targets in the SELECT
statement may
be arbitrary expressions (e.g., involving functions, or operators acting upon identifiers), or
constants. The syntax is simple, and only requires that each identifier, expression, or
constant be separated by commas. Conveniently, different types of targets may be arbitrarily
mixed in the target list.
In fact, the SELECT
command may be used to retrieve expressions and
constants without the use of a FROM
clause or specified columns, as in
Example 4-25.
Example 4-25. Using expressions and constants
testdb=# SELECT 2 + 2, testdb-# pi(), testdb-# 'PostgreSQL is more than a calculator!'; ?column? | pi | ?column? ----------+------------------+--------------------------------------- 4 | 3.14159265358979 | PostgreSQL is more than a calculator! (1 row)
The target list allows the use of an optional AS
clause for each
specified target, which re-names a column in the returned result set to an arbitrary name
specified in the clause. The rules and limitations for the specified name are the same as for
normal identifiers (e.g., they may be quoted to contain spaces, may not be keywords unless
quoted, and so on).
Using AS
has no lasting effect on the column itself, but only on the
result set which is returned by the query. AS
can be particularly useful
when selecting expressions or constants, rather than plain columns. Naming result set columns
with AS
can clarify the meaning of an otherwise ambiguous expression or
constant. This technique is demonstrated in Example 4-26, which shows the same results as Example 4-25, but with different column headings.
Example 4-26. Using the AS clause with expressions and constants
booktown=# SELECT 2 + 2 AS "2 plus 2", booktown-# pi() AS "the pi function", booktown-# 'PostgreSQL is more than a calculator!' AS comments; 2 plus 2 | the pi function | comments ----------+------------------+--------------------------------------- 4 | 3.14159265358979 | PostgreSQL is more than a calculator! (1 row)
The
FROM
clause allows you to choose either a table or a result set as a
source for your specified target list. Multiple sources may be entered following the FROM
clause, separated by commas. Specifying multiple sources in this fashion is
functionally similar to a CROSS JOIN,
discussed in the section titled Joining Data Sets with JOIN.
Take care when specifying multiple FROM
sources to PostgreSQL. The
result of performing a SELECT
on several comma-delimited sources without a
WHERE
or JOIN
clause to qualify the relationship
between the sources is that the complete Cartesian product of the sources
will be returned. This is a result set where each column from each source is combined in every
possible combination of rows between each other source.
Typically a WHERE
clause is used to define the relationship between
comma-delimited FROM
sources, as shown in Example 4-27 (see the section titled Qualifying with the WHERE Clause for more information about the WHERE
clause).
You must be careful when identifying column names and using multiple sources in the
FROM
clause, as it can introduce ambiguity between identifiers. Consider a
SELECT
that draws from both the books
table and the
authors
table. Each of these tables has a column called id.
If specified, PostgreSQL will be unable to determine if the id
column refers to the book, or the author:
booktown=# SELECT id FROM books, authors;
ERROR: Column reference "id" is ambiguous
As a result of the potential for ambiguity, âcompleteâ column names can be referenced
through a special syntax called dot-notation. Dot-notation refers to the
placement of a dot, or period, between the table name and a column name, in order to
explicitly reference a particular column. For example, books.id
refers to
the id
column within the books
table.
Dot-notation is only required in instances of ambiguity between data
sets. As shown in Example 4-27, you can use the
column name as an identifier source, as long as it is unique among the available sets defined
by the FROM
clause. (In this case, the title
column,
which is unique to the books
table, and the last_name
column, which is unique to the authors
tables).
Example 4-27. Selecting from multiple table sources
booktown=# SELECT books.id, title, authors.id, last_name booktown-# FROM books, authors booktown-# WHERE books.author_id = authors.id; id | title | id | last_name -------+-----------------------------+-------+-------------- 190 | Little Women | 16 | Alcott 156 | The Tell-Tale Heart | 115 | Poe 41472 | Practical PostgreSQL | 1212 | Worsley 2038 | Dynamic Anatomy | 1644 | Hogarth 1608 | The Cat in the Hat | 1809 | Geisel 1590 | Bartholomew and the Oobleck | 1809 | Geisel 4513 | Dune | 1866 | Herbert 4267 | 2001: A Space Odyssey | 2001 | Clarke 1501 | Goodnight Moon | 2031 | Brown 7808 | The Shining | 4156 | King 41473 | Programming Python | 7805 | Lutz 41477 | Learning Python | 7805 | Lutz 41478 | Perl Cookbook | 7806 | Christiansen 25908 | Franklin in the Dark | 15990 | Bourgeois 1234 | The Velveteen Rabbit | 25041 | Bianco (15 rows)
If you wish to use a sub-query to generate a result set as a source for your FROM
clause, the entire query must be surrounded by parentheses. This instructs
PostgreSQL to correctly interpret the query as a sub -SELECT
statement and
to execute it before the SELECT
statement within which it resides.
Example 4-28 demonstrates a peculiar query which
retrieves all column values (*
) from the books
table via
a sub-query. The query then retrieves a string constant of test and the
id
values from that result set (derived from the sub-query).
Example 4-28. Selecting from a sub-query
booktown=# SELECT 'test' AS test, id booktown-# FROM (SELECT * FROM books) booktown-# AS example_sub_query; test | id ------+------- test | 7808 test | 4513 test | 4267 test | 1608 test | 1590 test | 25908 test | 1501 test | 190 test | 1234 test | 2038 test | 156 test | 41472 test | 41473 test | 41477 test | 41478 (15 rows)
The query in Example 4-28 is rather peculiar because the
net effect is no different than if you had selected from the books
table.
This occurs because the result set from the sub-query is identical to the set of values in the
books
table. The use of this query demonstrates the combination of a
string constant from one SELECT
statement with a value drawn from the
result set of a second SELECT
statement. See the section titled Using Sub-Queries for more realistic examples of sub-queries once you have a
better understanding of the SELECT
statement itself.
Note
When specifying a table that is inherited by other tables, you may provide the optional
ONLY
keyword before the table name to indicate that you do not want to
draw from any sub-tables. (See Chapter 7 for more information on
inheritance.)
Like columns, FROM
sources (e.g., tables, or sub-queries) may be
aliased with the AS
clause. This is usually applied as a convenient
shorthand for the dot-notation described in the preceding section. Aliasing a data set allows
you to refer to it via dot-notation, which provides a more succinct and readable SQL
statement. Example 4-29 demonstrates the same query used in
Example 4-27, however you can see that it
simplifies the dot-notation with the AS
clause.
Example 4-29. Aliasing FROM sources
booktown=# SELECT b.id, title, a.id, last_name booktown-# FROM books AS b, authors AS a booktown-# WHERE b.author_id = a.id; id | title | id | last_name -------+-----------------------------+-------+-------------- 190 | Little Women | 16 | Alcott 156 | The Tell-Tale Heart | 115 | Poe 41472 | Practical PostgreSQL | 1212 | Worsley 2038 | Dynamic Anatomy | 1644 | Hogarth 1608 | The Cat in the Hat | 1809 | Geisel 1590 | Bartholomew and the Oobleck | 1809 | Geisel 4513 | Dune | 1866 | Herbert 4267 | 2001: A Space Odyssey | 2001 | Clarke 1501 | Goodnight Moon | 2031 | Brown 7808 | The Shining | 4156 | King 41473 | Programming Python | 7805 | Lutz 41477 | Learning Python | 7805 | Lutz 41478 | Perl Cookbook | 7806 | Christiansen 25908 | Franklin in the Dark | 15990 | Bourgeois 1234 | The Velveteen Rabbit | 25041 | Bianco (15 rows)
In addition to placing aliases on the FROM
clauseâs data sources, you
can place aliases on the columns within that source. This is done by
following a valid data sourceâs alias with a list of column aliases,
grouped in parentheses and separated by commas. A column alias list therefore consists of a
sequence of identifier aliases for each column, which correspond to the literal columns in the
order that the table is defined with (from left to right).
When describing a column alias list, you do not need to specify each column; any column that is left unspecified is accessible via its normal name within such a query. If the only column you wish to alias is to the right of any other columns that you do not necessarily wish to alias, you will need to explicitly list the preceding columns (it is valid to list the same name for an existing column as its âaliasâ). Otherwise, PostgreSQL will have no way of knowing which column you were attempting to alias and will assume you were addressing the first column from the left.
Note
The AS
keyword is technically considered noise, and may be omitted in
practice; PostgreSQL determines that any stray identifiers following a FROM
source may be used as aliases.
Example 4-30 illustrates the same query that is used in Example 4-29 but aliases the id
columns in
each table to unique identifiers in order to reference them directly (i.e., without
dot-notation). The syntax is functionally identical, aliasing only the books
tableâs id
column, thus making the authors
tableâs id
column non-ambiguous:
Example 4-30. Aliasing columns
booktown=# SELECT the_book_id, title, id, last_name booktown-# FROM books AS b (the_book_id), authors booktown-# WHERE author_id = id; the_book_id | title | id | last_name -------------+-----------------------------+-------+-------------- 190 | Little Women | 16 | Alcott 156 | The Tell-Tale Heart | 115 | Poe 41472 | Practical PostgreSQL | 1212 | Worsley 2038 | Dynamic Anatomy | 1644 | Hogarth 1608 | The Cat in the Hat | 1809 | Geisel 1590 | Bartholomew and the Oobleck | 1809 | Geisel 4513 | Dune | 1866 | Herbert 4267 | 2001: A Space Odyssey | 2001 | Clarke 1501 | Goodnight Moon | 2031 | Brown 7808 | The Shining | 4156 | King 41473 | Programming Python | 7805 | Lutz 41477 | Learning Python | 7805 | Lutz 41478 | Perl Cookbook | 7806 | Christiansen 25908 | Franklin in the Dark | 15990 | Bourgeois 1234 | The Velveteen Rabbit | 25041 | Bianco (15 rows)
The
optional DISTINCT
keyword excludes duplicate rows from the result set. If
supplied without the ON
clause, a query that specifies DISTINCT
will exclude any row whose target columns have already been retrieved
identically. Only columns in the SELECT
âs target list will be
evaluated.
For example, the books
table has 15 rows, each with an author_id.
Some authors may have several entries in the books
table, causing there to be several rows with the same author_id.
Supplying
the DISTINCT
clause, as shown in the first query in Example 4-31, ensures that the result set will not have two identical
rows.
Example 4-31. Using DISTINCT
booktown=# SELECT DISTINCT author_id booktown-# FROM books; author_id ----------- 16 115 1212 1644 1809 1866 2001 2031 4156 7805 7806 15990 25041 (13 rows) booktown=# SELECT DISTINCT ON (author_id) booktown-# author_id, title booktown-# FROM books; author_id | title -----------+----------------------- 16 | Little Women 115 | The Tell-Tale Heart 1212 | Practical PostgreSQL 1644 | Dynamic Anatomy 1809 | The Cat in the Hat 1866 | Dune 2001 | 2001: A Space Odyssey 2031 | Goodnight Moon 4156 | The Shining 7805 | Programming Python 7806 | Perl Cookbook 15990 | Franklin in the Dark 25041 | The Velveteen Rabbit (13 rows)
As you can see, the first query in
Example 4-31 returns only 13 rows from the books
table, even though there are 15 total rows within it. Two authors with two books each end up
being displayed only once.
The second query in Example 4-31 uses a different form of DISTINCT,
which specifies the columns (or expressions) to be checked for
redundancies. In this case, 13 rows are still returned, as the ON
clause
specifies to use the author_id
column as the basis for determining if a row
is redundant or not. Without the ON
clause, the second query would return
all 15 rows, because the DISTINCT
clause would cause PostgreSQL to look for
rows that are completely unique.
The titles that are omitted from the resultant data set by ON
are
arbitrarily determined by PostgreSQL, unless an ORDER BY
clause is
specified. If the ORDER BY
clause is used with DISTINCT,
you can specify the order in which columns are selected; hence, you can select which rows will
be considered distinct first. See the section titled Sorting Rows with ORDER BY for information about sorting rows.
If you are interested in grouping rows which have non-unique criteria, rather than
omitting all rows but one, see the description of the GROUP BY
clause in
the section titled Grouping Rows with GROUP BY.
The WHERE
clause allows you to provide Boolean (true or false)
conditions that rows must satisfy to be included in the resulting row set. In practice, a
SELECT
statement will almost always contain at least one qualification via
the WHERE
clause.
Suppose that you want to see all of the books in Book Townâs Computers section. The
subject_id
for the Computers subject is 4. Therefore, the WHERE
clause can be applied with an equivalence operation (the =
operator) to check for rows in the books
table with a
subject_id
equal to 4. This is shown in Example 4-32.
Example 4-32. A simple WHERE clause
booktown=# SELECT * FROM books booktown-# WHERE subject_id = 4; id | title | author_id | subject_id -------+----------------------+-----------+------------ 41472 | Practical PostgreSQL | 1212 | 4 41473 | Programming Python | 7805 | 4 41477 | Learning Python | 7805 | 4 41478 | Perl Cookbook | 7806 | 4 (4 rows)
The query in Example 4-32 returns only rows whose subject_id
column matches the integer constant value of 4. Thus, only the four
rows for computer books are returned, rather than the 15 rows shown by the simple query in
Example 4-23.
The WHERE
clause accepts numerous conditions, provided that they are
joined by valid logical keywords (e.g., the AND,
and OR
keywords) and returns a single Boolean condition. For example, you may be interested in seeing
all Book Town titles that fall under the Computers subject which are also
by the author Mark Lutz, thus joining two conditions to narrow the focus of your query.
Alternatively, you might be interested in seeing each of Book Townâs titles that fall under
either the Computers subject or the Arts subject, thereby joining two conditions to broaden
the focus of your intended result set. Example 4-33 demonstrates each of these scenarios
using the AND
keyword and OR
keyword,
respectively.
Example 4-33. Combining conditions in the WHERE clause
booktown=# SELECT title FROM books booktown-# WHERE subject_id = 4 booktown-# AND author_id = 7805; title -------------------- Programming Python Learning Python (2 rows) booktown=# SELECT title FROM books booktown-# WHERE subject_id = 4 booktown-# OR subject_id = 0; title ---------------------- Dynamic Anatomy Practical PostgreSQL Programming Python Learning Python Perl Cookbook (5 rows)
The first SELECT
statement in Example 4-33 combines one condition, which checks for
titles in the Computers subject (with a subject_id
of 4), with another
condition,
which checks if the author is Mark Lutz (with an author_id
of 7805) via the
AND
keyword. The result is a smaller data set, constrained to two rows
that fit both specified conditions.
The second SELECT
statement in Example 4-33 combines the same first condition (books
in the Computers subject) with a second condition: if the title falls under the Arts subject
(with a subject_id
of 0). The result is a slightly larger data set of five
rows that matched at least one of these conditions.
WHERE
conditions may be grouped together indefinitely, though after
two conditions you may wish to group the conditions with parentheses. Doing so explicitly
indicates how the conditions are interrelated. As a demonstration, the two statements in Example 4-34 have different effects based merely on
the addition of parentheses.
Example 4-34. Grouping WHERE conditions with parentheses
booktown=# SELECT * FROM books booktown-# WHERE author_id = 1866 booktown-# AND subject_id = 15 booktown-# OR subject_id = 3; id | title | author_id | subject_id ------+----------------------+-----------+------------ 4513 | Dune | 1866 | 15 1234 | The Velveteen Rabbit | 25041 | 3 (2 rows) booktown=# SELECT * FROM books booktown-# WHERE author_id = 1866 booktown-# AND (subject_id = 15 booktown(# OR subject_id = 3); id | title | author_id | subject_id ------+-------+-----------+------------ 4513 | Dune | 1866 | 15 (1 row)
The preceding example demonstrates two attempts to look up Book Town titles with an
author_id
of 1866. The titles also have a subject_id
of
either 15, or 3. As you can see from the first statement, when the three conditions are used
without parentheses, the intent of the statement is ambiguous, and interpreted incorrectly.
The addition of parentheses will cause the evaluations within parentheses to be considered
before any surrounding condition.
As demonstrated by the use of the WHERE
clause on two table sources in
the section titled Selecting Sources with the FROM Clause, you have the
ability to retrieve data from different data sources by combining their columns into joined
rows. In SQL, this process is formally called a join.
The essential concept behind a join is that two or more data sets, when joined, have
their columns combined into a new set of rows containing each of the
columns requested from each of the data sets. The foundation of all joins is the
Cartesian product, which is the set of all possible combinations between
two data sets. That product may then be refined into a smaller subset by a set of criteria in
the JOIN
syntax. These criteria describe a relationship between data sets,
though such a definition is not required.
There are three general types of joins:
Cross joins
Creates a Cartesian product (or cross product) between two sets of data. It is called a product because it does not define a relationship between the sets; instead, it returns every possible combination of rows between the joined sets, essentially multiplying the sources by one another.
Inner joins
Creates a subset of the Cartesian product between two sets of data, requiring a conditional clause to specify criteria upon which to join records. The condition must return a Boolean value to determine whether or not a row is included in the joined set.
Outer joins
Similar to an inner join, in that it accepts criteria which will match rows between two sets of data, but returns at least one instance of each row from a specified set. This is either the left set (the data source to the left of the
JOIN
keyword), the right set (the data source to the right of theJOIN
keyword), or both sets, depending on the variety of outer join employed. The missing column values for the empty half of the row which does not meet the join condition are returned asNULL
values.
A cross join is functionally identical to listing comma-delimited sources. It therefore
should almost always be accompanied by a WHERE
clause to qualify the
relationship between the joined data sets. Example 4-35 demonstrates
the same functional query used in Example 4-27,
substituting the formal JOIN
syntax for the comma.
Example 4-35. A simple CROSS JOIN
booktown=# SELECT b.id, title, a.id, last_name booktown-# FROM books AS b CROSS JOIN authors AS a booktown-# WHERE b.author_id = a.id; id | title | id | last_name -------+-----------------------------+-------+-------------- 190 | Little Women | 16 | Alcott 156 | The Tell-Tale Heart | 115 | Poe 41472 | Practical PostgreSQL | 1212 | Worsley 2038 | Dynamic Anatomy | 1644 | Hogarth 1608 | The Cat in the Hat | 1809 | Geisel 1590 | Bartholomew and the Oobleck | 1809 | Geisel 4513 | Dune | 1866 | Herbert 4267 | 2001: A Space Odyssey | 2001 | Clarke 1501 | Goodnight Moon | 2031 | Brown 7808 | The Shining | 4156 | King 41473 | Programming Python | 7805 | Lutz 41477 | Learning Python | 7805 | Lutz 41478 | Perl Cookbook | 7806 | Christiansen 25908 | Franklin in the Dark | 15990 | Bourgeois 1234 | The Velveteen Rabbit | 25041 | Bianco (15 rows)
This syntax is merely a more formal way of stating the relationship between the two data
sets. There is no functional difference between the CROSS JOIN
syntax and
using a simple comma delimited list of tables.
More useful are the inner and outer joins, which require a
qualification of the relationship between joined data sets in the JOIN
syntax itself. The following is the syntax for an inner or outer join:
source1 [ NATURAL ] join_type source2 [ ON ( condition [, ...] ) | USING ( column [, ...] ) ]
source1
Identifies the first data set that is being joined (i.e., a table name or sub-query).
[ NATURAL ]
Implies that the two data sets should be joined on equivalent values between like-named columns (e.g., if two tables have a column called
id,
it will join rows where theid
values are equivalent). TheNATURAL
clause will respect column aliases, if applied. The use of theNATURAL
clause makes it both unnecessary and invalid to try to specify either of theON
orUSING
clauses.join_type
Specifies the type of
JOIN
intended. Valid values in this context are[ INNER ] JOIN
(specifying justJOIN
implies anINNER JOIN), LEFT [ OUTER] JOIN, RIGHT [ OUTER ] JOIN,
andFULL [ OUTER ] JOIN.
source2
Identifies the second data set that is being joined (i.e., a table name, or sub-query).
ON (
condition
[, ...] )
Identifies the second data set that is being joined (i.e., a table name, or sub-query).
Specifies the relationship between
source1
andsource2.
Any arbitrary criteria may be specified within theON
clause, just as you would specify conditions following aWHERE
clause. Column and table aliases are allowed in this criteria.USING (
column
[, ...] )
Specifies like-named columns between
source1
andsource2
with which to join rows by equivalent values. Similar to aNATURAL JOIN,
but allows you to indicate what specific columns to join on, whereasNATURAL
will join on all like-named columns. Similar toNATURAL
joins, column aliases are respected in theUSING
clauseâs parameters.
The SQL92 INNER JOIN
syntax is a tool that helps differentiate the
conditions with which you are joining data sources (the JOIN
conditions)
from the conditions with which you are evaluating rows for inclusion in your data set (the
WHERE
conditions). For example, consider the two SELECT
statements in Example 4-36.
Example 4-36. Comparing INNER JOIN to WHERE
booktown=# SELECT title, last_name, first_name booktown-# FROM books, authors booktown-# WHERE (books.author_id = authors.id) booktown-# AND last_name = 'Geisel'; title | last_name | first_name -----------------------------+-----------+--------------- The Cat in the Hat | Geisel | Theodor Seuss Bartholomew and the Oobleck | Geisel | Theodor Seuss (2 rows) booktown=# SELECT title, last_name, first_name booktown-# FROM books AS b INNER JOIN authors AS a booktown-# ON (b.author_id = a.id) booktown-# WHERE last_name = 'Geisel'; title | last_name | first_name -----------------------------+-----------+--------------- The Cat in the Hat | Geisel | Theodor Seuss Bartholomew and the Oobleck | Geisel | Theodor Seuss (2 rows)
The two forms of syntax in Example 4-36 are
functionally identical, and return the same results. The INNER JOIN
syntax
allows you to segregate the relational criteria from your evaluation criteria by only
defining the set relationships in the ON
clause. This can make involved
queries much easier to read and maintain, as you do not need to interpret what each condition
described by the WHERE
clause is conceptually achieving.
Notice that the second query demonstrates the use of aliases b
and
a
in the ON
clause for the books
and authors
tables, respectively. The use of these aliases in the ON
clause is perfectly valid, and often preferable from a perspective of improved
readability.
In cases of simple equivalence joins, it may be more convenient for you to use either
the USING
or NATURAL
clauses instead of the ON
clause. These are only applicable on data sets with identically named columns.
If you have columns that define a relationship between two sets that are not identically
named, you may still use the USING
or NATURAL
clauses
by employing column aliases, as demonstrated in Example 4-37,
to re-name one or both of the columns to a uniform name.
Example 4-37. The NATURAL and USING clauses
booktown=# SELECT title, last_name, first_name booktown-# FROM books INNER JOIN authors AS a (author_id) booktown-# USING (author_id) booktown-# WHERE last_name = 'Geisel'; title | last_name | first_name -----------------------------+-----------+--------------- The Cat in the Hat | Geisel | Theodor Seuss Bartholomew and the Oobleck | Geisel | Theodor Seuss (2 rows) booktown=# SELECT title, last_name, first_name booktown-# FROM books NATURAL INNER JOIN authors AS a (author_id) booktown-# WHERE last_name = 'Geisel'; title | last_name | first_name -----------------------------+-----------+--------------- The Cat in the Hat | Geisel | Theodor Seuss Bartholomew and the Oobleck | Geisel | Theodor Seuss (2 rows)
The first SELECT
statement in Example 4-37 assigns the alias of author_id
to
the first column in the authors
table (which is actually named id).
By passing the author_id
identifier to the USING
clause, PostgreSQL then searches for a column identifier in each data set
with that name to join rows on values found to be equivalent.
Inner joins are adequate for a wide variety of queries, but there are times when an outer join is required to get all of the data you need. The key to understanding the difference between inner and outer joins is in knowing how each type of join handles rows that do not meet their defined relationship.
In short, an inner join will discard any row for which it cannot find a corresponding
value between the sets being joined (as specified by either the ON
or
USING
clause).
In contrast to inner joins, an outer join can retain rows where
corresponding values between sets are not found, populating the missing columns with NULL
values. Whether or not the outer join does retain that
row depends on which set is missing the value and the kind of outer join that is
specified.
There are three forms of outer joins:
Left outer joins
Will always return at least one instance of each row in the set of rows to the left of the
JOIN
keyword. Missing columns in the right set are populated withNULL
values.
Right outer joins
Will always return at least one instance of each row in the set of rows to the right of the
JOIN
keyword. Missing columns in the left set are populated withNULL
values.
Full outer joins
Will always return at least one instance of each row in each joined set. Missing columns on either side of the new set will be populated with
NULL
values.
Consider again Book Townâs books
table, and another Book Town table
called editions.
While the books
table stores general
information on a given title, the editions
table stores specific
information pertaining to each edition, such as an the bookâs ISBN, publisher, and
publication date. The editions
table has a column called book_id
which corresponds to the books
tableâs primary key
column, id.
Suppose that you want to retrieve each of Book Townâs titles, along with its isbn, if
applicable. Performing a query with an inner join between the books
and
editions
tables will correctly return a data set with title and isbn
columns. However, as demonstrated in Example 4-38, if a
book does not yet have a printed edition (or if that edition has not yet been entered into
Book Townâs database), those titles will not be displayed.
In contrast, the statement immediately following the inner join in Example 4-38 employs an outer join, returning 20 rows. Three of the returned rows do not have ISBN numbers, but are not omitted due to the definition of the join.
Example 4-38. Inner joins versus outer joins
booktown=# SELECT title, isbn booktown-# FROM books INNER JOIN editions booktown-# ON (books.id = editions.book_id); title | isbn -----------------------------+------------ The Tell-Tale Heart | 1885418035 The Tell-Tale Heart | 0929605942 Little Women | 0760720002 The Velveteen Rabbit | 0679803335 Goodnight Moon | 0694003611 Bartholomew and the Oobleck | 0394800753 The Cat in the Hat | 039480001X The Cat in the Hat | 0394900014 Dynamic Anatomy | 0823015505 2001: A Space Odyssey | 0451457994 2001: A Space Odyssey | 0451198492 Dune | 0441172717 Dune | 044100590X The Shining | 0451160916 The Shining | 0385121679 Franklin in the Dark | 0590445065 Programming Python | 0596000855 (17 rows) booktown=# SELECT title, isbn booktown-# FROM books LEFT OUTER JOIN editions booktown-# ON (books.id = editions.book_id); title | isbn -----------------------------+------------ The Tell-Tale Heart | 1885418035 The Tell-Tale Heart | 0929605942 Little Women | 0760720002 The Velveteen Rabbit | 0679803335 Goodnight Moon | 0694003611 Bartholomew and the Oobleck | 0394800753 The Cat in the Hat | 039480001X The Cat in the Hat | 0394900014 Dynamic Anatomy | 0823015505 2001: A Space Odyssey | 0451457994 2001: A Space Odyssey | 0451198492 Dune | 0441172717 Dune | 044100590X The Shining | 0451160916 The Shining | 0385121679 Franklin in the Dark | 0590445065 Practical PostgreSQL | Programming Python | 0596000855 Learning Python | Perl Cookbook | (20 rows)
The join specified by the second query in Example 4-38 uses the LEFT OUTER JOIN
clause to define its join type. This is
because the query focuses on titles from the books
table that have ISBN
numbers, and not those editions having ISBN numbers that do not correspond to titles. As the
books
table is to the left of the JOIN
keyword, it is
defined as a left outer join to achieve this. If the focus of the query was to see both ISBN
numbers without titles as well as titles without ISBN numbers, the same query could instead
be modified to be a full outer join with the FULL OUTER JOIN
clause.
The difference between inner and outer joins illustrated in Example 4-38 is a vital concept to understand, as misuse of joins can lead to both omitted and unexpected rows.
It should be understood that while a single JOIN
clause connects only
two sets of data, in practice, joins are not restricted to only two data sources. You may
arbitrarily specify numerous JOIN
clauses following sets that are
themselves constructed from joins, just as you may specify numerous data sources separated by
commas.
When connecting several joins together, it is a good practice to group each join and sub-join within parentheses. Explicitly grouping joins in this fashion insures that there is no ambiguity, to either PostgreSQL or a developer, as to which data sets are joined, and in what order.
Example 4-39. Joining many data sources
booktown=# SELECT a.last_name, p.name AS publisher, e.isbn, s.subject booktown-# FROM ((((authors AS a INNER JOIN books AS b booktown(# ON (a.id = b.author_id)) booktown(# INNER JOIN editions AS e ON (e.book_id = b.id)) booktown(# INNER JOIN publishers AS p ON (p.id = e.publisher_id)) booktown(# INNER JOIN subjects AS s ON (s.id = b.subject_id)); last_name | publisher | isbn | subject -----------+-----------------------------+------------+------------------ Hogarth | Watson-Guptill Publications | 0823015505 | Arts Brown | HarperCollins | 0694003611 | Children's Books Geisel | Random House | 0394800753 | Children's Books Geisel | Random House | 039480001X | Children's Books Geisel | Random House | 0394900014 | Children's Books Bourgeois | Kids Can Press | 0590445065 | Children's Books Bianco | Penguin | 0679803335 | Classics Lutz | O'Reilly & Associates | 0596000855 | Computers Alcott | Henry Holt & Company, Inc. | 0760720002 | Drama Poe | Mojo Press | 1885418035 | Horror Poe | Books of Wonder | 0929605942 | Horror King | Doubleday | 0451160916 | Horror King | Doubleday | 0385121679 | Horror Clarke | Roc | 0451457994 | Science Fiction Clarke | Roc | 0451198492 | Science Fiction Herbert | Ace Books | 0441172717 | Science Fiction Herbert | Ace Books | 044100590X | Science Fiction (17 rows)
An interesting observation to be made about Example 4-39 is that, while the books
table is
itself deeply involved in the join, none of its columns are retrieved in the final result
set. The books
table is included in the JOIN
clauses in
order to provide criteria through which other tables are joined together. Each of the tables
whose columns are retrieved in the query rely on the books table in order to draw
relationships with any other table through the id
column (with the
exception of the publishers
table, which relates to the publisher_id
column in the editions
table).
The GROUP BY
clause introduces a powerful SQL concept:
aggregation. To aggregate means to gather into a sum, or whole. The
practical effect of aggregating in a SQL query is that any rows whose results from the
GROUP BY
expression match identically are grouped together into a
single aggregate row. The GROUP BY
expression may
define a column, but it may also be any operation upon a column as well. If several columns or
expressions are specified (delimited by commas), the entire set of specified criteria must be
identical for rows to be grouped together.
To effectively use aggregation you must understand that any target columns requested by
an aggregating query which are not specified in the GROUP
BY
clause will be inaccessible, unless selected through an aggregate
function. An aggregate function accepts a column name (or expression involving at
least one column name) which can represent several values (i.e., from
several grouped rows), performs an operation on those values, and returns a single
value.
Common aggregate functions include count(),
which returns the number
of rows in the set, max()
, which returns the maximum value in the column,
and min()
, which returns the minimum value in the column. An aggregate
function operates only on rows in the queryâs result set, and is therefore executed
after conditional joins and WHERE
conditions have
been processed.
Imagine that you wanted to know how many books Book Town stores in its database for each
known publisher. You could perform a simple join between the editions
and
publishers
tables in order to associate each publisher name with a title
that they publish. It would be tedious to manually count how many titles each publisher
maintained, and in cases of larger data sets, it can become difficult to manage larger result
sets.
Example 4-40 demonstrates a join between these two Book Town tables,
but also introduces two new elements: the count()
function, and the
GROUP BY
clause.
Example 4-40. Using GROUP BY
booktown=# SELECT count(e.isbn) AS "number of books", booktown-# p.name AS publisher booktown-# FROM editions AS e INNER JOIN publishers AS p booktown-# ON (e.publisher_id = p.id) booktown-# GROUP BY p.name; number of books | publisher -----------------+----------------------------- 2 | Ace Books 1 | Books of Wonder 2 | Doubleday 1 | HarperCollins 1 | Henry Holt & Company, Inc. 1 | Kids Can Press 1 | Mojo Press 1 | O'Reilly & Associates 1 | Penguin 3 | Random House 2 | Roc 1 | Watson-Guptill Publications (12 rows)
The GROUP BY
clause in Example 4-40 instructs
PostgreSQL to group the rows in the joined data set by p.name,
which in
this query is a reference to the name
column in the publishers
table. Therefore, any rows that have the same publisher name will be
grouped together, or aggregated. The count()
function then counts the
number of isbn
values from the editions
table that are
in each aggregated row, and returns a single numeric value representing the number of rows
that were aggregated for each unique publisher.
Note that in Example 4-40 the argument of the editions
tableâs isbn
column is chosen simply to indicate the
objective of the example (to count how many books there are per publisher). Any column name
will return the same number, as count()
will always return the number of
rows grouped in the current aggregate row.
Something to watch out for when designing aggregate queries is that the WHERE
clause cannot accept criteria involving aggregate functions. Instead, use
the HAVING
clause. It functions identically to the WHERE
clause, but its conditions must be on aggregate functions rather than single-row conditions.
Syntactically, the HAVING
clause follows the GROUP BY
clause, as
demonstrated in Example 4-41.
Example 4-41. Using the HAVING clause
booktown=# SELECT count(e.isbn) AS "number of books", booktown-# p.name AS publisher booktown-# FROM editions AS e INNER JOIN publishers AS p booktown-# ON (e.publisher_id = p.id) booktown-# GROUP BY publisher booktown-# HAVING count(e.isbn) > 1; number of books | publisher -----------------+-------------- 2 | Ace Books 2 | Doubleday 3 | Random House 2 | Roc (4 rows)
Both Example 4-40 and Example 4-41
create a data set through an inner join between the editions
and publishers
table. However, Example 4-41 constrains
the final result to publishers having more than a single book in the Book Town database, as
set by the HAVING
clause.
As described in Chapter 3, row data is not stored in a
consistent order within tables. In fact, an identical query executed twice is in no way
guaranteed to return the rows in the same order each time. As order is commonly an important
part of retrieving data for database-dependent applications, use the ORDER
BY
clause to allow flexible sorting of your result set.
The ORDER BY
clause accepts as its parameters a list of
comma-delimited column names (or expressions upon columns), which are used as sorting
criteria. For each sort criteria, you may optionally apply either the ASC,
DESC,
or USING
keywords to control the type of sorting
employed:
ASC
Causes the rows to sort by the related criteria in an ascending fashion (e.g., numbers will be sorted lowest to highest, text will be sorted alphabetically from a to z).
ASC
is equivalent to specifyingUSING <
. Since it is the default behavior, specifyingASC
is only useful for explicit readability.DESC
Causes the rows to sort by the related criteria in a descending fashion (e.g., numbers will be sorted highest to lowest, text will be sorted alphabetically from z to a).
DESC
is equivalent to specifyingUSING >
.USING
operator
Allows the specification of the operator
operator
to be used to compare each column for precedence. This can be particularly useful for custom operators.
Example 4-42 demonstrates the use of the ORDER BY
clause on the editions
table. It specifies the publication
column as the source of values to sort by, and explicitly declares the
ordering method as an ascending (ASC)
sort.
Example 4-42. Using ORDER BY
booktown=# SELECT isbn, edition, publication booktown-# FROM editions booktown-# ORDER BY publication ASC; isbn | edition | publication ------------+---------+------------- 0760720002 | 1 | 1868-01-01 0679803335 | 1 | 1922-01-01 0694003611 | 1 | 1947-03-04 0394800753 | 1 | 1949-03-01 0394900014 | 1 | 1957-01-01 039480001X | 1 | 1957-03-01 0823015505 | 1 | 1958-01-01 0451160916 | 1 | 1981-08-01 0590445065 | 1 | 1987-03-01 0385121679 | 2 | 1993-10-01 1885418035 | 1 | 1995-03-28 0441172717 | 2 | 1998-09-01 0929605942 | 2 | 1998-12-01 044100590X | 3 | 1999-10-01 0451198492 | 3 | 1999-10-01 0451457994 | 3 | 2000-09-12 0596000855 | 2 | 2001-03-01 (17 rows)
As you can see in the result set from Example 4-42, the rows return
in ascending order, from the oldest date to the newest. It should be noted that even columns
and expressions that do not appear in the target list of the SELECT
statement may be used to sort the retrieved rows. Furthermore, aggregate functions and
expressions are allowed by the ORDER BY
clause if the query involves
aggregation. The ability to sort by such a wide scope of sources thus allows for a great deal
of flexibility in ordering results from a variety of query approaches.
Note
If a column alias in the result set has the same name as a literal column in an input
source from which it is drawing rows, and it is used in the ORDER BY
clause, PostgreSQL will assume that it is a reference to the named column in the result set,
not the column in the source set. This is an accepted inconsistency compared against the
default behavior of the GROUP BY
clause, as specified by the SQL92
standard.
When specifying multiple expressions to sort by, the result set will be ordered by the first criteria (from left to right), and will only process subsequent sorting criteria if the first conditionâs sort is inconclusive. For example, consider the sorting performed in Example 4-43.
Example 4-43. Using ORDER BY with multiple expressions
booktown=# SELECT edition, publication booktown-# FROM editions booktown-# ORDER BY edition ASC, booktown-# publication DESC; edition | publication ---------+------------- 1 | 1995-03-28 1 | 1987-03-01 1 | 1981-08-01 1 | 1958-01-01 1 | 1957-03-01 1 | 1957-01-01 1 | 1949-03-01 1 | 1947-03-04 1 | 1922-01-01 1 | 1868-01-01 2 | 2001-03-01 2 | 1998-12-01 2 | 1998-09-01 2 | 1993-10-01 3 | 2000-09-12 3 | 1999-10-01 3 | 1999-10-01 (17 rows)
The query in Example 4-43 selects the
numeric edition
and publication
date of each book from
the editions
table. The ORDER BY
clause then specifies
two columns to sort by: edition,
in ascending order, and publication,
in descending order.
As you can see in the result set for Example 4-43, each row is first sorted by edition, proceeding from the lower editions to the higher editions. Subsequently, wherever the editions are identical, the publication date is used to then sort again, from the most recent publication date to the least recent.
Sorting is extremely relevant when using the DISTINCT
keyword, as
discussed in the section titled Removing Duplicate Rows with DISTINCT. If
you are only interested in seeing the most recently published copy of each edition in the
editions
table, the ORDER BY
and DISTINCT
clauses can be combined to achieve an effect somewhat similar to the
GROUP BY
clause, as shown in Example 4-44.
Example 4-44. Using DISTINCT with ORDER BY
booktown=# SELECT DISTINCT ON (edition) booktown-# edition, publication booktown-# FROM editions booktown-# ORDER BY edition ASC, booktown-# publication DESC; edition | publication ---------+------------- 1 | 1995-03-28 2 | 2001-03-01 3 | 2000-09-12 (3 rows) booktown=# SELECT edition, max(publication) booktown-# FROM editions booktown-# GROUP BY edition; edition | max ---------+------------ 1 | 1995-03-28 2 | 2001-03-01 3 | 2000-09-12 (3 rows)
Since the ORDER BY
occurring before the DISTINCT
clause eliminates duplicate rows, the net effect can be very similar to using the max()
or min()
with a GROUP BY
clause. This
technique can sometimes be more efficient, depending on the complexity of the aggregation and
sorting involved.
Note
While never strictly necessary, PostgreSQL can accept integer constants as expressions
in the ORDER BY
clause, instead of column names or expressions. Such a
constant will be interpreted as representing the column that is at the numbered position in
the target list, from left to right, starting at 1 (e.g., ORDER BY 1 ASC
references the first column in the result set).
PostgreSQL enforces no limit upon the number of rows retrievable from a SQL query. If you attempt to execute a query that returns several million rows, it may take a while, but the server will not stop until it has returned the entire result set (or until it is interrupted).
Applications could conceivably be written to programmatically âpageâ through large sets
of data after retrieval, but SQL provides as a convenience the LIMIT
and
OFFSET
clauses, which allow for the retrieval of a specified portion of
the generated result set.
When the LIMIT
clause is specified, no more than the requested number
of rows will be returned (though there may be fewer if the result set is smaller than the
passed parameter). When the OFFSET
clause is specified, it skips the number
of rows defined by its parameters before returning rows. If both are specified, the number of
rows to be included as per the LIMIT
clause will not be counted until the
number of rows dictated by the OFFSET
clause have been skipped.
Example 4-45. Using LIMIT and OFFSET
booktown=# SELECT isbn, title, publication booktown-# FROM editions NATURAL JOIN books AS b (book_id) booktown-# ORDER BY publication DESC booktown-# LIMIT 5; isbn | title | publication ------------+-----------------------+------------- 0596000855 | Programming Python | 2001-03-01 0451457994 | 2001: A Space Odyssey | 2000-09-12 0451198492 | 2001: A Space Odyssey | 1999-10-01 044100590X | Dune | 1999-10-01 0929605942 | The Tell-Tale Heart | 1998-12-01 (5 rows) booktown=# SELECT isbn, title, publication booktown-# FROM editions NATURAL JOIN books AS b (book_id) booktown-# ORDER BY publication DESC booktown-# LIMIT 5 booktown-# OFFSET 2; isbn | title | publication ------------+-----------------------+------------- 0451198492 | 2001: A Space Odyssey | 1999-10-01 044100590X | Dune | 1999-10-01 0929605942 | The Tell-Tale Heart | 1998-12-01 0441172717 | Dune | 1998-09-01 1885418035 | The Tell-Tale Heart | 1995-03-28 (5 rows)
Example 4-45 demonstrates, in the first query, a simple use
of LIMIT,
by retrieving only 5 rows from the joined set of the editions
and books
table. Ordinarily, such a join would result
in 17 rows.
The second query in Example 4-45 shows the use of the
OFFSET
clause, to shift the scope of the result set down by two rows. You
can see that the last three rows of the first queryâs result set overlap with the first three
rows of the second queryâs result set. The ORDER BY
clause in each of these
queries insures the consistency of the sets returned.
While joins are used in SQL to combine column values into a single row, the UNION, INTERSECT
and EXCEPT
clauses exist to merge or omit row
data by comparing column values, returning a new result set based on this comparison. Each of
these keywords may be used at the end of a valid SQL query and followed by a second query, in
order to compare the resultant data sets, and then either merge or omit rows based on that
comparison.
When comparing data sets in this manner, it is required that they each have the same number of columns, as well as the same column type. Note that they do not need to have the same name, or be queried from the same table or data source.
UNION
A pair of queries merged with the
UNION
keyword will combine all non-distinct rows into a single data set. Like rows will not be duplicated.INTERSECT
A pair of queries merged with the
INTERSECT
keyword will cause any rows not found in both data sets to be omitted. As such, the only rows returned are those that overlap between the two query result sets.EXCEPT
A pair of queries merged with the
EXCEPT
keyword will cause any rows found in both data sets to be omitted from the returned data set. As such, only rows found in the query to the left of theEXCEPT
clause that are not found in the query to the right of the clause will be returned.
Example 4-46, Example 4-47, and Example 4-48 each demonstrate these keywords by combining and omitting rows from
comparative data sets. Example 4-46 creates a result set by combining several
authorsâ last names with book titles via the UNION
keyword.
Example 4-47 demonstrates the selection of ISBN numbers from the
books
table, limited to rows which intersect with the query on the
shipments
table for books which have records of more than two shipments.
Finally, Example 4-48 demonstrates the removal of any rows from the first
query which are matched completely in the second.
Example 4-46. Using UNION
booktown=# SELECT title FROM books booktown-# UNION booktown-# SELECT last_name FROM authors booktown-# LIMIT 11; title ----------------------------- 2001: A Space Odyssey Alcott Bartholomew and the Oobleck Bianco Bourgeois Brautigan Brite Brown Christiansen Clarke Denham (11 rows)
Example 4-47. Using INTERSECT
booktown=# SELECT isbn FROM editions booktown-# INTERSECT booktown-# SELECT isbn FROM shipments booktown-# GROUP BY isbn booktown-# HAVING count(id) > 2; isbn ------------ 039480001X 0394800753 0451160916 0590445065 0694003611 (5 rows)
Example 4-48. Using EXCEPT
booktown=# SELECT last_name, first_name booktown-# FROM authors booktown-# EXCEPT booktown-# SELECT last_name, first_name booktown-# FROM authors AS a (author_id) booktown-# NATURAL INNER JOIN books booktown-# ORDER BY first_name ASC; last_name | first_name -----------+------------ Denham | Ariel Gorey | Edward Brite | Poppy Z. Brautigan | Richard (4 rows)
In
Example 4-48, only rows that do not match the second query are returned.
Notice that the effective result of this is that only authors who do not have a book in the
books
table are returned. This is due to the INNER JOIN
clause, which causes the second query to omit any authors whose author_id
is not found in the books
table.
While the use of these keywords in a single SQL query precludes the ability to use the
LIMIT
clause, this limitation can be circumvented by PostgreSQLâs support
for sub-queries. By grouping in parentheses each of the queries involved between a UNION, EXCEPT,
or EXCEPT
clause, the returned result sets from
the sub-queries are compared, as demonstrated in Example 4-49.
Example 4-49. Comparing sub-query result sets
booktown=# (SELECT title FROM books ORDER BY title DESC LIMIT 7) booktown-# EXCEPT booktown-# (SELECT title FROM books ORDER BY title ASC LIMIT 11) booktown-# ORDER BY title DESC; title ---------------------- The Velveteen Rabbit The Tell-Tale Heart The Shining The Cat in the Hat (4 rows)
Notice that the query used in Example 4-49 creates a
set from the books
table that is constrained to the last seven rows and
sorted alphabetically by title. The EXCEPT
clause then removes from that
data set the first eleven rows, sorted alphabetically in an ascending fashion. The result
consists of the last four rows from the table, sorted from the bottom by the final ORDER BY
clause on the new exception set.
In order to achieve simple programmatic transformations without having to call out to a
procedural language, PostgreSQL supports standard SQL case expressions.
These use the SQL keywords CASE, WHEN, THEN,
and END
to
allow basic conditional transformations per each row.
The entirety of a case expression is syntactically placed within the SELECT
statementâs target list. A case expressionâs result column is named
case
by default, but it may be aliased in the same manner as any normal
target list. The general syntax for a case expression in a SELECT
statementâs target list is as follows:
CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 [ ... ] [ ELSE default_result ] END [ AS alias ]
The CASE, WHEN, THEN,
and ELSE
keywords are
somewhat similar to the if-then-else logic in programming languages. The condition
of a WHEN
clause must return a Boolean
result.
When a WHEN
condition is met, the result from its corresponding
THEN
clause will return in the result column for that row. If no
conditions are met, the ELSE
clause may be used to specify a default result
value. If there are no results found for a case expression, NULL
is
returned.
Example 4-50. Using case expressions in statements
booktown=# SELECT isbn, booktown-# CASE WHEN cost > 20 THEN 'over $20.00 cost' booktown-# WHEN cost = 20 THEN '$20.00 cost' booktown-# ELSE 'under $20.00 cost' booktown-# END AS cost_range booktown-# FROM stock booktown-# LIMIT 8; isbn | cost_range ------------+------------------- 0385121679 | over $20.00 cost 039480001X | over $20.00 cost 044100590X | over $20.00 cost 0451198492 | over $20.00 cost 0394900014 | over $20.00 cost 0441172717 | under $20.00 cost 0451160916 | over $20.00 cost 0679803335 | $20.00 cost (8 rows)
Adding to the power of case expressions are PostgreSQLâs sub-queries, described in the
section titled Using Sub-Queries. As demonstrated in Example 4-51, a sub-query may be provided as a
result
within a conditional expression.
Example 4-51. Using case expressions with sub-queries
booktown=# SELECT isbn, booktown-# CASE WHEN cost > 20 THEN 'N/A - (Out of price range)' booktown-# ELSE (SELECT title FROM books b JOIN editions e booktown(# ON (b.id = e.book_id) booktown(# WHERE e.isbn = stock.isbn) booktown-# END AS cost_range booktown-# FROM stock booktown-# ORDER BY cost_range ASC booktown-# LIMIT 8; isbn | cost_range ------------+----------------------------- 0451457994 | 2001: A Space Odyssey 0394800753 | Bartholomew and the Oobleck 0441172717 | Dune 0760720002 | Little Women 0385121679 | N/A - (Out of price range) 039480001X | N/A - (Out of price range) 044100590X | N/A - (Out of price range) 0451198492 | N/A - (Out of price range) (8 rows)
In Example 4-51,
any book found to have a cost of less than 20 has its title returned via a sub-select to the
books
table, along with its ISBN from the main query to the stock
table.
The INTO TABLE
clause may be used with any valid SELECT
query in order to create a new table with the column structure and row data
of the returned result set. The syntax for this is as follows:
SELECT select_targets INTO [ TABLE ] new_table FROM old_table;
This syntax performs an implicit CREATE TABLE
command, creating a
table with the same column names, value types, and row data as the result set from the
original table. When the message SELECT
is returned, you will know that the
statement was successfully performed, and the new table created. This is demonstrated in Example 4-52, which creates a backup table called stock_backup
out of the data in the stock
table.
Example 4-52. Using SELECT INTO
booktown=# SELECT * INTO stock_backup booktown-# FROM stock; SELECT
The table specified by the INTO
clause must not exist, or else an
error will be returned. Upon the error, the values of the query will not be inserted and the
query will fail. Note that the TABLE
keyword, in this query, is an optional
noise term.
Get Practical PostgreSQL now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.