Chapter 4. Reading Your Data
In this chapter, we will explore the key SQL statements and clauses needed to extract information from your database. You will learn about the fundamental SELECT
statement and the various subclauses you can use within it to select data and aggregate data into subtotals. By the end of this chapter, you will understand the concepts of JOIN
operations, SET
operations, aggregation, window operations, common table expressions, and subqueries. You will also know how to combine these effectively to query any data in the core database platforms.
How to Use This Chapter
When researching a command in this chapter:
Read “SQL Platform Support”.
Check Table 4-1.
Look up the specific SQL statement, check the syntax, and read the “Keywords,” “Rules at a glance,” and “Programming tips and gotchas” sections and read the section on the standard for SQL syntax and description. Do this even if you are looking for a specific platform implementation.
Finally, read the platform-specific implementation information.
You will note that the entry for a given platform implementation does not duplicate information on any clauses that do not differ from the standard. So, it is possible that you will need to flip between the descriptions for a vendor variation and the SQL standard to cover all possible details of that command.
In our discussions of MySQL, we will also include MariaDB, a fork of MySQL. For the most part, MySQL and MariaDB provide fully code-compatible syntax. In these cases we will refer to them collectively as MySQL. We will explicitly mention MariaDB only in situations where it deviates from MySQL in an important way.
SQL Platform Support
Table 4-1 provides a listing of the SQL statements covered in this chapter, the platforms that support them, and the degree to which they support them. The following list offers useful tips for reading Table 4-1, as well as an explanation of what each abbreviation stands for:
The first column contains the SQL commands, in alphabetical order.
The SQL statement class for each command is indicated in the second column.
The subsequent columns list the level of support for each vendor:
- Supported (S)
- The platform supports the SQL standard for the particular command.
- Supported, with variations (SWV)
- The platform supports the SQL standard for the particular command, using vendor-specific code or syntax.
- Supported, with limitations (SWL)
- The platform supports some but not all of the functions specified by the SQL standard for the particular command.
- Not supported (NS)
- The platform does not support the particular command according to the SQL standard.
The sections that follow the table describe the commands in detail.
Remember that even if a specific SQL command is listed in the table as “Not supported,” the platform usually has alternative coding or syntax to enact the same command or function. Therefore, be sure to read the discussion and examples for each command later in this chapter.
SQL command | SQL class | MySQL/MariaDB | Oracle | PostgreSQL | SQL Server |
---|---|---|---|---|---|
ALL/ANY/SOME |
SQL-data | S | SWV | SWV | SWV |
BETWEEN |
SQL-data | S | S | S | S |
EXCEPT |
SQL-data | NS/SWL | SWL | SWL | SWL |
EXISTS |
SQL-data | S | S | S | S |
FILTER |
SQL-data | NS | NS | S | NS |
GROUP BY |
SQL-data | SWV | SWV | SWV | SWV |
IN |
SQL-data | S | S | S | S |
INTERSECT |
SQL-data | NS/SWL | SWL | SWL | SWL |
IS |
SQL-data | S | S | S | S |
JOIN |
SQL-data | SWV | SWV | SWV | SWL |
LIKE |
SQL-data | S | S | SWV | SWV |
ORDER BY |
SQL-data | SWL | SWV | SWV | SWL |
OVER |
SQL-data | SWL | SWV | SWV | SWL |
SELECT |
SQL-data | SWV | SWV | SWV | SWV |
SUBQUERY |
SQL-data | SWL | S | S | S |
UNION |
SQL-data | S | SWL | SWL | SWL |
VALUES |
SQL-data | SWL | NS | S | SWL |
WHERE |
SQL-data | S | S | S | S |
WITH |
SQL-data | S | SWV | SWV | SWV |
WITH ORDINALITY |
SQL-data | NS | NS | S | NS |
SQL Command Reference
ALL/ANY/SOME Operators
The ALL
operator performs a Boolean test of a subquery for the existence of a value in all rows. The ANY
operator and its synonym SOME
perform a Boolean test of a subquery for the existence of a value in any of the rows tested.
You will also find the ALL
keyword used in conjunction with UNION
. This will be covered in “UNION Set Operator”.
Platform | Command |
---|---|
MySQL | Supported |
Oracle | Supported with variations |
PostgreSQL | Supported with variations |
SQL Server | Supported with variations |
SQL standard syntax
SELECT ... WHERE expression comparison {ALL | ANY | SOME} ( subquery )
Keywords
WHERE expression
- Tests a scalar expression (such as a column) against every value in the
subquery
forALL
and against every value until a match is found forANY
andSOME
. All rows must match the expression to return a BooleanTRUE
value for theALL
operator, while one or more rows must match the expression to return a BooleanTRUE
value for theANY
andSOME
operators. comparison
- Compares the
expression
to thesubquery
. Thecomparison
must be a standard comparison operator like=
,<>
,!=
,>
,>=
,<
, or<=
.
Rules at a glance
The ALL
operator returns a Boolean TRUE
value when one of two things happens: either the subquery returns an empty set (i.e., no records match), or every record in the set meets the comparison. ALL
returns FALSE
when any record in the set does not match the value comparison. The ANY
and SOME
operators return a Boolean TRUE
when at least one record in the subquery matches the comparison operation, and FALSE
when no record matches the comparison operation (or when a subquery returns an empty result set). If even one return value of the subquery is NULL, the operation evaluates as NULL, not as TRUE
.
Do not include special clauses like ORDER BY
, GROUP BY
, CUBE
, ROLLUP
, WITH
, etc. in your subquery.
For example, this query returns authors who currently have no titles:
SELECT au_id FROM authors WHERE au_id <> ALL(SELECT titleauthor.au_id FROM titleauthor);
You can use ANY
or SOME
to perform filtering checks of different kinds. For example, the following query will retrieve from the employee table any records that exist in the jobs table where the employee has the same job_lvl as the minimum required level of a job:
SELECT * FROM employee WHERE job_lvl = ANY(SELECT min_lvl FROM jobs);
Programming tips and gotchas
The ALL
and ANY
/SOME
operators are somewhat difficult to get used to. Most developers find it easier to use similar functions like IN
and EXISTS
.
EXISTS
is semantically equivalent to the ANY
/SOME
construct
MySQL
MySQL supports the SQL standard versions of ALL
and ANY
/SOME
.
Oracle
Oracle supports the SQL standard versions of ALL
and ANY
/SOME
with one minor variation, which is that you can supply a list of values instead of a subquery. For example, to find all employees who have a job_lvl value equal to 9 or 14:
SELECT * FROM employee WHERE job_lvl = ALL(9, 14);
PostgreSQL
PostgreSQL supports the SQL standard versions of ALL
and ANY
/SOME
. In addition, it supports their use with arrays. For example, to find all employees who have a job_lvl value equal to 9 or 14:
SELECT * FROM employee WHERE job_lvl = ANY(ARRAY[9, 14]);
PostgreSQL also supports the use of these terms in conjunction with LIKE
and the case-insensitive ILIKE
. A common use is to provide a shorthand for multiple LIKE
/ILIKE
clauses. For example, this:
SELECT * FROM employee WHERE name_last ILIKE ANY(ARRAY['smith', 'paris%', '%chin%']);
is equivalent to:
SELECT * FROM employee WHERE name_last ILIKE 'smith' OR name_last ILIKE 'paris%' OR name_last ILIKE '%chin%';
SQL Server
SQL Server supports the SQL standard versions of ALL
and ANY/SOME
. It also supports some additional comparison operators: not greater than (!>
) and not less than (!<
).
BETWEEN Operator
The BETWEEN
operator performs a Boolean test of a value against a range of values. It returns TRUE
when the value is included in the range and FALSE
when the value falls outside of the range. The result is NULL (unknown) if any of the range values are NULL.
Platform | Command |
---|---|
MySQL | Supported |
Oracle | Supported |
PostgreSQL | Supported |
SQL Server | Supported |
SQL standard syntax
SELECT ... WHERE expression [NOT] BETWEEN lower_range AND upper_range
Keywords
WHERE expression
- Compares a scalar expression, such as a column, to the range of values bounded by
upper_range
andlower_range
. [NOT] BETWEEN lower_range AND upper_range
- Compares the
expression
to thelower_range
andupper_range
. The comparison is inclusive, meaning that it is equivalent to saying “whereexpression
is [not] greater than or equal tolower_range
and less than or equal toupper_range
.”
Rules at a glance
The BETWEEN
operator is used to test an expression against a range of values. It may be used with any data type except BLOB
, CLOB
, NCLOB
, REF
, or ARRAY
.
For example, this query returns title_ids that have year-to-date sales of between 10,000 and 20,000:
SELECT title_id FROM titles WHERE ytd_sales BETWEEN 10000 AND 20000
BETWEEN
is inclusive of the range of values listed, so it includes the values 10,000 and 20,000 in the search. If you want an exclusive search, you must use the greater than (>) and less than (<
) symbols:
SELECT title_id FROM titles WHERE ytd_sales > 10000 AND ytd_sales < 20000
The NOT
operator allows you to search for values outside of the BETWEEN
range. For example, you can find all the title_ids that were not published during 2021:
SELECT title_id FROM titles WHERE pub_date NOT BETWEEN '01-JAN-2021' AND '31-DEC-2021'
Programming tips and gotchas
Some coders are very particular about how the keyword AND
is used in WHERE
clauses. To prevent a casual reviewer from thinking that the AND
used in a BETWEEN
operation is a logical AND
operator, you might want to use parentheses to encapsulate the entire BETWEEN
clause:
SELECT title_id FROM titles WHERE (ytd_sales BETWEEN 10000 AND 20000) AND pubdate >= '2021-06-12 00:00:00.000'
PostgreSQL also supports @> (the contains operator) and && (the overlaps operator), which serve the same purpose as BETWEEN
but for array types, range types, and multi-range types.
See also
ALL/ANY/SOME
EXISTS
SELECT
WHERE
EXCEPT Set Operator
The EXCEPT
set operator retrieves the result sets of two or more queries, including all the records retrieved by the first query that are not also found in subsequent queries. Whereas JOIN
clauses are used to return the rows of two or more queries that are in common, EXCEPT
is used to filter out the records that are present in only one of multiple similar tables.
EXCEPT
is in a class of keywords called set operators
. Other set operators include INTERSECT
and UNION
. (MINUS
is a synonym of the EXCEPT
keyword; EXCEPT
is the SQL standard.) All set operators are used to simultaneously manipulate the result sets of two or more queries, hence the term “set operators.”
Platform | Command |
---|---|
MySQL | Not supported |
MariaDB | Supported, with limitations |
Oracle | Supported, with limitations (as MINUS ) |
PostgreSQL | Supported, with limitations |
SQL Server | Supported, with limitations |
SQL standard syntax
There is technically no limit to the number of queries that you may combine with the EXCEPT
operator. The general syntax is:
{SELECT statement1 | VALUES (expr1[, ...])} EXCEPT [ALL | DISTINCT] [CORRESPONDING [BY (column1, column2, ...)]] {SELECT statement2 | VALUES (expr2[, ...])} EXCEPT [ALL | DISTINCT] [CORRESPONDING [BY (column1, column2, ...)]] ...
Keywords
VALUES (expr1[, ... ])
- Generates a derived result set with explicitly declared values as
expr1
,expr2
, etc.—it’s essentially aSELECT
statement result set without the SELECT ... FROM syntax. This is known as a row constructor, since the rows of the result set are manually constructed. According to the SQL standard, multiple handcoded rows in a row constructor must be enclosed in parentheses and separated by commas. EXCEPT
- Determines which rows will be excluded from the final result set.
ALL | DISTINCT
ALL
considers duplicate rows from all result sets in theEXCEPT
comparison.DISTINCT
drops duplicate rows from all result sets prior to theEXCEPT
comparison. Any columns containing a NULL value are considered duplicates. (If neitherALL
norDISTINCT
is used, theDISTINCT
behavior is the default.)CORRESPONDING
- Specifies that only columns with the same name in both queries are returned, even if both queries use the asterisk (*) shortcut.
BY (column1, column2, ... )
- Specifies that only the named columns are returned, even if more columns with corresponding names exist in the queries. Must be used with the
CORRESPONDING
keyword.
Rules at a glance
There is only one significant rule to remember when using EXCEPT
: the number and order of the columns should be the same in all queries, and the data types should be of the same category.
The data types do not have to be identical, but they must be compatible. For example, CHAR
and VARCHAR
are compatible data types. By default, the result set will default to the largest data type size of each column in each ordinal position. For example, a query retrieving rows from VARCHAR(10)
and VARCHAR(15)
columns will use the VARCHAR(15)
data type and size.
Programming tips and gotchas
None of the platforms supports the CORRESPONDING [BY (column1, column2, ... )]
clause.
According to the SQL standard, the UNION
and EXCEPT
set operators evaluate with equal precedence. However, the INTERSECT
set operator evaluates before the other set operators. We recommend that you explicitly control the precedence of the set operators using parentheses as a general best practice.
According to the SQL standard, only one ORDER BY
clause is allowed in the entire query. Include it at the end of the last SELECT
statement. To avoid column and table ambiguity, be sure to alias each column for each table with the same respective alias. For example:
SELECT au_lname AS lastname, au_fname AS firstname FROM authors EXCEPT SELECT emp_lname AS lastname, emp_fname AS firstname FROM employees ORDER BY lastname, firstname
While each of the column lists may list columns with correspondingly compatible data types, there may be variation in behavior across the DBMS platforms with regard to the length of the columns. For example, if the au_lname column in the previous example’s first query is markedly longer than the emp_lname column in the second query, the platforms may apply different rules as to which length is used for the final result. In general, though, the platforms will choose the longer (and less restrictive) column size for use in the result set.
Each DBMS may apply its own rules as to which column name is used if the names vary across column lists. In general, the column names of the first query are used.
On platforms that do not support EXCEPT
, you can substitute a LEFT JOIN
, NOT IN
, or NOT EXISTS.
The following queries are examples of how you can achieve EXCEPT
functionality using NOT EXISTS
and NOT IN
:
SELECT DISTINCT a.city FROM authors AS a WHERE NOT EXISTS (SELECT * FROM publishers AS p WHERE a.city = p.city) SELECT DISTINCT a.city FROM authors AS a WHERE a.city NOT IN (SELECT p.city FROM pubs.publishers AS p WHERE p.city IS NOT NULL)
In general, NOT EXISTS
is faster than NOT IN
. In addition, there is a subtle issue with NULLs that differentiates the IN
and NOT IN
operators and the EXISTS
and NOT EXISTS
set operators. To get around this different handling of NULLs, simply add the IS NOT NULL
clause to the WHERE
clause, as shown in the preceding example.
The following example illustrates the use of LEFT JOIN
:
SELECT DISTINCT a.city FROM authors AS a LEFT JOIN (SELECT city FROM publishers WHERE city IS NOT NULL) AS p ON a.city = p.city WHERE p.city IS NULL;
MySQL and MariaDB
MySQL does not support EXCEPT
. MariaDB 10.3 and later support the EXCEPT
, EXCEPT ALL
, and EXCEPT DISTINCT
set operators using the SQL standard syntax. For MySQL, you can use the NOT IN
or NOT EXISTS
operations as alternatives to EXCEPT
, as detailed in the previous section.
Oracle
Oracle versions below Oracle 21c do not support the EXCEPT
set operator. However, they have an alternative set operator, MINUS
, with identical functionality to EXCEPT
. Its syntax is as follows:
<SELECT statement1> MINUS <SELECT statement2> MINUS ...
MINUS
is the functional equivalent of MINUS DISTINCT
; the ALL
clause is not implemented.
Oracle does not support MINUS
on queries containing:
Columns whose data types are
LONG
,BLOB
,CLOB
,BFILE
, orVARRAY
A
FOR UPDATE
clauseTABLE
collection expressions
If the first query in a set operation contains any expressions in the SELECT
item list, you must include AS
clauses to associate aliases with those expressions. Also, only the last query in the set operation may contain an ORDER BY
clause.
For example, you could generate a list of all store IDs that do not have any records in the sales table as follows:
SELECT stor_id FROM stores MINUS SELECT stor_id FROM sales
The MINUS
command is functionally similar to a NOT IN
query. This query retrieves the same results:
SELECT stor_id FROM stores WHERE stor_id NOT IN (SELECT stor_id FROM sales)
PostgreSQL
PostgreSQL supports the EXCEPT
and EXCEPT ALL
set operators using the basic SQL standard syntax:
<SELECT statement1> EXCEPT [ALL] <SELECT statement2> EXCEPT [ALL] ...
EXCEPT DISTINCT
is not supported, but EXCEPT
is the functional equivalent. EXCEPT
or EXCEPT ALL
are not supported on queries with a FOR UPDATE
clause.
The first query in the set operation may not contain an ORDER BY
clause or a LIMIT
clause, although you can define a subquery for the SELECT
statement that does include these. Subsequent queries in the EXCEPT
or EXCEPT ALL
set operation may contain these clauses, but such queries must be enclosed in parentheses. Otherwise, the last occurrence of ORDER BY
or LIMIT
will be applied to the entire set operation.
PostgreSQL evaluates SELECT
statements in a multi-EXCEPT
statement from top to bottom, unless you use parentheses to change the evaluation hierarchy of the statements.
Normally, duplicate rows are eliminated from the two result sets, unless you add the ALL
keyword. For example, you could find all titles in the authors table that have no records in the sales table using this query:
SELECT title_id FROM authors EXCEPT ALL SELECT title_id FROM sales;
SQL Server
SQL Server supports EXCEPT
, but none of its subclauses. For comparison purposes, SQL Server considers NULL values equal when evaluating an EXCEPT
result set. If using the SELECT ... INTO
statement, only the first query may contain the INTO
clause. ORDER BY
is only allowed at the end of the statement and is not allowed with each individual query. Conversely, GROUP BY
and HAVING
clauses can only be used within individual queries and may not be used to affect the final result set. The FOR BROWSE
clause may not be used with statements that include EXCEPT
.
EXISTS Operator
The EXISTS
operator tests a subquery for the existence of rows. All the platforms support the SQL standard syntax.
Platform | Command |
---|---|
MySQL | Supported |
Oracle | Supported |
PostgreSQL | Supported |
SQL Server | Supported |
SQL standard syntax
SELECT ... WHERE [NOT] EXISTS (subquery)
Keywords
WHERE [NOT] EXISTS
- Tests the subquery for the existence of one or more rows. If even one row satisfies the subquery clause, it returns a Boolean
TRUE
value. The optionalNOT
keyword returns a BooleanTRUE
value when the subquery returns no matching rows. subquery
- Retrieves a result set based on a fully formed subquery.
Rules at a glance
The EXISTS
operator checks a subquery for the existence of one or more records against the records in the parent query.
For example, if we want to see whether there are any jobs where no employee is filling the position:
SELECT * FROM jobs WHERE NOT EXISTS (SELECT * FROM employee WHERE jobs.job_id = employee.job_id)
This example tests for the absence of records in the subquery using the optional NOT
keyword. The next example looks for specific records in the subquery to retrieve the main result set:
SELECT au_lname FROM authors WHERE EXISTS (SELECT * FROM publishers WHERE authors.city = publishers.city)
This query returns the last names of authors who live in the same city as their publishers. Note that the asterisk in the subquery is acceptable, since the subquery only needs to return a single record to provide a Boolean TRUE
value. Columns are irrelevant in these cases.
Programming tips and gotchas
EXISTS
, in many queries, does the same thing as ANY
(in fact, it is semantically equivalent to the ANY
operator). EXISTS
is usually most effective with correlated subqueries.
The EXISTS
subquery usually searches for only one of two things. Your first option is to use the asterisk wildcard (e.g., SELECT * FROM ...
) so that you are not retrieving any specific column or value. In this case, the asterisk means “any column.” The second option is to select only a single column in the subquery (e.g., SELECT au_id FROM ...
). The third option is to select a constant, such as (SELECT 1 FROM ...
).
EXISTS
can be rewritten using an IN
clause. Here is an example written with EXISTS
:
SELECT au_id FROM authors WHERE EXISTS(SELECT au_id FROM titleauthor AS ta WHERE ta.au_id = authors.au_id)
and its equivalent using IN
:
SELECT au_id FROM authors WHERE au_id IN(SELECT ta.au_id FROM titleauthor AS ta WHERE ta.au_id = authors.au_id)
FILTER Clause
The FILTER
clause is used in conjunction with aggregate functions, except when they are used as window aggregates. PostgreSQL supports the SQL standard syntax for this clause; the other platforms do not support it.
Platform | Command |
---|---|
MySQL | Not supported |
Oracle | Not supported |
PostgreSQL | Supported |
SQL Server | Not supported |
SQL standard syntax
The FILTER
clause is part of a SELECT
statement and qualifies an aggregate function call:
[aggregate_function(input_args) FILTER (WHERE search_condition) input_args := value[,..]
In databases where the FILTER
clause is not supported, the functionality can be simulated for aggregates that ignore NULLs with a CASE
statement, as follows:
[aggregate_function(input_args) input_args := CASE WHEN search_condition THEN value ELSE NULL END[,...]
Keywords
aggregate_function
- An aggregate function such as
AVG
,COUNT
,COUNT DISTINCT
,MAX
,MIN
, orSUM
. WHERE search_condition
- Any condition allowed in a
WHERE
clause is allowed.
Rules at a glance
The FILTER
clause is allowed only in queries that utilize aggregate functions. Here is an example that uses the FILTER
clause to count books by price in PostgreSQL:
SELECT SUM(ytd_sales) AS total_sales , SUM(ytd_sales) FILTER(WHERE price < '$20.00') AS sales_book_lt_20 FROM titles;
And here is an alternative using CASE
:
SELECT SUM(ytd_sales) AS total_sales , SUM(CASE WHEN price < '$20.00' THEN ytd_sales ELSE NULL END) AS sales_book_lt_20 FROM titles;
The results are:
total_sales sales_book_lt_20 ----------- ----------------------- 97446 83821
See also
CASE
in Chapter 7GROUP BY
SELECT
WHERE
GROUP BY Clause
The GROUP BY
clause is used to aggregate or dedupe data. It is often paired with aggregate functions and the HAVING
clause.
Platform | Command |
---|---|
MySQL | Supported, with variations |
Oracle | Supported, with variations |
PostgreSQL | Supported, with variations |
SQL Server | Supported, with variations |
SQL standard syntax
[GROUP BY group_by_expression [HAVING search_condition]] group_by_expression ::= { (grouping_column[, ...]) | ROLLUP (grouping_column[, ...]) | CUBE (grouping_column[, ...]) | GROUPING SETS (grouping_set_list) | () | grouping_set, grouping_set_list }
Keywords
GROUP BY group_by_expression
- Groups result sets into the categories defined in the
group_by_expression
. Used in queries that utilize aggregate functions such asAVG
,COUNT
,COUNT DISTINCT
,MAX
,MIN
, andSUM
. Thegroup_by_expression
of theGROUP BY
clause has an elaborate syntax of its own; refer to the following section for examples and more information onROLLUP
,CUBE
, andGROUPING SETS
. HAVING search_condition
- Adds search conditions on the results of the
GROUP BY
clause in a manner similar to theWHERE
clause.HAVING
does not affect the rows used to calculate the aggregates.HAVING
clauses may contain subqueries.
Rules at a glance
The GROUP BY
clause is needed only in queries that utilize aggregate functions. The HAVING
clause is almost always accompanied by a GROUP BY
clause, but a GROUP BY
clause is often used without a HAVING
clause.
The GROUP BY clause
The GROUP BY
clause is used to report an aggregated value for one or more rows returned by a SELECT
statement based on one or more non-aggregated columns called grouping columns. For example, here is a query that counts up how many people were hired each year during the years 2016 through 2021:
SELECT hire_year, COUNT(emp_id) AS nbr_emps FROM employee WHERE status = 'ACTIVE' AND hire_year BETWEEN 2016 AND 2021 GROUP BY hire_year;
The results are:
hire_year nbr_emps --------- -------- 2016 27 2017 17 2018 13 2019 19 2020 20 2021 32
Queries using aggregate functions provide many types of summary information. The most common aggregate functions include:
AVG
- Returns the average of all non-NULL values in the specified column(s)
AVG DISTINCT
- Returns the average of all unique non-NULL values in the specified column(s)
COUNT
- Counts the occurrences of all non-NULL values in the specified column(s)
COUNT DISTINCT
- Counts the occurrences of all unique non-NULL values in the specified column(s)
COUNT(*)
- Counts every record in the table
MAX
- Returns the highest non-NULL value in the specified column(s)
MIN
- Returns the lowest non-NULL value in the specified column(s)
SUM
- Totals all non-NULL values in the specified column(s)
SUM DISTINCT
- Totals all unique non-NULL values in the specified column(s)
Some queries that use aggregates return a single row and are used to aggregate the whole table. Single-value aggregates are known as scalar aggregates. Scalar aggregates do not need a GROUP BY
clause. For example:
-- Query SELECT AVG(price) FROM titles -- Results 14.77
Queries in which all returned columns are aggregates also do not need a GROUP BY
clause.
Queries that return both regular column values and aggregate function values are commonly called vector aggregates. Vector aggregates use the GROUP BY
clause and return one or many rows.
There are a few rules to follow when using GROUP BY
:
Place
GROUP BY
in the proper clause order—after theWHERE
clause and before theORDER BY
clause.Include all non-aggregate columns in the
GROUP BY
clause.Do not use column aliases in the
GROUP BY
clause (though table aliases are acceptable).
For example, let’s suppose we need to get the total purchase amount of several purchases from an Order_Details table that looks like this:
OrderID ProductID UnitPrice Quantity ----------- ----------- ------------------- -------- 10248 11 14.0000 12 10248 42 9.8000 10 10248 72 34.8000 5 10249 14 18.6000 9 10249 51 42.4000 40 10250 41 7.7000 10 10250 51 42.4000 35 10250 65 16.8000 15 ...
We can do this with a query like the following:
SELECT OrderID, SUM(UnitPrice * Quantity) AS Order_Amt FROM order_details WHERE orderid IN (10248, 10249, 10250) GROUP BY OrderID
The results are:
OrderID Order_Amt ----------- ---------------- 10248 440.0000 10249 1863.4000 10250 1813.0000
We could further refine the aggregations by using more than one grouping column. Consider the following query, which retrieves the average price of our products, grouped first by name and then by size:
SELECT name, size, AVG(unit_price) AS avg FROM product GROUP BY name, size
The results are:
name size avg ------------ ------ ----------------------- Flux Capacitor small 900 P32 Space Modulator small 1400 Transmogrifier medium 1400 Acme Rocket large 600 Land Speeder large 6500
In addition, the GROUP BY
clause supports a few very important subclauses:
{ROLLUP | CUBE} ([grouping_column[, ... ]])[, grouping_set_list]
- Groups the aggregate values of the result set by one or more grouping columns. (Without
ROLLUP
orCUBE
, theGROUP BY (grouping_column[, ... ])
clause is the simplest and most common form of theGROUP BY
clause.) ROLLUP
- Produces subtotals for each set of grouping columns as a hierarchical result set, adding subtotal and grand total rows into the result set in a hierarchical fashion.
ROLLUP
operations return one row per grouping column, with NULL appearing in the grouping column to show the subtotaled or totaled aggregate value. CUBE
- Produces subtotals and cross-tabulated totals for all grouping columns. In a sense, the
CUBE
clause enables you to quickly return multidimensional result sets from standard relational tables without a lot of programmatic work.CUBE
is especially useful when working with large amounts of data. LikeROLLUP
,CUBE
provides subtotals of the grouping columns, but it also includes subtotal rows for all possible combinations of the grouping columns specified in the query. GROUPING SETS [{ROLLUP | CUBE}] ([grouping_column[, ... ]])[, grouping_set_list]
- Enables aggregated groups on several different sets of grouping columns within the same query. This is especially useful when you want to return only a portion of an aggregated result set. The
GROUPING SETS
clause also lets you select which grouping columns to compare, whereasCUBE
returns all of the grouping columns andROLLUP
returns a hierarchical subset of the grouping columns. As the syntax shows, the SQL standard also allowsGROUPING SETS
to be paired withROLLUP
orCUBE
.
Table 4-2 illustrates the differences between the result sets returned by GROUP BY
on its own and with each of these subclauses.
GROUP BY syntax | Returns the following sets |
---|---|
GROUP BY (col_A, col_B, col_C) GROUP BY ROLLUP (col_A, col_B, col_C) GROUP BY CUBE (col_A, col_B, col_C) |
(col_A, col_B, col_C) (col_A, col_B, col_C) (col_A, col_B) (col_A) () (col_A, col_B, col_C) (col_A, col_B) (col_A) (col_B, col_C) (col_B) (col_C) () |
Each type of GROUP BY
clause returns a different set of aggregated values and, in the case of ROLLUP
and CUBE
, totals and subtotals.
The concepts of ROLLUP
, CUBE
, and GROUPING SETS
are much more intuitive when explained by example. In the following example, we query for data summarizing the number of sales_orders by order_year and by order_quarter:
SELECT order_year AS year, order_quarter AS quarter, COUNT (*) AS orders FROM order_details WHERE order_year IN (2020, 2021) GROUP BY ROLLUP (order_year, order_quarter) ORDER BY order_year, order_quarter;
The results are:
year quarter orders ---- ------- ------ NULL NULL 648 -- grand total 2020 NULL 380 -- total for year 2020 2020 1 87 2020 2 77 2020 3 91 2020 4 125 2021 NULL 268 -- total for year 2021 2021 1 139 2021 2 119 2021 3 10
Adding grouping columns to the query provides more details (and more subtotaling) in the result set. Now let’s modify the previous example by adding a region to the query (but since the number of rows increases, we’ll only look at the first and second quarters):
SELECT order_year AS year, order_quarter AS quarter, region, COUNT (*) AS orders FROM order_details WHERE order_year IN (2020, 2021) AND order_quarter IN (1,2) AND region IN ('USA', 'CANADA') GROUP BY ROLLUP (order_year, order_quarter,region) ORDER BY order_year, order_quarter, region;
The results are:
year quarter region orders ---- ------- ------ ------ NULL NULL NULL 183 -- grand total 2020 NULL NULL 68 -- subtotal for year 2020 2020 1 NULL 36 -- subtotal for all regions in q1 of 2020 2020 1 CANADA 3 2020 1 USA 33 2020 2 NULL 32 -- subtotal for all regions in q2 of 2021 2020 2 CANADA 3 2020 2 USA 29 2021 NULL NULL 115 -- subtotal for year 2021 2021 1 NULL 57 -- subtotal for all regions in q1 of 2021 2021 1 CANADA 11 2021 1 USA 46 2021 2 NULL 58 -- subtotal for all regions in q2 of 2021 2021 2 CANADA 4 2021 2 USA 54
The GROUP BY CUBE
clause is useful for performing multidimensional analyses on aggregated data. Like GROUP BY ROLLUP
, it returns subtotals, but unlike GROUP BY ROLLUP
, it returns subtotals combining all of the grouping columns named in the query. (As you will see, it also has the potential to increase the number of rows returned in the result set.)
In the following example, we query for data summarizing the number of sales_orders by order_year and by order_quarter:
SELECT order_year AS year, order_quarter AS quarter, COUNT (*) AS orders FROM order_details WHERE order_year IN (2020, 2021) GROUP BY CUBE (order_year, order_quarter) ORDER BY order_year, order_quarter;
The results are:
year quarter orders ---- ------- ------ NULL NULL 648 -- grand total NULL 1 226 -- subtotal for q1 of both years NULL 2 196 -- subtotal for q2 of both years NULL 3 101 -- subtotal for q3 of both years NULL 4 125 -- subtotal for q4 of both years 2020 NULL 380 -- total for year 2020 2020 1 87 2020 2 77 2020 3 91 2020 4 125 2021 NULL 268 -- total for year 2021 2021 1 139 2021 2 119 2021 3 10
The GROUP BY GROUPING SETS
clause lets you aggregate on more than one group in a single query. For each group set, the query returns subtotals with the grouping column marked as NULL. While the CUBE
and ROLLUP
clauses place predefined subtotals into the result set, the GROUPING SETS
clause allows you to control what subtotals to add to the query. The GROUPING SETS
clause will also return a grand total if you include a set with no columns, such as ()
.
Using a similar example query to the ones shown with ROLLUP
and CUBE
, this time we’ll subtotal by year and quarter and separately by year:
SELECT order_year AS year, order_quarter AS quarter, COUNT (*) AS orders FROM order_details WHERE order_year IN (2020, 2021) GROUP BY GROUPING SETS ((order_year, order_quarter), (order_year)) ORDER BY order_year, order_quarter;
The results are:
year quarter orders ---- ------- ------ 2020 NULL 380 -- total for year 2020 2020 1 87 2020 2 77 2020 3 91 2020 4 125 2021 NULL 268 -- total for year 2021 2021 1 139 2021 2 119 2021 3 10
Another way to think of GROUPING SETS
is to consider it to be like a UNION ALL
of more than one GROUP BY
query that references different parts of the same data. You can tell the database to add subtotals to a GROUPING SET
by simply adding in the ROLLUP
or CUBE
clause according to how you would like subtotaling to occur.
GROUPING SETS
can also be concatenated to concisely generate large combinations of groupings. Concatenated GROUPING SETS
yield the cross product of groupings from each of the sets within a GROUPING SETS
list. Concatenated GROUPING SETS
are compatible with CUBE
and ROLLUP
, but since they perform a cross product of all GROUPING SETS
, they will generate a very large number of final groupings from even a small number of concatenated groupings. This is demonstrated in the example in Table 4-3.
GROUP BY syntax |
Returns the following sets |
---|---|
GROUP BY (col_A, col_B, col_C) |
(col_A, col_B, col_C) |
... |
... |
GROUP BY GROUPING SETS (col_A, col_B) (col_Y, col_Z) |
(col_A, col_Y) (col_A, col_Z) (col_B, col_Y) (col_B, col_Z) |
You can imagine how large the result set would be if the concatenated GROUPING SETS
contained a large number of groupings! However, the information returned can be very valuable and hard to reproduce.
The HAVING clause
The HAVING
clause adds search conditions on the result of the GROUP BY
clause. The HAVING
clause works very much like the WHERE
clause, but it applies to the GROUP BY
clause. The HAVING
clause supports all the same search conditions as the WHERE
clause shown earlier. For example, using the same query as at the beginning of the previous section, say we now want to find only those jobs that are performed by more than three people:
-- Query SELECT j.job_desc "Job Description", COUNT(e.job_id) "Nbr in Job" FROM employee e JOIN jobs j ON e.job_id = j.job_id GROUP BY j.job_desc HAVING COUNT(e.job_id) > 3 -- Results Job Description Nbr in Job -------------------------------------------------- ----------- Acquisitions Manager 4 Managing Editor 4 Marketing Manager 4 Operations Manager 4 Productions Manager 4 Public Relations Manager 4 Publisher 7
Note that the SQL standard does not require that an explicit GROUP BY
clause appear with a HAVING
clause. For example, the following query against the employee table is valid because it has an implied GROUP BY
clause:
SELECT COUNT(dept_nbr) FROM employee HAVING COUNT(dept_nbr) > 30;
Although it’s valid, this application of the HAVING
clause is rather rare.
IN Operator
The IN
operator provides a way to delineate a list of values, either explicitly listed or from a subquery, and compare a value against that list in a WHERE
or HAVING
clause. In other words, it gives you a way to say “Is value A in this list of values?” All the platforms support the SQL standard syntax.
Platform | Command |
---|---|
MySQL | Supported |
Oracle | Supported |
PostgreSQL | Supported |
SQL Server | Supported |
SQL standard syntax
{WHERE | HAVING | {AND | OR}} value [NOT] IN ({comp_value1, comp_value2[, ...] | subquery})
Keywords
{WHERE | HAVING | {AND | OR}} value
IN
is permitted in either theWHERE
or theHAVING
clause. TheIN
comparison may also be a part of anAND
orOR
clause in a multiconditionWHERE
orHAVING
clause.value
may be of any data type, but is usually the name of a column of the table referenced by the transaction, or perhaps a host variable when used programmatically.NOT
- Optionally tells the database to look for a result set that contains values that are not in the list.
IN ({comp_value1, comp_value2[, ... ] | subquery})
- Defines the list of comparative values (hence,
comp_value
) to compare against. Eachcomp_value
must be of the same or a compatible data type as the initialvalue
. They are also governed by standard data type rules. For example, string values must be delimited by quotes, while integer values need no delimiters. As an alternative to listing specific values, you may use parentheses to enclose a subquery that returns one or more values of a compatible data type.
Rules at a glance
In the following example, generated on SQL Server, we look for all employees in the employee table of the hr database who have a home state of Georgia, Tennessee, Alabama, or Kentucky:
SELECT * FROM hr..employee WHERE home_state IN ('AL','GA','TN','KY')
Similarly, we can look for all employees in the hr database who are authors in the pubs database:
SELECT * FROM hr..employee WHERE emp_id IN (SELECT au_id FROM pubs..authors)
We can also use the NOT
keyword to return a result set based upon the absence of a value. In the following case, the company headquarters is located in New York, and many workers commute in from neighboring states. We want to see all such workers:
SELECT * FROM hr..employee WHERE home_state NOT IN ('NY','NJ','MA','CT','RI','DE','NH')
Note that Oracle, while fully supporting the SQL standard functionality, extends the functionality of the IN
operator by allowing multiple argument matches. For example, the following SELECT ... WHERE ... IN
statement is acceptable on Oracle:
SELECT * FROM hr..employee e WHERE (e.emp_id, e.emp_dept) IN ( (242, 'sales'), (442, 'mfg'), (747, 'mkt) )
See also
ALL/ANY/SOME
BETWEEN
EXISTS
LIKE
SELECT
INTERSECT Set Operator
The INTERSECT
set operator retrieves the result sets of two or more queries, including only the records retrieved by the first query that are also found in all subsequent queries (that is, it includes only those rows that appear in all the result sets). In some ways, INTERSECT
is similar to an INNER JOIN
operation (see “JOIN Subclause” for details).
INTERSECT
is in a class of keywords called set operators. Other set operators include EXCEPT
and UNION
. All set operators are used to simultaneously manipulate the result sets of two or more queries; hence the term “set operators.”
Platform | Command |
---|---|
MySQL | Not supported |
MariaDB | Supported, with limitations |
Oracle | Supported, with limitations |
PostgreSQL | Supported, with limitations |
SQL Server | Supported, with limitations |
SQL standard syntax
There is technically no limit to the number of queries that you may combine with the INTERSECT
set operator. The general syntax is:
<SELECT statement1> INTERSECT [ALL | DISTINCT] [CORRESPONDING [BY (column1, column2, ...)]] <SELECT statement2> INTERSECT [ALL | DISTINCT] [CORRESPONDING [BY (column1, column2, ...)]] ...
Keywords
INTERSECT
- Determines which rows will be included in the final single result set.
ALL | DISTINCT
ALL
includes duplicate rows from all result sets in theINTERSECT
comparison.DISTINCT
drops duplicate rows from all result sets prior to theINTERSECT
comparison. Any columns containing a NULL value are considered duplicates. (If neitherALL
norDISTINCT
is used, theDISTINCT
behavior is the default.)CORRESPONDING
- Specifies that only columns with the same name in both queries are returned, even if both queries use the asterisk (*) shortcut.
BY (column1, column2, ... )
- Specifies that only the named columns are returned, even if more columns with corresponding names exist in the queries. Must be used with the
CORRESPONDING
keyword.
Rules at a glance
There is only one significant rule to remember when using INTERSECT
: the order and number of columns must be the same in all of the queries.
Also, while the data types of the corresponding columns do not have to be identical, they must be compatible (for example, CHAR
and VARCHAR
are compatible data types). By default, the result set will default to the largest of the columns in each ordinal position.
Programming tips and gotchas
None of the platforms support the SQL standard CORRESPONDING [BY (column1, column2, ... )]
clause.
The SQL standard evaluates INTERSECT
as higher priority than other set operators, but not all platforms evaluate set operator precedence the same way. You can explicitly control the precedence of set operators using parentheses. Otherwise, the DBMS might evaluate the expressions either from leftmost to rightmost or from first to last.
According to the standard, only one ORDER BY
clause is allowed in the entire query. It should be included at the end of the last SELECT
statement. To avoid column and table ambiguity, be sure to alias each column of each table with the same respective alias. For example:
SELECT a.au_lname AS last_name, a.au_fname AS first_name FROM authors AS a INTERSECT SELECT e.emp_lname AS last_name, e.emp_fname AS last_name FROM employees AS e ORDER BY last_name, first_name
Also, be aware that while your column data types may be compatible throughout the queries in the INTERSECT
, there may be some variation in behavior across the DBMS platforms with regard to varying length of the columns. For example, if the au_lname column in the first query is markedly longer than the emp_lname column in the second query, different platforms may apply different rules as to which length is used for the final result. In general, though, the platforms will choose the longer (and less restrictive) column size for use in the result set.
Each DBMS may apply its own rules as to which column name is used if the columns across the tables have different names. In general, the column names from the first query are used.
On platforms that do not support INTERSECT
, substitute a query using INNER JOIN
.
Here is the earlier example rewritten as an INNER JOIN
:
SELECT DISTINCT a.au_lname AS last_name, a.au_fname AS first_name FROM authors AS a INNER JOIN employees AS e ON (a.au_lname = e.emp_lname AND a.au_fname = e.emp_fname) ORDER BY last_name, first_name
MySQL and MariaDB
MySQL does not support INTERSECT
. MariaDB 10.3 and later support the INTERSECT
, INTERSECT ALL
, and INTERSECT DISTINCT
set operators using the SQL standard syntax.
Oracle
Oracle supports the INTERSECT
and INTERSECT ALL
set operators using the basic SQL standard syntax. INTERSECT DISTINCT
is not supported, but INTERSECT
is the functional equivalent.
For example, you could find all store IDs that also have sales using this query:
SELECT stor_id FROM stores INTERSECT SELECT stor_id FROM sales
Oracle does not support INTERSECT
on the following types of queries:
Queries containing columns with
LONG
,BLOB
,CLOB
,BFILE
, orVARRAY
data typesQueries containing a
FOR UPDATE
clause or aTABLE
collection expression
If the first query in the set operation contains any expressions in the SELECT
item list, you should include the AS
keyword to associate an alias with the column resulting from the expression. Also, only the first query in the set operation may contain an ORDER BY
clause.
PostgreSQL
PostgreSQL supports the INTERSECT
and INTERSECT ALL
set operators using the basic SQL standard syntax, but not on queries with a FOR UPDATE
clause. INTERSECT DISTINCT
is not supported, but INTERSECT
is the functional equivalent.
For example, you can find all authors who are also employees and whose last last names start with “P” as follows:
SELECT a.au_lname FROM authors AS a WHERE a.au_lname LIKE 'P%' INTERSECT SELECT e.lname FROM employee AS e WHERE e.lname LIKE 'W%';
The first query in the set operation may not contain an ORDER BY
clause or a LIMIT
clause. Subsequent queries in the INTERSECT [ALL]
set operation may contain these clauses, but such queries must be enclosed in parentheses. Otherwise, the rightmost occurrence of ORDER BY
or LIMIT
will be assumed to apply to the entire set operation.
SQL Server
SQL Server supports INTERSECT
, but not its subclauses. The column names of the result set are those returned by the first query. Any column names or aliases referenced in an ORDER BY
clause must appear in the first query. When using INTERSECT
(or EXCEPT
) to compare more than two result sets, each pair of result sets (i.e., each pair of queries) is compared before moving to the next pair in the order of expressions in parentheses first, INTERSECT
set operators second, and EXCEPT
and UNION
last in order of appearance.
Also note that you can use NOT IN
or NOT EXISTS
operations in conjunction with a correlated subquery, as alternatives. Refer to the sections on IN
and EXISTS
for examples.
IS Operator
The IS
operator determines whether a value is NULL or not. All the platforms support the SQL standard syntax.
Platform | Command |
---|---|
MySQL | Supported |
Oracle | Supported |
PostgreSQL | Supported |
SQL Server | Supported |
SQL standard syntax
{WHERE | {AND | OR}} expression IS [NOT] NULL
Keywords
{WHERE | {AND | OR}} expression IS NULL
- Returns a Boolean value of
TRUE
if theexpression
is NULL, andFALSE
if theexpression
is not NULL. Theexpression
evaluated for NULL can be preceded by theWHERE
keyword or theAND
orOR
keywords. NOT
- Inverses the predicate: the statement will instead return a Boolean
TRUE
if the value ofexpression
is not NULL, andFALSE
if the value ofexpression
is NULL.
Rules at a glance
Because the value of NULL is unknown, you cannot use comparison expressions to determine whether a value is NULL. For example, the expressions X = NULL
and X
<> NULL
cannot be resolved because no value can equal, or not equal, an unknown.
Instead, you must use the IS NULL
operator. Be sure that you do not put the word NULL within quotation marks, because if you do that, the DBMS will interpret the value as the word “NULL” and not the special value NULL.
Programming tips and gotchas
Some platforms support the use of a comparison operator to determine whether an expression is NULL. However, all platforms covered by this book now support the SQL standard IS [NOT] NULL
syntax.
Sometimes, checking for NULL will make your WHERE
clause only slightly more complex. For example, rather than a simple predicate to test the value of stor_id, as shown here:
SELECT stor_id, ord_date FROM sales WHERE stor_id IN (6630, 7708)
you can add a second predicate to accommodate the possibility that stor_id might be NULL:
SELECT stor_id, ord_date FROM sales WHERE stor_id IN (6630, 7708) OR stor_id IS NULL
See also
SELECT
WHERE
JOIN Subclause
The JOIN
subclause enables you to retrieve rows from two or more logically related tables. You can define many different join conditions and types of joins, though the types of joins supported by the different platforms vary greatly.
Platform | Command |
---|---|
MySQL | Supported, with variations |
Oracle | Supported, with variations |
PostgreSQL | Supported, with variations |
SQL Server | Supported, with limitations |
SQL standard syntax
FROM table [AS alias] { [join_type] JOIN [LATERAL] joined_table [[AS] alias] { ON join_condition1 [{AND | OR} join_condition2] [...] | USING (column1[, ...]) }} | [ PARTITION BY (column1[, ...])] [...]
Keywords
FROM table
- Defines the first table or view in the join.
[join_type] JOIN [LATERAL] joined_table
- Specifies the type of
JOIN
and the second (and any subsequent) table(s) in the join. You may also define analias
on any of thejoined_table
s. The join types are:CROSS JOIN
Specifies the complete cross product of two tables. For each record in the first table, all the records in the second table are joined, creating a huge result set. This command has the same effect as leaving off the join condition, and its result set is also known as a Cartesian product.
Cross joins are not advisable or recommended.
[INNER] JOIN
- Specifies that unmatched rows in either table of the join should be discarded. If no join type is explicitly defined, this is the default.
LEFT [OUTER] JOIN
- Specifies that all records be returned from the table on the left side of the join statement. If a record returned from the left table has no matching record in the table on the right side of the join, it is still returned. Columns from the right table return NULL values when there is no matching row. It is a good idea to configure all your outer joins as left outer joins (rather than mixing left and right outer joins) wherever possible, for consistency.
RIGHT [OUTER] JOIN
- Specifies that all records be returned from the table on the right side of the join statement, even if the table on the left has no matching record. Columns from the left table return NULL values when there is no matching row.
FULL [OUTER] JOIN
- Specifies that all rows from both tables be returned, regardless of whether a row from one table matches a row in the other table. Any columns that have no value in the corresponding joined table are assigned a NULL value.
NATURAL
- Specifies that the join (either inner or outer) should be performed on all columns of identical name shared between the two tables. Consequently, you should not specify join conditions using the
ON
orUSING
clauses. The query will fail if you issue it on two tables that do not contain any columns with the same name(s). LATERAL
- The
LATERAL
keyword can be used with aLEFT JOIN
orCROSS JOIN
. It denotes a correlated subquery or function call where elements from previously specified tables are used in the subquery or as arguments to the function. The function used can return more than one row.
[AS] alias
- Specifies an alias or shorthand for the joined table. The
AS
keyword is optional when specifying an alias. ON join_condition
- Joins together the rows of the table shown in the
FROM
clause and the rows of the table declared in theJOIN
clause. You may have multipleJOIN
statements, all based on a common set of values. These values are usually contained in columns of the same name and data type appearing in both of the tables being joined. These columns, or possibly a single column from each table, are called the join key or common key. Most (but not all) of the time, the join key is the primary key of one table and a foreign key in the other table. As long as the values in the columns match, the join can be performed.
join_condition
s are syntactically depicted in the following form (note that join types are intentionally excluded in this example):
FROM table_name1 JOIN table_name2 ON table_name1.column1 = table_name2.column2 [{AND|OR} table_name1.column3 = table_name2.column4] [...] JOIN table_name3 ON table_name1.columnA = table_name3.columnA [{AND|OR} table_name1.column3 = table_name2.column4] [...] [JOIN...]
Use the AND
operator and the OR
operator to issue a JOIN
with multiple conditions. It is also a good idea to use brackets around each pair of joined tables if more than two tables are involved, as this makes reading the query much easier.
USING (column[, ... ])
- Assumes an equality condition on one or more named
column
s that appear in both tables. The column (or columns) must exist, as named, in both tables. Writing aUSING
clause is a little quicker than writing... ON table1.columnA = table2.columnA
, but the results are functionally equivalent. PARTITION BY (column1[, ... ])
- Useful for filling gaps in result sets. Only Oracle supports this clause. Refer to the Oracle section for an example.
Rules at a glance
Joins enable you to retrieve records from two (or more) logically related tables in a single result set. You can use a SQL standard JOIN
(detailed here) to perform this operation, or something called a theta join. Theta joins, which use a WHERE
clause to establish the filtering criteria, are the “old” way to do join operations.
For example, you might have a table called employee that tracks information about everyone employed in your company. The employee table, however, doesn’t contain extensive information about the job an employee holds; instead, it holds only job_ids. All information about the job, such as its description and title, are stored in a table called job. Using a join, you can easily return columns from both tables in a single set of records. The following sample queries illustrate the difference between a theta join and a SQL standard JOIN
:
/* Theta join */ SELECT emp_lname, emp_fname, job_title FROM employee, jobs WHERE employee.job_id = jobs.job_id; /* SQL standard join */ SELECT emp_lname, emp_fname, job_title FROM employee JOIN jobs ON employee.job_id = jobs.job_id;
Whenever you reference multiple columns in a single query, the columns must be unambiguous. In other words, the columns must either be unique to each table or be referenced with a table identifier, as is done for the job_id column in the preceding example (any columns in the query that don’t exist in both tables don’t need to be qualified by table identifiers). However, queries like this are often hard to read. The following variation of the previous SQL standard JOIN
is in better form, because it uses the short, easy-to-read aliases e and j to refer to the employee and jobs tables:
SELECT e.emp_lname, e.emp_fname, j.job_title FROM employee AS e JOIN jobs AS j ON e.job_id = j.job_id;
The previous examples were limited to equi-joins, or joins based on equality using an equals sign (=). However, most other comparison operators are also allowed: you can perform joins with >, <, >=, <=, <>, and so forth.
You cannot join on binary large object data types (e.g., BLOB
) or any other large object data types (e.g., CLOB
, NLOB
, etc.). Other data types are usually allowed in a join comparison.
Following are examples of each type of join:
CROSS JOIN
Here are some cross join examples. The first is a theta join that simply leaves off the join conditions, the second is written using the
CROSS JOIN
clause, and the final query is similar in concept to the first, with aJOIN
clause that omits the join conditions:SELECT * FROM employee, jobs; SELECT * FROM employee CROSS JOIN jobs; SELECT * FROM employee JOIN jobs;
As mentioned earlier, cross joins—joins between two or more tables that return all the data for all the rows in all possible variations (i.e., the Cartesian product of the tables—are a really bad idea. Take a careful look at these examples so you know what they look like, and then avoid them!
INNER JOIN
Following is an inner join written using the SQL standard syntax:
SELECT a.au_lname AS 'last name', a.au_fname AS 'first name', p.pub_name AS 'publisher' FROM authors AS a INNER JOIN publishers AS p ON a.city = p.city ORDER BY a.au_lname DESC
There are lots of authors in the authors table, but very few of them have cities that match their publishers’ cities in the publishers table. For example, the preceding query executed in the pubs database on SQL Server produces results like the following:
last name first name publisher --------------- -------------------- ------------------ Carson Cheryl Algodata Infosystems Bennet Abraham Algodata Infosystems
The join is called an inner join because only those records that meet the join condition in both tables are said to be “inside” the join. You could also issue the same query, on platforms that support it, by substituting the
USING
clause for theON
clause:SELECT a.au_lname AS 'last name', a.au_fname AS 'first name', p.pub_name AS 'publisher' FROM authors AS a INNER JOIN publishers AS p USING (city) ORDER BY a.au_lname DESC
- The results for this query would be the same.
LEFT [OUTER] JOIN
Following is an example of a left outer join, where we ask for the publisher for each author (we could also substitute the
USING
clause for theON
clause, as shown in the previous inner join example):SELECT a.au_lname AS "last name", a.au_fname AS "first name", p.pub_name AS "publisher" FROM authors AS a LEFT OUTER JOIN publishers AS p ON a.city = p.city ORDER BY a.au_lname DESC
In this example every author from the left (authors) table will be returned, along with the publisher’s name where there is a match, or a NULL value where there is no match. For example, in the SQL Server pubs database, the query returns:
last name first name publisher --------------- -------------------- ---------------- Yokomoto Akiko NULL White Johnson NULL Stringer Dirk NULL Straight Dean NULL ...
- As mentioned earlier, it’s best to avoid mixing left and right outer joins, for greater consistency. Left joins are the more common choice.
RIGHT [OUTER] JOIN
A right outer join is essentially the same as a left outer join, except it returns all the records from the table on the right side of the query. For example, the following query executed in the pubs database on SQL Server:
SELECT a.au_lname AS "last name", a.au_fname AS "first name", p.pub_name AS "publisher" FROM authors AS a RIGHT OUTER JOIN publishers AS p ON a.city = p.city ORDER BY a.au_lname DESC
returns the following result set:
last name first name publisher ---------------- ---------------- ------------------------ Carson Cheryl Algodata Infosystems Bennet Abraham Algodata Infosystems NULL NULL New Moon Books NULL NULL Binnet & Hardley ...
- Every publisher from the right (publishers) table is returned, along with the authors’ names where there is a match, or a NULL value where there is no match. Again, it’s best to avoid mixing right and left outer joins.
NATURAL [INNER | {LEFT | RIGHT} [OUTER]] JOIN
Natural joins are a substitute for the
ON
orUSING
clause, so do not useNATURAL
with those clauses. For example:SELECT a.au_lname AS "first name", a.au_fname AS "last name", p.pub_name AS "publisher" FROM authors AS a NATURAL RIGHT OUTER JOIN publishers AS p ORDER BY a.au_lname DESC
The preceding query will work the same as the earlier examples, but only if both tables possess a column called city and that is the only column that they hold in common. You could similarly perform any of the other types of joins (
INNER
,FULL
,OUTER
) using theNATURAL
prefix.We suggest you avoid natural joins. They save a couple of keystrokes, but at the expense of possible breakage of your code in the future. For example, suppose you have a date_add column in your authors table but not in your publishers table, but you later decide to add that column to the publishers table. Natural join queries you wrote previously will suddenly start giving very unexpected results because they will now also be joining by date_add.
FULL [OUTER] JOIN
If we take our previous example query and render it as a
FULL JOIN
, it looks like this (note that theOUTER
keyword is optional):SELECT a.au_lname AS "last name", a.au_fname AS "first name", p.pub_name AS "publisher" FROM authors AS a FULL JOIN publishers AS p ON a.city = p.city ORDER BY a.au_lname DESC;
The result set returned by the query is actually the accumulation of the result sets of issuing separate
LEFT
andRIGHT
join queries (some records have been excluded for brevity):last name first name publisher -------------------- -------------------- -------------------- Yokomoto Akiko NULL White Johnson NULL Stringer Dirk NULL ... Dull Ann NULL del Castillo Innes NULL DeFrance Michel NULL Carson Cheryl Algodata Infosystems Blotchet-Halls Reginald NULL Bennet Abraham Algodata Infosystems NULL NULL Binnet & Hardley NULL NULL Five Lakes Publishing NULL NULL New Moon Books ... NULL NULL Scootney Books NULL NULL Ramona Publishers NULL NULL GGG&G
- As you can see, with a
FULL JOIN
you get some records with all of the data (LEFT
andRIGHT JOIN
s), some with the NULLs on the right and data on the left (LEFT JOIN
), and some with NULLs on the left and data on the right (RIGHT JOIN
). LATERAL {query | function}
This example uses SQL standard syntax to return the three top-priced books by each author and will only include authors who have at least one title published. You can achieve the same results in MySQL and older versions of PostgreSQL by replacing the
FETCH FIRST 3 ROWS ONLY
withLIMIT 3
:SELECT a.au_lname AS "first name", a.au_fname AS "last name", topt.title, topt.pubdate FROM authors AS a CROSS JOIN LATERAL (SELECT t.title, t.pubdate FROM titles AS t INNER JOIN titleauthor AS ta ON t.title_id = ta.title_id WHERE ta.au_id = a.au_id ORDER BY t.pubdate DESC FETCH FIRST 3 ROWS ONLY ) AS topt ORDER BY a.au_lname ASC, topt.pubdate DESC;
If you wanted to list all authors even if they have no published titles, you would use a left join as follows:
SELECT a.au_lname AS "first name", a.au_fname AS "last name", topt.title, topt.pubdate FROM authors AS a LEFT JOIN LATERAL (SELECT t.title, t.pubdate FROM titles AS t INNER JOIN titleauthor AS ta ON t.title_id = ta.title_id WHERE ta.au_id = a.au_id ORDER BY t.pubdate DESC LIMIT 3 ) AS topt ON (1=1) ORDER BY a.au_lname ASC, topt.pubdate DESC;
The
LATERAL
keyword can be used only withCROSS JOIN
orLEFT JOIN
.
Programming tips and gotchas
As described in the preceding section, there are many types of joins, each with their own rules and behaviors. If an explicit join_type
is omitted, an INNER JOIN
is assumed.
In general, you should favor the JOIN
clause over the WHERE
clause for describing join expressions. This not only keeps your code cleaner, making it easy to differentiate join conditions from search conditions, but also avoids the possibility of buggy behavior resulting from some platform-specific implementations of outer joins specified using the WHERE
clause.
In general, we do not recommend the use of labor-saving keywords like NATURAL
, since the subclause will not automatically update itself when the structures of the underlying tables change. Consequently, statements using these constructs may fail when a table change is introduced without also changing the query.
Not all join types are supported by all platforms, so refer to the following sections for full details on platform-specific join support.
Joins involving more than two tables can be difficult. When joins involve three or more tables, it is a good idea to think of the query as a series of two table joins.
MySQL
MySQL supports most SQL standard syntax, except that natural joins are supported only on outer joins, not on inner joins. MySQL also does not support the PARTITION BY
clause. The JOIN
syntax is:
FROM table [AS alias] {[STRAIGHT_JOIN joined_table] | { {[INNER] | [CROSS] | [NATURAL] [ {LEFT | RIGHT | FULL} [OUTER] ]} JOIN [LATERAL] joined_table [AS alias] { ON join_condition1 [{AND|OR} join_condition2] [...] } | USING (column1[, ...]) }} [...]
where:
STRAIGHT_JOIN
Forces the optimizer to join tables in the exact order in which they appear in the
FROM
clause. TheSTRAIGHT_JOIN
keyword is functionally equivalent toJOIN
, except that it forces the join order from left to right. This option was supplied because MySQL might, rarely, join the tables in the wrong order.Refer to “Rules at a glance” for examples.
MySQL is very fluid in the way it supports joins. You can use several different syntaxes to perform a join; for example, you can explicitly declare a join in a query using the JOIN
clause, but then show the join condition in the WHERE
clause. The other platforms force you to pick one method or the other and do not allow you to mix them in a single query. However, we think it’s bad practice to mix methods, so our examples use SQL standard JOIN
syntax.
Oracle
Oracle fully supports the SQL standard JOIN
syntax. There’s also an older syntax for outer theta joins that involves adding “(+)” to the column names on the opposite side of the direction of the join (this comes from the fact that the table supplying the NULL value rows in effect has NULL value rows added to it). However, this syntax should be avoided because it doesn’t support some of the features of SQL standard joins, such as FULL JOIN
. Oracle also supports CROSS APPLY
and OUTER APPLY
clauses (also found in SQL Server), which are equivalent to the SQL standard’s LATERAL
clause.
For example, the following query does a RIGHT OUTER JOIN
on the authors and publishers tables:
SELECT a.au_lname AS 'first name', a.au_fname AS 'last name', p.pub_name AS 'publisher' FROM authors AS a RIGHT OUTER JOIN publishers AS p ON a.city = p.city ORDER BY a.au_lname DESC
The old Oracle syntax looks like this:
SELECT a.au_lname AS 'first name', a.au_fname AS 'last name', p.pub_name AS 'publisher' FROM authors a, publishers p WHERE a.city(+) = p.city ORDER BY a.au_lname DESC
Refer to “Rules at a glance” for more JOIN
examples.
Oracle is unique in offering partitioned outer joins, which are useful for filling gaps in result sets due to sparse data storage. For example, assume we store production records in a manufacturing table keyed on day and product ID. The table holds a row showing the quantity of each product produced during any day on which it is made, but there are no rows for the days it is not produced. This is considered sparse data, since a list of all rows will not show every day for every product. For calculation and reporting purposes, it’s very useful to be able to create result sets where each product has a row for every day, regardless of whether or not it was manufactured on that day. A partitioned outer join makes it simple to do that, since it lets you define a logical partition and apply an outer join to each partition value. The following example does a partitioned outer join with a times table to make sure each product_id has the full set of dates in a specified time range:
SELECT times.time_id AS time, product_id AS id, quantity AS qty FROM manufacturing PARTITION BY (product_id) RIGHT OUTER JOIN times ON (manufacturing.time_id = times.time_id) WHERE manufacturing.time_id BETWEEN TO_DATE('01/10/05', 'DD/MM/YY') AND TO_DATE('06/10/05', 'DD/MM/YY') ORDER BY 2, 1;
Here is the output from this query:
time id qty --------- ------ --- 01-OCT-05 101 10 02-OCT-05 101 03-OCT-05 101 04-OCT-05 101 17 05-OCT-05 101 23 06-OCT-05 101 01-OCT-05 102 02-OCT-05 102 03-OCT-05 102 43 04-OCT-05 102 99 05-OCT-05 102 06-OCT-05 102 87
Getting these results without using a partitioned outer join would require much more complex and less efficient SQL.
PostgreSQL
PostgreSQL fully supports the SQL standard syntax, except for the PARTITION BY
clause. Refer to “Rules at a glance” for examples. When using functions in a LATERAL
construct, the LATERAL
keyword is optional.
For example, if you wanted to create a set of dates from publication date to present date for each title, you could write your LATERAL
query as follows:
SELECT title_id, i AS cal_date FROM titles CROSS JOIN generate_series(titles.pubdate, CURRENT_DATE, interval '1 day') AS i ORDER BY title_id, cal_date
Or with the LATERAL
keyword:
SELECT title_id, i AS cal_date FROM titles CROSS JOIN LATERAL generate_series(titles.pubdate, CURRENT_DATE, interval '1 day') AS i ORDER BY title_id, cal_date
SQL Server
SQL Server supports INNER
, OUTER
, and CROSS
joins using the ON
clause. It does not support NATURAL
join syntax, nor PARTITION BY
or the USING
clause. SQL Server also does not support the LATERAL
clause, though it does support CROSS APPLY
, which is equivalent to CROSS JOIN LATERAL
, and OUTER APPLY
, which is equivalent to LEFT JOIN LATERAL
. SQL Server’s JOIN
syntax is:
FROM table [AS alias] { {[INNER] | [CROSS] | [ {LEFT | RIGHT | FULL} [OUTER] ]} [JOIN | APPLY] joined_table [AS alias] { ON join_condition1 [{AND|OR} join_condition2] [...] } } [...]
Refer to “Rules at a glance” for examples.
The equivalent SQL Server query for the LATERAL
example is:
SELECT a.au_lname AS "first name", a.au_fname AS "last name", topt.title, topt.pubdate FROM authors AS a CROSS APPLY (SELECT TOP 3 t.title, t.pubdate FROM titles AS t INNER JOIN titleauthor AS ta ON t.title_id = ta.title_id WHERE ta.au_id = a.au_id ORDER BY t.pubdate DESC ) AS topt ORDER BY a.au_lname ASC, topt.pubdate DESC;
If you wanted to make sure all authors are listed even if they have no published titles, you would use the OUTER APPLY
clause, as follows:
SELECT a.au_lname AS "first name", a.au_fname AS "last name", topt.title, topt.pubdate FROM authors AS a OUTER APPLY (SELECT TOP 3 t.title, t.pubdate FROM titles AS t INNER JOIN titleauthor AS ta ON t.title_id = ta.title_id WHERE ta.au_id = a.au_id ORDER BY t.pubdate DESC ) AS topt ORDER BY a.au_lname ASC, topt.pubdate DESC;
LIKE Operator
The LIKE
operator enables specified string patterns in SELECT
, INSERT
, UPDATE
, and DELETE
statements to be matched, specifically in the WHERE
clause. A specified pattern may include special wildcard characters. The specific wildcards supported vary from platform to platform.
Platform | Command |
---|---|
MySQL | Supported |
Oracle | Supported |
PostgreSQL | Supported, with variations |
SQL Server | Supported, with variations |
SQL standard syntax
WHERE expression [NOT] LIKE string_pattern [ESCAPE escape_sequence]
Keywords
WHERE expression [NOT] LIKE
- Returns a Boolean
TRUE
when the value ofexpression
matches thestring_pattern
. The expression may be a column, a constant, a host variable, a scalar function, or a concatenation of any of these. It should not be a user-defined type, nor should it be certainLOB
types. NOT
- Inverses the predicate, so the statement returns a Boolean
TRUE
if the value ofexpression
does not contain thestring_pattern
and returnsFALSE
if the value ofexpression
contains thestring_pattern
. ESCAPE escape_sequence
- Allows you to search for the presence of characters that would normally be interpreted as wildcards.
Rules at a glance
Matching string patterns is easy with LIKE
, but there are a couple of rules to remember:
All characters, including trailing and leading spaces, are important.
Differing data types may be compared using
LIKE
, but they store string patterns differently. In particular, be aware of the differences between theCHAR
,VARCHAR
, andDATE
data types.Using
LIKE
may negate indexes or force the DBMS to use alternative, less optimal indexes than a straight comparison operation.
The SQL standard standard currently supports two wildcard operators that are supported by all of the platforms covered in this book:
The first query in the following example retrieves any city record with “ville” in its name. The second query returns authors with a first name not like Sheryl or Cheryl (or Aheryl, Bheryl, Dheryl, 2heryl, and so forth):
SELECT * FROM authors WHERE city LIKE '%ville%'; SELECT * FROM authors WHERE au_fname NOT LIKE '_heryl';
Some of the platforms support additional wildcard symbols. These are described in the platform-specific sections that follow.
Use of the ESCAPE
clause allows you to look for wildcard characters in the strings stored in your database. Using this mechanism, you designate a character—typically a character that does not otherwise appear in the pattern string—as your escape character. For example, you might designate the tilde (~) because you know it never appears in the pattern string. Any wildcard character preceded by the escape sequence is then treated not as a wildcard, but rather as the character itself. For example, we can look through the comments column of the sales_detail table (on SQL Server) to see whether any customers have mentioned a newly introduced discount using this query:
SELECT ord_id, comment FROM sales_detail WHERE comment LIKE '%~%%' ESCAPE '~'
In this case, the first and last %s are interpreted as wildcards, but the second % character is interpreted as just that (a % character), because it is preceded by the designated escape sequence.
Programming tips and gotchas
The usefulness of LIKE
is based on the wildcard operators that it supports. LIKE
returns a Boolean TRUE
value when the comparison finds one or more matching values.
The default case sensitivity of the DBMS is very important to the behavior of LIKE
. For example, SQL Server is not case-sensitive by default (though it can be configured that way), so it will evaluate the strings 'DAD'
and 'dad'
to be equal. MySQL is also case-insensitive by default but has a LIKE BINARY
operator to force case sensitivity. Oracle and PostgreSQL, on the other hand, are case-sensitive. Thus, on those platforms a comparison of 'DAD'
and 'dad'
would show them to be unequal. PostgreSQL (but not Oracle) has an ILIKE
operator for case-insensitive matching. Here’s an example query to better illustrate this point:
SELECT * FROM authors WHERE lname LIKE 'LARS%'
This query on MySQL and SQL Server would find authors whose last names are stored as 'Larson'
or 'Lars'
, even though the search was for the uppercase 'LARS%'
. Oracle and PostgreSQL, however, would not find 'Larson'
or 'Lars'
, because they perform case-sensitive comparisons by default.
MySQL
MySQL supports the SQL standard syntax for LIKE
. Additionally, it supports the special functions REGEXP
, RLIKE
, NOT REGEXP
, and NOT RLIKE
for the evaluation of regular expressions.
Oracle
Oracle supports the SQL standard syntax for LIKE
. Its LIKE
syntax is as follows:
WHERE expression [NOT] {LIKE | LIKEC | LIKE2 | LIKE4} string_pattern [ESCAPE escape_sequence]
The Oracle-specific syntax elements have the following meanings:
LIKEC
- Uses Unicode complete characters
LIKE2
- Uses Unicode USC2 code points
LIKE4
- Uses Unicode UCS4 code points
Since Oracle is case-sensitive, you should enclose the expression
, the string_pattern
, or both with the UPPER
function. That way, you are always comparing apples to apples.
PostgreSQL
PostgreSQL supports the SQL standard syntax for LIKE
. It’s case-sensitive by default but provides the keyword ILIKE
for case-insensitive pattern matching. You can also use the operators ~~
as an equivalent to LIKE
, ~~*
for ILIKE
, and !~~
and !~~*
for NOT LIKE
and NOT ILIKE
, respectively. These are all extensions to the SQL standard syntax.
For example, the following queries are functionally the same:
SELECT * FROM authors WHERE city LIKE '%ville'; SELECT * FROM authors WHERE city ~~ '%ville';
Since these queries are in lowercase, you might run into a case-sensitivity problem. That is, the queries are looking for a lowercase '%ville'
, but the table might contain uppercase (and unequal) values such as 'BROWNSVILLE'
, 'NASHVILLE'
, and 'HUNTSVILLE'
. You can get around this as follows:
-- Convert the values to uppercase SELECT * FROM authors WHERE city LIKE UPPER('%ville'); -- Perform the pattern match using case insensitivity SELECT * FROM authors WHERE city ~~* '%ville'; SELECT * FROM authors WHERE city ILIKE '%ville';
Although beyond the scope of this text, you should be aware that PostgreSQL also supports POSIX regular expressions. See the platform documentation for details.
SQL Server
SQL Server supports the SQL standard syntax for LIKE
, and the following additional wildcard operators:
[ ]
- Matches any value in the specified set, as in [abc], or range, as in [k–n]
[^ ]
- Matches any characters not in the specified set or range
Using SQL Server’s additional wildcard operators, you have some added capabilities. For example, you can retrieve any author with a last name like Carson, Carsen, Karson, or Karsen:
SELECT * FROM authors WHERE au_lname LIKE '[CK]ars[eo]n'
or you can retrieve any author with a last name that ends in “arson” or “arsen,” but is not Larsen or Larson:
SELECT * FROM authors WHERE au_lname LIKE '[A-Z^L]ars[eo]n'
ORDER BY Clause
The ORDER BY
clause specifies the sort order of the result set retrieved by a SELECT
statement.
Platform | Command |
---|---|
MySQL | Supported, with limitations |
Oracle | Supported, with variations |
PostgreSQL | Supported, with variations |
SQL Server | Supported, with limitations |
SQL standard syntax
ORDER BY { sort_expression [COLLATE collation_name] [ASC | DESC] [NULLS {FIRST | LAST}] }[, ...] [ OFFSET int {ROW | ROWS} ] [ FETCH {FIRST | NEXT} numeric {ROW | ROWS | PERCENT} {ONLY | WITH TIES} ]
Keywords
ORDER BY
- Specifies the order in which rows should be returned by a query. You should not anticipate a specific ordering if you exclude the
ORDER BY
clause, even if you specify aGROUP BY
clause and it appears that a sort has been done. sort_expression
- Specifies an item in the query that will help determine the order of the result set. You can have multiple sort expressions. They are usually column names or column aliases from the query; however, they may also be expressions like
(salary * 1.02)
. COLLATE collation_name
- Overrides the default collation of the
sort_expression
and applies thecollation_name
to the expression for the purposes of theORDER BY
clause. ASC | DESC
- Specifies that the result set should be returned in either ascending order (
ASC
) or descending order (DESC
). NULLS {FIRST | LAST}
NULLS FIRST
andNULLS LAST
specify that the records containing NULLs should appear either first or last, respectively. By default, Oracle and PostgreSQL place NULLs last for ascending-order sorts and first for descending-order sorts.OFFSET int {ROW | ROWS}
- Specifies the number of rows to skip from the start of the
ORDER BY
set. FETCH {FIRST | NEXT} numeric {ROW | ROWS | PERCENT}
- Returns the first or next (from
OFFSET
)numeric
rows orPERCENT
of records. IfROW
orROWS
is used, the value must be an integer. When usingFIRST
, there should be noOFFSET
clause.PERCENT
is any number from 0 to 100 and can include fractional values. ONLY | WITH TIES
ONLY
returns at most the specified number or percentage of rows.WITH TIES
returns additional rows if, based on theORDER BY
clause, records within that count are tied. In other words, rather than returning one or more of the tied records at random, up to thenumeric
specified, all of the tied records that are within the count are returned even if the total count exceeds thenumeric
specified.
Rules at a glance
The ORDER BY
clause should reference columns as they appear in the item list of the SELECT
statement, preferably using their aliases (if aliases exist). For example:
SELECT au_fname AS first_name, au_lname AS last_name FROM authors ORDER BY first_name, last_name
The ORDER BY
clause uses a major-to-minor sort ordering. This means that the result set is ordered by the first column referenced; equal values in the first column are then ordered by the second column, equal values in the second column are ordered by the third column, and so forth.
The individual aspects of a column’s ordering—COLLATE
and ASC
/DESC
—are independent of the other columns in the ORDER BY
clause. Thus, you could order a result set in ascending order by one column, and then flip the next column and order it in descending order:
SELECT au_fname AS first_name, au_lname AS last_name FROM authors ORDER BY au_lname ASC, au_fname DESC
NULLs are always grouped together (i.e., considered equal) for the purposes of sorting. Depending on your platform, NULLs will be clumped together at the top or at the bottom of the result set. The following query on SQL Server:
SELECT title, price FROM titles ORDER BY price, title
provides this result set:
title price ---------------------------------------------------- ------- Net Etiquette NULL The Psychology of Computer Cooking NULL The Gourmet Microwave 2.9900 You Can Combat Computer Stress! 2.9900 Life Without Fear 7.0000 Onions, Leeks, and Garlic: Cooking Secrets of the Me 20.9500 Computer Phobic AND Non-Phobic Individuals: Behavior 21.5900 But Is It User Friendly? 22.9500 ...
You can force NULLs to appear at the top or bottom of the result set using ASC
or DESC
. Of course, all the non-NULL rows of the result set are also ordered in ascending or descending order.
Some platforms support specification of NULL sorting. In Oracle and PostgreSQL by default NULLs are sorted to the end, but you can change the behavior as follows to yield the previous result by doing:
SELECT title, price FROM titles ORDER BY price NULLS FIRST, title NULLS LAST
The SQL standard also supports for the sort_expression
the use of columns that are not referenced in the SELECT
item list. For example, the following query is valid:
SELECT title, price FROM titles ORDER BY title_id
Looking at this example, you can see that although the query does not select title_id, that column is the primary sort_expression
. The result set is returned in title_id order even though that column is not selected.
You can limit the number of records returned using the [OFFSET] FETCH {FIRST | NEXT}
subclauses. For example, the following query will skip the first 10 titles and return the next 10:
SELECT title, price FROM titles ORDER BY title OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY
Programming tips and gotchas
When using set operators (EXCEPT
, INTERSECT
, UNION
), only the last query may have an ORDER BY
clause.
A number of behaviors that were supported in SQL92 are deprecated in more recent SQL standards. You should avoid these usages although all the databases covered here still support them:
- References to table aliases
- For example,
ORDER BY e.emp_id
should be changed toORDER BY emp_id
. If there is an ambiguous column name, use an alias to compensate. - References to ordinal position
- Use explicitly defined column aliases to compensate.
You may sort not only on columns, but also on expressions involving columns, or even literals:
SELECT SUBSTRING(title,1,55) AS title, (price * 1.15) as price FROM titles WHERE price BETWEEN 2 and 19 ORDER BY price, title
When sorting on expressions from the SELECT
item list, you should use aliases to make the ORDER BY
sort_expression
column references easier to read.
MySQL
MySQL supports the SQL standard, except for the COLLATE
option, FETCH PERCENT
, and NULLS {FIRST | LAST}
. In addition to the SQL standard OFFSET FETCH
, it also supports a LIMIT OFFSET
subclause that is equivalent in purpose and predates the SQL standard’s subclause. MySQL’s ORDER BY
syntax is:
ORDER BY {sort_expression [ASC | DESC]}[, ...] [ OFFSET int {ROW | ROWS} ] [ FETCH {FIRST | NEXT} numeric {ROW | ROWS} {ONLY | WITH TIES} ] [ LIMIT int OFFSET int ]
where:
LIMIT int
- Specifies the maximum number of records to return
OFFSET int
- Specifies how many records to skip
You cannot use the LIMIT OFFSET
construct in conjunction with the OFFSET FETCH
construct.
You should not attempt to ORDER BY
columns of the BLOB
data type, because only the first bytes, defined by the MAX_SORT_LENGTH
setting, will be used in the sort. By default, MySQL sorts NULL values lowest (first) for ASC
order and highest (last) for DESC
order.
Oracle
Oracle supports the SQL standard, except for the COLLATE
option. It also supports a SIBLINGS
option. Oracle’s ORDER BY
syntax is:
ORDER [SIBLINGS] BY {sort_expression [ASC | DESC] [NULLS {FIRST | LAST}]}[, ...] [ OFFSET int {ROW | ROWS} ] [ FETCH {FIRST | NEXT} numeric {ROW | ROWS | PERCENT} {ONLY | WITH TIES} ]
where:
ORDER [SIBLINGS] BY sort_expression
- Sorts the result set of the query in order of the
sort_expression
(s). Asort_expression
may be a column name, an alias, an integer indicating a column’s ordinal position, or another expression (e.g., salary * 1.02). TheORDER SIBLINGS BY
clause tells Oracle to preserve any ordering specified by a hierarchical query clause (CONNECT BY
), and to use the sort expression order for ordering of siblings in the hierarchy.
You can emulate the behavior of the COLLATE
option for a single session by using the NLSSORT
function with the NLS_SORT
parameter. You can also emulate the behavior of the COLLATE
option for all sessions on the server either explicitly, by using the NLS_SORT
initialization parameter, or implicitly, with the NLS_LANGUAGE
initialization parameter.
You should not perform an ORDER BY
on any LOB
column, nested table, or VARRAY
.
PostgreSQL
PostgreSQL supports the SQL standard, with the exception of the COLLATE
and FETCH PERCENT
options. It also supports a LIMIT OFFSET
subclause similar to MySQL’s, and a USING
subclause. Support for OFFSET FETCH
was introduced in PostgreSQL 13. PostgreSQL’s ORDER BY
syntax is:
ORDER BY {sort_expression [ASC | DESC] [USING operator] [NULLS {FIRST | LAST}]}[, ...] [ OFFSET int {ROW | ROWS} ] [ FETCH {FIRST | NEXT} int {ROW | ROWS} {ONLY | WITH TIES} ] [ LIMIT int OFFSET int ]
where:
USING operator
- Specifies a specific comparison operator. Thus, you may sort by
>
,<
,=
,>=
,<=
, and so forth. Ascending order is the same as specifyingUSING <
, while descending order is the same asUSING >
. LIMIT int
- Specifies the maximum number of records to return.
OFFSET int
- Specifies how many records to skip.
ASC
and DESC
are SQL standards. If not specified, ASC
is the default. PostgreSQL sorts NULL values as higher than any other value by default, causing NULL values to appear at the end of ASC
sorts and at the beginning of DESC
sorts. You can use the NULLS {FIRST | LAST}
clause to change this behavior. For example:
SELECT stor_id, ord_date, qty AS quantity FROM sales ORDER BY stor_id, ord_date DESC, qty ASC NULLS FIRST;
You cannot use both OFFSET FETCH
and LIMIT OFFSET
, as they achieve equivalent goals. Although LIMIT OFFSET
is faster to type, the OFFSET FETCH
construct is SQL standard–compliant and allows for specifying how ties are treated in the result set. For example:
SELECT stor_id, ord_date, qty FROM sales ORDER BY qty FETCH FIRST 3 ROWS WITH TIES; stor_id | ord_date | qty --------+------------------------+----- 6380 | 1994-09-13 00:00:00-04 | 3 6380 | 1994-09-14 00:00:00-04 | 5 7896 | 1993-12-12 00:00:00-05 | 10 7067 | 1994-09-14 00:00:00-04 | 10 8042 | 1994-09-14 00:00:00-04 | 10 (5 rows)
Note that the output has five rows instead of three because the last three are tied for third place.
In SELECT
statements that involve single tables or use JOIN
s, but not set operations (UNION
, for example), you may also order by columns of the table that do not appear in the select_item
list. For example:
SELECT stor_name FROM sales ORDER BY stor_id, qty;
SQL Server
SQL Server supports the SQL standard, except for the FETCH
PERCENT
and NULLS {FIRST | LAST}
options. PERCENT
can be used with TOP
to simulate FETCH
PERCENT
. For example, the following query retrieves the authors’ first names from the authors table in the SQL_Latin1
collation:
SELECT au_fname FROM authors ORDER BY au_fname COLLATE SQL_Latin1_general_cp1_ci_as
By default, SQL Server sorts NULL values higher than all other values.
SQL Server allows a variety of collations that can affect how the result set is evaluated. Thus, under certain collations “SMITH” and “smith” might evaluate and sort differently. You should not use TEXT
, IMAGE
, or NTEXT
columns as sort_expression
s on SQL Server.
OVER Clause
The OVER
clause appears in the SELECT
clause as a function qualifier for window function–based columns and when aggregates are used in a window construct. The WINDOW
clause often accompanies one or more OVER
clauses and appears after the whole SELECT
statement. It is used to name a window specification that is then used by name in the OVER
clause(s).
Platform | Command |
---|---|
MySQL | Supported, with limitations |
Oracle | Supported, with variations |
PostgreSQL | Supported, with variations |
SQL Server | Supported, with limitations |
SQL standard syntax
FUNCTION_NAME
(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning]
[ordering] [framing]
partitioning ::= PARTITION BY value[, value...]
[COLLATE collation_name]
ordering ::= ORDER [SIBLINGS] BY rule[, rule...]
rule ::= {value | position | alias} [ASC | DESC]
[NULLS {FIRST | LAST}]
framing ::= {ROWS | RANGE | GROUPS} {start | between}
[exclusion]
start ::= {UNBOUNDED PRECEDING | unsigned-integer
PRECEDING | CURRENT ROW}
between ::= BETWEEN bound AND bound
bound ::= {start | UNBOUNDED FOLLOWING | unsigned-integer
FOLLOWING}
exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP |
EXCLUDE TIES | EXCLUDE NO OTHERS}
In addition to the OVER
clause, you can have a WINDOW
clause before the ORDER BY
clause and after the WHERE
clause in a SELECT
statement. It is followed by one or more definitions of named windows. The syntax is as follows:
WINDOW {window_name AS (window_specification)[, ...]}
Keywords
OVER {window_name | (window_specification)}
- The
OVER
clause may reference a predefined window (defined in aWINDOW
clause) that can be used across multiple columns or can provide a window specification consisting ofORDER BY
andPARTITION BY
clauses. A window of data consists of a subset of rows relative to the current row. partitioning
- Defines the group of rows that a row is a member of. The
value
s are column names or expressions that include column names. All records with these expressions in common belong to the same partition. If there is no partitioning clause, then all rows belong to the same partition. ordering
- Specifies the order in which rows should be sorted in the window.
framing
Denotes which rows are considered to be in the window (a subset of rows relative to the current row). The syntax of the
framing
subclause is as follows:{ROWS | RANGE} {start | between} [exclusion] start ::= {UNBOUNDED PRECEDING | unsigned-integer PRECEDING | CURRENT ROW} between ::= BETWEEN bound AND bound bound ::= {start | UNBOUNDED FOLLOWING | unsigned-integer FOLLOWING} exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES | EXCLUDE NO OTHERS}
ASC | DESC
- Specifies that the result set should be returned in either ascending order (
ASC
) or descending order (DESC
) based on thevalue
|position
. NULLS {FIRST | LAST}
- Specifies that the records containing NULLs should appear either first or last in the result set, respectively.
WINDOW {window_name AS (window_specification)[, ... ]}
- Defines one or more named windows, providing a specification for each.
Rules at a glance
Each SELECT
statement may have zero or more OVER
clauses. Here are a few examples. The following query ranks each title across all titles (dr) and then within the group that has the same publisher:
SELECT t.title_id, DENSE_RANK() OVER(ORDER BY price) AS dr, DENSE_RANK() OVER(PARTITION BY pub_id ORDER BY price) AS dr_pub FROM titles AS t ORDER BY title_id;
The next query uses the SUM
aggregate function to produce a running total and an overall total for price. Note that the ORDER BY
clause used in an OVER
clause does not need to be the same as the SELECT ... ORDER BY
, but it does make it easier to debug:
SELECT t.title_id, SUM(price) OVER() AS overall_total, SUM(price) OVER(ORDER BY price) AS running_total FROM titles AS t ORDER BY title_id;
Finally, here’s an example of a query that uses named windows:
SELECT t.title_id, DENSE_RANK() OVER wprice AS dr, SUM(price) OVER(wpub) AS wpub_total FROM titles AS t WINDOW wprice AS (ORDER BY price), wpub AS (PARTITION BY pub_id) ORDER BY title_id;
MySQL
MySQL supports a subset of the SQL standard syntax for the OVER
and window_specification
clauses, as shown here:
FUNCTION_NAME
(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning]
[ordering] [framing]
partitioning ::= PARTITION BY value[, value...]
ordering ::= ORDER BY rule[, rule...]
rule ::= {value | position | alias} [ASC | DESC] ]
framing ::= {ROWS | RANGE} {start | between} [exclusion]
start ::= {UNBOUNDED PRECEDING | unsigned-integer
PRECEDING | CURRENT ROW}
between ::= BETWEEN bound AND bound
bound ::= {start | UNBOUNDED FOLLOWING | unsigned-integer
FOLLOWING}
It also supports the WINDOW
clause, and has the same options for the window_specification
clause.
Oracle
Oracle fully supports the SQL standard OVER
and window_specification
clauses, except for the COLLATE
clause. Its syntax is as follows:
FUNCTION_NAME
(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning]
[ordering] [framing]
partitioning ::= PARTITION BY value[, value...]
ordering ::= ORDER [SIBLINGS] BY rule[, rule...]
rule ::= {value | position | alias} [ASC | DESC]
[NULLS {FIRST | LAST}]
framing ::= {ROWS | RANGE | GROUPS} {start | between}
[exclusion]
start ::= {UNBOUNDED PRECEDING | unsigned-integer
PRECEDING | CURRENT ROW}
between ::= BETWEEN bound AND bound
bound ::= {start | UNBOUNDED FOLLOWING |
unsigned-integer FOLLOWING}
exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP |
EXCLUDE TIES | EXCLUDE NO OTHERS}
The WINDOW
clause was introduced in Oracle 21c. It allows anything that is also allowed in the window_specification
of the OVER
clause.
PostgreSQL
PostgreSQL supports all the SQL standard syntax for the OVER
and window_specification
clauses, except for the SIBLINGS
keyword and COLLATE
clause. It allows all aggregate functions, including user-defined ones, to be used as window aggregates. Aggregates can be created in nearly any PostgreSQL-supported language, but the PL/pgSQL and SQL built-in languages do not support the creation of window functions (window functions can be created in C, PL/V8, and PL/R).
The PostgreSQL OVER
syntax is as follows:
FUNCTION_NAME
(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning]
[ordering] [framing]
partitioning ::= PARTITION BY value[, value...]
ordering ::= ORDER BY rule[, rule...]
rule ::= {value | position | alias} [ASC | DESC]
[NULLS {FIRST | LAST}]
framing ::= {ROWS | RANGE | GROUPS} {start | between}
[exclusion]
start ::= {UNBOUNDED PRECEDING | unsigned-integer
PRECEDING | CURRENT ROW}
between ::= BETWEEN bound AND bound
bound ::= {start | UNBOUNDED FOLLOWING |
unsigned-integer FOLLOWING}
exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP |
EXCLUDE TIES | EXCLUDE NO OTHERS}
It also supports the WINDOW
clause, with the same options for the window_specification
clause.
SQL Server
SQL Server supports a subset of the SQL standard syntax for the OVER
clause. It does not support named windows or the WINDOW
clause. SQL Server’s OVER
syntax is as follows:
FUNCTION_NAME
(expr) OVER {(window_specification)}
window_specification ::= [partitioning] [ordering] [framing]
partitioning ::= PARTITION BY value[, value...]
[COLLATE collation_name]
ordering ::= ORDER BY rule[, rule...]
rule ::= {value | position | alias} [ASC | DESC]
framing ::= {ROWS | RANGE} {start | between}
start ::= {UNBOUNDED PRECEDING | unsigned-integer
PRECEDING | CURRENT ROW}
between ::= BETWEEN bound AND bound
bound ::= {start | UNBOUNDED FOLLOWING |
unsigned-integer FOLLOWING}
See also
SELECT Statement
The SELECT
statement retrieves rows, columns, and derived values from one or many tables of a database.
Platform | Command |
---|---|
MySQL | Supported, with variations |
Oracle | Supported, with variations |
PostgreSQL | Supported, with variations |
SQL Server | Supported, with variations |
SQL standard syntax
The full syntax of the SELECT
statement is powerful and complex, but it can be broken down into these main clauses:
SELECT [{ALL | DISTINCT}] select_item [AS alias][, ...] FROM [ONLY | OUTER] {table_name [[AS] alias] | view_name [[AS] alias]}[, ...] [ [join_type] JOIN join_condition ] [WHERE search_condition] [ {AND | OR | NOT} search_condition[...] ] group_by_clause order_by_clause
Keywords
[{ALL | DISTINCT}] select_item
Retrieves values that compose the query result set. Each
select_item
may be a literal, an aggregate or scalar function, a mathematical calculation, a parameter or variable, or a subquery, but aselect_item
is most commonly a column from a table or view. A comma must separate each item in a list of such items.The schema or owner name should be prefixed to a column’s name when it’s extracted from a context outside of the current user’s. If another user owns the table, that user must be included in the column reference. For example, if the user jake needed to access data in the schema katie he would use:
SELECT emp_id FROM katie.employee;
You can use the asterisk (* ) shorthand to retrieve all columns in every table or view listed in the
FROM
clause. It’s a good idea to use this shortcut on single-table queries only.ALL
, the default behavior, returns all records that meet the selection criteria.DISTINCT
tells the database to filter out any duplicate records, thus retrieving only one instance of many identical records.AS alias
- Replaces a column heading (when in the
select_item
clause) or a table name or view name (when in theFROM
clause) with a shorter heading or name. This clause is especially useful for replacing cryptic or lengthy names with short, easy-to-understand names or mnemonics, and for when the column contains only derived data, so you don’t end up with a column called something like ORA000189x7/0.02. It is also very useful in self-joins and correlated subqueries where a single query references the same table more than once. When multiple items appear in theselect_item
clause orFROM
clause, make sure to place the commas after theAS
alias
clauses. Also, be careful to always use an alias uniformly once you introduce it into the query. FROM [ONLY | OUTER] {table_name | view_name}[, ... ]
- Lists all of the tables and/or views from which the query retrieves data. Separate table and view names using commas. The
FROM
clause also allows you to assign aliases to long table/or view names or subqueries using theAS
clause. Using shorter aliases instead of longer table or view names simplifies coding. (Of course, this might thwart the DBA’s carefully planned naming conventions, but the alias only lasts for the duration of the query. Refer to “Rules at a glance” for more information on aliases.) AFROM
clause may contain a subquery (refer to “SUBQUERY Substatement” for details). ONLY
- Specifies that only the rows of the named table or view (and no rows in subtables or subviews) will be retrieved in the result set. When using
ONLY
, be sure to enclose thetable_name
orview_name
within parentheses.ONLY
is ignored if the table or view has no subtables or subviews. OUTER
- Specifies that the rows of the named table or view, along with the rows and columns of any and all subtables or subviews, will be retrieved in the result set. Columns of the subtables (or subviews) will be appended to the right, in subtable hierarchy order according to the depth of the subtable. In extensive hierarchies, subtables with common parents are appended in the creation order of their types. When using
OUTER
, be sure to enclose thetable_name
orview_name
within parentheses.OUTER
is ignored if the table or view has no subtables or subviews. [join_type] JOIN join_condition
Joins together the result set of the table shown in the
FROM
clause to another table that shares a meaningful relationship based on a common set of values. These values are usually contained in columns of the same name and data type that appear in both tables being joined. These columns, or possibly a single column from each table, are called the join key or common key. Most—but not all—of the time, the join key is the primary key of one table and a foreign key in the other table. As long as the data in the columns matches, the join can be performed. (Note that joins can also be performed using theWHERE
clause. This technique is sometimes called a theta join.)Refer to “JOIN Subclause” for details of different kinds of joins.
Join conditions are most commonly depicted in the form:
JOIN table_name2 ON table_name1.column1 comparison_operator table_name2.column1 JOIN table_name3 ON table_name1.columnA comparison_operator table_name3.columnA [...]
When the
comparison_operator
is the equals sign (=), a join is said to be an equi-join. However, the comparison operator may be <, >, <=, >=, or even <>.Use the
AND
operator to issue aJOIN
with multiple conditions. You can also use theOR
operator to specify alternative join conditions.If an explicit
join_type
is omitted, anINNER JOIN
is assumed. Note that there are many types of joins, each with its own rules and behaviors. Also be aware that an alternative approach to the join condition, via theUSING
clause, exists:USING (column_name[, ... ])
Acts as an alternative to the
ON
clause. With this clause, instead of describing the conditions of the join, you simply provide one or more column names (separated by commas) that appear in both tables. The database then evaluates the join based on those columns (the column names must be identical in both tables). In the following example, the two queries produce identical results:SELECT emp_id FROM employee LEFT JOIN sales USING (emp_id, region_id); SELECT emp_id FROM employee AS e LEFT JOIN sales AS s ON e.emp_id = s.emp_id AND e.region_id = s.region_id;
WHERE search_condition
Filters unwanted data from the result set of the query, returning only those records that satisfy the search conditions. A poorly written
WHERE
clause can ruin the performance of an otherwise usefulSELECT
statement, so mastering the nuances of theWHERE
clause is of paramount importance. Search conditions are syntactically depicted in the formWHERE [schema.[table_name.]]column operator value
.WHERE
clauses usually compare the values contained in acolumn
of the table. The values of the column are compared using anoperator
of some type (refer to Chapter 2 for more details). For example, a column might equal (=) a givenvalue
, be greater than (>
) a givenvalue
, or beBETWEEN
a range ofvalues
.WHERE
clauses may contain many search conditions concatenated together using theAND
orOR
Boolean operators, and parentheses can be used to impact the order of precedence of the search conditions.WHERE
clauses can also contain subqueries (refer to “WHERE Clause” for details).group_by_clause
- Refer to “GROUP BY Clause” for details.
order_by_clause
- Refer to “ORDER BY Clause” for details.
Rules at a glance
Each clause of the SELECT
statement has a specific use. Thus, it is possible to speak individually of the FROM
clause, the WHERE
clause, the GROUP BY
clause, and so forth. You can get more details and examples of SELECT
statements by looking up the entries for each clause of the statement elsewhere in this chapter. Not every query needs every clause, but at a minimum a query needs a select_item
list.
Because the SELECT
clause is so important and offers so many options, we’ve divided this “Rules at a glance” section into the following detailed subsections:
Aliases and
WHERE
clause joinsThe
JOIN
clauseThe
WHERE
clauseThe
ORDER BY
clause
All of these clauses, as well as the GROUP BY
clause, are discussed at greater length in their respective sections in this chapter.
Aliases and WHERE clause joins
Column names may need to be prefixed with their database, schema, and table names, particularly when the same column name may appear in more than one table in the query. For example, on an Oracle database, both the jobs table and scott’s employee table may contain job_id columns. The following example joins the employee and jobs tables using the WHERE
clause. This is an old-style join syntax:
SELECT scott.employee.emp_id, scott.employee.fname, scott.employee.lname, jobs.job_desc FROM scott.employee, jobs WHERE scott.employee.job_id = jobs.job_id ORDER BY scott.employee.fname, scott.employee.lname
You can also use aliases to write such a query more simply and clearly:
SELECT e.emp_id, e.fname, e.lname, j.job_desc FROM scott.employee AS e, jobs AS j WHERE e.job_id = j.job_id ORDER BY e.fname, e.lname
These two queries illustrate the following important rules about WHERE
clause joins:
Use commas to separate multiple elements in the
select_item
list, tables in theFROM
clause, and items in theorder_expression
.Use the
AS
clause to define aliases.Use aliases consistently throughout the
SELECT
statement once you define them.
In general, you should favor the JOIN
clause (explained next) over the WHERE
clause for describing join expressions. This not only keeps your code cleaner, making it easy to differentiate join conditions from search conditions, but also allows you to avoid the counterintuitive behavior that may result from using the WHERE
clause for outer joins in some implementations.
The JOIN clause
To perform the same query as in the previous example using a SQL standard join, list the first table and the keyword JOIN
, followed by the name of the table to be joined, the keyword ON
, and the join condition that would have been used in the old-style query. The next example shows the preceding query using the SQL standard JOIN
clause:
SELECT e.emp_id, e.fname, e.lname, j.job_desc FROM scott.employee AS e JOIN jobs AS j ON e.job_id = j.job_id ORDER BY e.fname, e.lname;
Alternatively, you could use the USING
clause. Instead of describing the conditions of the join, simply provide one or more column_name
s (separated by commas) that appear in both of the joined tables. The database then evaluates the join based on those columns (the column names must be identical in both tables). In the following example, the two queries (one using the ON
clause and one using the USING
clause) produce identical results:
SELECT emp_id FROM employee LEFT JOIN sales USING (emp_id, region_id); SELECT emp_id FROM employee AS e LEFT JOIN sales AS s ON e.emp_id = s.emp_id AND e.region_id = s.region_id;
Refer to “JOIN Subclause” for details on different types of joins.
The WHERE clause
A poorly written WHERE
clause can ruin an otherwise beautiful SELECT
statement, so it’s important that you master the nuances of the WHERE
clause (discussed in more detail later in this chapter). Here is an example of a typical query with a multipart WHERE
clause:
SELECT a.au_lname, a.au_fname, t2.title, t2.pubdate FROM authors a JOIN titleauthor t1 ON a.au_id = t1.au_id JOIN titles t2 ON t1.title_id = t2.title_id WHERE (t2.type = 'business' OR t2.type = 'popular_comp') AND t2.advance > 5500 ORDER BY t2.title
In examining this query, note that the parentheses impact the order of processing for the search conditions. You can use parentheses to move search conditions up or down in precedence, just like you would in an algebra equation.
On some platforms, the database’s default collation (also known as the sort order) impacts how the WHERE
clause filters results for a query. For example, SQL Server is (by default) dictionary-order and case-insensitive, making no differentiation between “Smith,” “smith,” and “SMITH.” Oracle, however, is dictionary-order and case-sensitive, finding the values “Smith,” “smith,” and “SMITH” to be unequal.
The WHERE
clause offers many more specific capabilities than the preceding example illustrates. The following list references some of the more common capabilities of the WHERE
clause:
NOT
- Inverts a comparison operation using the syntax
WHERE NOT
expression
. Thus, you might useWHERE NOT LIKE ...
orWHERE NOT IN ...
in a query. - Comparison operators
Compares any set of values, using the operations <, >, <>, >=, <=, and =. For example:
WHERE emp_id = '54123'
IS NULL
orIS NOT NULL
conditions- Search for any
NULL
orNOT NULL
values, respectively, using the syntaxWHERE
expression
IS [NOT] NULL
. AND
Merges multiple conditions, returning only those records that meet all conditions. The maximum number of multiple conditions is platform-dependent. For example:
WHERE job_id = '12' AND job_status = 'active'
OR
Merges alternative conditions, returning records that meet any of the conditions. For example:
WHERE job_id = '13' OR job_status = 'active'
LIKE
Tells the query to use a pattern-matching string contained within quotation marks. The wildcard symbols supported by each platform are detailed in their individual sections. All platforms support the percent sign (%) for a wildcard symbol. For example, to find any phone number starting with the 415 area code:
WHERE phone LIKE '415%'
EXISTS
Used only with subqueries,
EXISTS
tests to see whether the subquery data exists. It is typically much faster than aWHERE IN
subquery. For example, the following query finds all authors who are also employees:SELECT au_lname FROM authors WHERE EXISTS (SELECT last_name FROM employees)
BETWEEN
Performs a range check to see whether a value is in between two values (inclusive of those two values). For example:
WHERE ytd_sales BETWEEN 4000 AND 9000.
IN
Performs a test to see whether an expression matches any one value out of a list of values. The list may be literal, as in
WHERE state IN ('or', 'il', 'tn', 'ak')
, or it may be derived using a subquery:WHERE state IN (SELECT state_abbr FROM territories)
SOME | ANY
Functions the same as the
EXISTS
operation, though with slightly different syntax. For example, the following query finds all authors who are also employees:SELECT au_lname FROM authors WHERE au_lname = SOME(SELECT last_name FROM employees)
ALL
Performs a check to see whether all records in the subquery match the evaluation criteria, and returns
TRUE
when the subquery returns zero rows. For example:WHERE city = ALL (SELECT city FROM employees WHERE emp_id = 54123)
Refer to “WHERE Clause” for additional details.
The ORDER BY clause
A result set can be sorted through the ORDER BY
clause, in accordance with the database’s sort order. Each column of the result set may be sorted in either ascending (ASC
) or descending (DESC
) order. (Ascending order is the default.) If no ORDER BY
clause is specified, most implementations return the data either according to the physical order of the data within the table or according to the order of an index utilized by the query. However, when no ORDER BY
clause is specified, there is no guarantee as to the order of the result set. Following is an example of a SELECT
statement with an ORDER BY
clause on SQL Server:
SELECT e.emp_id "Emp ID", e.fname "First", e.lname "Last", j.job_desc "Job Desc" FROM employee e, jobs j WHERE e.job_id = j.job_id AND j.job_desc = 'Acquisitions Manager' ORDER BY e.fname DESC, e.lname ASC
The results are:
Emp ID First Last Job Desc --------- --------------- --------------- -------------------- MIR38834F Margaret Rancé Acquisitions Manager MAS70474F Margaret Smith Acquisitions Manager KJJ92907F Karla Jablonski Acquisitions Manager GHT50241M Gary Thomas Acquisitions Manager
After the result set is pared down to meet the search conditions, it is sorted by the authors’ first names in descending order. Where the authors’ first names are equal, the results are sorted in ascending order by last name. Refer to “ORDER BY Clause” for more details.
You may write an ORDER BY
clause using columns in the table that do not appear in the select_item list. For example, you might query all emp_ids from the employee table, yet ORDER BY
the employees’ first and last names.
Programming tips and gotchas
Once you’ve assigned an alias to a table or view in the FROM
clause, use it exclusively for all other references to that table or view within the query (in the WHERE
clause, for example). Do not mix references to the full table name and the alias within a single query. You should avoid mixed references for a couple of reasons. First, it is simply inconsistent and makes code maintenance more difficult. Second, some database platforms return errors on SELECT
statements containing mixed references. (Refer to “SUBQUERY Substatement” for special instructions on aliasing within a subquery.)
MySQL, PostgreSQL, and SQL Server support certain types of queries that do not need a FROM
clause. Use these types of queries with caution, since the SQL standard requires a FROM
clause. Queries without a FROM
clause must be manually migrated either to the SQL standard form or to a form that also works on the target database. Refer to the entry for each clause to fully investigate the varying degrees of support offered by the different database vendors for the various options of the SELECT
command.
MySQL
MySQL’s implementation of SELECT
includes support for JOIN
(with some variations, as detailed in “JOIN Subclause”), the INTO
clause, the LIMIT
clause, and the PROCEDURE
clause. Its syntax follows:
SELECT [DISTINCT | DISTINCTROW | ALL] [HIGH_PRIORITY] [STRAIGHT_JOIN] [ {SQL_SMALL_RESULT | SQL_BIG_RESULT} ] [SQL_BUFFER_RESULT] [SQL_CALC_FOUND_ROWS] select_item[, ...] [INTO {OUTFILE 'filename' options | DUMPFILE 'filename' | variable[, ...]}] [FROM table_name[, ...] join_clause [WHERE search_condition] group_by_clause order_by_clause [PROCEDURE procedure_name (param[, ...])] [FOR {UPDATE | SHARE} [OF table_name[, ...]] [NOWAIT | SKIP LOCKED] | LOCK IN SHARE MODE]
where:
DISTINCT | DISTINCTROW | ALL
DISTINCTROW
is a synonym forDISTINCT
.ALL
is assumed ifDISTINCT
orDISTINCTROW
is not specified.HIGH_PRIORITY
- Gives the query a higher priority than statements that modify data within the table. This should be used only for special, high-speed queries.
STRAIGHT_JOIN
- Similar to
JOIN
, except that the left table is always read before the right table. This is a join optimization feature. It should be avoided unless you are getting bad query plans with the standardJOIN
syntax. SQL_SMALL_RESULT | SQL_BIG_RESULT
- Tells the optimizer to expect a small or large result set, respectively, for a
GROUP BY
orDISTINCT
clause. MySQL builds a temporary table when a query has aDISTINCT
orGROUP BY
clause, and these optional keywords tell MySQL whether to build a fast temporary table in memory (forSQL_SMALL_RESULT
) or a slower, disk-based temporary table (forSQL_BIG_RESULT
) to process the worktable. SQL_BUFFER_RESULT
- Forces the result set into a temporary table so that MySQL can free table locks earlier and get the result set to the client faster.
SQL_CALC_FOUND_ROWS
- Calculates how many rows are in the result set (regardless of a
LIMIT
clause), which can then be retrieved usingSELECT FOUND_ROWS()
. select_item
- Retrieves the expressions or columns listed. Columns may be listed in the format
[database_name.[table_name.]]column_name
. If the database and/or table names are left out, MySQL assumes the current database and table. INTO {OUTFILE 'filename' options | DUMPFILE 'filename' | variable[, ... ]}
- Writes the result set of the query to a file named
'filename'
on the host filesystem with theOUTFILE
option. The named file must not already exist on the filesystem. TheDUMPFILE
option writes a single continuous line of data without column terminations, line terminations, or escape characters. This option is used mostly forBLOB
files. Specific rules for using this clause are detailed following this list. TheINTO variable
clause allows you to list one or more variables (one for each column returned). If usingINTO variable
, do not also specify a filename. FROM ...
- Indicates the table from which rows will be retrieved. The table may be described as
[database_name.]table_name
. MySQL will treat the query as a join if more than one table appears in theFROM
clause. PROCEDURE procedure_name (param[, ... ])
- Names a procedure that processes the data in the result set. The procedure is an external procedure (usually C++), not an internal database stored procedure.
FOR {UPDATE | SHARE} [OF table_name[, ... ]] [NOWAIT | SKIP LOCKED] | LOCK IN SHARE MODE
- Issues a write lock on the rows returned by the query (
UPDATE
) for its exclusive use (provided the table is of InnoDB or BDB type), or issues read locks on the rows returned by the query (SHARE
andLOCK IN SHARE MODE
), such that other users may see the rows but may not modify them.NOWAIT
andSKIP LOCKED
cause aFOR UPDATE
orFOR SHARE
query to execute immediately.NOWAIT
returns an error if a row lock cannot be obtained, whileSKIP LOCKED
excludes rows from the result set that are locked by another transaction.
Keep a couple of rules in mind when using the INTO
clause. First, the output file cannot already exist, since overwrite functionality is not supported. Second, any file created by the query will be readable by everyone that can connect to the server. (When using SELECT ... INTO OUTFILE
, you can then turn around and use the MySQL command LOAD DATA INFILE
to quickly load the data.)
You can use the following options to better control the content of the output file when using SELECT ... INTO OUTFILE
:
ESCAPED BY
FIELDS TERMINATED BY
LINES TERMINATED BY
OPTIONALLY ENCLOSED BY
The following example illustrates the use of these optional commands via a MySQL query that returns a result set in a comma-delimited output file:
SELECT job_id, emp_id, lname+fname INTO OUTFILE "/tmp/employees.text" FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY "\;n" FROM employee;
MySQL also allows SELECT
statements without a FROM
clause when performing simple arithmetic. For example, the following queries are valid in MySQL:
SELECT 2 + 2; SELECT 565 - 200; SELECT (365 * 2) * 52;
For interoperability with Oracle, MySQL also supports selection from the pseudo-table called dual:
SELECT 565 - 200 FROM DUAL;
MySQL offers an interesting alternative to the SQL standard for querying tables—the HANDLER
statement. The HANDLER
statement works a lot like SELECT
, except that HANDLER
provides very rapid data reads that circumvent the SQL query engine in MySQL. However, since the HANDLER
statement is not a SQL statement, we’ll refer you to the MySQL documentation for more information.
Oracle
Oracle allows a very large number of extensions to the SQL standard SELECT
statement. For example, since both nested tables and partitioned tables are allowed in Oracle (see “CREATE/ALTER TABLE Statement”), the SELECT
statement allows queries to those types of structures. Oracle’s SELECT
syntax is as follows:
SELECT ( {[ALL | DISTINCT]} | [UNIQUE] ) [optimizer_hints] select_item [AS alias][, ...] [INTO {variable[, ...] | record}] FROM {[ONLY] {[schema.][table_name | view_name | materialized_view_name]} [@database_link] [AS [OF] {SCN |TIMESTAMP
} expression] | subquery [WITH {READ ONLY | CHECK OPTION [CONSTRAINT constraint_name]}] | [[VERSIONS BETWEEN {SCN |TIMESTAMP
} {exp | MINVALUE} AND {exp | MAXVALUE}] AS OF {SCN |TIMESTAMP
} expression] | TABLE (nested_table_column) [(+)] {[PARTITION (partition_name) | SUBPARTITION (subpartition_name)]} [SAMPLE [BLOCK] [sample_percentage] [SEED (seed_value)]]} [AS alias][, ...] join_clause [WHERE search_condition [{AND | OR} search_condition[, ...]] [[START WITH value] CONNECT BY [PRIOR] condition]] group_by_clause [MODEL model_clause] order_by_clause [FOR UPDATE [OF [schema.][table.]column][, ...] {[NOWAIT | WAIT (int)]}]
Unless otherwise noted, the clauses shown here follow the SQL standard. Similarly, elements of the clauses are identical to those in the SQL standard unless otherwise noted. For example, Oracle’s GROUP BY
clause is nearly identical to the SQL standard, including its component elements, such as ROLLUP
, CUBE
, GROUPING SETS
, concatenated GROUPING SETS
, and the HAVING
clause.
The parameters are:
{ALL | DISTINCT} | UNIQUE
UNIQUE
is a synonym forDISTINCT
. In Oracle,DISTINCT
andUNIQUE
cannot be used onLOB
columns.optimizer_hints
- Overrides the default behavior of the query optimizer with user-specified behaviors. For example, hints can force Oracle to use an index that it might not otherwise use or to avoid an index that it might otherwise use. Refer to the vendor documentation for more information about optimizer hints.
select_item
- Retrieves the expressions or columns listed. Columns can be from a named query, table, view, or materialized view and can be listed in the format
[schema.[table_name.]]column_name
. If you omit the schema, Oracle assumes the context of the current schema. Oracle also allows for named queries that may be referenced much like nested table subqueries (discussed in “SUBQUERY Substatement”); it refers to using named queries as subquery factoring. In addition to named queries, Oracle supports subqueries and the asterisk (*), shorthand for all columns, in theselect_item
list. INTO {variable[, ... ] | record}
- Retrieves the result set values into PL/SQL variables or into a PL/SQL record.
FROM [ONLY]
- Identifies the table, view, materialized view, partition, or subquery from which the result set is retrieved. The
ONLY
keyword is optional and applies only to views belonging to a hierarchy. UseONLY
when you want to retrieve records from a named view only, and not from any of its subviews. AS [OF] {SCN | TIMESTAMP} expression
- Implements SQL-driven flashback, whereby system change numbers (SCNs) or timestamps are applied to each object in the
select_item
list. Records retrieved by the query are only those that existed at the specified SCN or time. (This feature can also be implemented at the session level using theDBMS_FLASHBACK
built-in package.)SCN expression
must equal a number, whileTIMESTAMP expression
must equal a timestamp value. Flashback queries cannot be used on linked servers. subquery [WITH {READ ONLY | CHECK OPTION [CONSTRAINT constraint_name]}]
- Mentioned separately because Oracle allows you extra ways to control a subquery.
WITH READ ONLY
indicates that the target of the subquery cannot be updated.WITH CHECK OPTION
indicates that any update to the target of the subquery must produce rows that would be included in the subquery.WITH CONSTRAINT
creates aCHECK OPTION
constraint ofconstraint_name
on the table. Note thatWITH CHECK OPTION
andWITH CONSTRAINT
are usually used inINSERT ... SELECT
statements. [VERSIONS BETWEEN {SCN | TIMESTAMP} {exp | MINVALUE} AND {exp | MAXVALUE}] AS OF {SCN | TIMESTAMP} expression
- Specifies a special kind of query to retrieve the history of changes made to data from a table, view, or materialized view. The
VERSIONS_XID
pseudocolumn shows the identifier corresponding to the transaction that made the change. This kind of query is referred to as a flashback query; it requires that you specify anSCN
orTIMESTAMP
value for each object in theselect_item
list. (You can implement SQL-driven session-level flashback using the OracleDBMS_FLASHBACK
package.)
The optional subclause VERSIONS BETWEEN
is used to retrieve multiple versions of the data specified, either using an upper and lower boundary of an SCN
(a number) or TIMESTAMP
(a timestamp value), or using the MINVALUE
and MAXVALUE
keywords. Without this clause, only one past version of the data is returned. (Oracle also provides several version query pseudocolumns for additional versioning information.)
The AS OF
clause, discussed earlier in this list, determines the SCN or moment in time from which the database issues the query when used with the VERSIONS
clause.
You cannot use flashback queries with the VERSIONS
clause against temporary tables, external tables, tables in a cluster, or views.
TABLE
- Required when querying a hierarchically declared nested table.
PARTITION | SUBPARTITION
- Restricts a query to the specified partition or subpartition of the table. Rows are retrieved only from the named partition or subpartition, not from the entire table, reducing I/O.
SAMPLE [BLOCK] [sampling_percentage] [SEED (seed_value)]
- Tells Oracle to select records from a random sampling of rows within the result set, as a percentage of rows or blocks, rather than from the entire table.
BLOCK
tells Oracle to use block sampling rather than row sampling. Thesampling_percentage
, telling Oracle the total block or row percentage to be included in the sample, may be anywhere between .000001 and 99. The optionalSEED
clause is used to provide limited repeatability. If you specify a seed value, Oracle will attempt to return the same sample from one execution of the query to the next. The seed value can be between 0 and 4,294,967,295. WhenSEED
is omitted, the resulting sample will change from one execution of the query to the next. Sampling may be used only on single-table queries. join_clause
- Merges the result sets of two or more tables in a single query. See the description following this list for more information.
WHERE ... [[START WITH value] CONNECT BY [PRIOR] condition]
Filters records returned in the result set. Oracle allows the use of hierarchical information within tables, whose filtering can be controlled with the
START WITH
clause.START WITH
identifies the rows that will serve as the parent rows in the result set.CONNECT BY
identifies the relationshipcondition
between the parent rows and their child rows. ThePRIOR
keyword is used to identify the parent rows instead of the child rows.Hierarchical queries use the
LEVEL
pseudocolumn to identify (1) the root node, (2) the child nodes, (3) the grandchild nodes, and so forth. Other pseudocolumns available in hierarchical queries areCONNECT_BY_ISCYCLE
andCONNECT_BY_ISLEAF
. Hierarchical queries are mutually exclusive of theORDER BY
andGROUP BY
clauses. Do not use those clauses in a query containingSTART WITH
orCONNECT BY
. You can order records from siblings of the same parent table by using theORDER SIBLINGS BY
clause.MODEL model_clause
- Allows you to create a multidimensional array from query results and then apply formulas (called rules) to this array to calculate new values. See “The MODEL clause” for details.
FOR UPDATE [OF [schema.][table.]column[, ... ] {[NOWAIT | WAIT (int)]}
- Locks the rows of the result set so that other users cannot lock or update them until you’re finished with your transaction.
FOR UPDATE
cannot be used in a subquery, in queries usingDISTINCT
orGROUP BY
, or in queries with set operators or aggregate functions. Child rows in a hierarchical table are not locked when this clause is issued against the parent rows. TheOF
keyword is used to lock only the selected table or view. Otherwise, Oracle locks all the tables or views referenced in theFROM
clause. When usingOF
, the columns are not significant, though real column names (not aliases) must be used. TheNOWAIT
andWAIT
keywords tell Oracle either to return control immediately if a lock already exists or to waitint
seconds before returning control to you, respectively. If neitherNOWAIT
norWAIT
is specified, Oracle waits until the rows become available.
Unlike some other database platforms, Oracle does not allow a SELECT
statement without a FROM
clause. The following query, for example, is invalid:
SELECT 2 + 2;
As a workaround, Oracle has provided a special-purpose table called DUAL. Any time you want to write a query that does not retrieve data from a user-created table, such as to perform a calculation, use FROM DUAL
. Both of the following queries are valid:
SELECT 2 + 2 FROM DUAL; SELECT (((52-4) * 5) * 8) FROM DUAL;
Oracle’s implementation of SELECT
is quite straightforward if you want to retrieve data from a table. As mentioned previously, Oracle allows the use of named queries. A named query is, in a sense, an alias to an entire query that can save you time when you’re writing a complex multi-subquery SELECT
statement. For example:
WITH pub_costs AS (SELECT pub_id, SUM(job_lvl) dept_total FROM employees e GROUP BY pub_id), avg_costs AS (SELECT SUM(dept_total)/COUNT(*) avg FROM employee) SELECT * FROM pub_costs WHERE dept_total > (SELECT avg FROM avg_cost) ORDER BY department_name;
Here we create two named subqueries—pub_costs and avg_costs—which are later referenced in the main query. The named queries are effectively the same as subqueries; however, subqueries must be written out in their entirety each time they’re used, while named queries need not be.
Oracle allows you to select rows from a single partition of a partitioned table using the PARTITION
clause, or to retrieve only a statistical sampling of the rows (as a percentage of rows or blocks) of a result set using SAMPLE
. For example:
SELECT * FROM sales PARTITION (sales_2021_q3) sales WHERE sales.qty > 1000; SELECT * FROM sales SAMPLE (12);
Flashback queries are a feature of Oracle that enable retrieval of point-in-time result sets. For example, you could find out what everyone’s salary was yesterday before a big change was applied to the database:
SELECT job_lvl, lname, fname FROM employee AS OFTIMESTAMP
(SYSTIMESTAMP -INTERVAL
'1' DAY);
Another interesting Oracle extension of the standard query format is the hierarchical query. Hierarchical queries return the results of queries against hierarchically designed tables in the order you define. For example, the following query returns the names of the employees and their positions in the hierarchy (represented by the position in the org_char column), employee IDs, manager IDs, and job IDs:
-- Query SELECT LPAD(' ',2*(LEVEL-1)) || lname AS org_chart, emp_id, mgr_id, job_id FROM employee START WITH job_id = 'Chief Executive Officer' CONNECT BY PRIOR emp_id = mgr_id; -- Results ORG_CHART EMPLOYEE_ID MANAGER_ID JOB_ID -------------- ----------- ---------- ------------------------ Cramer 101 100 Chief Executive Officer Devon 108 101 Business Operations Mgr Thomas 109 108 Acquisitions Manager Koskitalo 110 108 Productions Manager Tonini 111 108 Operations Manager Whalen 200 101 Admin Assistant Chang 203 101 Chief Financial Officer Gietz 206 203 Comptroller Buchanan 102 101 VP Sales Callahan 103 102 Marketing Manager
In the previous query, the CONNECT BY
clause defines the hierarchical relationship of the emp_id value as the parent row equal to the mgr_id value in the child row, while the START WITH
clause specifies where in the hierarchy the result set should begin.
Oracle supports the following types of JOIN
syntax (refer to “JOIN Subclause” for more details):
FROM table1 { CROSS JOIN table2 | INNER JOIN table2 [ {ON join_condition | USING (column_list)} ] | NATURAL [LEFT [OUTER]] JOIN table2 | LEFT [OUTER] JOIN table2 [ {ON join_condition | USING (column_list)} ] | RIGHT [OUTER] JOIN table2 [ {ON join_condition | USING (column_list)} ]| NATURAL [RIGHT [OUTER]] JOIN table2 FULL [OUTER] JOIN table2 }
[CROSS] JOIN
- Retrieves all records of both
table1
andtable2
. This is syntactically the same asFROM table1, table2
with no join conditions in theWHERE
clause. INNER JOIN
- Retrieves those records of both
table1
andtable2
where there are matching values in both tables according to the join condition. Note that the syntaxFROM
table1
,table2
with join conditions in theWHERE
clause is semantically equivalent to an inner join. NATURAL
- Shortcuts the need to declare a join condition by assuming a
USING
clause containing all columns that are in common between the two joined tables. (Be careful if columns have the same names but not the same data types or the same sort of values!)LOB
columns cannot be referenced in a natural join. Referencing aLOB
or collection column in aNATURAL JOIN
clause will return an error. LEFT [OUTER] JOIN
Retrieves all records in the leftmost table (i.e.,
table1
) and matching records in the rightmost table (i.e.,table2
). If there isn’t a matching record intable2
, NULL values are substituted for that table’s columns. You can use this type of join to retrieve all the records in a table, even when there are no counterparts in the joined table. For example:SELECT j.job_id, e.lname FROM jobs j LEFT OUTER JOIN employee e ON j.job_id = e.job_id ORDER BY d.job_id
RIGHT [OUTER] JOIN
- Retrieves all records in the rightmost table, regardless of whether there is a matching record in the leftmost table. A right join is the same as a left join, except that the optional table is on the left.
FULL [OUTER] JOIN
- Specifies that all rows from both tables be returned, regardless of whether a row from one table matches a row in the other table. Any columns that have no value in the corresponding joined table are assigned a NULL value.
ON join_condition
- Declares the condition(s) that join the result sets of two tables together. This takes the form of declaring the columns in
table1
andtable2
that must match the join condition. When multiple columns must be compared, use theAND
clause. USING (column_list)
Acts as an alternative to the
ON
clause. Instead of describing the conditions of the join, simply provide a column name (or columns separated by commas) that appears in both tables. The column name(s) must be identical in both tables and cannot be prefixed with a table name or alias.USING
cannot be used onLOB
columns of any type. The following two queries produce identical results. One is written with aUSING
clause and the other specifies join conditions using SQL standard syntax:SELECT column1 FROM foo LEFT JOIN poo USING (column1, column2); SELECT column1 FROM foo LEFT JOIN poo ON foo.column1 = poo.column1 AND foo.column2 = poo.column2;
Partitioned outer joins
Oracle supports the PARTITION BY
subclause of the SQL standard’s JOIN
clause, which defines a special kind of query called a partitioned outer join that extends the conventional outer join syntax by applying a right or left outer join to a partition of one or more rows. This is especially useful for querying sparse data along a particular dimension of data, thereby returning rows that otherwise would be omitted from the result set. The PARTITION BY
clause can be used on either side of an outer join, resulting in a UNION
of the outer joins of each of the partitions in the partitioned result set and the table on the other side of the join. (When this clause is omitted, Oracle treats the entire result set as a single partition.) PARTITION BY
is not allowed with a FULL OUTER JOIN
.
For example, our product table keeps track of all products we produce, while the manufacturing table shows when we produce them. Since we’re not continuously making every product at all times, the joined data between the two tables may be sparse at times. Thus, the following query:
SELECT manufacturing.time_id AS time, product_name AS name, quantity AS qty FROM product PARTITION BY (product_name) RIGHT OUTER JOIN times ON (manufacturing.time_id = product.time_id) WHERE manufacturing.time_id BETWEEN TO_DATE('01/10/05', 'DD/MM/YY') AND TO_DATE('06/10/05', 'DD/MM/YY') ORDER BY 2, 1;
returns this result:
time name qty --------- ---------- ---------- 01-OCT-05 flux capacitor 10 02-OCT-05 flux capacitor 03-OCT-05 flux capacitor 04-OCT-05 flux capacitor 05-OCT-05 flux capacitor 06-OCT-05 flux capacitor 10 06-OCT-05 flux capacitor 8 01-OCT-05 transmogrifier 10 01-OCT-05 transmogrifier 15 02-OCT-05 transmogrifier 03-OCT-05 transmogrifier 04-OCT-05 transmogrifier 10 04-OCT-05 transmogrifier 11 05-OCT-05 transmogrifier 06-OCT-05 transmogrifier
Flashback queries
Oracle also supports flashback queries, which keep track of previous values of the results returned for a SELECT
statement. In the following set of example code, we’ll issue a regular query on a table, change the values in the table with an UPDATE
statement, and then query the flashback version of the data. First, the regular query:
SELECT salary FROM employees WHERE last_name = 'McCreary';
The results are:
SALARY ---------- 3800
Now, we’ll change the value in the employees table and query the table to confirm the current value:
UPDATE employees SET salary = 4000 WHERE last_name = 'McCreary '; SELECT salary FROM employees WHERE last_name = 'McCreary ';
The results are:
SALARY ---------- 4000
Finally, we’ll perform a flashback query to see what the salary value was in the past:
SELECT salary FROM employees AS OFTIMESTAMP
(SYSTIMESTAMP -INTERVAL
'1' DAY) WHERE last_name = 'McCreary';
The results are:
SALARY ---------- 3800
If we wanted to be more elaborate, we could find out all of the values of salary for a given time period, say, the last two days:
SELECT salary FROM employees VERSIONS BETWEENTIMESTAMP
SYSTIMESTAMP -INTERVAL
'1' MINUTE AND SYSTIMESTAMP -INTERVAL
'2' DAY WHERE last_name = 'McCreary';
The results are:
SALARY ---------- 4000 3800
The MODEL clause
Oracle supports a MODEL
clause that enables spreadsheet-like result sets to be returned from a SELECT
statement. The MODEL
clause is designed to alleviate the need for developers to extract data from the database and put it into a spreadsheet, like Microsoft Excel, for further manipulation. It creates a multidimensional array in which cells can be referenced by dimension values. For instance, you might dimension an array on product and time, specifying column values that you wish to access via combinations of those two dimensions. You can then write rules that are similar in concept to spreadsheet formulas, that are executed in order to change values in your model, or that create new values, and perhaps even new rows, in your model.
Syntactically, the MODEL
clause appears after the GROUP BY
clause and before the ORDER BY
clause. The earlier syntax diagram for Oracle’s SELECT
statement shows the position of the clause, and the syntax details are presented here:
MODEL [{IGNORE | KEEP} NAV] [UNIQUE {DIMENSION | SINGLE REFERENCE}] [ RETURN {UPDATED | ALL} ROWS ] [REFERENCE reference_model_name ON (subquery) [PARTITION BY (column [AS alias][, ...])] DIMENSION BY (column [AS alias][, ...]) MEASURES (column [AS alias][, ...]) [{IGNORE | KEEP} NAV] [UNIQUE {DIMENSION | SINGLE REFERENCE}]] [MAIN main_model_name] [PARTITION BY (column [AS alias][, ...])] DIMENSION BY (column [AS alias][, ...]) MEASURES (column [AS alias][, ...]) [{IGNORE | KEEP} NAV] [UNIQUE {DIMENSION | SINGLE REFERENCE}]] model_rules_clause [RULES [UPSERT [ALL] | UPDATE] [{AUTOMATIC | SEQUENTIAL} ORDER]] [ITERATE (int) [UNTIL (ending_condition)]] ( [ {UPSERT [ALL] | UPDATE } ] measure [...] [FOR { dimension | ( dimension[, ...] ) } { [IN ({subquery | literal[, ...]})] | [LIKE pattern] FROM start_literal TO end_literal {INCREMENT | DECREMENT} diff_literal }[, ...] [ORDER [SIBLINGS] BY (order_column [ASC | DESC] [NULLS FIRST | NULLS LAST][, ...])]] = expr[, ...] )
The parameters of the MODEL
clause are as follows:
{IGNORE | KEEP} NAV
- Specifies whether NULL or absent values (
NAV
) are retained as NULLs (KEEP
), or whether they are replaced with suitable defaults (IGNORE
): zero for numeric types, 1-Jan-2000 for date types, an empty string for character types, and NULL for anything else. UNIQUE {DIMENSION | SINGLE REFERENCE}
- Specifies the scope within which the database ensures that a given cell reference points to a unique data value. Use
DIMENSION
to require that each possible cell reference, whether on the left or right side of a rule, represents a single value. UseSINGLE REFERENCE
to perform that check only for those cell references that appear on the righthand side of a rule. RETURN {UPDATED | ALL} ROWS
- Specifies whether all rows are returned from model processing, or whether only updated rows are returned.
reference_model_name ON (subquery)
- Specifies the name and rowsource for a reference model. This is a model on which you cannot perform calculations, but which contains values that you can reference from within your main query.
PARTITION BY (column[, ... ])
- Splits a model into independent partitions based on the columns given. You cannot partition reference models.
DIMENSION BY (column[, ... ])
- Specifies the dimensions for a model. Values from these columns represent the set of index values that are used to identify cells in the multidimensional addressing space.
MEASURES (column[, ... ])
- Specifies the values associated with each unique combination of dimensions (e.g., with each cell of the model).
alias
- Specifies an alias for a column.
MAIN model_name
- Begins the definition of the main model, and also gives that model a name. The main model represents the model on which you perform work. Rows from your containing
SELECT
feed into this model, rules are applied, and the resulting rows are returned. RULES [UPSERT [ALL] | UPDATE]
- Specifies whether rules may both create new cells and update existing cells (
UPSERT
), or whether they must only update existing cells (UPDATE
). If you want your model to be able to create new rows in your result set, specifyUPSERT
. The default isUPSERT
. You can also control this behavior on a rule-by-rule basis; seerule
in the syntax. {AUTOMATIC | SEQUENTIAL} ORDER
- Specifies whether the optimizer determines the order in which rules are evaluated (
AUTOMATIC
), or whether rules are evaluated in the order in which you list them (SEQUENTIAL
). The default isSEQUENTIAL
. ITERATE (int)
- Requests that entire set of rules be evaluated repeatedly,
int
times. The default is to evaluate the set of rules just once. UNTIL(ending_condition)
- Specifies a condition that, when met, causes iteration to end. You must still specify an
int
, which serves as a safeguard against infinite loops. measure[ ... ]
- A reference to one of the measures listed in the
MEASURES
clause. When you reference a measure, the square brackets are part of the syntax. You must specify all dimensions, either via a subquery or by listing them, and the specific value of the measure associated with those dimensions will be returned, or referenced. FOR ...
- A
FOR
loop iterating over one or many dimensions. The multi-iteratingFOR
loop is much like a subquery where each row of the result set represents a specific combination of dimensions. { dimension | ( dimension[, ... ]) }
- A list of values, whether from columns or expressions, that collectively identify a unique cell in the model.
IN ({subquery | literal[, ... ]})
- The source of values for a
FOR
loop may be a subquery, or it may be a specific list of literal values. LIKE pattern
- Allows you to insert dimension values into a pattern. Use a percent-sign to mark the location at which you want dimension values to be inserted. For example, use
FOR x LIKE 'A%B' FROM 1 TO 3 INCREMENT 1
to generate values such as'A1B', 'A2B', 'A3B'
. FROM start_literal TO end_literal {INCREMENT | DECREMENT} diff_literal
- Defines the starting and ending
FOR
loop values, and also the difference between each subsequent value as the loop iterates from start to end. ORDER [SIBLINGS] BY (order_column [ASC | DESC] [NULLS FIRST | NULLS LAST][, ... ])
- Imposes an order of evaluation with respect to the cells referenced from the left side of a rule. Use this clause if you want a rule to be applied to cells in order. Otherwise, you have no guarantee as to the order in which the rule is applied to the cells that it affects. You can order records from siblings of the same parent table by using the
ORDER SIBLINGS BY
clause.
Following is a list of functions that have been designed specifically for use in the MODEL
clause:
CV() or CV(dimension_column)
- Returns the current value of a dimension column. May be used only on the righthand side of an expression in a rule. When the
CV()
form is used, the dimension column is determined implicitly based on the function call’s position in a list of dimension values. PRESENTNNV(measure[dimension[, ... ], not_null, was_null)
- Returns either
not_null
orwas_null
, depending on whether the specified measure was NULL when model processing began. This function may be used only from the righthand side of a rule expression. PRESENTV(measure[dimension[, ... ], did_exist, didnt_exist)
- Returns either
did_exist
ordidnt_exist
, depending on whether the specified measure existed when model processing began. This function may be used only from the righthand side of a rule expression. Be aware that whether a measure existed is a completely separate question from whether that measure was NULL. ITERATION_NUMBER
- Returns 0 on the first iteration through the rules, 1 on the second iteration, and so forth. This is useful when you want to base rule calculations on the number of iterations.
The following example demonstrates that the MODEL
clause gives a normal SELECT
statement the ability to construct a multidimensional array as a result set and calculate inter-row and inter-array values interdependently. The newly calculated values are returned as part of the SELECT
statement’s result set:
SELECT SUBSTR(region,1,20) country, SUBSTR(product,1,15) product, year, sales FROM sales_view WHERE region IN ('USA','UK') MODEL RETURN UPDATED ROWS PARTITION BY (region) DIMENSION BY (product, year) MEASURES (sale sales) RULES ( sales['Bounce',2006] = sales['Bounce',2005] + sales['Bounce',2004], sales['Y Box', 2006] = sales['Y Box', 2005], sales['2_Products',2006] = sales['Bounce',2006] + sales['Y Box',2006] ) ORDER BY region, product, year;
In this example, a query against the sales_view materialized view returns the sum of sales over the course of a few years for the regions 'USA'
and 'UK'
. The MODEL
clause then falls between the WHERE
clause and the ORDER BY
clause. Since sales_view currently holds data for the years 2004 and 2005, we provide it with rules to calculate figures for the year 2006.
The subclause RETURN UPDATED ROWS
limits the result set to the rows that were created or updated by the query. Next, the example defines the logical divisions of the data using data elements from the materialized view and using the PARTITION BY
, DIMENSION BY
, and MEASURES
subclauses. The RULES
subclause then references individual measures of the model by referring to combinations of different dimension values, much like a spreadsheet macro references worksheet cells with specific lookups and references to ranges of values.
Oracle (and SQL Server, using a somewhat different technique) supports a non-SQL standard query type known as a pivot query. Although you should refer to the vendor documentation for exactly how to write a pivot (or unpivot) query, an example here will help you take advantage of this useful technique. A pivot query turns the result set on its side, enabling you to extract more value from the data. In Oracle, you must first create your pivot table. By using a pivot table, you can then turn the result “on its side” so that the order_type column becomes the column headings:
CREATE TABLE pivot_table AS SELECT * FROM (SELECT year, order_type, amt FROM sales) PIVOT SUM(amt) FOR order_type IN ('retail', 'web'); SELECT * FROM pivot_table ORDER BY YEAR;
YEAR RETAIL WEB ---- ----------- ------ 2004 7014.54 2005 9745.12 2006 16717.88 10056.6 2007 28833.34 39334.9 2008 66165.77 127109.4
PostgreSQL
PostgreSQL supports a straightforward implementation of the SELECT
statement. It supports JOIN
and subquery applications. PostgreSQL also allows the creation of new temporary or permanent tables using the SELECT ... INTO
syntax or the CREATE TABLE AS SELECT
construct. Its SELECT
syntax is as follows:
SELECT [ALL | DISTINCT [ON (select_item[, ...])]] [AS alias [(alias_list)]][, ...] [INTO [LOGGED | UNLOGGED] [[TEMP]ORARY] [TABLE] new_table] [FROM [ONLY] table1[.*] [AS alias][, ...]] [[join_type] JOIN table2 {[ON join_condition] | [USING (column_list)]}] [WHERE search_condition] [group_by_clause] [order_by_clause] [for_update_clause] for_update_clause ::= [FOR {UPDATE | NO KEY UPDATE | SHARE | KEY SHARE} [OF table_name[, ...]] [NOWAIT | SKIP LOCKED] [...]]
where:
ALL | DISTINCT [ON (select_item[, ... ]]
- Supports the
ALL
andDISTINCT
keywords of the SQL standard, whereALL
(the default) returns all rows (including duplicates) andDISTINCT
eliminates duplicate rows. In addition,DISTINCT ON
eliminates duplicates on only the specifiedselect_item
s, not on all of theselect_item
s in the query (example follows). select_item
- Includes the standard elements of a
select_item
list supported by the SQL standard. In addition to the asterisk (*
) shorthand to retrieve all rows, you can usetable_name.*
to retrieve all rows from an individual table. AS alias [(alias_list)]
- Creates an alias or a list of aliases for one or more columns (or tables in the
FROM
clause).AS
is required forselect_item
aliases, but not forFROM
table aliases. (Some other database platforms treat theAS
as an option when declaring an alias.) INTO [UNLOGGED | LOGGED] [[TEMP]ORARY] [TABLE] new_table
- Creates a new table from the result set of the query. Both
TEMP
andTEMPORARY
are acceptable usages to create a temporary table that is automatically dropped at the end of the session. Otherwise, the command creates a permanent table. Permanent tables created with this statement must have new, unique names, but temporary tables may have the same name as an existing table. If you create a temporary table with the same name as an existing permanent table, the temporary table is used to resolve all operations against that table name while in the same session as the one that created it. Other sessions will continue to see the existing permanent table. TheUNLOGGED
table creates a table where only the creation of the table structure is written to the transaction logs. Creating unlogged tables is generally faster than logged ones and could be as much as 5 times faster. However, becauseUNLOGGED
table data writing is not logged, data inserted can not be replicated. They are also truncated during database restarts or crashes. That said, you should only use unlogged tables for data you do not need to read on replicas and data that can be easier recreated. When the logging option is not specified, aLOGGED
table is created. FROM [ONLY]table1[, ... ]
Specifies one or more source tables where the data resides. (Be sure to specify a join condition or a theta
WHERE
clause so that you don’t get a Cartesian product of all records in all tables.) PostgreSQL allows inheritance in child tables of declared parent tables. The ONLY keyword is not supported for partitioned tables because the parent never has data. Use theONLY
keyword to suppress rows from the child tables of your source table. (You can turn off this default inheritance globally with the commandSET SQL_Inheritance TO OFF
.) PostgreSQL also supports nested table subqueries (see the section onSUBQUERY
later in this chapter). TheFROM
clause is not needed when used for computation:SELECT 8 * 40;
PostgreSQL will also include an implicit
FROM
onSELECT
statements that include schema-identified columns. For example, the following query is acceptable (though not recommended):SELECT sales.stor_id WHERE sales.stor_id = '6380';
for_update_clause
If FOR UPDATE
, FOR NO KEY UPDATE
, FOR SHARE
, or FOR KEY SHARE
is specified, the SELECT
statement locks the selected rows against concurrent updates. SKIP LOCKS
allows records already locked to not be updated. This is suitable if you are updating records in batches and can always revisit records that haven’t been updated in a secondary batch.
PostgreSQL supports a handy variation of the DISTINCT
clause, DISTINCT ON (select_item[, ... ] )
. This variation allows you to pick and choose the exact columns that are considered for elimination of duplicates. PostgreSQL chooses the result set in a manner much like it does for ORDER BY
. You should include an ORDER BY
clause so that there’s no unpredictability as to which record is returned. For example, this query retrieves the most recent sales report for each store based on the most recent order date:
SELECT DISTINCT ON (stor_id), ord_date, qty FROM sales ORDER BY stor_id, ord_date DESC;
However, there would be no way to predict what single record would have been returned without the ORDER BY
clause.
PostgreSQL also allows retrieving a whole row as a column, as follows:
SELECT DISTINCT ON (stor_id) stor_id, s AS sale_row FROM sales AS s ORDER BY stor_id, ord_date DESC;
This query retrieves the most recent sales report for each store based on the most recent order date, but instead of returning individual columns, it returns the whole row as a column value. This is done simply by specifying the name of the table or table alias. If a table alias is specified, then the table name cannot be used; you must use the alias. The output of the preceding query looks like this:
6380 (6380,6871,"1994-09-14 00:00:00-04",5,"Net 60",BU1032) 7066 (7066,QA7442.3,"1994-09-13 00:00:00-04",75,"ON invoice",PS2091) 7067 (7067,D4482,"1994-09-14 00:00:00-04",10,"Net 60",PS2091) 7131 (7131,N914008,"1994-09-14 00:00:00-04",20,"Net 30",PS2091) 7896 (7896,TQ456,"1993-12-12 00:00:00-05",10,"Net 60",MC2222) 8042 (8042,423LL922,"1994-09-14 00:00:00-04",15,"ON invoice",MC3021)
This feature is particularly useful for outputting data to applications by combining it with a function such as jsonb_agg or json_agg, as follows, and using ORDER BY
aggregation syntax (a feature supported for all PostgreSQL aggregate functions, including user-defined ones):
SELECT json_agg(s ORDER BY stor_id, ord_date) AS sale_rows FROM sales AS s;
Refer to “JOIN Subclause” for information on the supported join types.
SQL Server
SQL Server supports most of the basic elements of the SQL standard SELECT
statement, including all of the various join types. It also offers several variations on the SELECT
statement, including optimizer hints, the INTO
clause, the TOP
clause, GROUP BY
variations, COMPUTE
, and WITH OPTIONS
. The SQL Server SELECT
syntax is:
SELECT {[ALL | DISTINCT] | [TOP number [PERCENT] [WITH TIES]]} select_item [AS alias] [INTO new_table_name] [FROM {[rowset_function | table1[, ...]]} [AS alias]] [[join_type] JOIN table2 [ON join_condition]] [WHERE search_condition] group_by_clause order_by_clause [COMPUTE {aggregation (expression)}[, ...] [BY expression[, ...]]] [FOR {BROWSE | XML | JSON}] [OPTION (hint[, ...])]
where:
TOP number [PERCENT] [WITH TIES]
- Indicates that only the specified
number
of rows should be retrieved in the query result set. IfPERCENT
is specified, only the firstnumber
percent of the rows are retrieved.WITH TIES
is used only for queries with anORDER BY
clause. This variation specifies that additional rows are returned from the base result set using the same value in theORDER BY
clause, appearing as the last of theTOP
rows. INTO new_table_name
- Creates a new table from the result set of the query. You can use this command to create temporary or permanent tables. (Refer to SQL Server’s rules for creating temporary or permanent tables in “CREATE/ALTER TABLE Statement”.) The
SELECT ... INTO
command quickly copies the rows and columns queried from other table(s) into a new table using a non-logged operation. Since it is not logged,COMMIT
andROLLBACK
statements do not affect it. FROM {[rowset_function | table1[, ... ]]}
Supports the standard behavior of the SQL standard
FROM
clause, including nested table subqueries. In addition, SQL Server supports a set of extensions called rowset functions. Rowset functions allow SQL Server to source data from special or external data sources such as XML streams, full-text search file structures (a special structure in SQL Server used to store things like MS Word documents and MS PowerPoint slide shows within the database), or external data sources (like an MS Excel spreadsheet).See the SQL Server documentation for the full description of the available
FROM {[rowset_function | table1[, ... ]]}
options. Among the many possibilities, SQL Server currently supports the following functions:CONTAINSTABLE
- Returns a table derived from a specified table that contains at least one full-text index
TEXT
orNTEXT
column. The records derived are based upon either a precise, fuzzy, weighted-match, or proximity-match search. The derived table is then treated like any otherFROM
data source. FREETEXTTABLE
- Similar to
CONTAINSTABLE
, except that records are derived based upon a meaning search of'freetext_string'
.FREETEXTTABLE
is useful for ad hoc queries against full-text tables, but less accurate thanCONTAINSTABLE
. OPENDATASOURCE
- Provides a means of sourcing data external to SQL Server via OLEDB without declaring a linked server, such as an MS Excel spreadsheet or a Sybase Adaptive Server database table. This is intended for the occasional ad hoc query; if you frequently retrieve result sets from external data sources, you should declare a linked server.
OPENQUERY
- Executes a pass-through query against a linked server. This is an effective means of performing a nested table subquery against a data source that is external to SQL Server. The data source must first be declared as a linked server.
OPENROWSET
- Executes a pass-through query against an external data source. This is similar to
OPENDATASOURCE
, except thatOPENDATASOURCE
only opens the data source; it does not actually pass through aSELECT
statement.OPENROWSET
is intended for occasional, ad hoc usage only. OPENXML
- Provides a queryable, table-like view to an XML document using a file handle. This is covered in “SQL Server XML keywords, functions, procedures, and methods”.
COMPUTE {aggregation (expression)}[, ... ] [BY expression[, ... ]]
Generates additional aggregations—usually totals—that appear at the end of the result set.
BY expression
adds subtotals and control breaks to the result set.COMPUTE
andCOMPUTE BY
can be used simultaneously in the same query.COMPUTE BY
must be coupled with anORDER BY
clause, though theexpression
used byCOMPUTE BY
can be a subset of theorder_by_expression
. The aggregation may be any of these function calls:AVG
,COUNT
,MAX
,MIN
,STDEV
,STDEVP
,VAR
,VARP
, orSUM
. Examples are shown later in this section.COMPUTE
, in any form, does not work with theDISTINCT
keyword or withTEXT
,NTEXT
, orIMAGE
data types.FOR {BROWSE | XML | JSON}
FOR BROWSE
is used to allow updates to data retrieved in a DB-Library browse mode cursor. (DB-Library is the original access methodology for SQL Server and has since been supplanted by OLE DB in most applications.)FOR BROWSE
can only be used against tables with a unique index and a column with theTIMESTAMP
data type; it cannot be used inUNION
statements or when aHOLDLOCK
hint is active.FOR XML
andFOR JSON
are used to extract the result set as an XML document or as JSON, respectively. See Chapter 10 for details on working with these formats.OPTION (hint[, ... ])
- Replaces elements of the default query plan with your own. Because the optimizer usually picks the best query plan for any query, you are strongly discouraged from placing optimizer hints into your queries. Refer to the SQL Server documentation for more information on hints.
Here’s an example of SQL Server’s SELECT ... INTO
capability. This example creates a table called non_mgr_employees using SELECT ... INTO
. The table contains the emp_id, first name, and last name of each non-manager from the employee table, joined with their job descriptions (taken from the jobs table):
SELECT e.emp_id, e.fname, e.lname, SUBSTRING(j.job_desc,1,30) AS job_desc INTO non_mgr_employee FROM employee e JOIN jobs AS j ON e.job_id = j.job_id WHERE j.job_desc NOT LIKE '%MANAG%' ORDER BY 2,3,1
The newly created and loaded table non_mgr_employee now can be queried like any other table.
SELECT ... INTO
is not logged or recoverable and so should only be used for operations that can be restarted from the beginning.
COMPUTE
has a number of permutations that can impact the result set retrieved by the query. The following example shows the sum of book prices broken out by type of book and sorted by type and then price:
-- Query SELECT type, price FROM titles WHERE type IN ('business','psychology') AND price > 10 ORDER BY type, price COMPUTE SUM(price) BY type -- Results type price ------------ --------------------- business 11.9500 business 19.9900 business 19.9900 sum ===================== 51.9300 type price ------------ --------------------- psychology 10.9500 psychology 19.9900 psychology 21.5900 sum ===================== 52.5300
The COMPUTE
clause behaves differently if you do not include BY
. The following query retrieves the grand total of prices and advances for books with prices over $16.00:
-- Query SELECT type, price, advance FROM titles WHERE price > $16 COMPUTE SUM(price), SUM(advance) -- Results type price advance ------------ --------------------- --------------------- business 19.9900 5000.0000 business 19.9900 5000.0000 mod_cook 19.9900 .0000 popular_comp 22.9500 7000.0000 popular_comp 20.0000 8000.0000 psychology 21.5900 7000.0000 psychology 19.9900 2000.0000 trad_cook 20.9500 7000.0000 sum ===================== 165.4500 sum ===================== 41000.0000
You can even use COMPUTE BY
and COMPUTE
in the same query to produce subtotals and grand totals. (For the sake of brevity, we’ll show an example query, but not the result set.) In this example, we find the sum of prices and advances by type for business and psychology books that cost over $16.00:
SELECT type, price, advance FROM titles WHERE price > $16 AND type IN ('business','psychology') ORDER BY type, price COMPUTE SUM(price), SUM(advance) BY type COMPUTE SUM(price), SUM(advance)
Don’t forget that you must include the ORDER BY
clause with a COMPUTE BY
clause! (You do not need an ORDER BY
clause with a simple COMPUTE
clause without the BY
keyword.) There are many permutations that you can perform in a single query—multiple COMPUTE
and COMPUTE BY
clauses, GROUP BY
with a COMPUTE
clause, and even COMPUTE
with an ORDER BY
statement. It’s actually fun to tinker around with the different ways you can build queries using COMPUTE
and COMPUTE BY
. It’s not theme park fun, but whaddya want? This is a programming book!
SQL Server (and Oracle, using a somewhat different technique) also supports a non-SQL standard query known as a pivot query. Although you should refer to the vendor documentation for details on exactly how to write a pivot (or unpivot) query, an example here will help you take advantage of this useful technique. A pivot query turns the result set on its side, enabling you to extract more value from the data. For example, the following query produces a two-column, four-row result set:
-- Query SELECT days_to_make, AVG(manufacturing_cost) AS Avg_Cost FROM manufacturing.products GROUP BY days_to_make; -- Results days_to_make Avg_Cost 0 5 1 225 2 350 4 950
By using a pivot query, you can then turn the result “on its side” so that the days_to_make column values become the column headings and the query returns one row with five columns:
-- Query SELECT 'Avg_Cost' As Cost_by_Days, [0], [1], [2], [3], [4] FROM (SELECT days_to_make, manufacturing_cost FROM manufacturing.products) AS source PIVOT (AVG(manufacturing_cost) FOR days_to_make IN ([0], [1], [2], [3], [4])) AS pivottable; -- Results Cost_by_Days 0 1 2 3 4 Avg_Cost 5 225 350 NULL 950
SUBQUERY Substatement
A subquery is a nested query. Subqueries may appear in various places within a SQL statement.
Platform | Command |
---|---|
MySQL | Supported, with limitations |
Oracle | Supported |
PostgreSQL | Supported |
SQL Server | Supported |
SQL standard syntax
The different types of subqueries the SQL standard supports are described “Rules at a glance”. Scalar, table, and nested table subqueries are represented by the following generalized syntax:
SELECT column1, column2, ... (scalar_subquery) FROM table1, ... (nested_table_subquery) AS subquery_table_name] WHERE foo = (scalar_subquery) OR foo IN (table_subquery)
Correlated subqueries are more complex because the values of such subqueries are dependent on values retrieved in their main queries. For example:
SELECT column1 FROM table1 AS t1 WHERE foo IN (SELECT value1 FROM table2 AS t2 WHERE t2.pk_identifier = t1.fk_identifier)
Note that the IN
clause is for example purposes only. Any comparison operator may be used.
Keywords
scalar_subquery
- Includes a scalar subquery in the
SELECT
item list or in theWHERE
orHAVING
clause of a query. table_subquery
- Includes a table subquery only in the
WHERE
clause, with operators such asIN
,ANY
,SOME
,EXISTS
, orALL
that act upon multiple values. Table subqueries return one or more rows containing a single value each. nested_table_subquery
- Includes a nested table subquery only in the
FROM
clause, in conjunction with theAS
clause.
Rules at a glance
Subqueries allow you to return one or more values and nest them inside a SELECT
, INSERT
, UPDATE
, or DELETE
statement, or inside another subquery. Subqueries can be used wherever expressions are allowed. Subqueries also can often be replaced with a JOIN
statement. Depending on the DBMS, subqueries may perform less quickly than joins.
Subqueries are always enclosed in parentheses.
SQL supports the following types of subquery:
- Scalar subqueries
- Subqueries that retrieve a single value. These are the most widely supported type of subquery among the various database platforms.
- Vector subqueries
- Subqueries that retrieve a single row which has more than one column.
- Table subqueries
- Subqueries that retrieve more than one value or row of values.
Scalar and vector subqueries can, on some platforms, appear as part of the expression in a SELECT
list of items, a WHERE
clause, or a HAVING
clause. Nested table subqueries tend to appear in the FROM
clauses of SELECT
statements.
A correlated subquery is a subquery that is dependent upon a value in an outer query. Consequently, the inner query is executed once for every record retrieved in the outer query. Since subqueries can be nested many layers deep, a correlated subquery may reference any level in the main query higher than its own level.
Different rules govern the behavior of a subquery, depending on the clause in which it appears. The level of support amongst the database platforms also varies: some platforms support subqueries in all clauses mentioned earlier (SELECT
, FROM
, WHERE
, and HAVING
), while others support subqueries in only one or two of the clauses.
Subqueries are usually associated with the SELECT
statement. Since subqueries may appear in the WHERE
clause, they can be used in any SQL statement that supports a WHERE
clause, including SELECT
, INSERT ... SELECT
, DELETE
, and UPDATE
statements.
Certain operators in a WHERE
clause, such as =
, <
, >
, >=
, <=
, and <>
(or !=
), expect only one value. If a subquery returns more than one value but the operator expects a single value, the entire query will fail. Scalar subqueries should be used in these cases because they can return only a single value. On the other hand, table subqueries may return multiple values, but they are usable only with multivalue expressions like [NOT] IN
, ANY
, ALL
, SOME
, or [NOT] EXISTS
.
Table subqueries may appear in the FROM
clause and should be aliased by the AS
clause. The result set returned by a table subquery, sometimes called a derived table, offers similar functionality to a view (see “CREATE/ALTER VIEW Statement” for more on views). Every column returned in the derived table need not be used in the query, though they can all be acted upon by the outer query.
Correlated subqueries typically appear as a component of a WHERE
or HAVING
clause in the outer query (and, less commonly, in the SELECT
item list) and are correlated through the WHERE
clause of the inner query (that is, the subquery). Correlated subqueries can also be used as table subqueries, though this is less common. Be sure to include in such a subquery a WHERE
clause that evaluates based on a correlating value from the outer query; the example for a correlated query in the earlier SQL standard syntax diagram illustrates this requirement.
It is also important to specify a table alias, called a correlation name, using the AS
clause or other alias shortcut for every table referenced in a correlated query, in both the outer and inner queries. Correlation names avoid ambiguity and help the DBMS quickly resolve the tables involved in the query.
All SQL standard–compliant subqueries comply with the following short list of rules:
A subquery cannot include an
ORDER BY
clause.A subquery cannot be enclosed in an aggregate function. For example, the following query is invalid:
SELECT foo FROM table1 WHERE sales >= AVG(SELECT column1 FROM sales_table ...)
You can get around this limitation by performing the aggregation in the subquery rather than in the outer query.
Programming tips and gotchas
For most vendor platforms, subqueries should not reference large object data types (e.g., CLOB
or BLOB
on Oracle and IMAGE
or TEXT
on SQL Server) or array data types (such as TABLE
or CURSOR
on SQL Server).
The platforms all support subqueries, but not every vendor supports every type of subquery. Table 4-4 summarizes vendor support at the time of writing.
Platform | MySQL | Oracle | PostgreSQL | SQL Server |
---|---|---|---|---|
Scalar subquery in SELECT item list |
✓ | ✓ | ✓ | ✓ |
Scalar subquery in WHERE/HAVING clause |
✓ | ✓ | ✓ | ✓ |
Vector subquery in WHERE/HAVING clause |
✓ | ✓ | ✓ | ✓ |
Table subquery in FROM clause |
✓ | ✓ | ✓ | ✓ |
Correlated subquery in WHERE/HAVING clause |
✓ | ✓ | ✓ | ✓ |
Aside from SELECT
statements, subqueries may also be used in INSERT
, UPDATE
, and DELETE
statements that include a WHERE
clause. Subqueries are often used for the following purposes:
To identify the rows inserted into the target table using an
INSERT ... SELECT
statement, aCREATE TABLE ... SELECT
statement, or aSELECT ... INTO
statementTo identify the rows of a view or materialized view in a
CREATE VIEW
statementTo identify value(s) assigned to existing rows using an
UPDATE
statementTo identify values for conditions in the
WHERE
andHAVING
clauses ofSELECT
,UPDATE
, andDELETE
statementsTo build a view of a table(s) on the fly (i.e., nested table subqueries)
Examples
This section shows subquery examples that are equally valid on MySQL, Oracle, PostgreSQL, and SQL Server.
A simple scalar subquery is shown in the SELECT
item list of the following query:
SELECT job, (SELECT AVG(salary) FROM employee) AS "Avg Sal" FROM employee
Table subqueries are functionally equivalent to querying a view. In the following, we query the education level and salary in a table subquery, and then perform aggregations on the values in the derived table in the outer query:
SELECT AVG(edlevel), AVG(salary) FROM (SELECT edlevel, salary FROM employee) AS emprand GROUP BY edlevel
Remember that this query may fail, depending on the platform, without the AS
clause to associate a name with the derived table.
The following query shows a standard table subquery in the WHERE
clause expression. In this case, we want all project numbers for employees in the department 'A00'
:
SELECT projno FROM emp_act WHERE empno IN (SELECT empno FROM employee WHERE workdept ='A00')
The subquery is executed only once for the outer query.
In the next example, we want to know the names of employees and their level of seniority. We get this result set through a correlated subquery:
SELECT firstname, lastname, (SELECT COUNT(*) FROM employee, senior WHERE employee.hiredate > senior.hiredate) as senioritype FROM employee
Unlike the previous subquery, this subquery is executed one time for every row retrieved by the outer query. In a query like this, the total processing time could be very long, since the inner query may potentially execute many times for a single result set.
Correlated subqueries depend on values retrieved by the outer query to complete the processing of the inner query. They are tricky to master, but they offer unique programmatic capabilities. The following example returns information about orders where the quantity sold in each order is less than the average quantity in other sales for that title:
SELECT s1.ord_num, s1.title_id, s1.qty FROM sales AS s1 WHERE s1.qty < (SELECT AVG(s2.qty) FROM sales AS s2 WHERE s2.title_id = s1.title_id)
For this example, you can accomplish the same functionality using a self-join. However, there are situations in which a correlated subquery may be the only easy way to do what you need.
The next example shows how a correlated subquery might be used to update values in a table:
UPDATE course SET ends = (SELECT min(c.begins) FROM course AS c WHERE c.begins BETWEEN course.begins AND course.ends) WHERE EXISTS (SELECT * FROM course AS c WHERE c.begins BETWEEN course.begins AND course.ends)
Similarly, you can use a subquery to determine which rows to delete. This example uses a correlated subquery to delete rows from one table based on related rows in another table:
DELETE FROM course WHERE EXISTS (SELECT * FROM course AS c WHERE course.id > c.id AND (course.begins BETWEEN c.begins AND c.ends OR course.ends BETWEEN c.begins AND c.ends))
MySQL
MySQL supports subqueries in the FROM
clause and the WHERE
clause. It supports scalar subqueries in the SELECT
item list.
Oracle
Oracle supports SQL-standard subqueries, though it uses a different nomenclature. In Oracle, a table subquery that appears in the FROM
clause is called an inline view. That makes sense because table subqueries are basically views built on the fly. Oracle calls a subquery that appears in the WHERE
clause or the HAVING
clause of a query a nested subquery. It allows correlated subqueries in the SELECT
item list and in the WHERE
and HAVING
clauses.
PostgreSQL
PostgreSQL supports SQL-standard subqueries in the FROM
, WHERE
, and HAVING
clauses and also allows them to have ORDER BY
clauses. However, subqueries appearing in a HAVING
clause cannot include ORDER BY
, FOR UPDATE
, or LIMIT
clauses. It supports scalar and vector subqueries in the SELECT
item list.
SQL Server
SQL Server supports SQL-standard subqueries. Scalar subqueries can be used almost anywhere a standard expression is allowed. Subqueries in SQL Server cannot include the COMPUTE
or FOR BROWSE
clauses. They can include the ORDER BY
clause if the TOP
clause is also used.
UNION Set Operator
The UNION
set operator combines the result sets of two or more queries, showing all the rows returned by each of the queries as a single result set.
UNION is in a class of keywords known as set operators. Other set operators include INTERSECT and EXCEPT. All set operators are used to simultaneously manipulate the result sets of two or more queries; hence the term “set operators.”
Platform | Command |
---|---|
MySQL | Supported |
Oracle | Supported, with limitations |
PostgreSQL | Supported, with limitations |
SQL Server | Supported, with limitations |
Keywords
UNION
- Determines which result sets will be combined into a single result set. Duplicate rows are, by default, excluded.
ALL | DISTINCT
- Combines duplicate rows from all result sets (
ALL
) or eliminates duplicate rows from the final result set (DISTINCT
). Columns containing a NULL value are considered duplicates. If neitherALL
norDISTINCT
is used,DISTINCT
behavior is the default.
Rules at a glance
There is only one significant rule to remember when using UNION
: the order, number, and data types of the columns should be the same in all queries.
The data types do not have to be identical, but they should be compatible. For example, CHAR
and VARCHAR
are compatible data types. By default, the result set will default to the largest of two (or more) compatible data types, so a query that unions three CHAR
columns—CHAR(5)
, CHAR(10)
, and CHAR(12)
—will display the results in the CHAR(12)
format with extra space padded onto the smaller column results.
Programming tips and gotchas
Even though the SQL standard calls for INTERSECT
to take precedence over other set operators in a single statement, many platforms evaluate all set operators with equal precedence. You can explicitly control the precedence of set operators using parentheses. Otherwise, the DBMS is likely to evaluate them in order from the leftmost to the rightmost expression.
Depending on the platform, specifying DISTINCT
can incur a significant performance cost, since it often involves a second pass through the results to winnow out duplicate records. ALL
can be specified in any instance where no duplicate records are expected (or where duplicate records are OK) for faster results.
According to the SQL standard, only one ORDER BY
clause is allowed in the entire query. Include it at the end of the last SELECT
statement. To avoid column and table ambiguity, be sure to alias matching columns in each table with the same respective aliases. However, for column-naming purposes, only the aliases in the first query are used for each column in the SELECT ... UNION
query. For example:
SELECT au_lname AS "lastname", au_fname AS "firstname" FROM authors UNION SELECT emp_lname AS "lastname", emp_fname AS "firstname" FROM employees ORDER BY lastname, firstname
Also be aware that even if the queries in your UNION
have compatible data-typed columns, there may be some variation in behavior across the DBMS platforms, especially with regard to the length of the columns. For example, if the au_lname column in the first query is markedly longer than the emp_lname column in the second query, different platforms may apply different rules as to which length is used. In general, though, the platforms will choose the longer (and less restrictive) column size for use in the result set.
Each DBMS may apply its own rules as to which column name is used if the columns across the tables have different names. In general, the column names of the first query are used.
MySQL
MySQL fully supports the SQL standard syntax.
Oracle
Oracle supports the UNION
and UNION ALL
set operators using the basic SQL standard syntax. UNION DISTINCT
is not supported, but UNION
is the functional equivalent.
For example, you could find out all unique store IDs without duplicates using this query:
SELECT stor_id FROM stores UNION SELECT stor_id FROM sales;
Oracle does not support UNION [ALL]
on the following types of queries:
Queries containing columns with
LONG
,BLOB
,CLOB
,BFILE
, orVARRAY
data typesQueries containing a
FOR UPDATE
clause or aTABLE
collection expression
If the first query in the set operation contains any expressions in the SELECT
item list, include the AS
keyword to associate an alias with the column resulting from the expression. Also, only the last query in the set operation may contain an ORDER BY
clause.
PostgreSQL
PostgreSQL supports the UNION
and UNION ALL
set operators using the basic SQL standard syntax, but not on queries with a FOR UPDATE
clause UNION DISTINCT
is not supported, but UNION
is the functional equivalent.
The first query in the set operation may not contain an ORDER BY
clause or a LIMIT
clause. Subsequent queries in the UNION [ALL]
set operation may contain these clauses, but such queries must be enclosed in parentheses. Otherwise, the rightmost occurrence of ORDER BY
or LIMIT
will be assumed to apply to the entire set operation.
For example, we could find all authors and all employees whose last names start with “P” with the following query:
SELECT a.au_lname FROM authors AS a WHERE a.au_lname LIKE 'P%' UNION SELECT e.lname FROM employee AS e WHERE e.lname LIKE 'W%';
SQL Server
SQL Server supports the UNION
and UNION ALL
set operators using the basic SQL standard syntax. UNION DISTINCT
is not supported, but UNION
is the functional equivalent.
You can use SELECT ... INTO
with UNION
or UNION ALL
, but INTO
may appear only in the first query of the union. Special keywords, such as SELECT TOP
and GROUP BY ... WITH CUBE
, are usable with all queries in a union, but if you use them in one query you must use them with all of the queries. If you use SELECT TOP
or GROUP BY ... WITH CUBE
in only one query in a union, the operation will fail.
Each query in a union must contain the same number of columns. The data types of the columns do not have to be identical, but they must implicitly convert. For example, mixing VARCHAR
and CHAR
columns is acceptable. SQL Server uses the larger of the two columns when evaluating the size of the columns returned in the result set. Thus, if a SELECT ... UNION
statement has a CHAR(5)
column and a CHAR(10)
column, it will display the data of both columns as a CHAR(10)
column. Numeric columns are converted to and displayed as the most precise data type in the union.
For example, the following query unions the results of two independent queries that use GROUP BY ... WITH CUBE
:
SELECT ta.au_id, COUNT(ta.au_id) FROM pubs..titleauthor AS ta JOIN pubs..authors AS a ON a.au_id = ta.au_id WHERE ta.au_id >= '722-51-5454' GROUP BY ta.au_id WITH CUBE UNION SELECT ta.au_id, COUNT(ta.au_id) FROM pubs..titleauthor AS ta JOIN pubs..authors AS a ON a.au_id = ta.au_id WHERE ta.au_id < '722-51-5454' GROUP BY ta.au_id WITH CUBE
VALUES Clause
The VALUES
multi-row constructor is a constructor often found in INSERT
statements, but it can also be used in FROM
statements or anywhere you can have a table expression to create an inline table. When used in a FROM
statement, the column names and table name must be aliased.
Platform | Command |
---|---|
MySQL/MariaDB | Supported, with limitations |
Oracle | Not supported |
PostgreSQL | Supported |
SQL Server | Supported, with limitations |
SQL standard syntax
( VALUES [ROW](<row1 columns>), [ROW](<row2 columns>), ... [ROW](<rown columns>) ) [AS <table_alias>(column1, column2[,... columnn])]
Keywords
(VALUES [ROW](<row1 columns>), [ROW](<row2 columns>), ... [ROW](<rown columns>) )
- Defines a set of rows. Each row is enclosed in parentheses, and column values are separated by commas. Each row must have the same number of columns.
ROW
- Optional keyword to denote the beginning of a row. Some databases do not support this keyword and others require it.
AS <table_alias>(column1, column2[, ... columnn])
- Specifies names for the table and columns. When not specified, the default names for columns are column1, column2, … column.
Rules at a glance
VALUES
clauses can stand alone or be included in SELECT
statements, JOIN
clauses, IN
clauses, NOT IN
clauses, DELETE
statements, INSERT ... SELECT
statements, UPDATE
statements, and any statement that might have a query or subquery (such as DECLARE
, CREATE TABLE
, CREATE VIEW
, and so forth). The following example defines a virtual table consisting of two rows with two named columns:
SELECT * FROM (VALUES ('ABC1', 'Title 1'), ('DCF1', 'Title 2') ) AS t(title_id, title);
VALUES
can also be used in a common table expression (WITH
clause) as follows:
WITH t(title_id, title) AS ( VALUES ('ABC1', 'Title 1'), ('DCF1', 'Title 2') ) SELECT * FROM t;
And it can be used standalone without aliasing, as follows:
VALUES ('ABC1', 'Title 1'), ('DCF1', 'Title 2');
The output of the preceding queries is:
ABC1 | Title 1 DCF1 | Title 2
MySQL and MariaDB
MySQL 8 and later support the VALUES
multi-row constructor, which can be used standalone or within a SELECT
, INSERT
, or UPDATE
statement. When used as table output the ROW
keyword is not optional for MySQL. For example:
SELECT * FROM (VALUES ROW('ABC1', 'Title 1'), ROW('DCF1', 'Title 2') ) AS t(title_id, title);
When used in an INSERT
statement, the ROW
keyword can be left out.
MariaDB also supports the VALUES
multi-row constructor, which can be used standalone, in a WITH
clause, or within an INSERT
statement. The ROW
keyword is not supported. MariaDB allows aliasing in a WITH
clause, but not when VALUES
is used standalone.
Oracle
Oracle does not support the VALUES
multi-row constructor.
PostgreSQL
PostgreSQL fully supports the VALUES
constructor, except for the ROW
keyword.
SQL Server
SQL Server supports the multi-row VALUES
constructor and allows its use in SELECT
... FROM
and INSERT
statements. It does not support its use in WITH
or IN
clauses or standalone, and it does not support the optional ROW
keyword. When used in the FROM
clause, renaming of columns is required.
WHERE Clause
The WHERE
clause sets the search criteria for an operation such as SELECT
, UPDATE
, or DELETE
. Any records in the target table(s) that do not meet the search criteria are excluded from the operation. The search conditions may include many variations, such as calculations, Boolean operators, and SQL predicates (for example, LIKE
or BETWEEN
). All the platforms support the SQL standard syntax.
Platform | Command |
---|---|
MySQL | Supported |
Oracle | Supported |
PostgreSQL | Supported |
SQL Server | Supported |
SQL standard syntax
{ WHERE search_criteria | WHERE CURRENT OF cursor_name }
Keywords
WHERE search_criteria
- Defines search criteria for the statement to ensure that only the target rows are affected.
WHERE CURRENT OF cursor_name
- Restricts the operation of the statement to the current row of a defined and opened cursor called
cursor_name
.
Rules at a glance
WHERE
clauses are found in SELECT
statements, DELETE
statements, INSERT ... SELECT
statements, UPDATE
statements, and any statement that might have a query or subquery (such as DECLARE
, CREATE TABLE
, CREATE VIEW
, and so forth).
The search conditions, all of which are described in their own entries elsewhere in this book, can include:
- All records (
=ALL
,>ALL
,<=
ALL
,SOME/ANY
) For example, to see publishers who live in the same city as their authors:
SELECT pub_name FROM publishers WHERE city = SOME (SELECT city FROM authors);
- Combinations(
AND
,OR
, andNOT
) and evaluation hierarchy For example, to see all authors with sales in quantities greater than or equal to 75 units, or coauthors with a royalty of greater than or equal to 60:
SELECT a.au_id FROM authors AS a JOIN titleauthor AS ta ON a.au_id = ta.au_id WHERE ta.title_id IN (SELECT title_id FROM sales WHERE qty >= 75) OR (a.au_id IN (SELECT au_id FROM titleauthor WHERE royaltyper >= 60) AND a.au_id IN (SELECT au_id FROM titleauthor WHERE au_ord = 2));
- Comparison operators (such as
=, < >, <, >, <=,
and>=
) For example, to see the last and first names of authors who don’t have a contract (i.e., authors with contract value of 0):
SELECT au_lname, au_fname FROM authors WHERE contract = 0;
- Lists (
IN
andNOT IN
) For example, to see all authors who do not yet have a title in the titleauthor table:
SELECT au_fname, au_lname FROM authors WHERE au_id NOT IN (SELECT au_id FROM titleauthor);
- NULL comparisons (
IS NULL
andIS NOT NULL
) For example, to see titles that have NULL year-to-date sales:
SELECT title_id, SUBSTRING(title, 1, 25) AS title FROM titles WHERE ytd_sales IS NULL;
Be sure not to specify
= NULL
in a query.NULL
is unknown and can never be equal to anything. Using= NULL
is not the same as specifying theIS NULL
operator.- Pattern matches (
LIKE
andNOT LIKE
) For example, to see authors whose last names start with a “C”:
SELECT au_id FROM authors WHERE au_lname LIKE 'C%';
- Range operations (
BETWEEN
andNOT BETWEEN
) For example, to see authors with last names that fall alphabetically between “Smith” and “White”:
SELECT au_lname, au_fname FROM authors WHERE au_lname BETWEEN 'smith' AND 'white';
Programming tips and gotchas
The WHERE
clause may require special handling when dealing with certain data types, such as LOB
s, or certain character sets, including Unicode.
Parentheses are used to control evaluation hierarchy within a WHERE
clause. Encapsulating a clause within parentheses tells the DBMS to evaluate that clause before others. Parentheses can be nested to create a hierarchy of evaluations. The innermost parenthetical clause will be evaluated first. You should watch parentheses very carefully, for two reasons:
You must always have an equal number of opening and closing parentheses. Any imbalance in the number of opening and closing parentheses will cause an error.
You should be careful where you place parentheses, since misplacing a parenthesis can dramatically change the result set of your query.
For example, consider again the following query, which returns six rows in the pubs database on the SQL Server platform:
SELECT DISTINCT a.au_id FROM authors AS a JOIN titleauthor AS ta ON a.au_id = ta.au_id WHERE ta.title_id IN (SELECT title_id FROM sales WHERE qty >= 75) OR (a.au_id IN (SELECT au_id FROM titleauthor WHERE royaltyper >= 60) AND a.au_id IN (SELECT au_id FROM titleauthor WHERE au_ord = 2))
The output from this query is as follows:
au_id ----------- 213-46-8915 724-80-9391 899-46-2035 998-72-3567
Changing just one set of parentheses produces different results:
SELECT DISTINCT a.au_id FROM authors AS a JOIN titleauthor AS ta ON a.au_id = ta.au_id WHERE (ta.title_id IN (SELECT title_id FROM sales WHERE qty >= 75) OR a.au_id IN (SELECT au_id FROM titleauthor WHERE royaltyper >= 60)) AND a.au_id IN (SELECT au_id FROM titleauthor WHERE au_ord = 2)
This time, the output will look like this:
au_id ----------- 213-46-8915 724-80-9391 899-46-2035
WITH Clause
The WITH
clause defines a short-term view that is instantiated for the duration of a parent query. It may or may not have an alias associated with it in order to ease referencing from the parent query (or subqueries) later. The temporary named result set created by a WITH
clause is called a common table expression (CTE), and this is also known as subquery factoring. CTEs are not stored in the database schema like standard views, but they behave in essentially the same way. In fact, when CTEs were added to the SQL:1999 standard, they were simply referred to as statement-scoped views. All the platforms support the SQL standard syntax for the WITH
clause, some with extensions.
Platform | Command |
---|---|
MySQL | Supported |
Oracle | Supported, with variations |
PostgreSQL | Supported, with variations |
SQL Server | Supported, with variations |
SQL standard syntax
WITH [RECURSIVE] with_query[, ...] SELECT...
Keywords
with_query
- Defines a query with a name that takes the form
some_name AS (query_definition) or some_name(col1,col2,col3 ...) AS (query_definition)
. There can be one or morewith_query
s before the final query. Eachwith_query
must be separated by a comma. RECURSIVE
- Denotes that the CTE batch may contain queries that call themselves, generally using tail recursion. This keyword is syntactic sugar providing a hint to the query planner that a
WITH
clause has recursive elements.
Rules at a glance
WITH
clauses are used to make complex SQL queries easier to read and debug by dividing them into subsets. They are also used to compartmentalize a subquery that is reused in multiple parts of a parent query (each CTE is associated with an anchoring parent query, which may in turn have multiple CTEs), to write recursive queries, and to improve performance.
Some databases allow for writable CTEs, which have elements that update data and return the changed data.
Programming tips and gotchas
The WITH
clause can have implications for query performance. Depending on the database vendor and version, a query written with WITH
may perform better or worse than a similar query utilizing nested SELECT
statements. When in doubt, experiment with writing your query using WITH
and without WITH
.
Here’s an example of a non-recursive CTE:
WITH au AS (SELECT au_state AS state, COUNT(*) AS au_count FROM authors GROUP BY au_state ), pu AS (SELECT pub_state AS state, COUNT(*) AS pub_count FROM publishers GROUP BY pub_state ) SELECT au.state, au.au_count, pu.pub_count FROM au INNER JOIN pu ON au.state = pu.state;
A completely SQL standard–compliant recursive CTE that counts from 1 to 20 might look like this:
WITH RECURSIVE numbers AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM numbers WHERE n+1 <= 20 ) SELECT * FROM numbers;
There is no physical numbers table; numbers is a CTE expression that builds on itself. If you did have a numbers table in your database, however, the CTE version would still be used; this is because in the case of name clashes CTEs take precedence.
Recursive CTEs cannot include DISTINCT
. Some databases, like PostgreSQL, will error if you prefix a with_query
with RECURSIVE
and it has no recursive elements. SQL Server, although it supports recursive queries, does not allow the RECURSIVE
keyword.
MySQL and MariaDB
MySQL fully supports the WITH
clause as of version 8.0. MariaDB introduced support in version 10.2.
Oracle
Oracle supports the SQL standard WITH
as well as a MATERIALIZE
Hint that forces a CTE to be materialized for better performance. To force materialization you would write something like:
WITH au AS (SELECT /*+ MATERIALIZE */ au_state, COUNT(*) AS au_count FROM authors GROUP BY au_state ), pu AS (SELECT /*+ MATERIALIZE */ pub_state, COUNT(*) AS pub_count FROM publishers GROUP BY pub_state ) SELECT au.state, au.au_count, pu.pub_count FROM au INNER JOIN pu ON au.state = pu.state;
PostgreSQL
PostgreSQL fully supports the SQL standard WITH
clause and some extensions to it. PostgreSQL allows CTEs to contain one or more INSERT
/UPDATE
/DELETE
statements if they are followed with a RETURNING
clause. The final query can also be an INSERT
/UPDATE
/DELETE
but need not have a RETURNING
clause. This is useful, for example, to move deleted records to another table, as follows:
WITH del AS (DELETE FROM authors WHERE au_state = 'CA' RETURNING *) INSERT INTO deleted_authors(au_id) SELECT del.au_id FROM del;
PostgreSQL also supports a MATERIALIZED
/NOT MATERIALIZED
extension to the standard. To force materialization of a CTE you would prefix it with MATERIALIZED
, and if you wanted to discourage materialization you could similarly prefix it with NOT MATERIALIZED
, as follows:
WITH au AS MATERIALIZED (SELECT au_state AS state, COUNT(*) AS au_count FROM authors GROUP BY au_state ), pu AS NOT MATERIALIZED (SELECT pub_state AS state, COUNT(*) AS pub_count FROM publishers GROUP BY pub_state ) SELECT au.state, au.au_count, pu.pub_count FROM au INNER JOIN pu ON au.state = pu.state;
WITH RECURSIVE
is also supported in the definition of views, which allows for writing recursive views. PostgreSQL will complain if you use WITH RECURSIVE
and have no recursive elements in your view.
SQL Server
SQL Server supports WITH
but does not allow the RECURSIVE
keyword; it internally determines whether a WITH
clause is recursive or not. SQL Server does not allow ORDER BY
clauses in CTEs unless they are used in conjunction with TOP
. It also does not allow INTO
or OPTION
clauses with query hints. Unlike other statements where the ;
is optional, CTEs must start with a ;
if they are part of a set of query statements.
The earlier recursive CTE would be written without the RECURSIVE
keyword as follows:
WITH numbers AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM numbers WHERE n+1 <= 20 ) SELECT * FROM numbers;
See also
ALTER/CREATE VIEW
RETURNING
in Chapter 5SELECT
WITH ORDINALITY Clause
The WITH ORDINALITY
clause adds an incrementing integer column to the result of a set-returning function. The name of the column is ordinality unless it is renamed. It is generally used in the FROM
or JOIN
clause and is commonly used in conjunction with the UNNEST
function to expand and number array data.
Platform | Command |
---|---|
MySQL | Not supported |
Oracle | Not supported |
PostgreSQL | Supported |
SQL Server | Not supported |
SQL standard syntax
set_returning_function_call WITH ORDINALITY [AS ..]
Keywords
set_returning_function_call
- Defines a function call, for example
UNNEST(somevalue)
.
Rules at a glance
WITH ORDINALITY
is used to number the results of a set-returning function. For regular SELECT
queries, you would use ROW_NUMBER() OVER()
instead for numbering.
Programming tips and gotchas
Here is a PostgreSQL example of using WITH ORDINALITY
to number an array of values:
SELECT * FROM unnest(ARRAY['PC8888','BU1032', 'PS7777','PS3333','BU1111'] ) WITH ORDINALITY AS title_id;
The output of this query is:
title_id | ordinality ---------------------- PC8888 | 1 BU1032 | 2 PS7777 | 3 PS3333 | 4 BU1111 | 5
You can also rename the output of the columns using aliases, as follows:
SELECT * FROM unnest(ARRAY['PC8888','BU1032', 'PS7777','PS3333','BU1111'] ) WITH ORDINALITY AS title_id(id, ord);
WITH ORDINALITY
is often used in a JOIN
clause or a LATERAL JOIN
clause.
See also
JOIN
SELECT
Get SQL in a Nutshell, 4th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.