Chapter 4. Reading Your Data

In this chapter, we will explore the key SQL statements and clauses needed to extract information from your database. You will learn about the fundamental SELECT statement and the various subclauses you can use within it to select data and aggregate data into subtotals. By the end of this chapter, you will understand the concepts of JOIN operations, SET operations, aggregation, window operations, common table expressions, and subqueries. You will also know how to combine these effectively to query any data in the core database platforms.

How to Use This Chapter

When researching a command in this chapter:

Read “SQL Platform Support”.
Check Table 4-1.
Look up the specific SQL statement, check the syntax, and read the “Keywords,” “Rules at a glance,” and “Programming tips and gotchas” sections and read the section on the standard for SQL syntax and description. Do this even if you are looking for a specific platform implementation.
Finally, read the platform-specific implementation information.

You will note that the entry for a given platform implementation does not duplicate information on any clauses that do not differ from the standard. So, it is possible that you will need to flip between the descriptions for a vendor variation and the SQL standard to cover all possible details of that command.

In our discussions of MySQL, we will also include MariaDB, a fork of MySQL. For the most part, MySQL and MariaDB provide fully code-compatible syntax. In these cases we will refer to them collectively as MySQL. We will explicitly mention MariaDB only in situations where it deviates from MySQL in an important way.

SQL Platform Support

Table 4-1 provides a listing of the SQL statements covered in this chapter, the platforms that support them, and the degree to which they support them. The following list offers useful tips for reading Table 4-1, as well as an explanation of what each abbreviation stands for:

The first column contains the SQL commands, in alphabetical order.
The SQL statement class for each command is indicated in the second column.
The subsequent columns list the level of support for each vendor:

Supported (S)

The platform supports the SQL standard for the particular command.

Supported, with variations (SWV)

The platform supports the SQL standard for the particular command, using vendor-specific code or syntax.

Supported, with limitations (SWL)

The platform supports some but not all of the functions specified by the SQL standard for the particular command.

Not supported (NS)

The platform does not support the particular command according to the SQL standard.

The sections that follow the table describe the commands in detail.

Remember that even if a specific SQL command is listed in the table as “Not supported,” the platform usually has alternative coding or syntax to enact the same command or function. Therefore, be sure to read the discussion and examples for each command later in this chapter.

Table 4-1. Alphabetical quick SQL command reference
SQL command	SQL class	MySQL/MariaDB	Oracle	PostgreSQL	SQL Server
`ALL/ANY/SOME`	SQL-data	S	SWV	SWV	SWV
`BETWEEN`	SQL-data	S	S	S	S
`EXCEPT`	SQL-data	NS/SWL	SWL	SWL	SWL
`EXISTS`	SQL-data	S	S	S	S
`FILTER`	SQL-data	NS	NS	S	NS
`GROUP BY`	SQL-data	SWV	SWV	SWV	SWV
`IN`	SQL-data	S	S	S	S
`INTERSECT`	SQL-data	NS/SWL	SWL	SWL	SWL
`IS`	SQL-data	S	S	S	S
`JOIN`	SQL-data	SWV	SWV	SWV	SWL
`LIKE`	SQL-data	S	S	SWV	SWV
`ORDER BY`	SQL-data	SWL	SWV	SWV	SWL
`OVER`	SQL-data	SWL	SWV	SWV	SWL
`SELECT`	SQL-data	SWV	SWV	SWV	SWV
`SUBQUERY`	SQL-data	SWL	S	S	S
`UNION`	SQL-data	S	SWL	SWL	SWL
`VALUES`	SQL-data	SWL	NS	S	SWL
`WHERE`	SQL-data	S	S	S	S
`WITH`	SQL-data	S	SWV	SWV	SWV
`WITH ORDINALITY`	SQL-data	NS	NS	S	NS

SQL Command Reference

ALL/ANY/SOME Operators

The ALL operator performs a Boolean test of a subquery for the existence of a value in all rows. The ANY operator and its synonym SOME perform a Boolean test of a subquery for the existence of a value in any of the rows tested.

You will also find the ALL keyword used in conjunction with UNION. This will be covered in “UNION Set Operator”.

Platform	Command
MySQL	Supported
Oracle	Supported with variations
PostgreSQL	Supported with variations
SQL Server	Supported with variations

SQL standard syntax

SELECT ...
WHERE expression comparison {ALL | ANY | SOME} ( subquery )

Keywords

WHERE expression: Tests a scalar expression (such as a column) against every value in the subquery for ALL and against every value until a match is found for ANY and SOME. All rows must match the expression to return a Boolean TRUE value for the ALL operator, while one or more rows must match the expression to return a Boolean TRUE value for the ANY and SOME operators.
comparison: Compares the expression to the subquery. The comparison must be a standard comparison operator like =, <>, !=, >, >=, <, or <=.

Rules at a glance

The ALL operator returns a Boolean TRUE value when one of two things happens: either the subquery returns an empty set (i.e., no records match), or every record in the set meets the comparison. ALL returns FALSE when any record in the set does not match the value comparison. The ANY and SOME operators return a Boolean TRUE when at least one record in the subquery matches the comparison operation, and FALSE when no record matches the comparison operation (or when a subquery returns an empty result set). If even one return value of the subquery is NULL, the operation evaluates as NULL, not as TRUE.

Do not include special clauses like ORDER BY, GROUP BY, CUBE, ROLLUP, WITH, etc. in your subquery.

For example, this query returns authors who currently have no titles:

SELECT au_id
FROM authors
WHERE au_id <> ALL(SELECT titleauthor.au_id FROM titleauthor);

You can use ANY or SOME to perform filtering checks of different kinds. For example, the following query will retrieve from the employee table any records that exist in the jobs table where the employee has the same job_lvl as the minimum required level of a job:

SELECT *
FROM employee
WHERE job_lvl = ANY(SELECT min_lvl FROM jobs);

Programming tips and gotchas

The ALL and ANY/SOME operators are somewhat difficult to get used to. Most developers find it easier to use similar functions like IN and EXISTS.

EXISTS is semantically equivalent to the ANY/SOME construct

MySQL

MySQL supports the SQL standard versions of ALL and ANY/SOME.

Oracle

Oracle supports the SQL standard versions of ALL and ANY/SOME with one minor variation, which is that you can supply a list of values instead of a subquery. For example, to find all employees who have a job_lvl value equal to 9 or 14:

SELECT * FROM employee
WHERE job_lvl = ALL(9, 14);

PostgreSQL

PostgreSQL supports the SQL standard versions of ALL and ANY/SOME. In addition, it supports their use with arrays. For example, to find all employees who have a job_lvl value equal to 9 or 14:

SELECT * FROM employee
WHERE job_lvl = ANY(ARRAY[9, 14]);

PostgreSQL also supports the use of these terms in conjunction with LIKE and the case-insensitive ILIKE. A common use is to provide a shorthand for multiple LIKE/ILIKE clauses. For example, this:

SELECT * FROM employee
WHERE name_last ILIKE ANY(ARRAY['smith', 'paris%', '%chin%']);

is equivalent to:

SELECT * FROM employee
WHERE name_last ILIKE 'smith' 
  OR name_last ILIKE 'paris%'
  OR name_last ILIKE '%chin%';

SQL Server

SQL Server supports the SQL standard versions of ALL and ANY/SOME. It also supports some additional comparison operators: not greater than (!>) and not less than (!<).

BETWEEN Operator

The BETWEEN operator performs a Boolean test of a value against a range of values. It returns TRUE when the value is included in the range and FALSE when the value falls outside of the range. The result is NULL (unknown) if any of the range values are NULL.

Platform	Command
MySQL	Supported
Oracle	Supported
PostgreSQL	Supported
SQL Server	Supported

SQL standard syntax

SELECT ...
WHERE expression [NOT] BETWEEN lower_range AND upper_range

Keywords

WHERE expression: Compares a scalar expression, such as a column, to the range of values bounded by upper_range and lower_range.
[NOT] BETWEEN lower_range AND upper_range: Compares the expression to the lower_range and upper_range. The comparison is inclusive, meaning that it is equivalent to saying “where expression is [not] greater than or equal to lower_range and less than or equal to upper_range.”

Rules at a glance

The BETWEEN operator is used to test an expression against a range of values. It may be used with any data type except BLOB, CLOB, NCLOB, REF, or ARRAY.

For example, this query returns title_ids that have year-to-date sales of between 10,000 and 20,000:

SELECT title_id
FROM titles
WHERE ytd_sales BETWEEN 10000 AND 20000

BETWEEN is inclusive of the range of values listed, so it includes the values 10,000 and 20,000 in the search. If you want an exclusive search, you must use the greater than (>) and less than (<) symbols:

SELECT title_id
FROM titles
WHERE ytd_sales > 10000
  AND ytd_sales < 20000

The NOT operator allows you to search for values outside of the BETWEEN range. For example, you can find all the title_ids that were not published during 2021:

SELECT title_id
FROM titles
WHERE pub_date NOT BETWEEN '01-JAN-2021'
  AND '31-DEC-2021'

Programming tips and gotchas

Some coders are very particular about how the keyword AND is used in WHERE clauses. To prevent a casual reviewer from thinking that the AND used in a BETWEEN operation is a logical AND operator, you might want to use parentheses to encapsulate the entire BETWEEN clause:

SELECT title_id
FROM titles
WHERE (ytd_sales BETWEEN 10000 AND 20000)
  AND pubdate >= '2021-06-12 00:00:00.000'

PostgreSQL also supports @> (the contains operator) and && (the overlaps operator), which serve the same purpose as BETWEEN but for array types, range types, and multi-range types.

EXCEPT Set Operator

The EXCEPT set operator retrieves the result sets of two or more queries, including all the records retrieved by the first query that are not also found in subsequent queries. Whereas JOIN clauses are used to return the rows of two or more queries that are in common, EXCEPT is used to filter out the records that are present in only one of multiple similar tables.

EXCEPT is in a class of keywords called set operators. Other set operators include INTERSECT and UNION. (MINUS is a synonym of the EXCEPT keyword; EXCEPT is the SQL standard.) All set operators are used to simultaneously manipulate the result sets of two or more queries, hence the term “set operators.”

Platform	Command
MySQL	Not supported
MariaDB	Supported, with limitations
Oracle	Supported, with limitations (as `MINUS`)
PostgreSQL	Supported, with limitations
SQL Server	Supported, with limitations

SQL standard syntax

There is technically no limit to the number of queries that you may combine with the EXCEPT operator. The general syntax is:

{SELECT statement1 | VALUES (expr1[, ...])}
EXCEPT [ALL | DISTINCT]
[CORRESPONDING [BY (column1, column2, ...)]]
{SELECT statement2 | VALUES (expr2[, ...])}
EXCEPT [ALL | DISTINCT]
[CORRESPONDING [BY (column1, column2, ...)]]
...

Keywords

VALUES (expr1[, ... ]): Generates a derived result set with explicitly declared values as expr1, expr2, etc.—it’s essentially a SELECT statement result set without the SELECT ... FROM syntax. This is known as a row constructor, since the rows of the result set are manually constructed. According to the SQL standard, multiple handcoded rows in a row constructor must be enclosed in parentheses and separated by commas.
EXCEPT: Determines which rows will be excluded from the final result set.
ALL | DISTINCT: ALL considers duplicate rows from all result sets in the EXCEPT comparison. DISTINCT drops duplicate rows from all result sets prior to the EXCEPT comparison. Any columns containing a NULL value are considered duplicates. (If neither ALL nor DISTINCT is used, the DISTINCT behavior is the default.)
CORRESPONDING: Specifies that only columns with the same name in both queries are returned, even if both queries use the asterisk (*) shortcut.
BY (column1, column2, ... ): Specifies that only the named columns are returned, even if more columns with corresponding names exist in the queries. Must be used with the CORRESPONDING keyword.

Rules at a glance

There is only one significant rule to remember when using EXCEPT: the number and order of the columns should be the same in all queries, and the data types should be of the same category.

The data types do not have to be identical, but they must be compatible. For example, CHAR and VARCHAR are compatible data types. By default, the result set will default to the largest data type size of each column in each ordinal position. For example, a query retrieving rows from VARCHAR(10) and VARCHAR(15) columns will use the VARCHAR(15) data type and size.

Programming tips and gotchas

None of the platforms supports the CORRESPONDING [BY (column1, column2, ... )] clause.

According to the SQL standard, the UNION and EXCEPT set operators evaluate with equal precedence. However, the INTERSECT set operator evaluates before the other set operators. We recommend that you explicitly control the precedence of the set operators using parentheses as a general best practice.

SELECT au_lname AS lastname, au_fname AS firstname
FROM authors
EXCEPT
SELECT emp_lname AS lastname, emp_fname AS firstname
FROM employees
ORDER BY lastname, firstname

While each of the column lists may list columns with correspondingly compatible data types, there may be variation in behavior across the DBMS platforms with regard to the length of the columns. For example, if the au_lname column in the previous example’s first query is markedly longer than the emp_lname column in the second query, the platforms may apply different rules as to which length is used for the final result. In general, though, the platforms will choose the longer (and less restrictive) column size for use in the result set.

Each DBMS may apply its own rules as to which column name is used if the names vary across column lists. In general, the column names of the first query are used.

On platforms that do not support EXCEPT, you can substitute a LEFT JOIN, NOT IN, or NOT EXISTS. The following queries are examples of how you can achieve EXCEPT functionality using NOT EXISTS and NOT IN:

SELECT DISTINCT a.city
FROM authors AS a
WHERE NOT EXISTS
   (SELECT *
    FROM publishers AS p
    WHERE a.city = p.city)

SELECT DISTINCT a.city
FROM authors AS a
WHERE a.city NOT IN
   (SELECT p.city
    FROM pubs.publishers AS p
    WHERE p.city IS NOT NULL)

In general, NOT EXISTS is faster than NOT IN. In addition, there is a subtle issue with NULLs that differentiates the IN and NOT IN operators and the EXISTS and NOT EXISTS set operators. To get around this different handling of NULLs, simply add the IS NOT NULL clause to the WHERE clause, as shown in the preceding example.

The following example illustrates the use of LEFT JOIN:

SELECT DISTINCT a.city
FROM authors AS a 
 LEFT JOIN (SELECT city 
  FROM publishers 
  WHERE city IS NOT NULL) AS p ON a.city = p.city
WHERE p.city IS NULL;

MySQL and MariaDB

MySQL does not support EXCEPT. MariaDB 10.3 and later support the EXCEPT, EXCEPT ALL, and EXCEPT DISTINCT set operators using the SQL standard syntax. For MySQL, you can use the NOT IN or NOT EXISTS operations as alternatives to EXCEPT, as detailed in the previous section.

Oracle

Oracle versions below Oracle 21c do not support the EXCEPT set operator. However, they have an alternative set operator, MINUS, with identical functionality to EXCEPT. Its syntax is as follows:

<SELECT statement1>
MINUS
<SELECT statement2>
MINUS
...

MINUS is the functional equivalent of MINUS DISTINCT; the ALL clause is not implemented.

Oracle does not support MINUS on queries containing:

Columns whose data types are LONG, BLOB, CLOB, BFILE, or VARRAY
A FOR UPDATE clause
TABLE collection expressions

If the first query in a set operation contains any expressions in the SELECTitem list, you must include AS clauses to associate aliases with those expressions. Also, only the last query in the set operation may contain an ORDER BY clause.

For example, you could generate a list of all store IDs that do not have any records in the sales table as follows:

SELECT stor_id FROM stores
MINUS
SELECT stor_id FROM sales

The MINUS command is functionally similar to a NOT IN query. This query retrieves the same results:

SELECT stor_id FROM stores
WHERE stor_id NOT IN
   (SELECT stor_id FROM sales)

PostgreSQL

PostgreSQL supports the EXCEPT and EXCEPT ALL set operators using the basic SQL standard syntax:

<SELECT statement1>
EXCEPT [ALL]
<SELECT statement2>
EXCEPT [ALL]
...

EXCEPT DISTINCT is not supported, but EXCEPT is the functional equivalent. EXCEPT or EXCEPT ALL are not supported on queries with a FOR UPDATE clause.

The first query in the set operation may not contain an ORDER BY clause or a LIMIT clause, although you can define a subquery for the SELECT statement that does include these. Subsequent queries in the EXCEPT or EXCEPT ALL set operation may contain these clauses, but such queries must be enclosed in parentheses. Otherwise, the last occurrence of ORDER BY or LIMIT will be applied to the entire set operation.

PostgreSQL evaluates SELECT statements in a multi-EXCEPT statement from top to bottom, unless you use parentheses to change the evaluation hierarchy of the statements.

Normally, duplicate rows are eliminated from the two result sets, unless you add the ALL keyword. For example, you could find all titles in the authors table that have no records in the sales table using this query:

SELECT title_id
FROM   authors
EXCEPT ALL
SELECT title_id
FROM   sales;

SQL Server

SQL Server supports EXCEPT, but none of its subclauses. For comparison purposes, SQL Server considers NULL values equal when evaluating an EXCEPT result set. If using the SELECT ... INTO statement, only the first query may contain the INTO clause. ORDER BY is only allowed at the end of the statement and is not allowed with each individual query. Conversely, GROUP BY and HAVING clauses can only be used within individual queries and may not be used to affect the final result set. The FOR BROWSE clause may not be used with statements that include EXCEPT.

EXISTS Operator

The EXISTS operator tests a subquery for the existence of rows. All the platforms support the SQL standard syntax.

Platform	Command
MySQL	Supported
Oracle	Supported
PostgreSQL	Supported
SQL Server	Supported

SQL standard syntax

SELECT ...
WHERE [NOT] EXISTS (subquery)

Keywords

WHERE [NOT] EXISTS: Tests the subquery for the existence of one or more rows. If even one row satisfies the subquery clause, it returns a Boolean TRUE value. The optional NOT keyword returns a Boolean TRUE value when the subquery returns no matching rows.
subquery: Retrieves a result set based on a fully formed subquery.

Rules at a glance

The EXISTS operator checks a subquery for the existence of one or more records against the records in the parent query.

For example, if we want to see whether there are any jobs where no employee is filling the position:

SELECT *
FROM jobs
WHERE NOT EXISTS
   (SELECT * FROM employee
    WHERE jobs.job_id = employee.job_id)

This example tests for the absence of records in the subquery using the optional NOT keyword. The next example looks for specific records in the subquery to retrieve the main result set:

SELECT au_lname
FROM authors
WHERE EXISTS
   (SELECT *
    FROM publishers
    WHERE authors.city = publishers.city)

This query returns the last names of authors who live in the same city as their publishers. Note that the asterisk in the subquery is acceptable, since the subquery only needs to return a single record to provide a Boolean TRUE value. Columns are irrelevant in these cases.

Programming tips and gotchas

EXISTS, in many queries, does the same thing as ANY (in fact, it is semantically equivalent to the ANY operator). EXISTS is usually most effective with correlated subqueries.

The EXISTS subquery usually searches for only one of two things. Your first option is to use the asterisk wildcard (e.g., SELECT * FROM ... ) so that you are not retrieving any specific column or value. In this case, the asterisk means “any column.” The second option is to select only a single column in the subquery (e.g., SELECT au_id FROM ... ). The third option is to select a constant, such as (SELECT 1 FROM ... ).

EXISTS can be rewritten using an IN clause. Here is an example written with EXISTS:

SELECT au_id
FROM authors
WHERE EXISTS(SELECT au_id 
  FROM titleauthor AS ta WHERE ta.au_id = authors.au_id)

and its equivalent using IN:

SELECT au_id
FROM authors
WHERE au_id IN(SELECT ta.au_id 
  FROM titleauthor AS ta 
  WHERE ta.au_id = authors.au_id)

FILTER Clause

The FILTER clause is used in conjunction with aggregate functions, except when they are used as window aggregates. PostgreSQL supports the SQL standard syntax for this clause; the other platforms do not support it.

Platform	Command
MySQL	Not supported
Oracle	Not supported
PostgreSQL	Supported
SQL Server	Not supported

SQL standard syntax

The FILTER clause is part of a SELECT statement and qualifies an aggregate function call:

[aggregate_function(input_args) FILTER
   (WHERE search_condition)
input_args := value[,..]

In databases where the FILTER clause is not supported, the functionality can be simulated for aggregates that ignore NULLs with a CASE statement, as follows:

[aggregate_function(input_args)
input_args := CASE WHEN search_condition THEN value 
   ELSE NULL END[,...]

Keywords

aggregate_function: An aggregate function such as AVG, COUNT, COUNT DISTINCT, MAX, MIN, or SUM.
WHERE search_condition: Any condition allowed in a WHERE clause is allowed.

Rules at a glance

The FILTER clause is allowed only in queries that utilize aggregate functions. Here is an example that uses the FILTER clause to count books by price in PostgreSQL:

SELECT SUM(ytd_sales) AS total_sales
   , SUM(ytd_sales) FILTER(WHERE price < '$20.00') 
   AS sales_book_lt_20
FROM titles;

And here is an alternative using CASE:

SELECT SUM(ytd_sales) AS total_sales
   , SUM(CASE WHEN price < '$20.00' THEN ytd_sales 
   ELSE NULL END) AS sales_book_lt_20
FROM titles;

The results are:

total_sales  sales_book_lt_20
-----------  -----------------------
97446        83821

GROUP BY Clause

The GROUP BY clause is used to aggregate or dedupe data. It is often paired with aggregate functions and the HAVING clause.

Platform	Command
MySQL	Supported, with variations
Oracle	Supported, with variations
PostgreSQL	Supported, with variations
SQL Server	Supported, with variations

SQL standard syntax

[GROUP BY group_by_expression
   [HAVING search_condition]]

group_by_expression ::= { (grouping_column[, ...]) | 
   ROLLUP (grouping_column[, ...]) | 
   CUBE (grouping_column[, ...]) | 
   GROUPING SETS (grouping_set_list) | () | 
   grouping_set, grouping_set_list }

Keywords

GROUP BY group_by_expression: Groups result sets into the categories defined in the group_by_expression. Used in queries that utilize aggregate functions such as AVG, COUNT, COUNT DISTINCT, MAX, MIN, and SUM. The group_by_expression of the GROUP BY clause has an elaborate syntax of its own; refer to the following section for examples and more information on ROLLUP, CUBE, and GROUPING SETS.
HAVING search_condition: Adds search conditions on the results of the GROUP BY clause in a manner similar to the WHERE clause. HAVING does not affect the rows used to calculate the aggregates. HAVING clauses may contain subqueries.

Rules at a glance

The GROUP BY clause is needed only in queries that utilize aggregate functions. The HAVING clause is almost always accompanied by a GROUP BY clause, but a GROUP BY clause is often used without a HAVING clause.

The GROUP BY clause

The GROUP BY clause is used to report an aggregated value for one or more rows returned by a SELECT statement based on one or more non-aggregated columns called grouping columns. For example, here is a query that counts up how many people were hired each year during the years 2016 through 2021:

SELECT hire_year, COUNT(emp_id) AS nbr_emps
FROM employee
WHERE status = 'ACTIVE'
  AND hire_year BETWEEN 2016 AND 2021
GROUP BY hire_year;

The results are:

hire_year nbr_emps
--------- --------
2016      27
2017      17
2018      13
2019      19
2020      20
2021      32

Queries using aggregate functions provide many types of summary information. The most common aggregate functions include:

AVG: Returns the average of all non-NULL values in the specified column(s)
AVG DISTINCT: Returns the average of all unique non-NULL values in the specified column(s)
COUNT: Counts the occurrences of all non-NULL values in the specified column(s)
COUNT DISTINCT: Counts the occurrences of all unique non-NULL values in the specified column(s)
COUNT(*): Counts every record in the table
MAX: Returns the highest non-NULL value in the specified column(s)
MIN: Returns the lowest non-NULL value in the specified column(s)
SUM: Totals all non-NULL values in the specified column(s)
SUM DISTINCT: Totals all unique non-NULL values in the specified column(s)

Some queries that use aggregates return a single row and are used to aggregate the whole table. Single-value aggregates are known as scalar aggregates. Scalar aggregates do not need a GROUP BY clause. For example:

-- Query
SELECT AVG(price)
FROM titles

-- Results
14.77

Queries in which all returned columns are aggregates also do not need a GROUP BY clause.

Queries that return both regular column values and aggregate function values are commonly called vector aggregates. Vector aggregates use the GROUP BY clause and return one or many rows.

There are a few rules to follow when using GROUP BY:

Place GROUP BY in the proper clause order—after the WHERE clause and before the ORDER BY clause.
Include all non-aggregate columns in the GROUP BY clause.
Do not use column aliases in the GROUP BY clause (though table aliases are acceptable).

For example, let’s suppose we need to get the total purchase amount of several purchases from an Order_Details table that looks like this:

OrderID      ProductID    UnitPrice            Quantity
-----------  -----------  -------------------  --------
10248        11           14.0000              12
10248        42           9.8000               10
10248        72           34.8000              5
10249        14           18.6000              9
10249        51           42.4000              40
10250        41           7.7000               10
10250        51           42.4000              35
10250        65           16.8000              15
...

We can do this with a query like the following:

SELECT OrderID, SUM(UnitPrice * Quantity) AS Order_Amt
FROM order_details
WHERE orderid IN (10248, 10249, 10250)
GROUP BY OrderID

The results are:

OrderID      Order_Amt
-----------  ----------------
10248        440.0000
10249        1863.4000
10250        1813.0000

We could further refine the aggregations by using more than one grouping column. Consider the following query, which retrieves the average price of our products, grouped first by name and then by size:

SELECT name, size, AVG(unit_price) AS avg
FROM product
GROUP BY name, size

The results are:

name                 size    avg
------------         ------  -----------------------
Flux Capacitor       small   900
P32 Space Modulator  small   1400
Transmogrifier      medium  1400
Acme Rocket          large   600
Land Speeder         large   6500

In addition, the GROUP BY clause supports a few very important subclauses:

{ROLLUP | CUBE} ([grouping_column[, ... ]])[, grouping_set_list]: Groups the aggregate values of the result set by one or more grouping columns. (Without ROLLUP or CUBE, the GROUP BY (grouping_column[, ... ]) clause is the simplest and most common form of the GROUP BY clause.)
ROLLUP: Produces subtotals for each set of grouping columns as a hierarchical result set, adding subtotal and grand total rows into the result set in a hierarchical fashion. ROLLUP operations return one row per grouping column, with NULL appearing in the grouping column to show the subtotaled or totaled aggregate value.
CUBE: Produces subtotals and cross-tabulated totals for all grouping columns. In a sense, the CUBE clause enables you to quickly return multidimensional result sets from standard relational tables without a lot of programmatic work. CUBE is especially useful when working with large amounts of data. Like ROLLUP, CUBE provides subtotals of the grouping columns, but it also includes subtotal rows for all possible combinations of the grouping columns specified in the query.
GROUPING SETS [{ROLLUP | CUBE}] ([grouping_column[, ... ]])[, grouping_set_list]: Enables aggregated groups on several different sets of grouping columns within the same query. This is especially useful when you want to return only a portion of an aggregated result set. The GROUPING SETS clause also lets you select which grouping columns to compare, whereas CUBE returns all of the grouping columns and ROLLUP returns a hierarchical subset of the grouping columns. As the syntax shows, the SQL standard also allows GROUPING SETS to be paired with ROLLUP or CUBE.

Table 4-2 illustrates the differences between the result sets returned by GROUP BY on its own and with each of these subclauses.

GROUP BY syntax Returns the following sets

Table 4-2. `GROUP BY` syntax variations
GROUP BY syntax	Returns the following sets
GROUP BY (col_A, col_B, col_C) GROUP BY ROLLUP (col_A, col_B, col_C) GROUP BY CUBE (col_A, col_B, col_C)	(col_A, col_B, col_C) (col_A, col_B, col_C) (col_A, col_B) (col_A) () (col_A, col_B, col_C) (col_A, col_B) (col_A) (col_B, col_C) (col_B) (col_C) ()

 GROUP BY (col_A, col_B, col_C)
 GROUP BY ROLLUP (col_A, col_B, col_C)



 GROUP BY CUBE (col_A, col_B, col_C)

 (col_A, col_B, col_C)
 (col_A, col_B, col_C)
    (col_A, col_B)
    (col_A)
    ()
 (col_A, col_B, col_C)
    (col_A, col_B)
    (col_A)
    (col_B, col_C)
    (col_B)
    (col_C)
    ()

Each type of GROUP BY clause returns a different set of aggregated values and, in the case of ROLLUP and CUBE, totals and subtotals.

The concepts of ROLLUP, CUBE, and GROUPING SETS are much more intuitive when explained by example. In the following example, we query for data summarizing the number of sales_orders by order_year and by order_quarter:

SELECT order_year AS year, order_quarter AS quarter,
    COUNT (*) AS orders
FROM order_details
WHERE order_year IN (2020, 2021)
GROUP BY ROLLUP (order_year, order_quarter)
ORDER BY order_year, order_quarter;

The results are:

year quarter orders
---- ------- ------
NULL NULL    648     -- grand total
2020 NULL    380     -- total for year 2020
2020 1       87
2020 2       77
2020 3       91
2020 4       125
2021 NULL    268     -- total for year 2021
2021 1       139
2021 2       119
2021 3       10

Adding grouping columns to the query provides more details (and more subtotaling) in the result set. Now let’s modify the previous example by adding a region to the query (but since the number of rows increases, we’ll only look at the first and second quarters):

SELECT order_year AS year, order_quarter AS quarter, region,
   COUNT (*) AS orders
FROM order_details
WHERE order_year IN (2020, 2021)
  AND order_quarter IN (1,2)
  AND region IN ('USA', 'CANADA')
GROUP BY ROLLUP (order_year, order_quarter,region)
ORDER BY order_year, order_quarter, region;

The results are:

year quarter region  orders
---- ------- ------  ------
NULL NULL    NULL    183     -- grand total
2020 NULL    NULL    68      -- subtotal for year 2020
2020 1       NULL    36      -- subtotal for all regions in q1 of 2020
2020 1       CANADA  3
2020 1       USA     33
2020 2       NULL    32      -- subtotal for all regions in q2 of 2021
2020 2       CANADA  3
2020 2       USA     29
2021 NULL    NULL    115     -- subtotal for year 2021
2021 1       NULL    57      -- subtotal for all regions in q1 of 2021
2021 1       CANADA  11
2021 1       USA     46
2021 2       NULL    58      -- subtotal for all regions in q2 of 2021
2021 2       CANADA  4
2021 2       USA     54

The GROUP BY CUBE clause is useful for performing multidimensional analyses on aggregated data. Like GROUP BY ROLLUP, it returns subtotals, but unlike GROUP BY ROLLUP, it returns subtotals combining all of the grouping columns named in the query. (As you will see, it also has the potential to increase the number of rows returned in the result set.)

In the following example, we query for data summarizing the number of sales_orders by order_year and by order_quarter:

SELECT order_year AS year, order_quarter AS quarter,
     COUNT (*) AS orders
FROM order_details
WHERE order_year IN (2020, 2021)
GROUP BY CUBE (order_year, order_quarter)
ORDER BY order_year, order_quarter;

The results are:

year quarter orders
---- ------- ------
NULL NULL    648     -- grand total
NULL 1       226     -- subtotal for q1 of both years
NULL 2       196     -- subtotal for q2 of both years
NULL 3       101     -- subtotal for q3 of both years
NULL 4       125     -- subtotal for q4 of both years
2020 NULL    380     -- total for year 2020
2020 1       87
2020 2       77
2020 3       91
2020 4       125
2021 NULL    268     -- total for year 2021
2021 1       139
2021 2       119
2021 3       10

The GROUP BY GROUPING SETS clause lets you aggregate on more than one group in a single query. For each group set, the query returns subtotals with the grouping column marked as NULL. While the CUBE and ROLLUP clauses place predefined subtotals into the result set, the GROUPING SETS clause allows you to control what subtotals to add to the query. The GROUPING SETS clause will also return a grand total if you include a set with no columns, such as ().

Using a similar example query to the ones shown with ROLLUP and CUBE, this time we’ll subtotal by year and quarter and separately by year:

SELECT order_year AS year, order_quarter AS quarter, COUNT (*)
AS orders
FROM order_details
WHERE order_year IN (2020, 2021)
GROUP BY GROUPING SETS ((order_year, order_quarter), (order_year))
ORDER BY order_year, order_quarter;

The results are:

year quarter orders
---- ------- ------
2020 NULL    380     -- total for year 2020
2020 1       87
2020 2       77
2020 3       91
2020 4       125
2021 NULL    268     -- total for year 2021
2021 1       139
2021 2       119
2021 3       10

Another way to think of GROUPING SETS is to consider it to be like a UNION ALL of more than one GROUP BY query that references different parts of the same data. You can tell the database to add subtotals to a GROUPING SET by simply adding in the ROLLUP or CUBE clause according to how you would like subtotaling to occur.

GROUPING SETS can also be concatenated to concisely generate large combinations of groupings. Concatenated GROUPING SETS yield the cross product of groupings from each of the sets within a GROUPING SETS list. Concatenated GROUPING SETS are compatible with CUBE and ROLLUP, but since they perform a cross product of all GROUPING SETS, they will generate a very large number of final groupings from even a small number of concatenated groupings. This is demonstrated in the example in Table 4-3.

Table 4-3. `GROUP BY GOUPING SETS` syntax
`GROUP BY` syntax	Returns the following sets
GROUP BY (col_A, col_B, col_C)	(col_A, col_B, col_C)
...	...
GROUP BY GROUPING SETS (col_A, col_B) (col_Y, col_Z)	(col_A, col_Y) (col_A, col_Z) (col_B, col_Y) (col_B, col_Z)

You can imagine how large the result set would be if the concatenated GROUPING SETS contained a large number of groupings! However, the information returned can be very valuable and hard to reproduce.

The HAVING clause

The HAVING clause adds search conditions on the result of the GROUP BY clause. The HAVING clause works very much like the WHERE clause, but it applies to the GROUP BY clause. The HAVING clause supports all the same search conditions as the WHERE clause shown earlier. For example, using the same query as at the beginning of the previous section, say we now want to find only those jobs that are performed by more than three people:

-- Query
SELECT   j.job_desc "Job Description",
         COUNT(e.job_id) "Nbr in Job"
FROM     employee e
JOIN     jobs j ON e.job_id = j.job_id
GROUP BY j.job_desc
HAVING   COUNT(e.job_id) > 3

-- Results
Job Description                                    Nbr in Job
-------------------------------------------------- -----------
Acquisitions Manager                               4
Managing Editor                                    4
Marketing Manager                                  4
Operations Manager                                 4
Productions Manager                                4
Public Relations Manager                           4
Publisher                                          7

Note that the SQL standard does not require that an explicit GROUP BY clause appear with a HAVING clause. For example, the following query against the employee table is valid because it has an implied GROUP BY clause:

SELECT COUNT(dept_nbr)
FROM employee
HAVING COUNT(dept_nbr) > 30;

Although it’s valid, this application of the HAVING clause is rather rare.

IN Operator

The IN operator provides a way to delineate a list of values, either explicitly listed or from a subquery, and compare a value against that list in a WHERE or HAVING clause. In other words, it gives you a way to say “Is value A in this list of values?” All the platforms support the SQL standard syntax.

Platform	Command
MySQL	Supported
Oracle	Supported
PostgreSQL	Supported
SQL Server	Supported

SQL standard syntax

{WHERE | HAVING | {AND | OR}}
   value [NOT] IN ({comp_value1, comp_value2[, ...] | subquery})

Keywords

{WHERE | HAVING | {AND | OR}} value: IN is permitted in either the WHERE or the HAVING clause. The IN comparison may also be a part of an AND or OR clause in a multicondition WHERE or HAVING clause. value may be of any data type, but is usually the name of a column of the table referenced by the transaction, or perhaps a host variable when used programmatically.
NOT: Optionally tells the database to look for a result set that contains values that are not in the list.
IN ({comp_value1, comp_value2[, ... ] | subquery}): Defines the list of comparative values (hence, comp_value) to compare against. Each comp_value must be of the same or a compatible data type as the initial value. They are also governed by standard data type rules. For example, string values must be delimited by quotes, while integer values need no delimiters. As an alternative to listing specific values, you may use parentheses to enclose a subquery that returns one or more values of a compatible data type.

Rules at a glance

In the following example, generated on SQL Server, we look for all employees in the employee table of the hr database who have a home state of Georgia, Tennessee, Alabama, or Kentucky:

SELECT *
FROM hr..employee
WHERE home_state IN ('AL','GA','TN','KY')

Similarly, we can look for all employees in the hr database who are authors in the pubs database:

SELECT *
FROM hr..employee
WHERE emp_id IN (SELECT au_id FROM pubs..authors)

We can also use the NOT keyword to return a result set based upon the absence of a value. In the following case, the company headquarters is located in New York, and many workers commute in from neighboring states. We want to see all such workers:

SELECT *
FROM hr..employee
WHERE home_state
   NOT IN ('NY','NJ','MA','CT','RI','DE','NH')

Note that Oracle, while fully supporting the SQL standard functionality, extends the functionality of the IN operator by allowing multiple argument matches. For example, the following SELECT ... WHERE ... IN statement is acceptable on Oracle:

SELECT *
FROM hr..employee e
WHERE (e.emp_id, e.emp_dept)
   IN ( (242, 'sales'), (442, 'mfg'), (747, 'mkt) )

INTERSECT Set Operator

The INTERSECT set operator retrieves the result sets of two or more queries, including only the records retrieved by the first query that are also found in all subsequent queries (that is, it includes only those rows that appear in all the result sets). In some ways, INTERSECT is similar to an INNER JOIN operation (see “JOIN Subclause” for details).

INTERSECT is in a class of keywords called set operators. Other set operators include EXCEPT and UNION. All set operators are used to simultaneously manipulate the result sets of two or more queries; hence the term “set operators.”

Platform	Command
MySQL	Not supported
MariaDB	Supported, with limitations
Oracle	Supported, with limitations
PostgreSQL	Supported, with limitations
SQL Server	Supported, with limitations

SQL standard syntax

There is technically no limit to the number of queries that you may combine with the INTERSECT set operator. The general syntax is:

<SELECT statement1>
INTERSECT [ALL | DISTINCT]
[CORRESPONDING [BY (column1, column2, ...)]]
<SELECT statement2>
INTERSECT [ALL | DISTINCT]
[CORRESPONDING [BY (column1, column2, ...)]]
...

Keywords

INTERSECT: Determines which rows will be included in the final single result set.
ALL | DISTINCT: ALL includes duplicate rows from all result sets in the INTERSECT comparison. DISTINCT drops duplicate rows from all result sets prior to the INTERSECT comparison. Any columns containing a NULL value are considered duplicates. (If neither ALL nor DISTINCT is used, the DISTINCT behavior is the default.)
CORRESPONDING: Specifies that only columns with the same name in both queries are returned, even if both queries use the asterisk (*) shortcut.
BY (column1, column2, ... ): Specifies that only the named columns are returned, even if more columns with corresponding names exist in the queries. Must be used with the CORRESPONDING keyword.

Rules at a glance

There is only one significant rule to remember when using INTERSECT: the order and number of columns must be the same in all of the queries.

Also, while the data types of the corresponding columns do not have to be identical, they must be compatible (for example, CHAR and VARCHAR are compatible data types). By default, the result set will default to the largest of the columns in each ordinal position.

Programming tips and gotchas

None of the platforms support the SQL standard CORRESPONDING [BY (column1, column2, ... )] clause.

The SQL standard evaluates INTERSECT as higher priority than other set operators, but not all platforms evaluate set operator precedence the same way. You can explicitly control the precedence of set operators using parentheses. Otherwise, the DBMS might evaluate the expressions either from leftmost to rightmost or from first to last.

According to the standard, only one ORDER BY clause is allowed in the entire query. It should be included at the end of the last SELECT statement. To avoid column and table ambiguity, be sure to alias each column of each table with the same respective alias. For example:

SELECT a.au_lname AS last_name, a.au_fname AS first_name
FROM authors AS a
INTERSECT
SELECT e.emp_lname AS last_name, e.emp_fname AS last_name
FROM employees AS e
ORDER BY last_name, first_name

Also, be aware that while your column data types may be compatible throughout the queries in the INTERSECT, there may be some variation in behavior across the DBMS platforms with regard to varying length of the columns. For example, if the au_lname column in the first query is markedly longer than the emp_lname column in the second query, different platforms may apply different rules as to which length is used for the final result. In general, though, the platforms will choose the longer (and less restrictive) column size for use in the result set.

Each DBMS may apply its own rules as to which column name is used if the columns across the tables have different names. In general, the column names from the first query are used.

On platforms that do not support INTERSECT, substitute a query using INNER JOIN.

Here is the earlier example rewritten as an INNER JOIN:

SELECT DISTINCT a.au_lname AS last_name, a.au_fname AS first_name
FROM authors AS a 
   INNER JOIN employees AS e 
   ON (a.au_lname = e.emp_lname AND a.au_fname = e.emp_fname)
ORDER BY last_name, first_name

MySQL and MariaDB

MySQL does not support INTERSECT. MariaDB 10.3 and later support the INTERSECT, INTERSECT ALL, and INTERSECT DISTINCT set operators using the SQL standard syntax.

Oracle

Oracle supports the INTERSECT and INTERSECT ALL set operators using the basic SQL standard syntax. INTERSECT DISTINCT is not supported, but INTERSECT is the functional equivalent.

For example, you could find all store IDs that also have sales using this query:

SELECT stor_id FROM stores
INTERSECT
SELECT stor_id FROM sales

Oracle does not support INTERSECT on the following types of queries:

Queries containing columns with LONG, BLOB, CLOB, BFILE, or VARRAY data types
Queries containing a FOR UPDATE clause or a TABLE collection expression

If the first query in the set operation contains any expressions in the SELECT item list, you should include the AS keyword to associate an alias with the column resulting from the expression. Also, only the first query in the set operation may contain an ORDER BY clause.

PostgreSQL

PostgreSQL supports the INTERSECT and INTERSECT ALL set operators using the basic SQL standard syntax, but not on queries with a FOR UPDATE clause. INTERSECT DISTINCT is not supported, but INTERSECT is the functional equivalent.

For example, you can find all authors who are also employees and whose last last names start with “P” as follows:

SELECT a.au_lname
FROM   authors AS a
WHERE  a.au_lname LIKE 'P%'
INTERSECT
SELECT e.lname
FROM   employee AS e
WHERE  e.lname LIKE 'W%';

The first query in the set operation may not contain an ORDER BY clause or a LIMIT clause. Subsequent queries in the INTERSECT [ALL] set operation may contain these clauses, but such queries must be enclosed in parentheses. Otherwise, the rightmost occurrence of ORDER BY or LIMIT will be assumed to apply to the entire set operation.

SQL Server

SQL Server supports INTERSECT, but not its subclauses. The column names of the result set are those returned by the first query. Any column names or aliases referenced in an ORDER BY clause must appear in the first query. When using INTERSECT (or EXCEPT) to compare more than two result sets, each pair of result sets (i.e., each pair of queries) is compared before moving to the next pair in the order of expressions in parentheses first, INTERSECT set operators second, and EXCEPT and UNION last in order of appearance.

Also note that you can use NOT IN or NOT EXISTS operations in conjunction with a correlated subquery, as alternatives. Refer to the sections on IN and EXISTS for examples.

IS Operator

The IS operator determines whether a value is NULL or not. All the platforms support the SQL standard syntax.

Platform	Command
MySQL	Supported
Oracle	Supported
PostgreSQL	Supported
SQL Server	Supported

SQL standard syntax

{WHERE | {AND | OR}} expression IS [NOT] NULL

Keywords

{WHERE | {AND | OR}} expression IS NULL: Returns a Boolean value of TRUE if the expression is NULL, and FALSE if the expression is not NULL. The expression evaluated for NULL can be preceded by the WHERE keyword or the AND or OR keywords.
NOT: Inverses the predicate: the statement will instead return a Boolean TRUE if the value of expression is not NULL, and FALSE if the value of expression is NULL.

Rules at a glance

Because the value of NULL is unknown, you cannot use comparison expressions to determine whether a value is NULL. For example, the expressions X = NULL and X <> NULL cannot be resolved because no value can equal, or not equal, an unknown.

Instead, you must use the IS NULL operator. Be sure that you do not put the word NULL within quotation marks, because if you do that, the DBMS will interpret the value as the word “NULL” and not the special value NULL.

Programming tips and gotchas

Some platforms support the use of a comparison operator to determine whether an expression is NULL. However, all platforms covered by this book now support the SQL standard IS [NOT] NULL syntax.

Sometimes, checking for NULL will make your WHERE clause only slightly more complex. For example, rather than a simple predicate to test the value of stor_id, as shown here:

SELECT stor_id, ord_date
FROM sales
WHERE stor_id IN (6630, 7708)

you can add a second predicate to accommodate the possibility that stor_id might be NULL:

SELECT stor_id, ord_date
FROM sales
WHERE stor_id IN (6630, 7708)
   OR stor_id IS NULL

JOIN Subclause

The JOIN subclause enables you to retrieve rows from two or more logically related tables. You can define many different join conditions and types of joins, though the types of joins supported by the different platforms vary greatly.

Platform	Command
MySQL	Supported, with variations
Oracle	Supported, with variations
PostgreSQL	Supported, with variations
SQL Server	Supported, with limitations

SQL standard syntax

FROM table [AS alias] { [join_type] JOIN [LATERAL] joined_table
[[AS] alias]
   { ON join_condition1 [{AND | OR} join_condition2] [...] |
   USING (column1[, ...]) }} |
 [ PARTITION BY (column1[, ...])]
[...]

Keywords

FROM table

Defines the first table or view in the join.

[join_type] JOIN [LATERAL] joined_table

Specifies the type of JOIN and the second (and any subsequent) table(s) in the join. You may also define an alias on any of the joined_tables. The join types are:

CROSS JOIN: Specifies the complete cross product of two tables. For each record in the first table, all the records in the second table are joined, creating a huge result set. This command has the same effect as leaving off the join condition, and its result set is also known as a Cartesian product.

Cross joins are not advisable or recommended.
[INNER] JOIN: Specifies that unmatched rows in either table of the join should be discarded. If no join type is explicitly defined, this is the default.
LEFT [OUTER] JOIN: Specifies that all records be returned from the table on the left side of the join statement. If a record returned from the left table has no matching record in the table on the right side of the join, it is still returned. Columns from the right table return NULL values when there is no matching row. It is a good idea to configure all your outer joins as left outer joins (rather than mixing left and right outer joins) wherever possible, for consistency.
RIGHT [OUTER] JOIN: Specifies that all records be returned from the table on the right side of the join statement, even if the table on the left has no matching record. Columns from the left table return NULL values when there is no matching row.
FULL [OUTER] JOIN: Specifies that all rows from both tables be returned, regardless of whether a row from one table matches a row in the other table. Any columns that have no value in the corresponding joined table are assigned a NULL value.
NATURAL: Specifies that the join (either inner or outer) should be performed on all columns of identical name shared between the two tables. Consequently, you should not specify join conditions using the ON or USING clauses. The query will fail if you issue it on two tables that do not contain any columns with the same name(s).
LATERAL: The LATERAL keyword can be used with a LEFT JOIN or CROSS JOIN. It denotes a correlated subquery or function call where elements from previously specified tables are used in the subquery or as arguments to the function. The function used can return more than one row.

[AS] alias

Specifies an alias or shorthand for the joined table. The AS keyword is optional when specifying an alias.

ON join_condition

Joins together the rows of the table shown in the FROM clause and the rows of the table declared in the JOIN clause. You may have multiple JOIN statements, all based on a common set of values. These values are usually contained in columns of the same name and data type appearing in both of the tables being joined. These columns, or possibly a single column from each table, are called the join key or common key. Most (but not all) of the time, the join key is the primary key of one table and a foreign key in the other table. As long as the values in the columns match, the join can be performed.

join_conditions are syntactically depicted in the following form (note that join types are intentionally excluded in this example):

FROM table_name1
JOIN table_name2
   ON table_name1.column1 = table_name2.column2
      [{AND|OR} table_name1.column3 = table_name2.column4]
      [...]
JOIN table_name3
   ON table_name1.columnA = table_name3.columnA
      [{AND|OR} table_name1.column3 = table_name2.column4]
      [...]
[JOIN...]

Use the AND operator and the OR operator to issue a JOIN with multiple conditions. It is also a good idea to use brackets around each pair of joined tables if more than two tables are involved, as this makes reading the query much easier.

USING (column[, ... ]): Assumes an equality condition on one or more named columns that appear in both tables. The column (or columns) must exist, as named, in both tables. Writing a USING clause is a little quicker than writing ... ON table1.columnA = table2.columnA, but the results are functionally equivalent.
PARTITION BY (column1[, ... ]): Useful for filling gaps in result sets. Only Oracle supports this clause. Refer to the Oracle section for an example.

Rules at a glance

Joins enable you to retrieve records from two (or more) logically related tables in a single result set. You can use a SQL standard JOIN (detailed here) to perform this operation, or something called a theta join. Theta joins, which use a WHERE clause to establish the filtering criteria, are the “old” way to do join operations.

For example, you might have a table called employee that tracks information about everyone employed in your company. The employee table, however, doesn’t contain extensive information about the job an employee holds; instead, it holds only job_ids. All information about the job, such as its description and title, are stored in a table called job. Using a join, you can easily return columns from both tables in a single set of records. The following sample queries illustrate the difference between a theta join and a SQL standard JOIN:

/* Theta join */
SELECT emp_lname, emp_fname, job_title
FROM employee, jobs
WHERE employee.job_id = jobs.job_id;

/* SQL standard join */
SELECT emp_lname, emp_fname, job_title
FROM employee
JOIN jobs ON employee.job_id = jobs.job_id;

Whenever you reference multiple columns in a single query, the columns must be unambiguous. In other words, the columns must either be unique to each table or be referenced with a table identifier, as is done for the job_id column in the preceding example (any columns in the query that don’t exist in both tables don’t need to be qualified by table identifiers). However, queries like this are often hard to read. The following variation of the previous SQL standard JOIN is in better form, because it uses the short, easy-to-read aliases e and j to refer to the employee and jobs tables:

SELECT e.emp_lname, e.emp_fname, j.job_title
FROM employee AS e
JOIN jobs AS j ON e.job_id = j.job_id;

The previous examples were limited to equi-joins, or joins based on equality using an equals sign (=). However, most other comparison operators are also allowed: you can perform joins with >, <, >=, <=, <>, and so forth.

You cannot join on binary large object data types (e.g., BLOB) or any other large object data types (e.g., CLOB, NLOB, etc.). Other data types are usually allowed in a join comparison.

Following are examples of each type of join:

CROSS JOIN

Here are some cross join examples. The first is a theta join that simply leaves off the join conditions, the second is written using the CROSS JOIN clause, and the final query is similar in concept to the first, with a JOIN clause that omits the join conditions:

SELECT *
FROM employee, jobs;

SELECT *
FROM employee
CROSS JOIN jobs;

SELECT *
FROM employee
JOIN jobs;

As mentioned earlier, cross joins—joins between two or more tables that return all the data for all the rows in all possible variations (i.e., the Cartesian product of the tables—are a really bad idea. Take a careful look at these examples so you know what they look like, and then avoid them!

INNER JOIN

Following is an inner join written using the SQL standard syntax:

SELECT a.au_lname AS 'last name',
   a.au_fname AS 'first name',
   p.pub_name AS 'publisher'
FROM authors AS a
INNER JOIN publishers AS p ON a.city = p.city
ORDER BY a.au_lname DESC

There are lots of authors in the authors table, but very few of them have cities that match their publishers’ cities in the publishers table. For example, the preceding query executed in the pubs database on SQL Server produces results like the following:

last name      first name             publisher
---------------  --------------------  ------------------
Carson          Cheryl                Algodata Infosystems
Bennet          Abraham               Algodata Infosystems

The join is called an inner join because only those records that meet the join condition in both tables are said to be “inside” the join. You could also issue the same query, on platforms that support it, by substituting the USING clause for the ON clause:

SELECT a.au_lname AS 'last name',
   a.au_fname AS 'first name',
   p.pub_name AS 'publisher'
FROM authors AS a
INNER JOIN publishers AS p USING (city)
ORDER BY a.au_lname DESC

The results for this query would be the same.

LEFT [OUTER] JOIN

Following is an example of a left outer join, where we ask for the publisher for each author (we could also substitute the USING clause for the ON clause, as shown in the previous inner join example):

SELECT a.au_lname AS "last name",
   a.au_fname AS "first name",
   p.pub_name AS "publisher"
FROM authors AS a
LEFT OUTER JOIN publishers AS p ON a.city = p.city
ORDER BY a.au_lname DESC

In this example every author from the left (authors) table will be returned, along with the publisher’s name where there is a match, or a NULL value where there is no match. For example, in the SQL Server pubs database, the query returns:

last name     first name            publisher
--------------- -------------------- ----------------
Yokomoto       Akiko                NULL
White          Johnson              NULL
Stringer       Dirk                 NULL
Straight       Dean                 NULL
...

As mentioned earlier, it’s best to avoid mixing left and right outer joins, for greater consistency. Left joins are the more common choice.

RIGHT [OUTER] JOIN

A right outer join is essentially the same as a left outer join, except it returns all the records from the table on the right side of the query. For example, the following query executed in the pubs database on SQL Server:

SELECT a.au_lname AS "last name",
   a.au_fname AS "first name",
   p.pub_name AS "publisher"
FROM authors AS a
RIGHT OUTER JOIN publishers AS p ON a.city = p.city
ORDER BY a.au_lname DESC

returns the following result set:

last name       first name        publisher
---------------- ---------------- ------------------------
Carson           Cheryl           Algodata Infosystems
Bennet           Abraham          Algodata Infosystems
NULL             NULL             New Moon Books
NULL             NULL             Binnet & Hardley
...

Every publisher from the right (publishers) table is returned, along with the authors’ names where there is a match, or a NULL value where there is no match. Again, it’s best to avoid mixing right and left outer joins.

NATURAL [INNER | {LEFT | RIGHT} [OUTER]] JOIN

Natural joins are a substitute for the ON or USING clause, so do not use NATURAL with those clauses. For example:

SELECT a.au_lname AS "first name",
   a.au_fname AS "last name",
   p.pub_name AS "publisher"
FROM authors AS a
NATURAL RIGHT OUTER JOIN publishers AS p
ORDER BY a.au_lname DESC

The preceding query will work the same as the earlier examples, but only if both tables possess a column called city and that is the only column that they hold in common. You could similarly perform any of the other types of joins (INNER, FULL, OUTER) using the NATURAL prefix.

We suggest you avoid natural joins. They save a couple of keystrokes, but at the expense of possible breakage of your code in the future. For example, suppose you have a date_add column in your authors table but not in your publishers table, but you later decide to add that column to the publishers table. Natural join queries you wrote previously will suddenly start giving very unexpected results because they will now also be joining by date_add.

FULL [OUTER] JOIN

If we take our previous example query and render it as a FULL JOIN, it looks like this (note that the OUTER keyword is optional):

SELECT a.au_lname AS "last name",
   a.au_fname AS "first name",
   p.pub_name AS "publisher"
FROM authors AS a
FULL JOIN publishers AS p ON a.city = p.city
ORDER BY a.au_lname DESC;

The result set returned by the query is actually the accumulation of the result sets of issuing separate LEFT and RIGHT join queries (some records have been excluded for brevity):

last name           first name            publisher
-------------------- -------------------- --------------------
Yokomoto             Akiko                NULL
White                Johnson              NULL
Stringer             Dirk                 NULL
...
Dull                 Ann                  NULL
del Castillo         Innes                NULL
DeFrance             Michel               NULL
Carson               Cheryl               Algodata Infosystems
Blotchet-Halls       Reginald             NULL
Bennet               Abraham              Algodata Infosystems
NULL                 NULL                 Binnet & Hardley
NULL                 NULL                 Five Lakes Publishing
NULL                 NULL                 New Moon Books
...
NULL                 NULL                 Scootney Books
NULL                 NULL                 Ramona Publishers
NULL                 NULL                 GGG&G

As you can see, with a FULL JOIN you get some records with all of the data (LEFT and RIGHT JOINs), some with the NULLs on the right and data on the left (LEFT JOIN), and some with NULLs on the left and data on the right (RIGHT JOIN).

LATERAL {query | function}

This example uses SQL standard syntax to return the three top-priced books by each author and will only include authors who have at least one title published. You can achieve the same results in MySQL and older versions of PostgreSQL by replacing the FETCH FIRST 3 ROWS ONLY with LIMIT 3:

SELECT a.au_lname AS "first name",
   a.au_fname AS "last name",
   topt.title, topt.pubdate
FROM authors AS a
  CROSS JOIN LATERAL
  (SELECT t.title, t.pubdate
  FROM titles AS t
       INNER JOIN titleauthor AS ta ON
            t.title_id = ta.title_id
  WHERE ta.au_id = a.au_id
  ORDER BY t.pubdate DESC
  FETCH FIRST 3 ROWS ONLY
  ) AS topt
ORDER BY a.au_lname ASC, topt.pubdate DESC;

If you wanted to list all authors even if they have no published titles, you would use a left join as follows:

SELECT a.au_lname AS "first name",
   a.au_fname AS "last name",
   topt.title, topt.pubdate
FROM authors AS a
  LEFT JOIN LATERAL
  (SELECT t.title, t.pubdate
  FROM titles AS t
       INNER JOIN titleauthor AS ta ON
            t.title_id = ta.title_id
  WHERE ta.au_id = a.au_id
  ORDER BY t.pubdate DESC
  LIMIT 3
  ) AS topt ON (1=1)
ORDER BY a.au_lname ASC, topt.pubdate DESC;

The LATERAL keyword can be used only with CROSS JOIN or LEFT JOIN.

Programming tips and gotchas

As described in the preceding section, there are many types of joins, each with their own rules and behaviors. If an explicit join_type is omitted, an INNER JOIN is assumed.

In general, you should favor the JOIN clause over the WHERE clause for describing join expressions. This not only keeps your code cleaner, making it easy to differentiate join conditions from search conditions, but also avoids the possibility of buggy behavior resulting from some platform-specific implementations of outer joins specified using the WHERE clause.

In general, we do not recommend the use of labor-saving keywords like NATURAL, since the subclause will not automatically update itself when the structures of the underlying tables change. Consequently, statements using these constructs may fail when a table change is introduced without also changing the query.

Not all join types are supported by all platforms, so refer to the following sections for full details on platform-specific join support.

Joins involving more than two tables can be difficult. When joins involve three or more tables, it is a good idea to think of the query as a series of two table joins.

MySQL

MySQL supports most SQL standard syntax, except that natural joins are supported only on outer joins, not on inner joins. MySQL also does not support the PARTITION BY clause. The JOIN syntax is:

FROM table [AS alias]
{[STRAIGHT_JOIN joined_table] |
{ {[INNER] | [CROSS] |
    [NATURAL] [ {LEFT | RIGHT | FULL} [OUTER] ]}
   JOIN [LATERAL] joined_table [AS alias]
   { ON join_condition1 [{AND|OR} join_condition2] [...] } |
     USING (column1[, ...]) }}
[...]

where:

STRAIGHT_JOIN

Forces the optimizer to join tables in the exact order in which they appear in the FROM clause. The STRAIGHT_JOIN keyword is functionally equivalent to JOIN, except that it forces the join order from left to right. This option was supplied because MySQL might, rarely, join the tables in the wrong order.

Refer to “Rules at a glance” for examples.

MySQL is very fluid in the way it supports joins. You can use several different syntaxes to perform a join; for example, you can explicitly declare a join in a query using the JOIN clause, but then show the join condition in the WHERE clause. The other platforms force you to pick one method or the other and do not allow you to mix them in a single query. However, we think it’s bad practice to mix methods, so our examples use SQL standard JOIN syntax.

Oracle

Oracle fully supports the SQL standard JOIN syntax. There’s also an older syntax for outer theta joins that involves adding “(+)” to the column names on the opposite side of the direction of the join (this comes from the fact that the table supplying the NULL value rows in effect has NULL value rows added to it). However, this syntax should be avoided because it doesn’t support some of the features of SQL standard joins, such as FULL JOIN. Oracle also supports CROSS APPLY and OUTER APPLY clauses (also found in SQL Server), which are equivalent to the SQL standard’s LATERAL clause.

For example, the following query does a RIGHT OUTER JOIN on the authors and publishers tables:

SELECT a.au_lname AS 'first name',
   a.au_fname AS 'last name',
   p.pub_name AS 'publisher'
FROM authors AS a
RIGHT OUTER JOIN publishers AS p ON a.city = p.city
ORDER BY a.au_lname DESC

The old Oracle syntax looks like this:

SELECT a.au_lname AS 'first name',
   a.au_fname AS 'last name',
   p.pub_name AS 'publisher'
FROM authors a, publishers p
WHERE a.city(+) = p.city
ORDER BY a.au_lname DESC

Refer to “Rules at a glance” for more JOIN examples.

Oracle is unique in offering partitioned outer joins, which are useful for filling gaps in result sets due to sparse data storage. For example, assume we store production records in a manufacturing table keyed on day and product ID. The table holds a row showing the quantity of each product produced during any day on which it is made, but there are no rows for the days it is not produced. This is considered sparse data, since a list of all rows will not show every day for every product. For calculation and reporting purposes, it’s very useful to be able to create result sets where each product has a row for every day, regardless of whether or not it was manufactured on that day. A partitioned outer join makes it simple to do that, since it lets you define a logical partition and apply an outer join to each partition value. The following example does a partitioned outer join with a times table to make sure each product_id has the full set of dates in a specified time range:

SELECT times.time_id AS time, product_id AS id, quantity AS qty
FROM manufacturing
PARTITION BY (product_id)
RIGHT OUTER JOIN times
ON (manufacturing.time_id = times.time_id)
WHERE manufacturing.time_id
BETWEEN TO_DATE('01/10/05', 'DD/MM/YY')
    AND TO_DATE('06/10/05', 'DD/MM/YY')
ORDER BY 2, 1;

Here is the output from this query:

time          id        qty
--------- ------        ---
01-OCT-05    101         10
02-OCT-05    101
03-OCT-05    101
04-OCT-05    101         17
05-OCT-05    101         23
06-OCT-05    101
01-OCT-05    102
02-OCT-05    102
03-OCT-05    102         43
04-OCT-05    102         99
05-OCT-05    102
06-OCT-05    102         87

Getting these results without using a partitioned outer join would require much more complex and less efficient SQL.

PostgreSQL

PostgreSQL fully supports the SQL standard syntax, except for the PARTITION BY clause. Refer to “Rules at a glance” for examples. When using functions in a LATERAL construct, the LATERAL keyword is optional.

For example, if you wanted to create a set of dates from publication date to present date for each title, you could write your LATERAL query as follows:

SELECT title_id, i AS cal_date
FROM titles CROSS JOIN 
   generate_series(titles.pubdate, CURRENT_DATE, interval '1 day') 
   AS i
ORDER BY title_id, cal_date

Or with the LATERAL keyword:

SELECT title_id, i AS cal_date
FROM titles CROSS JOIN
   LATERAL generate_series(titles.pubdate, CURRENT_DATE, 
      interval '1 day') 
   AS i
ORDER BY title_id, cal_date

SQL Server

SQL Server supports INNER, OUTER, and CROSS joins using the ON clause. It does not support NATURAL join syntax, nor PARTITION BY or the USING clause. SQL Server also does not support the LATERAL clause, though it does support CROSS APPLY, which is equivalent to CROSS JOIN LATERAL, and OUTER APPLY, which is equivalent to LEFT JOIN LATERAL. SQL Server’s JOIN syntax is:

FROM table [AS alias]
{ {[INNER] | [CROSS] | [ {LEFT | RIGHT | FULL} [OUTER] ]}
   [JOIN | APPLY]  joined_table [AS alias]
      { ON join_condition1 [{AND|OR}
       join_condition2] [...] } }
[...]

Refer to “Rules at a glance” for examples.

The equivalent SQL Server query for the LATERAL example is:

SELECT a.au_lname AS "first name",
   a.au_fname AS "last name",
   topt.title, topt.pubdate
FROM authors AS a
  CROSS APPLY
  (SELECT TOP 3 t.title, t.pubdate
  FROM titles AS t
       INNER JOIN titleauthor AS ta ON
            t.title_id = ta.title_id
  WHERE ta.au_id = a.au_id
  ORDER BY t.pubdate DESC
  ) AS topt
ORDER BY a.au_lname ASC, topt.pubdate DESC;

If you wanted to make sure all authors are listed even if they have no published titles, you would use the OUTER APPLY clause, as follows:

SELECT a.au_lname AS "first name",
   a.au_fname AS "last name",
   topt.title, topt.pubdate
FROM authors AS a
  OUTER APPLY
  (SELECT TOP 3 t.title, t.pubdate
  FROM titles AS t
       INNER JOIN titleauthor AS ta ON
            t.title_id = ta.title_id
  WHERE ta.au_id = a.au_id
  ORDER BY t.pubdate DESC
  ) AS topt
ORDER BY a.au_lname ASC, topt.pubdate DESC;

LIKE Operator

The LIKE operator enables specified string patterns in SELECT, INSERT, UPDATE, and DELETE statements to be matched, specifically in the WHERE clause. A specified pattern may include special wildcard characters. The specific wildcards supported vary from platform to platform.

Platform	Command
MySQL	Supported
Oracle	Supported
PostgreSQL	Supported, with variations
SQL Server	Supported, with variations

SQL standard syntax

WHERE expression [NOT] LIKE string_pattern
   [ESCAPE escape_sequence]

Keywords

WHERE expression [NOT] LIKE: Returns a Boolean TRUE when the value of expression matches the string_pat⁠tern. The expression may be a column, a constant, a host variable, a scalar function, or a concatenation of any of these. It should not be a user-defined type, nor should it be certain LOB types.
NOT: Inverses the predicate, so the statement returns a Boolean TRUE if the value of expression does not contain the string_pattern and returns FALSE if the value of expression contains the string_pattern.
ESCAPE escape_sequence: Allows you to search for the presence of characters that would normally be interpreted as wildcards.

Rules at a glance

Matching string patterns is easy with LIKE, but there are a couple of rules to remember:

All characters, including trailing and leading spaces, are important.
Differing data types may be compared using LIKE, but they store string patterns differently. In particular, be aware of the differences between the CHAR, VARCHAR, and DATE data types.
Using LIKE may negate indexes or force the DBMS to use alternative, less optimal indexes than a straight comparison operation.

The SQL standard standard currently supports two wildcard operators that are supported by all of the platforms covered in this book:

%: Matches any string
_ (underscore): Matches any single character

The first query in the following example retrieves any city record with “ville” in its name. The second query returns authors with a first name not like Sheryl or Cheryl (or Aheryl, Bheryl, Dheryl, 2heryl, and so forth):

SELECT * FROM authors
WHERE city LIKE '%ville%';
SELECT * FROM authors
WHERE au_fname NOT LIKE '_heryl';

Some of the platforms support additional wildcard symbols. These are described in the platform-specific sections that follow.

Use of the ESCAPE clause allows you to look for wildcard characters in the strings stored in your database. Using this mechanism, you designate a character—typically a character that does not otherwise appear in the pattern string—as your escape character. For example, you might designate the tilde (~) because you know it never appears in the pattern string. Any wildcard character preceded by the escape sequence is then treated not as a wildcard, but rather as the character itself. For example, we can look through the comments column of the sales_detail table (on SQL Server) to see whether any customers have mentioned a newly introduced discount using this query:

SELECT ord_id, comment
FROM sales_detail
WHERE comment LIKE '%~%%' ESCAPE '~'

In this case, the first and last %s are interpreted as wildcards, but the second % character is interpreted as just that (a % character), because it is preceded by the designated escape sequence.

Programming tips and gotchas

The usefulness of LIKE is based on the wildcard operators that it supports. LIKE returns a Boolean TRUE value when the comparison finds one or more matching values.

The default case sensitivity of the DBMS is very important to the behavior of LIKE. For example, SQL Server is not case-sensitive by default (though it can be configured that way), so it will evaluate the strings 'DAD' and 'dad' to be equal. MySQL is also case-insensitive by default but has a LIKE BINARY operator to force case sensitivity. Oracle and PostgreSQL, on the other hand, are case-sensitive. Thus, on those platforms a comparison of 'DAD' and 'dad' would show them to be unequal. PostgreSQL (but not Oracle) has an ILIKE operator for case-insensitive matching. Here’s an example query to better illustrate this point:

SELECT *
FROM authors
WHERE lname LIKE 'LARS%'

This query on MySQL and SQL Server would find authors whose last names are stored as 'Larson' or 'Lars', even though the search was for the uppercase 'LARS%'. Oracle and PostgreSQL, however, would not find 'Larson' or 'Lars', because they perform case-sensitive comparisons by default.

MySQL

MySQL supports the SQL standard syntax for LIKE. Additionally, it supports the special functions REGEXP, RLIKE, NOT REGEXP, and NOT RLIKE for the evaluation of regular expressions.

Oracle

Oracle supports the SQL standard syntax for LIKE. Its LIKE syntax is as follows:

WHERE expression [NOT] {LIKE | LIKEC | LIKE2 |
   LIKE4} string_pattern
[ESCAPE escape_sequence]

The Oracle-specific syntax elements have the following meanings:

LIKEC: Uses Unicode complete characters
LIKE2: Uses Unicode USC2 code points
LIKE4: Uses Unicode UCS4 code points

Since Oracle is case-sensitive, you should enclose the expression, the string_pat⁠tern, or both with the UPPER function. That way, you are always comparing apples to apples.

PostgreSQL

PostgreSQL supports the SQL standard syntax for LIKE. It’s case-sensitive by default but provides the keyword ILIKE for case-insensitive pattern matching. You can also use the operators ~~ as an equivalent to LIKE, ~~* for ILIKE, and !~~ and !~~* for NOT LIKE and NOT ILIKE, respectively. These are all extensions to the SQL standard syntax.

For example, the following queries are functionally the same:

SELECT * FROM authors
WHERE city LIKE '%ville';
SELECT * FROM authors
WHERE city ~~ '%ville';

Since these queries are in lowercase, you might run into a case-sensitivity problem. That is, the queries are looking for a lowercase '%ville', but the table might contain uppercase (and unequal) values such as 'BROWNSVILLE', 'NASHVILLE', and 'HUNTSVILLE'. You can get around this as follows:

-- Convert the values to uppercase
SELECT * FROM authors
WHERE city LIKE UPPER('%ville');

-- Perform the pattern match using case insensitivity
SELECT * FROM authors
WHERE city ~~* '%ville';
SELECT * FROM authors
WHERE city ILIKE '%ville';

Although beyond the scope of this text, you should be aware that PostgreSQL also supports POSIX regular expressions. See the platform documentation for details.

SQL Server

SQL Server supports the SQL standard syntax for LIKE, and the following additional wildcard operators:

[ ]: Matches any value in the specified set, as in [abc], or range, as in [k–n]
[^ ]: Matches any characters not in the specified set or range

Using SQL Server’s additional wildcard operators, you have some added capabilities. For example, you can retrieve any author with a last name like Carson, Carsen, Karson, or Karsen:

SELECT * FROM authors
WHERE au_lname LIKE '[CK]ars[eo]n'

or you can retrieve any author with a last name that ends in “arson” or “arsen,” but is not Larsen or Larson:

SELECT * FROM authors
WHERE au_lname LIKE '[A-Z^L]ars[eo]n'

Remember that when you’re performing string comparisons with LIKE, all characters in the pattern string are significant, including all leading and trailing blank spaces.

ORDER BY Clause

The ORDER BY clause specifies the sort order of the result set retrieved by a SELECT statement.

Platform	Command
MySQL	Supported, with limitations
Oracle	Supported, with variations
PostgreSQL	Supported, with variations
SQL Server	Supported, with limitations

SQL standard syntax

ORDER BY { sort_expression [COLLATE collation_name]
   [ASC | DESC] [NULLS {FIRST | LAST}] }[, ...]
[ OFFSET int {ROW | ROWS} ]
[ FETCH {FIRST | NEXT} numeric {ROW | ROWS | PERCENT}
   {ONLY | WITH TIES} ]

Keywords

ORDER BY: Specifies the order in which rows should be returned by a query. You should not anticipate a specific ordering if you exclude the ORDER BY clause, even if you specify a GROUP BY clause and it appears that a sort has been done.
sort_expression: Specifies an item in the query that will help determine the order of the result set. You can have multiple sort expressions. They are usually column names or column aliases from the query; however, they may also be expressions like (salary * 1.02).
COLLATE collation_name: Overrides the default collation of the sort_expression and applies the collation_name to the expression for the purposes of the ORDER BY clause.
ASC | DESC: Specifies that the result set should be returned in either ascending order (ASC) or descending order (DESC).
NULLS {FIRST | LAST}: NULLS FIRST and NULLS LAST specify that the records containing NULLs should appear either first or last, respectively. By default, Oracle and PostgreSQL place NULLs last for ascending-order sorts and first for descending-order sorts.
OFFSET int {ROW | ROWS}: Specifies the number of rows to skip from the start of the ORDER BY set.
FETCH {FIRST | NEXT} numeric {ROW | ROWS | PERCENT}: Returns the first or next (from OFFSET) numeric rows or PERCENT of records. If ROW or ROWS is used, the value must be an integer. When using FIRST, there should be no OFFSET clause. PERCENT is any number from 0 to 100 and can include fractional values.
ONLY | WITH TIES: ONLY returns at most the specified number or percentage of rows. WITH TIES returns additional rows if, based on the ORDER BY clause, records within that count are tied. In other words, rather than returning one or more of the tied records at random, up to the numeric specified, all of the tied records that are within the count are returned even if the total count exceeds the numeric specified.

Rules at a glance

The ORDER BY clause should reference columns as they appear in the item list of the SELECT statement, preferably using their aliases (if aliases exist). For example:

SELECT au_fname AS first_name, au_lname AS last_name
FROM authors
ORDER BY first_name, last_name

The ORDER BY clause uses a major-to-minor sort ordering. This means that the result set is ordered by the first column referenced; equal values in the first column are then ordered by the second column, equal values in the second column are ordered by the third column, and so forth.

The individual aspects of a column’s ordering—COLLATE and ASC/DESC—are independent of the other columns in the ORDER BY clause. Thus, you could order a result set in ascending order by one column, and then flip the next column and order it in descending order:

SELECT au_fname AS first_name, au_lname AS last_name
FROM authors
ORDER BY au_lname ASC, au_fname DESC

NULLs are always grouped together (i.e., considered equal) for the purposes of sorting. Depending on your platform, NULLs will be clumped together at the top or at the bottom of the result set. The following query on SQL Server:

SELECT title, price
FROM titles
ORDER BY price, title

provides this result set:

title                                                price
---------------------------------------------------- -------
Net Etiquette                                        NULL
The Psychology of Computer Cooking                   NULL
The Gourmet Microwave                                2.9900
You Can Combat Computer Stress!                      2.9900
Life Without Fear                                    7.0000
Onions, Leeks, and Garlic: Cooking Secrets of the Me 20.9500
Computer Phobic AND Non-Phobic Individuals: Behavior 21.5900
But Is It User Friendly?                             22.9500
...

You can force NULLs to appear at the top or bottom of the result set using ASC or DESC. Of course, all the non-NULL rows of the result set are also ordered in ascending or descending order.

Some platforms support specification of NULL sorting. In Oracle and PostgreSQL by default NULLs are sorted to the end, but you can change the behavior as follows to yield the previous result by doing:

SELECT title, price
FROM titles
ORDER BY price NULLS FIRST, title NULLS LAST

The SQL standard also supports for the sort_expression the use of columns that are not referenced in the SELECT item list. For example, the following query is valid:

SELECT title, price
FROM titles
ORDER BY title_id

Looking at this example, you can see that although the query does not select title_id, that column is the primary sort_expression. The result set is returned in title_id order even though that column is not selected.

You can limit the number of records returned using the [OFFSET] FETCH {FIRST | NEXT} subclauses. For example, the following query will skip the first 10 titles and return the next 10:

SELECT title, price
FROM titles
ORDER BY title
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY

Programming tips and gotchas

When using set operators (EXCEPT, INTERSECT, UNION), only the last query may have an ORDER BY clause.

A number of behaviors that were supported in SQL92 are deprecated in more recent SQL standards. You should avoid these usages although all the databases covered here still support them:

References to table aliases: For example, ORDER BY e.emp_id should be changed to ORDER BY emp_id. If there is an ambiguous column name, use an alias to compensate.
References to ordinal position: Use explicitly defined column aliases to compensate.

You may sort not only on columns, but also on expressions involving columns, or even literals:

SELECT SUBSTRING(title,1,55) AS title, (price * 1.15) as price
FROM titles
WHERE price BETWEEN 2 and 19
ORDER BY price, title

When sorting on expressions from the SELECT item list, you should use aliases to make the ORDER BY sort_expression column references easier to read.

MySQL

MySQL supports the SQL standard, except for the COLLATE option, FETCH PERCENT, and NULLS {FIRST | LAST}. In addition to the SQL standard OFFSET FETCH, it also supports a LIMIT OFFSET subclause that is equivalent in purpose and predates the SQL standard’s subclause. MySQL’s ORDER BY syntax is:

ORDER BY {sort_expression [ASC | DESC]}[, ...]
[ OFFSET int {ROW | ROWS} ]
[ FETCH {FIRST | NEXT} numeric {ROW | ROWS} {ONLY | WITH TIES} ]
[ LIMIT int OFFSET int ]

where:

LIMIT int: Specifies the maximum number of records to return
OFFSET int: Specifies how many records to skip

You cannot use the LIMIT OFFSET construct in conjunction with the OFFSET FETCH construct.

You should not attempt to ORDER BY columns of the BLOB data type, because only the first bytes, defined by the MAX_SORT_LENGTH setting, will be used in the sort. By default, MySQL sorts NULL values lowest (first) for ASC order and highest (last) for DESC order.

Oracle

Oracle supports the SQL standard, except for the COLLATE option. It also supports a SIBLINGS option. Oracle’s ORDER BY syntax is:

ORDER [SIBLINGS] BY {sort_expression
   [ASC | DESC] [NULLS {FIRST | LAST}]}[, ...]
[ OFFSET int {ROW | ROWS} ]
[ FETCH {FIRST | NEXT} numeric {ROW | ROWS | PERCENT} 
   {ONLY | WITH TIES} ]

where:

ORDER [SIBLINGS] BY sort_expression: Sorts the result set of the query in order of the sort_expression(s). A sort_expression may be a column name, an alias, an integer indicating a column’s ordinal position, or another expression (e.g., salary * 1.02). The ORDER SIBLINGS BY clause tells Oracle to preserve any ordering specified by a hierarchical query clause (CONNECT BY), and to use the sort expression order for ordering of siblings in the hierarchy.

You can emulate the behavior of the COLLATE option for a single session by using the NLSSORT function with the NLS_SORT parameter. You can also emulate the behavior of the COLLATE option for all sessions on the server either explicitly, by using the NLS_SORT initialization parameter, or implicitly, with the NLS_LANGUAGE initialization parameter.

You should not perform an ORDER BY on any LOB column, nested table, or VARRAY.

PostgreSQL

PostgreSQL supports the SQL standard, with the exception of the COLLATE and FETCH PERCENT options. It also supports a LIMIT OFFSET subclause similar to MySQL’s, and a USING subclause. Support for OFFSET FETCH was introduced in PostgreSQL 13. PostgreSQL’s ORDER BY syntax is:

ORDER BY {sort_expression [ASC | DESC] [USING operator]
   [NULLS {FIRST | LAST}]}[, ...]
[ OFFSET int {ROW | ROWS} ]
[ FETCH {FIRST | NEXT} int {ROW | ROWS} {ONLY | WITH TIES} ]
[ LIMIT int OFFSET int ]

where:

USING operator: Specifies a specific comparison operator. Thus, you may sort by >, <, =, >=, <=, and so forth. Ascending order is the same as specifying USING <, while descending order is the same as USING >.
LIMIT int: Specifies the maximum number of records to return.
OFFSET int: Specifies how many records to skip.

ASC and DESC are SQL standards. If not specified, ASC is the default. PostgreSQL sorts NULL values as higher than any other value by default, causing NULL values to appear at the end of ASC sorts and at the beginning of DESC sorts. You can use the NULLS {FIRST | LAST} clause to change this behavior. For example:

SELECT stor_id, ord_date, qty AS quantity
FROM sales
ORDER BY stor_id, ord_date DESC, qty ASC NULLS FIRST;

You cannot use both OFFSET FETCH and LIMIT OFFSET, as they achieve equivalent goals. Although LIMIT OFFSET is faster to type, the OFFSET FETCH construct is SQL standard–compliant and allows for specifying how ties are treated in the result set. For example:

SELECT stor_id, ord_date, qty
FROM sales
ORDER BY qty
FETCH FIRST 3 ROWS WITH TIES;
stor_id |        ord_date       | qty
--------+------------------------+-----
 6380    | 1994-09-13 00:00:00-04 |   3
 6380    | 1994-09-14 00:00:00-04 |   5
 7896    | 1993-12-12 00:00:00-05 |  10
 7067    | 1994-09-14 00:00:00-04 |  10
 8042    | 1994-09-14 00:00:00-04 |  10
(5 rows)

Note that the output has five rows instead of three because the last three are tied for third place.

In SELECT statements that involve single tables or use JOINs, but not set operations (UNION, for example), you may also order by columns of the table that do not appear in the select_item list. For example:

SELECT stor_name
FROM sales
ORDER BY stor_id, qty;

SQL Server

SQL Server supports the SQL standard, except for the FETCH PERCENT and NULLS {FIRST | LAST} options. PERCENT can be used with TOP to simulate FETCH PERCENT. For example, the following query retrieves the authors’ first names from the authors table in the SQL_Latin1 collation:

SELECT au_fname
FROM authors
ORDER BY au_fname
COLLATE SQL_Latin1_general_cp1_ci_as

By default, SQL Server sorts NULL values higher than all other values.

SQL Server allows a variety of collations that can affect how the result set is evaluated. Thus, under certain collations “SMITH” and “smith” might evaluate and sort differently. You should not use TEXT, IMAGE, or NTEXT columns as sort_expressions on SQL Server.

OVER Clause

The OVER clause appears in the SELECT clause as a function qualifier for window function–based columns and when aggregates are used in a window construct. The WINDOW clause often accompanies one or more OVER clauses and appears after the whole SELECT statement. It is used to name a window specification that is then used by name in the OVER clause(s).

Platform	Command
MySQL	Supported, with limitations
Oracle	Supported, with variations
PostgreSQL	Supported, with variations
SQL Server	Supported, with limitations

SQL standard syntax

FUNCTION_NAME(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning] 
      [ordering] [framing]
   partitioning ::= PARTITION BY value[, value...] 
         [COLLATE collation_name]
   ordering ::= ORDER [SIBLINGS] BY rule[, rule...]
      rule ::= {value | position | alias} [ASC | DESC] 
         [NULLS {FIRST | LAST}]
   framing ::= {ROWS | RANGE | GROUPS} {start | between} 
         [exclusion]
      start ::= {UNBOUNDED PRECEDING | unsigned-integer 
            PRECEDING | CURRENT ROW}
      between ::= BETWEEN bound AND bound
         bound ::= {start | UNBOUNDED FOLLOWING | unsigned-integer 
            FOLLOWING}
      exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP | 
         EXCLUDE TIES | EXCLUDE NO OTHERS}

In addition to the OVER clause, you can have a WINDOW clause before the ORDER BY clause and after the WHERE clause in a SELECT statement. It is followed by one or more definitions of named windows. The syntax is as follows:

WINDOW {window_name AS (window_specification)[, ...]}

Keywords

OVER {window_name | (window_specification)}

The OVER clause may reference a predefined window (defined in a WINDOW clause) that can be used across multiple columns or can provide a window specification consisting of ORDER BY and PARTITION BY clauses. A window of data consists of a subset of rows relative to the current row.

partitioning

Defines the group of rows that a row is a member of. The values are column names or expressions that include column names. All records with these expressions in common belong to the same partition. If there is no partitioning clause, then all rows belong to the same partition.

ordering

Specifies the order in which rows should be sorted in the window.

framing

Denotes which rows are considered to be in the window (a subset of rows relative to the current row). The syntax of the framing subclause is as follows:

{ROWS | RANGE} {start | between} [exclusion]
start ::= {UNBOUNDED PRECEDING | unsigned-integer PRECEDING | 
   CURRENT ROW}
between ::= BETWEEN bound AND bound
bound ::= {start | UNBOUNDED FOLLOWING | unsigned-integer 
   FOLLOWING}
exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP |                
   EXCLUDE TIES | EXCLUDE NO OTHERS}

ASC | DESC

Specifies that the result set should be returned in either ascending order (ASC) or descending order (DESC) based on the value | position.

NULLS {FIRST | LAST}

Specifies that the records containing NULLs should appear either first or last in the result set, respectively.

WINDOW {window_name AS (window_specification)[, ... ]}

Defines one or more named windows, providing a specification for each.

Rules at a glance

Each SELECT statement may have zero or more OVER clauses. Here are a few examples. The following query ranks each title across all titles (dr) and then within the group that has the same publisher:

SELECT   t.title_id,
   DENSE_RANK() OVER(ORDER BY price) AS dr,
   DENSE_RANK() OVER(PARTITION BY pub_id ORDER BY price) AS dr_pub
FROM     titles AS t
ORDER BY title_id;

The next query uses the SUM aggregate function to produce a running total and an overall total for price. Note that the ORDER BY clause used in an OVER clause does not need to be the same as the SELECT ... ORDER BY, but it does make it easier to debug:

SELECT   t.title_id,
   SUM(price) OVER() AS overall_total,
   SUM(price) OVER(ORDER BY price) AS running_total
FROM     titles AS t
ORDER BY title_id;

Finally, here’s an example of a query that uses named windows:

SELECT   t.title_id,
   DENSE_RANK() OVER wprice AS dr,
   SUM(price) OVER(wpub) AS wpub_total
FROM     titles AS t
WINDOW wprice AS (ORDER BY price), 
   wpub AS (PARTITION BY pub_id)
ORDER BY title_id;

MySQL

MySQL supports a subset of the SQL standard syntax for the OVER and window_spec⁠ification clauses, as shown here:

FUNCTION_NAME(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning] 
      [ordering] [framing]
   partitioning ::= PARTITION BY value[, value...] 
   ordering ::= ORDER BY rule[, rule...]
      rule ::= {value | position | alias} [ASC | DESC] ]
   framing ::= {ROWS | RANGE} {start | between} [exclusion]
      start ::= {UNBOUNDED PRECEDING | unsigned-integer 
            PRECEDING | CURRENT ROW}
      between ::= BETWEEN bound AND bound
         bound ::= {start | UNBOUNDED FOLLOWING | unsigned-integer
                             FOLLOWING}

It also supports the WINDOW clause, and has the same options for the window_specifi⁠cation clause.

Oracle

Oracle fully supports the SQL standard OVER and window_specification clauses, except for the COLLATE clause. Its syntax is as follows:

FUNCTION_NAME(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning]
      [ordering] [framing]
   partitioning ::= PARTITION BY value[, value...] 
   ordering ::= ORDER [SIBLINGS] BY rule[, rule...]
      rule ::= {value | position | alias} [ASC | DESC] 
            [NULLS {FIRST | LAST}]
   framing ::= {ROWS | RANGE | GROUPS} {start | between} 
         [exclusion]
      start ::= {UNBOUNDED PRECEDING | unsigned-integer 
            PRECEDING | CURRENT ROW}
      between ::= BETWEEN bound AND bound
         bound ::= {start | UNBOUNDED FOLLOWING | 
               unsigned-integer FOLLOWING}
      exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP | 
            EXCLUDE TIES | EXCLUDE NO OTHERS}

The WINDOW clause was introduced in Oracle 21c. It allows anything that is also allowed in the window_specification of the OVER clause.

PostgreSQL

PostgreSQL supports all the SQL standard syntax for the OVER and window_specifi⁠cation clauses, except for the SIBLINGS keyword and COLLATE clause. It allows all aggregate functions, including user-defined ones, to be used as window aggregates. Aggregates can be created in nearly any PostgreSQL-supported language, but the PL/pgSQL and SQL built-in languages do not support the creation of window functions (window functions can be created in C, PL/V8, and PL/R).

The PostgreSQL OVER syntax is as follows:

FUNCTION_NAME(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning] 
      [ordering] [framing]
   partitioning ::= PARTITION BY value[, value...] 
   ordering ::= ORDER BY rule[, rule...]
      rule ::= {value | position | alias} [ASC | DESC] 
            [NULLS {FIRST | LAST}]
   framing ::= {ROWS | RANGE | GROUPS} {start | between} 
         [exclusion]
      start ::= {UNBOUNDED PRECEDING | unsigned-integer 
            PRECEDING | CURRENT ROW}
      between ::= BETWEEN bound AND bound
         bound ::= {start | UNBOUNDED FOLLOWING | 
               unsigned-integer FOLLOWING}
      exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP | 
            EXCLUDE TIES | EXCLUDE NO OTHERS}

It also supports the WINDOW clause, with the same options for the window_specifica⁠tion clause.

SQL Server

SQL Server supports a subset of the SQL standard syntax for the OVER clause. It does not support named windows or the WINDOW clause. SQL Server’s OVER syntax is as follows:

FUNCTION_NAME(expr) OVER {(window_specification)}
window_specification ::= [partitioning] [ordering] [framing]
   partitioning ::= PARTITION BY value[, value...] 
         [COLLATE collation_name]
   ordering ::= ORDER BY rule[, rule...]
      rule ::= {value | position | alias} [ASC | DESC]
   framing ::= {ROWS | RANGE} {start | between}
      start ::= {UNBOUNDED PRECEDING | unsigned-integer 
            PRECEDING | CURRENT ROW}
      between ::= BETWEEN bound AND bound
         bound ::= {start | UNBOUNDED FOLLOWING | 
               unsigned-integer FOLLOWING}

SELECT Statement

The SELECT statement retrieves rows, columns, and derived values from one or many tables of a database.

Platform	Command
MySQL	Supported, with variations
Oracle	Supported, with variations
PostgreSQL	Supported, with variations
SQL Server	Supported, with variations

SQL standard syntax

The full syntax of the SELECT statement is powerful and complex, but it can be broken down into these main clauses:

SELECT [{ALL | DISTINCT}] select_item [AS alias][, ...]
FROM [ONLY | OUTER]
   {table_name [[AS] alias] | view_name [[AS] alias]}[, ...]
[ [join_type] JOIN join_condition ]
[WHERE search_condition] [ {AND | OR | NOT} search_condition[...] ]

group_by_clause

order_by_clause

Keywords

[{ALL | DISTINCT}] select_item

Retrieves values that compose the query result set. Each select_item may be a literal, an aggregate or scalar function, a mathematical calculation, a parameter or variable, or a subquery, but a select_item is most commonly a column from a table or view. A comma must separate each item in a list of such items.

The schema or owner name should be prefixed to a column’s name when it’s extracted from a context outside of the current user’s. If another user owns the table, that user must be included in the column reference. For example, if the user jake needed to access data in the schema katie he would use:

SELECT emp_id
FROM katie.employee;

You can use the asterisk (* ) shorthand to retrieve all columns in every table or view listed in the FROM clause. It’s a good idea to use this shortcut on single-table queries only.

ALL, the default behavior, returns all records that meet the selection criteria. DISTINCT tells the database to filter out any duplicate records, thus retrieving only one instance of many identical records.

AS alias

Replaces a column heading (when in the select_item clause) or a table name or view name (when in the FROM clause) with a shorter heading or name. This clause is especially useful for replacing cryptic or lengthy names with short, easy-to-understand names or mnemonics, and for when the column contains only derived data, so you don’t end up with a column called something like ORA000189x7/0.02. It is also very useful in self-joins and correlated subqueries where a single query references the same table more than once. When multiple items appear in the select_item clause or FROM clause, make sure to place the commas after the AS alias clauses. Also, be careful to always use an alias uniformly once you introduce it into the query.

FROM [ONLY | OUTER] {table_name | view_name}[, ... ]

Lists all of the tables and/or views from which the query retrieves data. Separate table and view names using commas. The FROM clause also allows you to assign aliases to long table/or view names or subqueries using the AS clause. Using shorter aliases instead of longer table or view names simplifies coding. (Of course, this might thwart the DBA’s carefully planned naming conventions, but the alias only lasts for the duration of the query. Refer to “Rules at a glance” for more information on aliases.) A FROM clause may contain a subquery (refer to “SUBQUERY Substatement” for details).

ONLY

Specifies that only the rows of the named table or view (and no rows in subtables or subviews) will be retrieved in the result set. When using ONLY, be sure to enclose the table_name or view_name within parentheses. ONLY is ignored if the table or view has no subtables or subviews.

OUTER

Specifies that the rows of the named table or view, along with the rows and columns of any and all subtables or subviews, will be retrieved in the result set. Columns of the subtables (or subviews) will be appended to the right, in subtable hierarchy order according to the depth of the subtable. In extensive hierarchies, subtables with common parents are appended in the creation order of their types. When using OUTER, be sure to enclose the table_name or view_name within parentheses. OUTER is ignored if the table or view has no subtables or subviews.

[join_type] JOIN join_condition

Joins together the result set of the table shown in the FROM clause to another table that shares a meaningful relationship based on a common set of values. These values are usually contained in columns of the same name and data type that appear in both tables being joined. These columns, or possibly a single column from each table, are called the join key or common key. Most—but not all—of the time, the join key is the primary key of one table and a foreign key in the other table. As long as the data in the columns matches, the join can be performed. (Note that joins can also be performed using the WHERE clause. This technique is sometimes called a theta join.)

Refer to “JOIN Subclause” for details of different kinds of joins.

Join conditions are most commonly depicted in the form:

JOIN table_name2 ON table_name1.column1 comparison_operator
   table_name2.column1
JOIN table_name3 ON table_name1.columnA comparison_operator
   table_name3.columnA
[...]

When the comparison_operator is the equals sign (=), a join is said to be an equi-join. However, the comparison operator may be <, >, <=, >=, or even <>.

Use the AND operator to issue a JOIN with multiple conditions. You can also use the OR operator to specify alternative join conditions.

If an explicit join_type is omitted, an INNER JOIN is assumed. Note that there are many types of joins, each with its own rules and behaviors. Also be aware that an alternative approach to the join condition, via the USING clause, exists:

USING (column_name[, ... ])

Acts as an alternative to the ON clause. With this clause, instead of describing the conditions of the join, you simply provide one or more column names (separated by commas) that appear in both tables. The database then evaluates the join based on those columns (the column names must be identical in both tables). In the following example, the two queries produce identical results:

   SELECT emp_id
   FROM employee
   LEFT JOIN sales USING (emp_id, region_id);

   SELECT emp_id
   FROM employee AS e
   LEFT JOIN sales AS s
      ON e.emp_id     = s.emp_id
      AND e.region_id = s.region_id;

WHERE search_condition

Filters unwanted data from the result set of the query, returning only those records that satisfy the search conditions. A poorly written WHERE clause can ruin the performance of an otherwise useful SELECT statement, so mastering the nuances of the WHERE clause is of paramount importance. Search conditions are syntactically depicted in the form WHERE [schema.[table_name.]]column operator value.

WHERE clauses usually compare the values contained in a column of the table. The values of the column are compared using an operator of some type (refer to Chapter 2 for more details). For example, a column might equal (=) a given value, be greater than (>) a given value, or be BETWEEN a range of values.

WHERE clauses may contain many search conditions concatenated together using the AND or OR Boolean operators, and parentheses can be used to impact the order of precedence of the search conditions. WHERE clauses can also contain subqueries (refer to “WHERE Clause” for details).

group_by_clause

Refer to “GROUP BY Clause” for details.

order_by_clause

Refer to “ORDER BY Clause” for details.

Rules at a glance

Each clause of the SELECT statement has a specific use. Thus, it is possible to speak individually of the FROM clause, the WHERE clause, the GROUP BY clause, and so forth. You can get more details and examples of SELECT statements by looking up the entries for each clause of the statement elsewhere in this chapter. Not every query needs every clause, but at a minimum a query needs a select_item list.

Because the SELECT clause is so important and offers so many options, we’ve divided this “Rules at a glance” section into the following detailed subsections:

Aliases and WHERE clause joins
The JOIN clause
The WHERE clause
The ORDER BY clause

All of these clauses, as well as the GROUP BY clause, are discussed at greater length in their respective sections in this chapter.

Aliases and WHERE clause joins

Column names may need to be prefixed with their database, schema, and table names, particularly when the same column name may appear in more than one table in the query. For example, on an Oracle database, both the jobs table and scott’s employee table may contain job_id columns. The following example joins the employee and jobs tables using the WHERE clause. This is an old-style join syntax:

SELECT   scott.employee.emp_id,
         scott.employee.fname,
         scott.employee.lname,
         jobs.job_desc

FROM     scott.employee,
         jobs
WHERE    scott.employee.job_id = jobs.job_id
ORDER BY scott.employee.fname,
         scott.employee.lname

You can also use aliases to write such a query more simply and clearly:

SELECT   e.emp_id,
         e.fname,
         e.lname,
         j.job_desc
FROM     scott.employee AS e,
         jobs AS j
WHERE    e.job_id = j.job_id
ORDER BY e.fname,
         e.lname

These two queries illustrate the following important rules about WHERE clause joins:

Use commas to separate multiple elements in the select_item list, tables in the FROM clause, and items in the order_expression.
Use the AS clause to define aliases.
Use aliases consistently throughout the SELECT statement once you define them.

In general, you should favor the JOIN clause (explained next) over the WHERE clause for describing join expressions. This not only keeps your code cleaner, making it easy to differentiate join conditions from search conditions, but also allows you to avoid the counterintuitive behavior that may result from using the WHERE clause for outer joins in some implementations.

The JOIN clause

To perform the same query as in the previous example using a SQL standard join, list the first table and the keyword JOIN, followed by the name of the table to be joined, the keyword ON, and the join condition that would have been used in the old-style query. The next example shows the preceding query using the SQL standard JOIN clause:

SELECT   e.emp_id, e.fname, e.lname, j.job_desc
FROM     scott.employee AS e
JOIN     jobs AS j ON e.job_id = j.job_id
ORDER BY e.fname, e.lname;

Alternatively, you could use the USING clause. Instead of describing the conditions of the join, simply provide one or more column_names (separated by commas) that appear in both of the joined tables. The database then evaluates the join based on those columns (the column names must be identical in both tables). In the following example, the two queries (one using the ON clause and one using the USING clause) produce identical results:

SELECT emp_id
FROM employee LEFT JOIN sales USING (emp_id, region_id);

SELECT emp_id
FROM employee   AS e
LEFT JOIN sales AS s
   ON  e.emp_id    = s.emp_id
   AND e.region_id = s.region_id;

Refer to “JOIN Subclause” for details on different types of joins.

The WHERE clause

A poorly written WHERE clause can ruin an otherwise beautiful SELECT statement, so it’s important that you master the nuances of the WHERE clause (discussed in more detail later in this chapter). Here is an example of a typical query with a multipart WHERE clause:

SELECT   a.au_lname,
         a.au_fname,
         t2.title,
         t2.pubdate
FROM     authors a
JOIN     titleauthor t1 ON a.au_id = t1.au_id
JOIN     titles t2 ON t1.title_id = t2.title_id
WHERE    (t2.type = 'business' OR t2.type = 'popular_comp')
  AND    t2.advance > 5500
ORDER BY t2.title

In examining this query, note that the parentheses impact the order of processing for the search conditions. You can use parentheses to move search conditions up or down in precedence, just like you would in an algebra equation.

On some platforms, the database’s default collation (also known as the sort order) impacts how the WHERE clause filters results for a query. For example, SQL Server is (by default) dictionary-order and case-insensitive, making no differentiation between “Smith,” “smith,” and “SMITH.” Oracle, however, is dictionary-order and case-sensitive, finding the values “Smith,” “smith,” and “SMITH” to be unequal.

The WHERE clause offers many more specific capabilities than the preceding example illustrates. The following list references some of the more common capabilities of the WHERE clause:

NOT

Inverts a comparison operation using the syntax WHERE NOT expression. Thus, you might use WHERE NOT LIKE ... or WHERE NOT IN ... in a query.

Comparison operators

Compares any set of values, using the operations <, >, <>, >=, <=, and =. For example:

WHERE emp_id = '54123'

IS NULL or IS NOT NULL conditions

Search for any NULL or NOT NULL values, respectively, using the syntax WHERE expression IS [NOT] NULL.

AND

Merges multiple conditions, returning only those records that meet all conditions. The maximum number of multiple conditions is platform-dependent. For example:

WHERE job_id = '12' AND job_status = 'active'

OR

Merges alternative conditions, returning records that meet any of the conditions. For example:

WHERE job_id = '13' OR job_status = 'active'

LIKE

Tells the query to use a pattern-matching string contained within quotation marks. The wildcard symbols supported by each platform are detailed in their individual sections. All platforms support the percent sign (%) for a wildcard symbol. For example, to find any phone number starting with the 415 area code:

WHERE phone LIKE '415%'

EXISTS

Used only with subqueries, EXISTS tests to see whether the subquery data exists. It is typically much faster than a WHERE IN subquery. For example, the following query finds all authors who are also employees:

SELECT au_lname FROM authors WHERE EXISTS
   (SELECT last_name FROM employees)

BETWEEN

Performs a range check to see whether a value is in between two values (inclusive of those two values). For example:

WHERE ytd_sales BETWEEN 4000 AND 9000.

IN

Performs a test to see whether an expression matches any one value out of a list of values. The list may be literal, as in WHERE state IN ('or', 'il', 'tn', 'ak'), or it may be derived using a subquery:

WHERE state IN (SELECT state_abbr FROM territories)

SOME | ANY

Functions the same as the EXISTS operation, though with slightly different syntax. For example, the following query finds all authors who are also employees:

SELECT au_lname FROM authors WHERE
   au_lname = SOME(SELECT last_name FROM employees)

ALL

Performs a check to see whether all records in the subquery match the evaluation criteria, and returns TRUE when the subquery returns zero rows. For example:

WHERE city = ALL
   (SELECT city FROM employees WHERE emp_id = 54123)

Refer to “WHERE Clause” for additional details.

The ORDER BY clause

A result set can be sorted through the ORDER BY clause, in accordance with the database’s sort order. Each column of the result set may be sorted in either ascending (ASC) or descending (DESC) order. (Ascending order is the default.) If no ORDER BY clause is specified, most implementations return the data either according to the physical order of the data within the table or according to the order of an index utilized by the query. However, when no ORDER BY clause is specified, there is no guarantee as to the order of the result set. Following is an example of a SELECT statement with an ORDER BY clause on SQL Server:

SELECT   e.emp_id "Emp ID",
         e.fname "First",
         e.lname "Last",
         j.job_desc "Job Desc"
FROM     employee e,
         jobs j

WHERE    e.job_id = j.job_id
  AND    j.job_desc = 'Acquisitions Manager'
ORDER BY e.fname DESC,
         e.lname ASC

The results are:

Emp ID    First           Last            Job Desc
--------- --------------- --------------- --------------------
MIR38834F Margaret        Rancé          Acquisitions Manager
MAS70474F Margaret        Smith          Acquisitions Manager
KJJ92907F Karla           Jablonski      Acquisitions Manager
GHT50241M Gary            Thomas         Acquisitions Manager

After the result set is pared down to meet the search conditions, it is sorted by the authors’ first names in descending order. Where the authors’ first names are equal, the results are sorted in ascending order by last name. Refer to “ORDER BY Clause” for more details.

You may write an ORDER BY clause using columns in the table that do not appear in the select_item list. For example, you might query all emp_ids from the employee table, yet ORDER BY the employees’ first and last names.

Programming tips and gotchas

Once you’ve assigned an alias to a table or view in the FROM clause, use it exclusively for all other references to that table or view within the query (in the WHERE clause, for example). Do not mix references to the full table name and the alias within a single query. You should avoid mixed references for a couple of reasons. First, it is simply inconsistent and makes code maintenance more difficult. Second, some database platforms return errors on SELECT statements containing mixed references. (Refer to “SUBQUERY Substatement” for special instructions on aliasing within a subquery.)

MySQL, PostgreSQL, and SQL Server support certain types of queries that do not need a FROM clause. Use these types of queries with caution, since the SQL standard requires a FROM clause. Queries without a FROM clause must be manually migrated either to the SQL standard form or to a form that also works on the target database. Refer to the entry for each clause to fully investigate the varying degrees of support offered by the different database vendors for the various options of the SELECT command.

MySQL

MySQL’s implementation of SELECT includes support for JOIN (with some variations, as detailed in “JOIN Subclause”), the INTO clause, the LIMIT clause, and the PROCEDURE clause. Its syntax follows:

SELECT [DISTINCT | DISTINCTROW | ALL] [HIGH_PRIORITY] [STRAIGHT_JOIN]
   [ {SQL_SMALL_RESULT | SQL_BIG_RESULT} ] [SQL_BUFFER_RESULT] 
   [SQL_CALC_FOUND_ROWS] select_item[, ...]
[INTO {OUTFILE 'filename' options | DUMPFILE 'filename' | 
   variable[, ...]}]
[FROM table_name[, ...]

join_clause

[WHERE search_condition]

group_by_clause

order_by_clause

[PROCEDURE procedure_name (param[, ...])]
[FOR {UPDATE | SHARE} [OF table_name[, ...]] [NOWAIT | SKIP LOCKED] | 
   LOCK IN SHARE MODE]

where:

DISTINCT | DISTINCTROW | ALL: DISTINCTROW is a synonym for DISTINCT. ALL is assumed if DISTINCT or DISTINCTROW is not specified.
HIGH_PRIORITY: Gives the query a higher priority than statements that modify data within the table. This should be used only for special, high-speed queries.
STRAIGHT_JOIN: Similar to JOIN, except that the left table is always read before the right table. This is a join optimization feature. It should be avoided unless you are getting bad query plans with the standard JOIN syntax.
SQL_SMALL_RESULT | SQL_BIG_RESULT: Tells the optimizer to expect a small or large result set, respectively, for a GROUP BY or DISTINCT clause. MySQL builds a temporary table when a query has a DISTINCT or GROUP BY clause, and these optional keywords tell MySQL whether to build a fast temporary table in memory (for SQL_SMALL_RESULT) or a slower, disk-based temporary table (for SQL_BIG_RESULT) to process the worktable.
SQL_BUFFER_RESULT: Forces the result set into a temporary table so that MySQL can free table locks earlier and get the result set to the client faster.
SQL_CALC_FOUND_ROWS: Calculates how many rows are in the result set (regardless of a LIMIT clause), which can then be retrieved using SELECT FOUND_ROWS().
select_item: Retrieves the expressions or columns listed. Columns may be listed in the format [database_name.[table_name.]]column_name. If the database and/or table names are left out, MySQL assumes the current database and table.
INTO {OUTFILE 'filename' options | DUMPFILE 'filename' | variable[, ... ]}: Writes the result set of the query to a file named 'filename' on the host filesystem with the OUTFILE option. The named file must not already exist on the filesystem. The DUMPFILE option writes a single continuous line of data without column terminations, line terminations, or escape characters. This option is used mostly for BLOB files. Specific rules for using this clause are detailed following this list. The INTO variable clause allows you to list one or more variables (one for each column returned). If using INTO variable, do not also specify a filename.
FROM ...: Indicates the table from which rows will be retrieved. The table may be described as [database_name.]table_name. MySQL will treat the query as a join if more than one table appears in the FROM clause.
PROCEDURE procedure_name (param[, ... ]): Names a procedure that processes the data in the result set. The procedure is an external procedure (usually C++), not an internal database stored procedure.
FOR {UPDATE | SHARE} [OF table_name[, ... ]] [NOWAIT | SKIP LOCKED] | LOCK IN SHARE MODE: Issues a write lock on the rows returned by the query (UPDATE) for its exclusive use (provided the table is of InnoDB or BDB type), or issues read locks on the rows returned by the query (SHARE and LOCK IN SHARE MODE), such that other users may see the rows but may not modify them. NOWAIT and SKIP LOCKED cause a FOR UPDATE or FOR SHARE query to execute immediately. NOWAIT returns an error if a row lock cannot be obtained, while SKIP LOCKED excludes rows from the result set that are locked by another transaction.

Keep a couple of rules in mind when using the INTO clause. First, the output file cannot already exist, since overwrite functionality is not supported. Second, any file created by the query will be readable by everyone that can connect to the server. (When using SELECT ... INTO OUTFILE, you can then turn around and use the MySQL command LOAD DATA INFILE to quickly load the data.)

You can use the following options to better control the content of the output file when using SELECT ... INTO OUTFILE:

ESCAPED BY
FIELDS TERMINATED BY
LINES TERMINATED BY
OPTIONALLY ENCLOSED BY

The following example illustrates the use of these optional commands via a MySQL query that returns a result set in a comma-delimited output file:

SELECT job_id, emp_id, lname+fname
INTO OUTFILE "/tmp/employees.text"
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY "\;n"
FROM employee;

MySQL also allows SELECT statements without a FROM clause when performing simple arithmetic. For example, the following queries are valid in MySQL:

SELECT 2 + 2;
SELECT 565 - 200;
SELECT (365 * 2) * 52;

For interoperability with Oracle, MySQL also supports selection from the pseudo-table called dual:

SELECT 565 - 200 FROM DUAL;

MySQL offers an interesting alternative to the SQL standard for querying tables—the HANDLER statement. The HANDLER statement works a lot like SELECT, except that HANDLER provides very rapid data reads that circumvent the SQL query engine in MySQL. However, since the HANDLER statement is not a SQL statement, we’ll refer you to the MySQL documentation for more information.

Oracle

Oracle allows a very large number of extensions to the SQL standard SELECT statement. For example, since both nested tables and partitioned tables are allowed in Oracle (see “CREATE/ALTER TABLE Statement”), the SELECT statement allows queries to those types of structures. Oracle’s SELECT syntax is as follows:

SELECT ( {[ALL | DISTINCT]} | [UNIQUE] ) [optimizer_hints]
select_item [AS alias][, ...]
[INTO {variable[, ...] | record}]
FROM {[ONLY] {[schema.][table_name | view_name |
   materialized_view_name]} [@database_link] 
      [AS [OF] {SCN | TIMESTAMP} expression] |
   subquery [WITH {READ ONLY | CHECK OPTION [CONSTRAINT 
      constraint_name]}] |
   [[VERSIONS BETWEEN {SCN | TIMESTAMP} {exp | MINVALUE} AND
      {exp | MAXVALUE}] AS OF {SCN | TIMESTAMP} expression] |
   TABLE (nested_table_column) [(+)]
      {[PARTITION (partition_name) | 
        SUBPARTITION (subpartition_name)]}
   [SAMPLE [BLOCK] [sample_percentage] [SEED (seed_value)]]} 
   [AS alias][, ...]

join_clause
[WHERE search_condition [{AND | OR} search_condition[, ...]]
   [[START WITH value] CONNECT BY [PRIOR] condition]]
group_by_clause

[MODEL model_clause]
order_by_clause

[FOR UPDATE [OF [schema.][table.]column][, ...]
   {[NOWAIT | WAIT (int)]}]

Unless otherwise noted, the clauses shown here follow the SQL standard. Similarly, elements of the clauses are identical to those in the SQL standard unless otherwise noted. For example, Oracle’s GROUP BY clause is nearly identical to the SQL standard, including its component elements, such as ROLLUP, CUBE, GROUPING SETS, concatenated GROUPING SETS, and the HAVING clause.

The parameters are:

{ALL | DISTINCT} | UNIQUE: UNIQUE is a synonym for DISTINCT. In Oracle, DISTINCT and UNIQUE cannot be used on LOB columns.
optimizer_hints: Overrides the default behavior of the query optimizer with user-specified behaviors. For example, hints can force Oracle to use an index that it might not otherwise use or to avoid an index that it might otherwise use. Refer to the vendor documentation for more information about optimizer hints.
select_item: Retrieves the expressions or columns listed. Columns can be from a named query, table, view, or materialized view and can be listed in the format [schema.[table_name.]]column_name. If you omit the schema, Oracle assumes the context of the current schema. Oracle also allows for named queries that may be referenced much like nested table subqueries (discussed in “SUBQUERY Substatement”); it refers to using named queries as subquery factoring. In addition to named queries, Oracle supports subqueries and the asterisk (*), shorthand for all columns, in the select_item list.
INTO {variable[, ... ] | record}: Retrieves the result set values into PL/SQL variables or into a PL/SQL record.
FROM [ONLY]: Identifies the table, view, materialized view, partition, or subquery from which the result set is retrieved. The ONLY keyword is optional and applies only to views belonging to a hierarchy. Use ONLY when you want to retrieve records from a named view only, and not from any of its subviews.
AS [OF] {SCN | TIMESTAMP} expression: Implements SQL-driven flashback, whereby system change numbers (SCNs) or timestamps are applied to each object in the select_item list. Records retrieved by the query are only those that existed at the specified SCN or time. (This feature can also be implemented at the session level using the DBMS_FLASH⁠BACK built-in package.) SCN expression must equal a number, while TIMESTAMP expression must equal a timestamp value. Flashback queries cannot be used on linked servers.
subquery [WITH {READ ONLY | CHECK OPTION [CONSTRAINT constraint_name]}]: Mentioned separately because Oracle allows you extra ways to control a subquery. WITH READ ONLY indicates that the target of the subquery cannot be updated. WITH CHECK OPTION indicates that any update to the target of the subquery must produce rows that would be included in the subquery. WITH CONSTRAINT creates a CHECK OPTION constraint of constraint_name on the table. Note that WITH CHECK OPTION and WITH CONSTRAINT are usually used in INSERT ... SELECT statements.
[VERSIONS BETWEEN {SCN | TIMESTAMP} {exp | MINVALUE} AND {exp | MAXVALUE}] AS OF {SCN | TIMESTAMP} expression: Specifies a special kind of query to retrieve the history of changes made to data from a table, view, or materialized view. The VERSIONS_XID pseudocolumn shows the identifier corresponding to the transaction that made the change. This kind of query is referred to as a flashback query; it requires that you specify an SCN or TIMESTAMP value for each object in the select_item list. (You can implement SQL-driven session-level flashback using the Oracle DBMS_FLASHBACK package.)

The optional subclause VERSIONS BETWEEN is used to retrieve multiple versions of the data specified, either using an upper and lower boundary of an SCN (a number) or TIMESTAMP (a timestamp value), or using the MINVALUE and MAXVALUE keywords. Without this clause, only one past version of the data is returned. (Oracle also provides several version query pseudocolumns for additional versioning information.)

The AS OF clause, discussed earlier in this list, determines the SCN or moment in time from which the database issues the query when used with the VERSIONS clause.

You cannot use flashback queries with the VERSIONS clause against temporary tables, external tables, tables in a cluster, or views.

TABLE

Required when querying a hierarchically declared nested table.

PARTITION | SUBPARTITION

Restricts a query to the specified partition or subpartition of the table. Rows are retrieved only from the named partition or subpartition, not from the entire table, reducing I/O.

SAMPLE [BLOCK] [sampling_percentage] [SEED (seed_value)]

Tells Oracle to select records from a random sampling of rows within the result set, as a percentage of rows or blocks, rather than from the entire table. BLOCK tells Oracle to use block sampling rather than row sampling. The sampling_per⁠centage, telling Oracle the total block or row percentage to be included in the sample, may be anywhere between .000001 and 99. The optional SEED clause is used to provide limited repeatability. If you specify a seed value, Oracle will attempt to return the same sample from one execution of the query to the next. The seed value can be between 0 and 4,294,967,295. When SEED is omitted, the resulting sample will change from one execution of the query to the next. Sampling may be used only on single-table queries.

join_clause

Merges the result sets of two or more tables in a single query. See the description following this list for more information.

WHERE ... [[START WITH value] CONNECT BY [PRIOR] condition]

Filters records returned in the result set. Oracle allows the use of hierarchical information within tables, whose filtering can be controlled with the START WITH clause. START WITH identifies the rows that will serve as the parent rows in the result set. CONNECT BY identifies the relationship condition between the parent rows and their child rows. The PRIOR keyword is used to identify the parent rows instead of the child rows.

Hierarchical queries use the LEVEL pseudocolumn to identify (1) the root node, (2) the child nodes, (3) the grandchild nodes, and so forth. Other pseudocolumns available in hierarchical queries are CONNECT_BY_ISCYCLE and CONNECT_BY_ISLEAF. Hierarchical queries are mutually exclusive of the ORDER BY and GROUP BY clauses. Do not use those clauses in a query containing START WITH or CONNECT BY. You can order records from siblings of the same parent table by using the ORDER SIBLINGS BY clause.

MODEL model_clause

Allows you to create a multidimensional array from query results and then apply formulas (called rules) to this array to calculate new values. See “The MODEL clause” for details.

FOR UPDATE [OF [schema.][table.]column[, ... ] {[NOWAIT | WAIT (int)]}

Locks the rows of the result set so that other users cannot lock or update them until you’re finished with your transaction. FOR UPDATE cannot be used in a subquery, in queries using DISTINCT or GROUP BY, or in queries with set operators or aggregate functions. Child rows in a hierarchical table are not locked when this clause is issued against the parent rows. The OF keyword is used to lock only the selected table or view. Otherwise, Oracle locks all the tables or views referenced in the FROM clause. When using OF, the columns are not significant, though real column names (not aliases) must be used. The NOWAIT and WAIT keywords tell Oracle either to return control immediately if a lock already exists or to wait int seconds before returning control to you, respectively. If neither NOWAIT nor WAIT is specified, Oracle waits until the rows become available.

Unlike some other database platforms, Oracle does not allow a SELECT statement without a FROM clause. The following query, for example, is invalid:

SELECT 2 + 2;

As a workaround, Oracle has provided a special-purpose table called DUAL. Any time you want to write a query that does not retrieve data from a user-created table, such as to perform a calculation, use FROM DUAL. Both of the following queries are valid:

SELECT 2 + 2
FROM DUAL;
SELECT (((52-4) * 5) * 8)
FROM DUAL;

Oracle’s implementation of SELECT is quite straightforward if you want to retrieve data from a table. As mentioned previously, Oracle allows the use of named queries. A named query is, in a sense, an alias to an entire query that can save you time when you’re writing a complex multi-subquery SELECT statement. For example:

WITH pub_costs AS
   (SELECT pub_id, SUM(job_lvl) dept_total
    FROM employees e
    GROUP BY pub_id),
avg_costs AS
   (SELECT SUM(dept_total)/COUNT(*) avg
    FROM employee)
SELECT * FROM pub_costs
WHERE dept_total > (SELECT avg FROM avg_cost)
ORDER BY department_name;

Here we create two named subqueries—pub_costs and avg_costs—which are later referenced in the main query. The named queries are effectively the same as subqueries; however, subqueries must be written out in their entirety each time they’re used, while named queries need not be.

Oracle allows you to select rows from a single partition of a partitioned table using the PARTITION clause, or to retrieve only a statistical sampling of the rows (as a percentage of rows or blocks) of a result set using SAMPLE. For example:

SELECT *
FROM sales PARTITION (sales_2021_q3) sales
WHERE sales.qty > 1000;
SELECT *
FROM sales SAMPLE (12);

Flashback queries are a feature of Oracle that enable retrieval of point-in-time result sets. For example, you could find out what everyone’s salary was yesterday before a big change was applied to the database:

SELECT job_lvl, lname, fname
FROM employee
   AS OF TIMESTAMP (SYSTIMESTAMP - INTERVAL '1' DAY);

Another interesting Oracle extension of the standard query format is the hierarchical query. Hierarchical queries return the results of queries against hierarchically designed tables in the order you define. For example, the following query returns the names of the employees and their positions in the hierarchy (represented by the position in the org_char column), employee IDs, manager IDs, and job IDs:

-- Query
SELECT LPAD(' ',2*(LEVEL-1)) || lname AS org_chart,
   emp_id, mgr_id, job_id
FROM employee
START WITH job_id = 'Chief Executive Officer'
CONNECT BY PRIOR emp_id = mgr_id;

-- Results
ORG_CHART      EMPLOYEE_ID MANAGER_ID JOB_ID
-------------- ----------- ---------- ------------------------
Cramer             101        100 Chief Executive Officer
Devon              108        101 Business Operations Mgr
Thomas             109        108 Acquisitions Manager
Koskitalo          110        108 Productions Manager
Tonini             111        108 Operations Manager
Whalen             200        101 Admin Assistant
Chang              203        101 Chief Financial Officer
Gietz              206        203 Comptroller
Buchanan           102        101 VP Sales
Callahan           103        102 Marketing Manager

In the previous query, the CONNECT BY clause defines the hierarchical relationship of the emp_id value as the parent row equal to the mgr_id value in the child row, while the START WITH clause specifies where in the hierarchy the result set should begin.

Oracle supports the following types of JOIN syntax (refer to “JOIN Subclause” for more details):

FROM table1 {
   CROSS JOIN table2 |
   INNER JOIN table2 [ {ON join_condition |
      USING (column_list)} ] |
   NATURAL [LEFT [OUTER]] JOIN table2 |
   LEFT [OUTER] JOIN table2 [ {ON join_condition
    | USING (column_list)} ] |
   RIGHT [OUTER] JOIN table2 [ {ON join_condition
    | USING (column_list)} ]|
   NATURAL [RIGHT [OUTER]] JOIN table2
   FULL [OUTER] JOIN table2 }

[CROSS] JOIN

Retrieves all records of both table1 and table2. This is syntactically the same as FROM table1, table2 with no join conditions in the WHERE clause.

INNER JOIN

Retrieves those records of both table1 and table2 where there are matching values in both tables according to the join condition. Note that the syntax FROM table1, table2 with join conditions in the WHERE clause is semantically equivalent to an inner join.

NATURAL

Shortcuts the need to declare a join condition by assuming a USING clause containing all columns that are in common between the two joined tables. (Be careful if columns have the same names but not the same data types or the same sort of values!) LOB columns cannot be referenced in a natural join. Referencing a LOB or collection column in a NATURAL JOIN clause will return an error.

LEFT [OUTER] JOIN

Retrieves all records in the leftmost table (i.e., table1) and matching records in the rightmost table (i.e., table2). If there isn’t a matching record in table2, NULL values are substituted for that table’s columns. You can use this type of join to retrieve all the records in a table, even when there are no counterparts in the joined table. For example:

SELECT j.job_id, e.lname
FROM jobs j
LEFT OUTER JOIN employee e ON j.job_id = e.job_id
ORDER BY d.job_id

RIGHT [OUTER] JOIN

Retrieves all records in the rightmost table, regardless of whether there is a matching record in the leftmost table. A right join is the same as a left join, except that the optional table is on the left.

FULL [OUTER] JOIN

Specifies that all rows from both tables be returned, regardless of whether a row from one table matches a row in the other table. Any columns that have no value in the corresponding joined table are assigned a NULL value.

ON join_condition

Declares the condition(s) that join the result sets of two tables together. This takes the form of declaring the columns in table1 and table2 that must match the join condition. When multiple columns must be compared, use the AND clause.

USING (column_list)

Acts as an alternative to the ON clause. Instead of describing the conditions of the join, simply provide a column name (or columns separated by commas) that appears in both tables. The column name(s) must be identical in both tables and cannot be prefixed with a table name or alias. USING cannot be used on LOB columns of any type. The following two queries produce identical results. One is written with a USING clause and the other specifies join conditions using SQL standard syntax:

SELECT column1
FROM foo
LEFT JOIN poo USING (column1, column2);

SELECT column1
FROM foo
LEFT JOIN poo ON foo.column1 = poo.column1
AND foo.column2 = poo.column2;

Partitioned outer joins

Oracle supports the PARTITION BY subclause of the SQL standard’s JOIN clause, which defines a special kind of query called a partitioned outer join that extends the conventional outer join syntax by applying a right or left outer join to a partition of one or more rows. This is especially useful for querying sparse data along a particular dimension of data, thereby returning rows that otherwise would be omitted from the result set. The PARTITION BY clause can be used on either side of an outer join, resulting in a UNION of the outer joins of each of the partitions in the partitioned result set and the table on the other side of the join. (When this clause is omitted, Oracle treats the entire result set as a single partition.) PARTITION BY is not allowed with a FULL OUTER JOIN.

For example, our product table keeps track of all products we produce, while the manufacturing table shows when we produce them. Since we’re not continuously making every product at all times, the joined data between the two tables may be sparse at times. Thus, the following query:

SELECT manufacturing.time_id AS time, product_name AS name,
     quantity AS qty
FROM product
PARTITION BY (product_name)
RIGHT OUTER JOIN times ON (manufacturing.time_id =
     product.time_id)
WHERE manufacturing.time_id
   BETWEEN TO_DATE('01/10/05', 'DD/MM/YY')
       AND TO_DATE('06/10/05', 'DD/MM/YY')
ORDER BY 2, 1;

returns this result:

time      name            qty

--------- ----------      ----------

01-OCT-05 flux capacitor  10
02-OCT-05 flux capacitor

03-OCT-05 flux capacitor
04-OCT-05 flux capacitor
05-OCT-05 flux capacitor
06-OCT-05 flux capacitor  10
06-OCT-05 flux capacitor  8
01-OCT-05 transmogrifier 10
01-OCT-05 transmogrifier 15
02-OCT-05 transmogrifier
03-OCT-05 transmogrifier
04-OCT-05 transmogrifier 10
04-OCT-05 transmogrifier 11
05-OCT-05 transmogrifier
06-OCT-05 transmogrifier

Flashback queries

Oracle also supports flashback queries, which keep track of previous values of the results returned for a SELECT statement. In the following set of example code, we’ll issue a regular query on a table, change the values in the table with an UPDATE statement, and then query the flashback version of the data. First, the regular query:

SELECT salary FROM employees
WHERE last_name = 'McCreary';

The results are:

SALARY
----------
3800

Now, we’ll change the value in the employees table and query the table to confirm the current value:

UPDATE employees SET salary = 4000
WHERE last_name = 'McCreary ';
SELECT salary FROM employees
WHERE last_name = 'McCreary ';

The results are:

SALARY
----------
4000

Finally, we’ll perform a flashback query to see what the salary value was in the past:

SELECT salary FROM employees
AS OF TIMESTAMP (SYSTIMESTAMP - INTERVAL '1' DAY)
WHERE last_name = 'McCreary';

The results are:

SALARY
----------
3800

If we wanted to be more elaborate, we could find out all of the values of salary for a given time period, say, the last two days:

SELECT salary FROM employees
VERSIONS BETWEEN TIMESTAMP
   SYSTIMESTAMP - INTERVAL '1' MINUTE AND
   SYSTIMESTAMP - INTERVAL '2' DAY
WHERE last_name = 'McCreary';

The results are:

SALARY
----------
4000
3800

The MODEL clause

Oracle supports a MODEL clause that enables spreadsheet-like result sets to be returned from a SELECT statement. The MODEL clause is designed to alleviate the need for developers to extract data from the database and put it into a spreadsheet, like Microsoft Excel, for further manipulation. It creates a multidimensional array in which cells can be referenced by dimension values. For instance, you might dimension an array on product and time, specifying column values that you wish to access via combinations of those two dimensions. You can then write rules that are similar in concept to spreadsheet formulas, that are executed in order to change values in your model, or that create new values, and perhaps even new rows, in your model.

Syntactically, the MODEL clause appears after the GROUP BY clause and before the ORDER BY clause. The earlier syntax diagram for Oracle’s SELECT statement shows the position of the clause, and the syntax details are presented here:

MODEL
   [{IGNORE | KEEP} NAV] 
      [UNIQUE {DIMENSION | SINGLE REFERENCE}]
   [ RETURN {UPDATED | ALL} ROWS ]
   [REFERENCE reference_model_name ON (subquery)
      [PARTITION BY (column [AS alias][, ...])]
         DIMENSION BY (column [AS alias][, ...])
         MEASURES (column [AS alias][, ...])
      [{IGNORE | KEEP} NAV] 
      [UNIQUE {DIMENSION | SINGLE REFERENCE}]]
[MAIN main_model_name]
   [PARTITION BY (column [AS alias][, ...])]
      DIMENSION BY (column [AS alias][, ...])
      MEASURES (column [AS alias][, ...])
   [{IGNORE | KEEP} NAV] 
      [UNIQUE {DIMENSION | SINGLE REFERENCE}]]
   model_rules_clause
[RULES [UPSERT [ALL] | UPDATE]
      [{AUTOMATIC | SEQUENTIAL} ORDER]]
   [ITERATE (int) [UNTIL (ending_condition)]]
   ( [ {UPSERT [ALL] | UPDATE } ]   measure [...]
   [FOR { dimension | ( dimension[, ...] ) }
   { [IN ({subquery | literal[, ...]})] |
      [LIKE pattern] FROM start_literal TO end_literal
         {INCREMENT | DECREMENT} diff_literal }[, ...]
   [ORDER [SIBLINGS] BY (order_column [ASC | DESC]
      [NULLS FIRST | NULLS LAST][, ...])]]
   = expr[, ...] )

The parameters of the MODEL clause are as follows:

{IGNORE | KEEP} NAV: Specifies whether NULL or absent values (NAV) are retained as NULLs (KEEP), or whether they are replaced with suitable defaults (IGNORE): zero for numeric types, 1-Jan-2000 for date types, an empty string for character types, and NULL for anything else.
UNIQUE {DIMENSION | SINGLE REFERENCE}: Specifies the scope within which the database ensures that a given cell reference points to a unique data value. Use DIMENSION to require that each possible cell reference, whether on the left or right side of a rule, represents a single value. Use SINGLE REFERENCE to perform that check only for those cell references that appear on the righthand side of a rule.
RETURN {UPDATED | ALL} ROWS: Specifies whether all rows are returned from model processing, or whether only updated rows are returned.
reference_model_name ON (subquery): Specifies the name and rowsource for a reference model. This is a model on which you cannot perform calculations, but which contains values that you can reference from within your main query.
PARTITION BY (column[, ... ]): Splits a model into independent partitions based on the columns given. You cannot partition reference models.
DIMENSION BY (column[, ... ]): Specifies the dimensions for a model. Values from these columns represent the set of index values that are used to identify cells in the multidimensional addressing space.
MEASURES (column[, ... ]): Specifies the values associated with each unique combination of dimensions (e.g., with each cell of the model).
alias: Specifies an alias for a column.
MAIN model_name: Begins the definition of the main model, and also gives that model a name. The main model represents the model on which you perform work. Rows from your containing SELECT feed into this model, rules are applied, and the resulting rows are returned.
RULES [UPSERT [ALL] | UPDATE]: Specifies whether rules may both create new cells and update existing cells (UPSERT), or whether they must only update existing cells (UPDATE). If you want your model to be able to create new rows in your result set, specify UPSERT. The default is UPSERT. You can also control this behavior on a rule-by-rule basis; see rule in the syntax.
{AUTOMATIC | SEQUENTIAL} ORDER: Specifies whether the optimizer determines the order in which rules are evaluated (AUTOMATIC), or whether rules are evaluated in the order in which you list them (SEQUENTIAL). The default is SEQUENTIAL.
ITERATE (int): Requests that entire set of rules be evaluated repeatedly, int times. The default is to evaluate the set of rules just once.
UNTIL(ending_condition): Specifies a condition that, when met, causes iteration to end. You must still specify an int, which serves as a safeguard against infinite loops.
measure[ ... ]: A reference to one of the measures listed in the MEASURES clause. When you reference a measure, the square brackets are part of the syntax. You must specify all dimensions, either via a subquery or by listing them, and the specific value of the measure associated with those dimensions will be returned, or referenced.
FOR ...: A FOR loop iterating over one or many dimensions. The multi-iterating FOR loop is much like a subquery where each row of the result set represents a specific combination of dimensions.
{ dimension | ( dimension[, ... ]) }: A list of values, whether from columns or expressions, that collectively identify a unique cell in the model.
IN ({subquery | literal[, ... ]}): The source of values for a FOR loop may be a subquery, or it may be a specific list of literal values.
LIKE pattern: Allows you to insert dimension values into a pattern. Use a percent-sign to mark the location at which you want dimension values to be inserted. For example, use FOR x LIKE 'A%B' FROM 1 TO 3 INCREMENT 1 to generate values such as 'A1B', 'A2B', 'A3B'.
FROM start_literal TO end_literal {INCREMENT | DECREMENT} diff_literal: Defines the starting and ending FOR loop values, and also the difference between each subsequent value as the loop iterates from start to end.
ORDER [SIBLINGS] BY (order_column [ASC | DESC] [NULLS FIRST | NULLS LAST][, ... ]): Imposes an order of evaluation with respect to the cells referenced from the left side of a rule. Use this clause if you want a rule to be applied to cells in order. Otherwise, you have no guarantee as to the order in which the rule is applied to the cells that it affects. You can order records from siblings of the same parent table by using the ORDER SIBLINGS BY clause.

Following is a list of functions that have been designed specifically for use in the MODEL clause:

CV() or CV(dimension_column): Returns the current value of a dimension column. May be used only on the righthand side of an expression in a rule. When the CV() form is used, the dimension column is determined implicitly based on the function call’s position in a list of dimension values.
PRESENTNNV(measure[dimension[, ... ], not_null, was_null): Returns either not_null or was_null, depending on whether the specified measure was NULL when model processing began. This function may be used only from the righthand side of a rule expression.
PRESENTV(measure[dimension[, ... ], did_exist, didnt_exist): Returns either did_exist or didnt_exist, depending on whether the specified measure existed when model processing began. This function may be used only from the righthand side of a rule expression. Be aware that whether a measure existed is a completely separate question from whether that measure was NULL.
ITERATION_NUMBER: Returns 0 on the first iteration through the rules, 1 on the second iteration, and so forth. This is useful when you want to base rule calculations on the number of iterations.

The following example demonstrates that the MODEL clause gives a normal SELECT statement the ability to construct a multidimensional array as a result set and calculate inter-row and inter-array values interdependently. The newly calculated values are returned as part of the SELECT statement’s result set:

SELECT SUBSTR(region,1,20) country, SUBSTR(product,1,15)
   product, year, sales
FROM sales_view
WHERE region IN ('USA','UK')
MODEL RETURN UPDATED ROWS
   PARTITION BY (region)
   DIMENSION BY (product, year)
   MEASURES (sale sales)
   RULES (
      sales['Bounce',2006] = sales['Bounce',2005] 
         + sales['Bounce',2004],
      sales['Y Box', 2006] = sales['Y Box', 2005],
      sales['2_Products',2006] = sales['Bounce',2006]
         + sales['Y Box',2006] )
ORDER BY region, product, year;

In this example, a query against the sales_view materialized view returns the sum of sales over the course of a few years for the regions 'USA' and 'UK'. The MODEL clause then falls between the WHERE clause and the ORDER BY clause. Since sales_view currently holds data for the years 2004 and 2005, we provide it with rules to calculate figures for the year 2006.

The subclause RETURN UPDATED ROWS limits the result set to the rows that were created or updated by the query. Next, the example defines the logical divisions of the data using data elements from the materialized view and using the PARTITION BY, DIMENSION BY, and MEASURES subclauses. The RULES subclause then references individual measures of the model by referring to combinations of different dimension values, much like a spreadsheet macro references worksheet cells with specific lookups and references to ranges of values.

Oracle (and SQL Server, using a somewhat different technique) supports a non-SQL standard query type known as a pivot query. Although you should refer to the vendor documentation for exactly how to write a pivot (or unpivot) query, an example here will help you take advantage of this useful technique. A pivot query turns the result set on its side, enabling you to extract more value from the data. In Oracle, you must first create your pivot table. By using a pivot table, you can then turn the result “on its side” so that the order_type column becomes the column headings:

CREATE TABLE pivot_table AS
SELECT * FROM (SELECT year, order_type, amt FROM sales)
PIVOT SUM(amt) FOR order_type IN ('retail', 'web');

SELECT * FROM pivot_table ORDER BY YEAR;

The results are:

YEAR     RETAIL    WEB
---- ----------- ------
2004     7014.54
2005     9745.12
2006    16717.88 10056.6
2007    28833.34 39334.9
2008    66165.77 127109.4

PostgreSQL

PostgreSQL supports a straightforward implementation of the SELECT statement. It supports JOIN and subquery applications. PostgreSQL also allows the creation of new temporary or permanent tables using the SELECT ... INTO syntax or the CREATE TABLE AS SELECT construct. Its SELECT syntax is as follows:

SELECT [ALL | DISTINCT [ON (select_item[, ...])]]
[AS alias [(alias_list)]][, ...]
[INTO [LOGGED | UNLOGGED] [[TEMP]ORARY] [TABLE] new_table]
[FROM [ONLY] table1[.*] [AS alias][, ...]]
[[join_type] JOIN table2 {[ON join_condition] | 
   [USING (column_list)]}]
[WHERE search_condition]
[group_by_clause]
[order_by_clause]
[for_update_clause]

for_update_clause ::= [FOR {UPDATE | NO KEY UPDATE | SHARE | KEY SHARE}
   [OF table_name[, ...]] [NOWAIT | SKIP LOCKED] [...]]

where:

ALL | DISTINCT [ON (select_item[, ... ]]

Supports the ALL and DISTINCT keywords of the SQL standard, where ALL (the default) returns all rows (including duplicates) and DISTINCT eliminates duplicate rows. In addition, DISTINCT ON eliminates duplicates on only the specified select_items, not on all of the select_items in the query (example follows).

select_item

Includes the standard elements of a select_item list supported by the SQL standard. In addition to the asterisk (*) shorthand to retrieve all rows, you can use table_name.* to retrieve all rows from an individual table.

AS alias [(alias_list)]

Creates an alias or a list of aliases for one or more columns (or tables in the FROM clause). AS is required for select_item aliases, but not for FROM table aliases. (Some other database platforms treat the AS as an option when declaring an alias.)

INTO [UNLOGGED | LOGGED] [[TEMP]ORARY] [TABLE] new_table

Creates a new table from the result set of the query. Both TEMP and TEMPORARY are acceptable usages to create a temporary table that is automatically dropped at the end of the session. Otherwise, the command creates a permanent table. Permanent tables created with this statement must have new, unique names, but temporary tables may have the same name as an existing table. If you create a temporary table with the same name as an existing permanent table, the temporary table is used to resolve all operations against that table name while in the same session as the one that created it. Other sessions will continue to see the existing permanent table. The UNLOGGED table creates a table where only the creation of the table structure is written to the transaction logs. Creating unlogged tables is generally faster than logged ones and could be as much as 5 times faster. However, because UNLOGGED table data writing is not logged, data inserted can not be replicated. They are also truncated during database restarts or crashes. That said, you should only use unlogged tables for data you do not need to read on replicas and data that can be easier recreated. When the logging option is not specified, a LOGGED table is created.

FROM [ONLY]table1[, ... ]

Specifies one or more source tables where the data resides. (Be sure to specify a join condition or a theta WHERE clause so that you don’t get a Cartesian product of all records in all tables.) PostgreSQL allows inheritance in child tables of declared parent tables. The ONLY keyword is not supported for partitioned tables because the parent never has data. Use the ONLY keyword to suppress rows from the child tables of your source table. (You can turn off this default inheritance globally with the command SET SQL_Inheritance TO OFF.) PostgreSQL also supports nested table subqueries (see the section on SUBQUERY later in this chapter). The FROM clause is not needed when used for computation:

SELECT 8 * 40;

PostgreSQL will also include an implicit FROM on SELECT statements that include schema-identified columns. For example, the following query is acceptable (though not recommended):

SELECT sales.stor_id WHERE sales.stor_id = '6380';

for_update_clause

If FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, or FOR KEY SHARE is specified, the SELECT statement locks the selected rows against concurrent updates. SKIP LOCKS allows records already locked to not be updated. This is suitable if you are updating records in batches and can always revisit records that haven’t been updated in a secondary batch.

PostgreSQL supports a handy variation of the DISTINCT clause, DISTINCT ON (select_item[, ... ] ). This variation allows you to pick and choose the exact columns that are considered for elimination of duplicates. PostgreSQL chooses the result set in a manner much like it does for ORDER BY. You should include an ORDER BY clause so that there’s no unpredictability as to which record is returned. For example, this query retrieves the most recent sales report for each store based on the most recent order date:

SELECT DISTINCT ON (stor_id), ord_date, qty
FROM sales
ORDER BY stor_id, ord_date DESC;

However, there would be no way to predict what single record would have been returned without the ORDER BY clause.

PostgreSQL also allows retrieving a whole row as a column, as follows:

SELECT DISTINCT ON (stor_id) stor_id, s AS sale_row
FROM sales AS s
ORDER BY stor_id, ord_date DESC;

This query retrieves the most recent sales report for each store based on the most recent order date, but instead of returning individual columns, it returns the whole row as a column value. This is done simply by specifying the name of the table or table alias. If a table alias is specified, then the table name cannot be used; you must use the alias. The output of the preceding query looks like this:

6380  (6380,6871,"1994-09-14 00:00:00-04",5,"Net 60",BU1032)
7066  (7066,QA7442.3,"1994-09-13 00:00:00-04",75,"ON invoice",PS2091)
7067  (7067,D4482,"1994-09-14 00:00:00-04",10,"Net 60",PS2091)
7131  (7131,N914008,"1994-09-14 00:00:00-04",20,"Net 30",PS2091)
7896  (7896,TQ456,"1993-12-12 00:00:00-05",10,"Net 60",MC2222)
8042  (8042,423LL922,"1994-09-14 00:00:00-04",15,"ON invoice",MC3021)

This feature is particularly useful for outputting data to applications by combining it with a function such as jsonb_agg or json_agg, as follows, and using ORDER BY aggregation syntax (a feature supported for all PostgreSQL aggregate functions, including user-defined ones):

SELECT json_agg(s ORDER BY stor_id, ord_date) AS sale_rows
FROM sales AS s;

Refer to “JOIN Subclause” for information on the supported join types.

SQL Server

SQL Server supports most of the basic elements of the SQL standard SELECT statement, including all of the various join types. It also offers several variations on the SELECT statement, including optimizer hints, the INTO clause, the TOP clause, GROUP BY variations, COMPUTE, and WITH OPTIONS. The SQL Server SELECT syntax is:

SELECT {[ALL | DISTINCT] | [TOP number [PERCENT] [WITH TIES]]}
   select_item [AS alias]
[INTO new_table_name]
[FROM {[rowset_function | table1[, ...]]} [AS alias]]
[[join_type] JOIN table2 [ON join_condition]]
[WHERE search_condition]
group_by_clause
order_by_clause

[COMPUTE {aggregation (expression)}[, ...]
   [BY expression[, ...]]]
[FOR {BROWSE | XML | JSON}]
[OPTION (hint[, ...])]

where:

TOP number [PERCENT] [WITH TIES]

Indicates that only the specified number of rows should be retrieved in the query result set. If PERCENT is specified, only the first number percent of the rows are retrieved. WITH TIES is used only for queries with an ORDER BY clause. This variation specifies that additional rows are returned from the base result set using the same value in the ORDER BY clause, appearing as the last of the TOP rows.

INTO new_table_name

Creates a new table from the result set of the query. You can use this command to create temporary or permanent tables. (Refer to SQL Server’s rules for creating temporary or permanent tables in “CREATE/ALTER TABLE Statement”.) The SELECT ... INTO command quickly copies the rows and columns queried from other table(s) into a new table using a non-logged operation. Since it is not logged, COMMIT and ROLLBACK statements do not affect it.

FROM {[rowset_function | table1[, ... ]]}

Supports the standard behavior of the SQL standard FROM clause, including nested table subqueries. In addition, SQL Server supports a set of extensions called rowset functions. Rowset functions allow SQL Server to source data from special or external data sources such as XML streams, full-text search file structures (a special structure in SQL Server used to store things like MS Word documents and MS PowerPoint slide shows within the database), or external data sources (like an MS Excel spreadsheet).

See the SQL Server documentation for the full description of the available FROM {[rowset_function | table1[, ... ]]} options. Among the many possibilities, SQL Server currently supports the following functions:

CONTAINSTABLE: Returns a table derived from a specified table that contains at least one full-text index TEXT or NTEXT column. The records derived are based upon either a precise, fuzzy, weighted-match, or proximity-match search. The derived table is then treated like any other FROM data source.
FREETEXTTABLE: Similar to CONTAINSTABLE, except that records are derived based upon a meaning search of 'freetext_string'. FREETEXTTABLE is useful for ad hoc queries against full-text tables, but less accurate than CONTAINSTABLE.
OPENDATASOURCE: Provides a means of sourcing data external to SQL Server via OLEDB without declaring a linked server, such as an MS Excel spreadsheet or a Sybase Adaptive Server database table. This is intended for the occasional ad hoc query; if you frequently retrieve result sets from external data sources, you should declare a linked server.
OPENQUERY: Executes a pass-through query against a linked server. This is an effective means of performing a nested table subquery against a data source that is external to SQL Server. The data source must first be declared as a linked server.
OPENROWSET: Executes a pass-through query against an external data source. This is similar to OPENDATASOURCE, except that OPENDATASOURCE only opens the data source; it does not actually pass through a SELECT statement. OPENROWSET is intended for occasional, ad hoc usage only.
OPENXML: Provides a queryable, table-like view to an XML document using a file handle. This is covered in “SQL Server XML keywords, functions, procedures, and methods”.

COMPUTE {aggregation (expression)}[, ... ] [BY expression[, ... ]]

Generates additional aggregations—usually totals—that appear at the end of the result set. BY expression adds subtotals and control breaks to the result set. COMPUTE and COMPUTE BY can be used simultaneously in the same query. COMPUTE BY must be coupled with an ORDER BY clause, though the expression used by COMPUTE BY can be a subset of the order_by_expression. The aggregation may be any of these function calls: AVG, COUNT, MAX, MIN, STDEV, STDEVP, VAR, VARP, or SUM. Examples are shown later in this section.

COMPUTE, in any form, does not work with the DISTINCT keyword or with TEXT, NTEXT, or IMAGE data types.

FOR {BROWSE | XML | JSON}

FOR BROWSE is used to allow updates to data retrieved in a DB-Library browse mode cursor. (DB-Library is the original access methodology for SQL Server and has since been supplanted by OLE DB in most applications.) FOR BROWSE can only be used against tables with a unique index and a column with the TIMESTAMP data type; it cannot be used in UNION statements or when a HOLDLOCK hint is active.

FOR XML and FOR JSON are used to extract the result set as an XML document or as JSON, respectively. See Chapter 10 for details on working with these formats.

OPTION (hint[, ... ])

Replaces elements of the default query plan with your own. Because the optimizer usually picks the best query plan for any query, you are strongly discouraged from placing optimizer hints into your queries. Refer to the SQL Server documentation for more information on hints.

Here’s an example of SQL Server’s SELECT ... INTO capability. This example creates a table called non_mgr_employees using SELECT ... INTO. The table contains the emp_id, first name, and last name of each non-manager from the employee table, joined with their job descriptions (taken from the jobs table):

SELECT   e.emp_id, e.fname, e.lname,
   SUBSTRING(j.job_desc,1,30) AS job_desc
INTO non_mgr_employee
FROM     employee e
JOIN     jobs AS j ON e.job_id = j.job_id
WHERE    j.job_desc NOT LIKE '%MANAG%'
ORDER BY 2,3,1

The newly created and loaded table non_mgr_employee now can be queried like any other table.

SELECT ... INTO is not logged or recoverable and so should only be used for operations that can be restarted from the beginning.

COMPUTE has a number of permutations that can impact the result set retrieved by the query. The following example shows the sum of book prices broken out by type of book and sorted by type and then price:

-- Query
SELECT type, price
FROM titles
WHERE type IN ('business','psychology')
  AND price > 10
ORDER BY type, price
COMPUTE SUM(price) BY type

-- Results
type         price
------------ ---------------------
business     11.9500
business     19.9900
business     19.9900
             sum
             =====================
             51.9300
type         price
------------ ---------------------
psychology   10.9500
psychology   19.9900
psychology   21.5900
             sum
             =====================
             52.5300

The COMPUTE clause behaves differently if you do not include BY. The following query retrieves the grand total of prices and advances for books with prices over $16.00:

-- Query
SELECT type, price, advance
FROM titles
WHERE price > $16
COMPUTE SUM(price), SUM(advance)

-- Results
type         price                 advance
------------ --------------------- ---------------------
business     19.9900               5000.0000
business     19.9900               5000.0000
mod_cook     19.9900               .0000
popular_comp 22.9500               7000.0000
popular_comp 20.0000               8000.0000
psychology   21.5900               7000.0000
psychology   19.9900               2000.0000
trad_cook    20.9500               7000.0000
             sum
             =====================
             165.4500
                                   sum
                                   =====================
                                   41000.0000

You can even use COMPUTE BY and COMPUTE in the same query to produce subtotals and grand totals. (For the sake of brevity, we’ll show an example query, but not the result set.) In this example, we find the sum of prices and advances by type for business and psychology books that cost over $16.00:

SELECT type, price, advance
FROM titles
WHERE price > $16
  AND type IN ('business','psychology')
ORDER BY type, price
COMPUTE SUM(price), SUM(advance) BY type
COMPUTE SUM(price), SUM(advance)

Don’t forget that you must include the ORDER BY clause with a COMPUTE BY clause! (You do not need an ORDER BY clause with a simple COMPUTE clause without the BY keyword.) There are many permutations that you can perform in a single query—multiple COMPUTE and COMPUTE BY clauses, GROUP BY with a COMPUTE clause, and even COMPUTE with an ORDER BY statement. It’s actually fun to tinker around with the different ways you can build queries using COMPUTE and COMPUTE BY. It’s not theme park fun, but whaddya want? This is a programming book!

SQL Server (and Oracle, using a somewhat different technique) also supports a non-SQL standard query known as a pivot query. Although you should refer to the vendor documentation for details on exactly how to write a pivot (or unpivot) query, an example here will help you take advantage of this useful technique. A pivot query turns the result set on its side, enabling you to extract more value from the data. For example, the following query produces a two-column, four-row result set:

-- Query
SELECT days_to_make, AVG(manufacturing_cost) AS Avg_Cost
FROM manufacturing.products
GROUP BY days_to_make;

-- Results
days_to_make  Avg_Cost
0             5
1             225
2             350
4             950

By using a pivot query, you can then turn the result “on its side” so that the days_to_make column values become the column headings and the query returns one row with five columns:

-- Query
SELECT 'Avg_Cost' As Cost_by_Days, [0], [1], [2], [3], [4]
FROM 
   (SELECT days_to_make, manufacturing_cost 
    FROM manufacturing.products)
AS source
PIVOT
   (AVG(manufacturing_cost) 
    FOR days_to_make IN ([0], [1], [2], [3], [4]))
 AS pivottable;

-- Results
Cost_by_Days  0    1    2    3    4
Avg_Cost      5    225  350  NULL 950

SUBQUERY Substatement

A subquery is a nested query. Subqueries may appear in various places within a SQL statement.

Platform	Command
MySQL	Supported, with limitations
Oracle	Supported
PostgreSQL	Supported
SQL Server	Supported

SQL standard syntax

The different types of subqueries the SQL standard supports are described “Rules at a glance”. Scalar, table, and nested table subqueries are represented by the following generalized syntax:

SELECT column1, column2, ... (scalar_subquery)
FROM table1, ... (nested_table_subquery)
   AS subquery_table_name]
WHERE foo = (scalar_subquery)
   OR foo IN (table_subquery)

Correlated subqueries are more complex because the values of such subqueries are dependent on values retrieved in their main queries. For example:

SELECT column1
FROM   table1 AS t1
WHERE  foo IN
   (SELECT value1
    FROM table2 AS t2
    WHERE t2.pk_identifier = t1.fk_identifier)

Note that the IN clause is for example purposes only. Any comparison operator may be used.

Keywords

scalar_subquery: Includes a scalar subquery in the SELECT item list or in the WHERE or HAVING clause of a query.
table_subquery: Includes a table subquery only in the WHERE clause, with operators such as IN, ANY, SOME, EXISTS, or ALL that act upon multiple values. Table subqueries return one or more rows containing a single value each.
nested_table_subquery: Includes a nested table subquery only in the FROM clause, in conjunction with the AS clause.

Rules at a glance

Subqueries allow you to return one or more values and nest them inside a SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery. Subqueries can be used wherever expressions are allowed. Subqueries also can often be replaced with a JOIN statement. Depending on the DBMS, subqueries may perform less quickly than joins.

Subqueries are always enclosed in parentheses.

SQL supports the following types of subquery:

Scalar subqueries: Subqueries that retrieve a single value. These are the most widely supported type of subquery among the various database platforms.
Vector subqueries: Subqueries that retrieve a single row which has more than one column.
Table subqueries: Subqueries that retrieve more than one value or row of values.

Scalar and vector subqueries can, on some platforms, appear as part of the expression in a SELECT list of items, a WHERE clause, or a HAVING clause. Nested table subqueries tend to appear in the FROM clauses of SELECT statements.

A correlated subquery is a subquery that is dependent upon a value in an outer query. Consequently, the inner query is executed once for every record retrieved in the outer query. Since subqueries can be nested many layers deep, a correlated subquery may reference any level in the main query higher than its own level.

Different rules govern the behavior of a subquery, depending on the clause in which it appears. The level of support amongst the database platforms also varies: some platforms support subqueries in all clauses mentioned earlier (SELECT, FROM, WHERE, and HAVING), while others support subqueries in only one or two of the clauses.

Subqueries are usually associated with the SELECT statement. Since subqueries may appear in the WHERE clause, they can be used in any SQL statement that supports a WHERE clause, including SELECT, INSERT ... SELECT, DELETE, and UPDATE statements.

Certain operators in a WHERE clause, such as =, <, >, >=, <=, and <> (or !=), expect only one value. If a subquery returns more than one value but the operator expects a single value, the entire query will fail. Scalar subqueries should be used in these cases because they can return only a single value. On the other hand, table subqueries may return multiple values, but they are usable only with multivalue expressions like [NOT] IN, ANY, ALL, SOME, or [NOT] EXISTS.

Table subqueries may appear in the FROM clause and should be aliased by the AS clause. The result set returned by a table subquery, sometimes called a derived table, offers similar functionality to a view (see “CREATE/ALTER VIEW Statement” for more on views). Every column returned in the derived table need not be used in the query, though they can all be acted upon by the outer query.

Correlated subqueries typically appear as a component of a WHERE or HAVING clause in the outer query (and, less commonly, in the SELECT item list) and are correlated through the WHERE clause of the inner query (that is, the subquery). Correlated subqueries can also be used as table subqueries, though this is less common. Be sure to include in such a subquery a WHERE clause that evaluates based on a correlating value from the outer query; the example for a correlated query in the earlier SQL standard syntax diagram illustrates this requirement.

It is also important to specify a table alias, called a correlation name, using the AS clause or other alias shortcut for every table referenced in a correlated query, in both the outer and inner queries. Correlation names avoid ambiguity and help the DBMS quickly resolve the tables involved in the query.

All SQL standard–compliant subqueries comply with the following short list of rules:

A subquery cannot include an ORDER BY clause.
A subquery cannot be enclosed in an aggregate function. For example, the following query is invalid:

SELECT foo FROM table1
WHERE sales >= AVG(SELECT column1 FROM sales_table ...)

You can get around this limitation by performing the aggregation in the subquery rather than in the outer query.

Programming tips and gotchas

For most vendor platforms, subqueries should not reference large object data types (e.g., CLOB or BLOB on Oracle and IMAGE or TEXT on SQL Server) or array data types (such as TABLE or CURSOR on SQL Server).

The platforms all support subqueries, but not every vendor supports every type of subquery. Table 4-4 summarizes vendor support at the time of writing.

Table 4-4. *Platform-specific subquery support*
Platform	MySQL	Oracle	PostgreSQL	SQL Server
Scalar subquery in `SELECT` item list	✓	✓	✓	✓
Scalar subquery in `WHERE/HAVING` clause	✓	✓	✓	✓
Vector subquery in `WHERE/HAVING` clause	✓	✓	✓	✓
Table subquery in `FROM` clause	✓	✓	✓	✓
Correlated subquery in `WHERE/HAVING` clause	✓	✓	✓	✓

Aside from SELECT statements, subqueries may also be used in INSERT, UPDATE, and DELETE statements that include a WHERE clause. Subqueries are often used for the following purposes:

To identify the rows inserted into the target table using an INSERT ... SELECT statement, a CREATE TABLE ... SELECT statement, or a SELECT ... INTO statement
To identify the rows of a view or materialized view in a CREATE VIEW statement
To identify value(s) assigned to existing rows using an UPDATE statement
To identify values for conditions in the WHERE and HAVING clauses of SELECT, UPDATE, and DELETE statements
To build a view of a table(s) on the fly (i.e., nested table subqueries)

Examples

This section shows subquery examples that are equally valid on MySQL, Oracle, PostgreSQL, and SQL Server.

A simple scalar subquery is shown in the SELECT item list of the following query:

SELECT job, (SELECT AVG(salary) FROM employee) AS "Avg Sal"
FROM   employee

Table subqueries are functionally equivalent to querying a view. In the following, we query the education level and salary in a table subquery, and then perform aggregations on the values in the derived table in the outer query:

SELECT AVG(edlevel), AVG(salary)
FROM 
   (SELECT edlevel, salary
    FROM employee) AS emprand
GROUP BY edlevel

Remember that this query may fail, depending on the platform, without the AS clause to associate a name with the derived table.

The following query shows a standard table subquery in the WHERE clause expression. In this case, we want all project numbers for employees in the department 'A00':

SELECT projno
FROM   emp_act
WHERE  empno IN
   (SELECT empno
    FROM   employee
    WHERE  workdept ='A00')

The subquery is executed only once for the outer query.

In the next example, we want to know the names of employees and their level of seniority. We get this result set through a correlated subquery:

SELECT firstname, lastname,
   (SELECT COUNT(*)
    FROM employee, senior
    WHERE employee.hiredate > senior.hiredate) as senioritype
FROM employee

Unlike the previous subquery, this subquery is executed one time for every row retrieved by the outer query. In a query like this, the total processing time could be very long, since the inner query may potentially execute many times for a single result set.

Correlated subqueries depend on values retrieved by the outer query to complete the processing of the inner query. They are tricky to master, but they offer unique programmatic capabilities. The following example returns information about orders where the quantity sold in each order is less than the average quantity in other sales for that title:

SELECT s1.ord_num, s1.title_id, s1.qty
FROM sales AS s1
WHERE s1.qty <
   (SELECT AVG(s2.qty)
    FROM sales AS s2
    WHERE s2.title_id = s1.title_id)

For this example, you can accomplish the same functionality using a self-join. However, there are situations in which a correlated subquery may be the only easy way to do what you need.

The next example shows how a correlated subquery might be used to update values in a table:

UPDATE course SET ends =
   (SELECT min(c.begins) FROM course AS c
    WHERE c.begins BETWEEN course.begins AND course.ends)
WHERE EXISTS
   (SELECT * FROM course AS c
    WHERE c.begins BETWEEN course.begins AND course.ends)

Similarly, you can use a subquery to determine which rows to delete. This example uses a correlated subquery to delete rows from one table based on related rows in another table:

DELETE FROM course
WHERE EXISTS
   (SELECT * FROM course AS c
    WHERE course.id > c.id
    AND (course.begins BETWEEN c.begins
      AND c.ends OR course.ends BETWEEN c.begins AND c.ends))

MySQL

MySQL supports subqueries in the FROM clause and the WHERE clause. It supports scalar subqueries in the SELECT item list.

Oracle

Oracle supports SQL-standard subqueries, though it uses a different nomenclature. In Oracle, a table subquery that appears in the FROM clause is called an inline view. That makes sense because table subqueries are basically views built on the fly. Oracle calls a subquery that appears in the WHERE clause or the HAVING clause of a query a nested subquery. It allows correlated subqueries in the SELECT item list and in the WHERE and HAVING clauses.

PostgreSQL

PostgreSQL supports SQL-standard subqueries in the FROM, WHERE, and HAVING clauses and also allows them to have ORDER BY clauses. However, subqueries appearing in a HAVING clause cannot include ORDER BY, FOR UPDATE, or LIMIT clauses. It supports scalar and vector subqueries in the SELECT item list.

SQL Server

SQL Server supports SQL-standard subqueries. Scalar subqueries can be used almost anywhere a standard expression is allowed. Subqueries in SQL Server cannot include the COMPUTE or FOR BROWSE clauses. They can include the ORDER BY clause if the TOP clause is also used.

UNION Set Operator

The UNION set operator combines the result sets of two or more queries, showing all the rows returned by each of the queries as a single result set.

UNION is in a class of keywords known as set operators. Other set operators include INTERSECT and EXCEPT. All set operators are used to simultaneously manipulate the result sets of two or more queries; hence the term “set operators.”

Platform	Command
MySQL	Supported
Oracle	Supported, with limitations
PostgreSQL	Supported, with limitations
SQL Server	Supported, with limitations

SQL standard syntax

There are technically no limits to the number of queries that you may combine with the UNION statement. The general syntax is:

<SELECT statement1>
UNION [ALL | DISTINCT]
<SELECT statement2>
UNION [ALL | DISTINCT]
...

Keywords

UNION: Determines which result sets will be combined into a single result set. Duplicate rows are, by default, excluded.
ALL | DISTINCT: Combines duplicate rows from all result sets (ALL) or eliminates duplicate rows from the final result set (DISTINCT). Columns containing a NULL value are considered duplicates. If neither ALL nor DISTINCT is used, DISTINCT behavior is the default.

Rules at a glance

There is only one significant rule to remember when using UNION: the order, number, and data types of the columns should be the same in all queries.

The data types do not have to be identical, but they should be compatible. For example, CHAR and VARCHAR are compatible data types. By default, the result set will default to the largest of two (or more) compatible data types, so a query that unions three CHAR columns—CHAR(5), CHAR(10), and CHAR(12)—will display the results in the CHAR(12) format with extra space padded onto the smaller column results.

Programming tips and gotchas

Even though the SQL standard calls for INTERSECT to take precedence over other set operators in a single statement, many platforms evaluate all set operators with equal precedence. You can explicitly control the precedence of set operators using parentheses. Otherwise, the DBMS is likely to evaluate them in order from the leftmost to the rightmost expression.

Depending on the platform, specifying DISTINCT can incur a significant performance cost, since it often involves a second pass through the results to winnow out duplicate records. ALL can be specified in any instance where no duplicate records are expected (or where duplicate records are OK) for faster results.

According to the SQL standard, only one ORDER BY clause is allowed in the entire query. Include it at the end of the last SELECT statement. To avoid column and table ambiguity, be sure to alias matching columns in each table with the same respective aliases. However, for column-naming purposes, only the aliases in the first query are used for each column in the SELECT ... UNION query. For example:

SELECT au_lname AS "lastname", au_fname AS "firstname"
FROM authors
UNION
SELECT emp_lname AS "lastname", emp_fname AS "firstname"
FROM employees
ORDER BY lastname, firstname

Also be aware that even if the queries in your UNION have compatible data-typed columns, there may be some variation in behavior across the DBMS platforms, especially with regard to the length of the columns. For example, if the au_lname column in the first query is markedly longer than the emp_lname column in the second query, different platforms may apply different rules as to which length is used. In general, though, the platforms will choose the longer (and less restrictive) column size for use in the result set.

Each DBMS may apply its own rules as to which column name is used if the columns across the tables have different names. In general, the column names of the first query are used.

MySQL

MySQL fully supports the SQL standard syntax.

Oracle

Oracle supports the UNION and UNION ALL set operators using the basic SQL standard syntax. UNION DISTINCT is not supported, but UNION is the functional equivalent.

For example, you could find out all unique store IDs without duplicates using this query:

SELECT stor_id FROM stores
UNION
SELECT stor_id FROM sales;

Oracle does not support UNION [ALL] on the following types of queries:

Queries containing columns with LONG, BLOB, CLOB, BFILE, or VARRAY data types
Queries containing a FOR UPDATE clause or a TABLE collection expression

If the first query in the set operation contains any expressions in the SELECT item list, include the AS keyword to associate an alias with the column resulting from the expression. Also, only the last query in the set operation may contain an ORDER BY clause.

PostgreSQL

PostgreSQL supports the UNION and UNION ALL set operators using the basic SQL standard syntax, but not on queries with a FOR UPDATE clause UNION DISTINCT is not supported, but UNION is the functional equivalent.

The first query in the set operation may not contain an ORDER BY clause or a LIMIT clause. Subsequent queries in the UNION [ALL] set operation may contain these clauses, but such queries must be enclosed in parentheses. Otherwise, the rightmost occurrence of ORDER BY or LIMIT will be assumed to apply to the entire set operation.

For example, we could find all authors and all employees whose last names start with “P” with the following query:

SELECT a.au_lname
FROM   authors AS a
WHERE  a.au_lname LIKE 'P%'
UNION
SELECT e.lname
FROM   employee AS e
WHERE  e.lname LIKE 'W%';

SQL Server

SQL Server supports the UNION and UNION ALL set operators using the basic SQL standard syntax. UNION DISTINCT is not supported, but UNION is the functional equivalent.

You can use SELECT ... INTO with UNION or UNION ALL, but INTO may appear only in the first query of the union. Special keywords, such as SELECT TOP and GROUP BY ... WITH CUBE, are usable with all queries in a union, but if you use them in one query you must use them with all of the queries. If you use SELECT TOP or GROUP BY ... WITH CUBE in only one query in a union, the operation will fail.

Each query in a union must contain the same number of columns. The data types of the columns do not have to be identical, but they must implicitly convert. For example, mixing VARCHAR and CHAR columns is acceptable. SQL Server uses the larger of the two columns when evaluating the size of the columns returned in the result set. Thus, if a SELECT ... UNION statement has a CHAR(5) column and a CHAR(10) column, it will display the data of both columns as a CHAR(10) column. Numeric columns are converted to and displayed as the most precise data type in the union.

For example, the following query unions the results of two independent queries that use GROUP BY ... WITH CUBE:

SELECT ta.au_id, COUNT(ta.au_id)
FROM   pubs..titleauthor AS ta
JOIN   pubs..authors     AS a ON a.au_id = ta.au_id
WHERE  ta.au_id >= '722-51-5454'
GROUP  BY ta.au_id WITH CUBE
UNION
SELECT ta.au_id, COUNT(ta.au_id)
FROM   pubs..titleauthor AS ta
JOIN   pubs..authors     AS a ON a.au_id = ta.au_id
WHERE  ta.au_id < '722-51-5454'
GROUP  BY ta.au_id WITH CUBE

VALUES Clause

The VALUES multi-row constructor is a constructor often found in INSERT statements, but it can also be used in FROM statements or anywhere you can have a table expression to create an inline table. When used in a FROM statement, the column names and table name must be aliased.

Platform	Command
MySQL/MariaDB	Supported, with limitations
Oracle	Not supported
PostgreSQL	Supported
SQL Server	Supported, with limitations

SQL standard syntax

( VALUES [ROW](<row1 columns>), [ROW](<row2 columns>), ...
   [ROW](<rown columns>) ) 
[AS <table_alias>(column1, column2[,... columnn])]

Keywords

(VALUES [ROW](<row1 columns>), [ROW](<row2 columns>), ... [ROW](<rown columns>) ): Defines a set of rows. Each row is enclosed in parentheses, and column values are separated by commas. Each row must have the same number of columns.
ROW: Optional keyword to denote the beginning of a row. Some databases do not support this keyword and others require it.
AS <table_alias>(column1, column2[, ... columnn]): Specifies names for the table and columns. When not specified, the default names for columns are column1, column2, … column.

Rules at a glance

VALUES clauses can stand alone or be included in SELECT statements, JOIN clauses, IN clauses, NOT IN clauses, DELETE statements, INSERT ... SELECT statements, UPDATE statements, and any statement that might have a query or subquery (such as DECLARE, CREATE TABLE, CREATE VIEW, and so forth). The following example defines a virtual table consisting of two rows with two named columns:

SELECT *
FROM
(VALUES ('ABC1', 'Title 1'),
   ('DCF1', 'Title 2')
) AS t(title_id, title);

VALUES can also be used in a common table expression (WITH clause) as follows:

WITH t(title_id, title) AS (
   VALUES 
      ('ABC1', 'Title 1'),
      ('DCF1', 'Title 2')
)
SELECT * FROM t;

And it can be used standalone without aliasing, as follows:

VALUES ('ABC1', 'Title 1'),
   ('DCF1', 'Title 2');

The output of the preceding queries is:

ABC1    | Title 1
DCF1    | Title 2

MySQL and MariaDB

MySQL 8 and later support the VALUES multi-row constructor, which can be used standalone or within a SELECT, INSERT, or UPDATE statement. When used as table output the ROW keyword is not optional for MySQL. For example:

SELECT *
FROM
(VALUES ROW('ABC1', 'Title 1'),
  ROW('DCF1', 'Title 2')
) AS t(title_id, title);

When used in an INSERT statement, the ROW keyword can be left out.

MariaDB also supports the VALUES multi-row constructor, which can be used standalone, in a WITH clause, or within an INSERT statement. The ROW keyword is not supported. MariaDB allows aliasing in a WITH clause, but not when VALUES is used standalone.

Oracle

Oracle does not support the VALUES multi-row constructor.

PostgreSQL

PostgreSQL fully supports the VALUES constructor, except for the ROW keyword.

SQL Server

SQL Server supports the multi-row VALUES constructor and allows its use in SELECT ... FROM and INSERT statements. It does not support its use in WITH or IN clauses or standalone, and it does not support the optional ROW keyword. When used in the FROM clause, renaming of columns is required.

WHERE Clause

The WHERE clause sets the search criteria for an operation such as SELECT, UPDATE, or DELETE. Any records in the target table(s) that do not meet the search criteria are excluded from the operation. The search conditions may include many variations, such as calculations, Boolean operators, and SQL predicates (for example, LIKE or BETWEEN). All the platforms support the SQL standard syntax.

Platform	Command
MySQL	Supported
Oracle	Supported
PostgreSQL	Supported
SQL Server	Supported

SQL standard syntax

{ WHERE search_criteria | WHERE CURRENT OF cursor_name }

Keywords

WHERE search_criteria: Defines search criteria for the statement to ensure that only the target rows are affected.
WHERE CURRENT OF cursor_name: Restricts the operation of the statement to the current row of a defined and opened cursor called cursor_name.

Rules at a glance

WHERE clauses are found in SELECT statements, DELETE statements, INSERT ... SELECT statements, UPDATE statements, and any statement that might have a query or subquery (such as DECLARE, CREATE TABLE, CREATE VIEW, and so forth).

The search conditions, all of which are described in their own entries elsewhere in this book, can include:

All records (=ALL, >ALL, <= ALL, SOME/ANY)

For example, to see publishers who live in the same city as their authors:

SELECT pub_name
FROM   publishers
WHERE city = SOME (SELECT city FROM authors);

Combinations( AND, OR, and NOT) and evaluation hierarchy

For example, to see all authors with sales in quantities greater than or equal to 75 units, or coauthors with a royalty of greater than or equal to 60:

SELECT a.au_id
FROM   authors AS a
JOIN titleauthor AS ta ON a.au_id = ta.au_id
WHERE  ta.title_id IN (SELECT title_id FROM sales
                       WHERE qty >= 75)
   OR  (a.au_id IN (SELECT au_id FROM titleauthor
                    WHERE royaltyper >= 60)
  AND  a.au_id IN (SELECT au_id FROM titleauthor
                   WHERE au_ord = 2));

Comparison operators (such as =, < >, <, >, <=, and >=)

For example, to see the last and first names of authors who don’t have a contract (i.e., authors with contract value of 0):

SELECT au_lname, au_fname
FROM   authors
WHERE  contract = 0;

Lists (IN and NOT IN)

For example, to see all authors who do not yet have a title in the titleauthor table:

SELECT au_fname, au_lname
FROM   authors
WHERE au_id NOT IN (SELECT au_id FROM titleauthor);

NULL comparisons (IS NULL and IS NOT NULL)

For example, to see titles that have NULL year-to-date sales:

SELECT title_id, SUBSTRING(title, 1, 25) AS title
FROM   titles
WHERE ytd_sales IS NULL;

Be sure not to specify = NULL in a query. NULL is unknown and can never be equal to anything. Using = NULL is not the same as specifying the IS NULL operator.

Pattern matches (LIKE and NOT LIKE)

For example, to see authors whose last names start with a “C”:

SELECT au_id
FROM   authors
WHERE au_lname LIKE 'C%';

Range operations (BETWEEN and NOT BETWEEN)

For example, to see authors with last names that fall alphabetically between “Smith” and “White”:

SELECT au_lname, au_fname
FROM   authors
WHERE  au_lname BETWEEN 'smith' AND 'white';

Programming tips and gotchas

The WHERE clause may require special handling when dealing with certain data types, such as LOBs, or certain character sets, including Unicode.

Parentheses are used to control evaluation hierarchy within a WHERE clause. Encapsulating a clause within parentheses tells the DBMS to evaluate that clause before others. Parentheses can be nested to create a hierarchy of evaluations. The innermost parenthetical clause will be evaluated first. You should watch parentheses very carefully, for two reasons:

You must always have an equal number of opening and closing parentheses. Any imbalance in the number of opening and closing parentheses will cause an error.
You should be careful where you place parentheses, since misplacing a parenthesis can dramatically change the result set of your query.

For example, consider again the following query, which returns six rows in the pubs database on the SQL Server platform:

SELECT DISTINCT a.au_id
FROM   authors AS a
JOIN   titleauthor AS ta ON a.au_id = ta.au_id
WHERE  ta.title_id IN (SELECT title_id FROM sales
                       WHERE qty >= 75)
   OR  (a.au_id  IN (SELECT au_id FROM titleauthor
                     WHERE royaltyper >= 60)
  AND   a.au_id  IN (SELECT au_id FROM titleauthor
                     WHERE au_ord = 2))

The output from this query is as follows:

au_id
-----------
213-46-8915
724-80-9391
899-46-2035
998-72-3567

Changing just one set of parentheses produces different results:

SELECT DISTINCT a.au_id
FROM   authors AS a
JOIN titleauthor AS ta ON a.au_id = ta.au_id
WHERE  (ta.title_id IN (SELECT title_id FROM sales
                      WHERE qty >= 75)
   OR  a.au_id  IN (SELECT au_id FROM titleauthor
                    WHERE royaltyper >= 60))
 AND  a.au_id  IN (SELECT au_id FROM titleauthor
                   WHERE au_ord = 2)

This time, the output will look like this:

au_id
-----------
213-46-8915
724-80-9391
899-46-2035

WITH Clause

The WITH clause defines a short-term view that is instantiated for the duration of a parent query. It may or may not have an alias associated with it in order to ease referencing from the parent query (or subqueries) later. The temporary named result set created by a WITH clause is called a common table expression (CTE), and this is also known as subquery factoring. CTEs are not stored in the database schema like standard views, but they behave in essentially the same way. In fact, when CTEs were added to the SQL:1999 standard, they were simply referred to as statement-scoped views. All the platforms support the SQL standard syntax for the WITH clause, some with extensions.

Platform	Command
MySQL	Supported
Oracle	Supported, with variations
PostgreSQL	Supported, with variations
SQL Server	Supported, with variations

SQL standard syntax

WITH [RECURSIVE] with_query[, ...]
SELECT...

Keywords

with_query: Defines a query with a name that takes the form some_name AS (query_defini⁠tion) or some_name(col1,col2,col3 ...) AS (query_definition). There can be one or more with_querys before the final query. Each with_query must be separated by a comma.
RECURSIVE: Denotes that the CTE batch may contain queries that call themselves, generally using tail recursion. This keyword is syntactic sugar providing a hint to the query planner that a WITH clause has recursive elements.

Rules at a glance

WITH clauses are used to make complex SQL queries easier to read and debug by dividing them into subsets. They are also used to compartmentalize a subquery that is reused in multiple parts of a parent query (each CTE is associated with an anchoring parent query, which may in turn have multiple CTEs), to write recursive queries, and to improve performance.

Some databases allow for writable CTEs, which have elements that update data and return the changed data.

Programming tips and gotchas

The WITH clause can have implications for query performance. Depending on the database vendor and version, a query written with WITH may perform better or worse than a similar query utilizing nested SELECT statements. When in doubt, experiment with writing your query using WITH and without WITH.

Here’s an example of a non-recursive CTE:

WITH au AS (SELECT au_state AS state, COUNT(*) AS au_count
   FROM authors 
   GROUP BY au_state
   ),
   pu AS (SELECT pub_state AS state, COUNT(*) AS pub_count
      FROM publishers 
   GROUP BY pub_state
   )
SELECT au.state, au.au_count, pu.pub_count
FROM au INNER JOIN pu ON au.state = pu.state;

A completely SQL standard–compliant recursive CTE that counts from 1 to 20 might look like this:

WITH RECURSIVE numbers AS (
   SELECT 1 AS n
   UNION ALL 
   SELECT n + 1
      FROM numbers
   WHERE n+1 <= 20 
)
SELECT *
FROM numbers;

There is no physical numbers table; numbers is a CTE expression that builds on itself. If you did have a numbers table in your database, however, the CTE version would still be used; this is because in the case of name clashes CTEs take precedence.

Recursive CTEs cannot include DISTINCT. Some databases, like PostgreSQL, will error if you prefix a with_query with RECURSIVE and it has no recursive elements. SQL Server, although it supports recursive queries, does not allow the RECURSIVE keyword.

MySQL and MariaDB

MySQL fully supports the WITH clause as of version 8.0. MariaDB introduced support in version 10.2.

Oracle

Oracle supports the SQL standard WITH as well as a MATERIALIZE Hint that forces a CTE to be materialized for better performance. To force materialization you would write something like:

WITH au AS (SELECT /*+ MATERIALIZE */ au_state, COUNT(*) AS au_count
   FROM authors 
   GROUP BY au_state
   ),
   pu AS (SELECT /*+ MATERIALIZE */ pub_state, COUNT(*) AS pub_count
      FROM publishers 
   GROUP BY pub_state
   )
SELECT au.state, au.au_count, pu.pub_count
FROM au INNER JOIN pu ON au.state = pu.state;

PostgreSQL

PostgreSQL fully supports the SQL standard WITH clause and some extensions to it. PostgreSQL allows CTEs to contain one or more INSERT/UPDATE/DELETE statements if they are followed with a RETURNING clause. The final query can also be an INSERT/UPDATE/DELETE but need not have a RETURNING clause. This is useful, for example, to move deleted records to another table, as follows:

WITH del AS (DELETE
   FROM authors 
   WHERE au_state = 'CA' RETURNING *)
INSERT INTO deleted_authors(au_id)
SELECT del.au_id
FROM del;

PostgreSQL also supports a MATERIALIZED/NOT MATERIALIZED extension to the standard. To force materialization of a CTE you would prefix it with MATERIALIZED, and if you wanted to discourage materialization you could similarly prefix it with NOT MATERIALIZED, as follows:

WITH au AS MATERIALIZED
(SELECT au_state AS state, COUNT(*) AS au_count
   FROM authors 
   GROUP BY au_state
),
pu AS NOT MATERIALIZED 
(SELECT pub_state AS state, COUNT(*) AS pub_count
   FROM publishers 
   GROUP BY pub_state
)
SELECT au.state, au.au_count, pu.pub_count
FROM au INNER JOIN pu ON au.state = pu.state;

WITH RECURSIVE is also supported in the definition of views, which allows for writing recursive views. PostgreSQL will complain if you use WITH RECURSIVE and have no recursive elements in your view.

SQL Server

SQL Server supports WITH but does not allow the RECURSIVE keyword; it internally determines whether a WITH clause is recursive or not. SQL Server does not allow ORDER BY clauses in CTEs unless they are used in conjunction with TOP. It also does not allow INTO or OPTION clauses with query hints. Unlike other statements where the ; is optional, CTEs must start with a ; if they are part of a set of query statements.

The earlier recursive CTE would be written without the RECURSIVE keyword as follows:

WITH numbers AS (
   SELECT 1 AS n
   UNION ALL 
   SELECT n + 1
      FROM numbers
   WHERE n+1 <= 20 
)
SELECT *
FROM numbers;

WITH ORDINALITY Clause

The WITH ORDINALITY clause adds an incrementing integer column to the result of a set-returning function. The name of the column is ordinality unless it is renamed. It is generally used in the FROM or JOIN clause and is commonly used in conjunction with the UNNEST function to expand and number array data.

Platform	Command
MySQL	Not supported
Oracle	Not supported
PostgreSQL	Supported
SQL Server	Not supported

SQL standard syntax

set_returning_function_call WITH ORDINALITY [AS ..]

Keywords

set_returning_function_call: Defines a function call, for example UNNEST(somevalue).

Rules at a glance

WITH ORDINALITY is used to number the results of a set-returning function. For regular SELECT queries, you would use ROW_NUMBER() OVER() instead for numbering.

Programming tips and gotchas

Here is a PostgreSQL example of using WITH ORDINALITY to number an array of values:

SELECT *
FROM unnest(ARRAY['PC8888','BU1032',
                 'PS7777','PS3333','BU1111']
            ) WITH ORDINALITY AS title_id;

The output of this query is:

title_id | ordinality
----------------------
PC8888   |  1
BU1032   |  2
PS7777   |  3
PS3333   |  4
BU1111   |  5

You can also rename the output of the columns using aliases, as follows:

SELECT *
FROM unnest(ARRAY['PC8888','BU1032',
   'PS7777','PS3333','BU1111']
) WITH ORDINALITY AS title_id(id, ord);

WITH ORDINALITY is often used in a JOIN clause or a LATERAL JOIN clause.

Chapter 4. Reading Your Data

How to Use This Chapter

SQL Platform Support

SQL Command Reference

ALL/ANY/SOME Operators

SQL standard syntax

Keywords

Rules at a glance

Programming tips and gotchas

MySQL

Oracle

PostgreSQL

SQL Server

See also

BETWEEN Operator

SQL standard syntax

Keywords

Rules at a glance

Programming tips and gotchas

See also

EXCEPT Set Operator

SQL standard syntax

Keywords

Rules at a glance

Programming tips and gotchas

MySQL and MariaDB

Oracle

PostgreSQL

SQL Server

See also

EXISTS Operator

SQL standard syntax

Keywords

Rules at a glance

Programming tips and gotchas

See also

FILTER Clause

SQL standard syntax

Keywords

Rules at a glance

See also

GROUP BY Clause

SQL standard syntax

Keywords

Rules at a glance

The GROUP BY clause

The HAVING clause

IN Operator

SQL standard syntax

Keywords

Rules at a glance

See also

INTERSECT Set Operator

SQL standard syntax

Keywords

Rules at a glance

Programming tips and gotchas

MySQL and MariaDB

Oracle

PostgreSQL

SQL Server

See also

IS Operator

SQL standard syntax

Keywords

Rules at a glance

Programming tips and gotchas

See also

JOIN Subclause

SQL standard syntax

Keywords

Rules at a glance

Programming tips and gotchas

MySQL

Oracle

PostgreSQL

SQL Server

See also

LIKE Operator

SQL standard syntax