Chapter 1. SQL History and Implementations

In the early 1970s, the seminal work of IBM research fellow Dr. E. F. Codd led to the development of a relational data model product called SEQUEL, or Structured English Query Language. SEQUEL ultimately became SQL, the Structured Query Language. Ironically, the SQL standard treats “SQL” as the name of this language and not as an acronym. Jim Melton, longtime editor of the SQL standard, claims that if SQL is an acronym, it stands for “SQL Query Language.”

IBM, along with other relational database vendors, wanted a standardized method for accessing and manipulating data in a relational database. Although IBM was the first to develop relational database theory, Oracle was first to market the technology. Over time, SQL proved popular enough in the marketplace to attract the attention of the American National Standards Institute (ANSI) in cooperation with the International Standards Organization (ISO), which released standards for SQL in 1986, 1989, 1992, 1999, 2003, 2008, 2011, 2016, and 2019.

Since 1986, various competing languages have allowed developers to access and manipulate relational data. However, few were as easy to learn or as universally accepted as SQL. Developers and administrators now have the benefit of being able to learn a single language that, with minor adjustments, is applicable to a wide variety of database platforms, applications, and products.

SQL in a Nutshell, 4th edition, provides the syntax for five common implementations of SQL:

The ANSI/ISO SQL standard (SQL:2016)
MySQL 8 and MariaDB 10.5
Oracle Database 19c

PostgreSQL 14
Microsoft SQL Server 2019

The Relational Model and ANSI SQL

Relational database management systems (RDBMSs) such as those covered in this book are the primary engines of information systems worldwide, and particularly of web applications and distributed client/server computing systems. They enable a multitude of users to quickly and simultaneously access, create, edit, and manipulate data without impacting other users. They also allow developers to write useful applications to access their resources and provide administrators with the capabilities they need to maintain, secure, and optimize organizational data resources.

An RDBMS is defined as a system whose users view data as a collection of tables related to each other through common data values. Data is stored in tables, which are composed of rows and columns. Tables of independent data can be linked (or related) to one another if they each have unique, identifying columns of data (called keys) that represent data values held in common. E. F. Codd first described relational database theory in his landmark paper “A Relational Model of Data for Large Shared Data Banks,” published in the Communications of the ACM (Association for Computing Machinery) in June 1970. Under Codd’s new relational data model, data was structured (into tables of rows and columns); manageable using operations such as selections, projections, and joins; and consistent as the result of integrity rules such as keys and referential integrity. These terms and their definitions were based upon earlier mathematical concepts—in particular, relational algebra—and as such are fully provable theorems. Codd also articulated rules that governed how a relational database should be designed. The process for applying these rules is now known as normalization.

Codd’s Rules for Relational Database Systems

Codd applied rigorous mathematical theories (primarily relational algebra and set theory) to the management of data, from which he compiled a list of criteria a database must meet to be considered relational. Cobb, at least in part, set forth his 12 principles to constrain the overenthusiastic marketing verbiage of the many vendors of every remotely DBMS-like product who wanted to claim to be “relational.” In fact, Cobb was so emphatic about these principles of relational databases as a mechanism of anti-hype that he further laid out an important foundational principle he called Rule 0:

Rule 0: For any system that is advertised as, or claimed to be, a relational database management system, that system must be able to manage databases entirely through its relational capabilities.

At its core, the relational database concept centers around storing data in tables. This concept is now so common as to seem trivial; however, it’s only since the 1980s that the idea of designing a system capable of sustaining the relational model has not been considered a long shot with limited usefulness.

Following are Codd’s Twelve Principles of Relational Databases:

Information is represented logically in tables.
Data must be logically accessible by table, primary key, and column.
Null values must be uniformly treated as “missing information,” not as empty strings, blanks, or zeros.
Metadata (data about the database) must be stored in the database just as regular data is.
A single language must be able to define data, views, integrity constraints, authorization, transactions, and data manipulation.
Views must show the updates of their base tables and vice versa.
A single operation must be available to do each of the following operations: retrieve data, insert data, update data, or delete data.
Batch and end-user operations are logically separate from physical storage and access methods.
Batch and end-user operations can change the database schema without having to re-create it or the applications built upon it.
Integrity constraints must be available and stored in the metadata, not in an application program.
The data manipulation language of the relational system should not care where or how the physical data is distributed and should not require alteration if the physical data is centralized or distributed.
Any row processing done in the system must obey the same integrity rules and constraints that set-processing operations do.

While these rules do not apply to application development, they continue to be the litmus test used to validate the “relational” characteristics of a database platform; a database that does not meet all of these rules is not fully relational. Currently, most commercial RDBMS products pass Codd’s test, and all the platforms discussed in the reference material of SQL in a Nutshell, 4th edition, satisfy these requirements. The standard also specifies support for handling non-relational formats like JSON and XML; we’ll discuss this in Chapter 10.

Understanding Codd’s principles assists developers in the proper development and design of relational databases (RDBs). The following sections detail how some of these requirements are met within SQL using RDBs.

Data structures (rules 1, 2, 8, and 9)

Codd’s rules 1 and 2 state that “information is represented logically in tables” and that “data must be logically accessible by table, primary key, and column.” So, the process of defining a table for a relational database does not require that programs instruct the database how to interact with the underlying physical data structures. Furthermore, SQL logically isolates the processes of accessing data and physically maintaining that data, as required by rules 8 and 9, which state that batch and end-user operations “are logically separate from physical storage and access methods” and “can change the database schema without having to re-create it or the applications built upon it.”

In the relational model, data is shown logically as a two-dimensional table that describes a single entity (for example, business expenses). Academics refer to tables as entities and to columns as attributes. Tables are composed of rows, or records (academics call them tuples), and columns (called attributes, since each column of a table describes a specific attribute of the entity). The intersection of a record and a column provides a single value. However, it is quite common to hear this referred to as a field, from spreadsheet parlance. The column or columns whose values uniquely identify each record can act as a primary key. These days this representation seems elementary, but it was actually quite innovative when it was first proposed.

The SQL standard defines a whole data structure hierarchy beyond simple tables, though tables are the core data structure. Relational design handles data on a table-by-table basis, not on a record-by-record basis. This table-centric orientation is the heart of set programming. Consequently, almost all SQL commands operate much more efficiently against sets of data within or across tables than against individual records. Said another way, effective SQL programming requires that you think in terms of sets of data, rather than of individual rows.

Figure 1-1 is a description of SQL’s terminology used to describe the hierarchical data structures used by a relational database: catalogs contain sets of schemas; schemas contain sets of objects, such as tables and views; and tables are composed of sets of columns and records.

For example, in a Business_Expense table, a column called Expense_Date might show when an expense was incurred. Each record in the table describes a specific entity; in this case, everything that makes up a business expense (when it happened, how much it cost, who incurred the expense, what it was for, and so on).

Each attribute of an expense—in other words, each column—is supposed to be atomic; that is, each column is supposed to contain one, and only one, value. If a table is constructed in which the intersection of a row and column can contain more than one distinct value, one of SQL’s primary design guidelines has been violated. (That said, some of the database platforms discussed in this book do allow you to place more than one value into a column, via the VARRAY or TABLE data types or, more commonly in the last several years, XML or JSON data types.)

Rules of behavior are specified for column values. Foremost is that column values must share a common domain, better known as a data type. For example, if the Expense_Date field is defined as having a DATE data type, the value ELMER cannot be placed into that field because it is a string, not a date, and the Expense_Date field can contain only dates. In addition, the SQL standard allows further control of column values through the application of constraints (discussed in detail in Chapter 2) and assertions. A SQL constraint might, for instance, check to ensure when inserting a new expense record that its Expense_Date value is no more than 90 days old, matching the company policy for expense reporting.

Additionally, data access for all individuals and computer processes is controlled at the schema level by an AuthorizationID or user. Permissions to access or modify specific sets of data may be granted or restricted on a per-user basis.

SQL databases also employ character sets and collations. Character sets are the “symbols” or “alphabets” used by the “language” of the data. For example, the American English character set does not contain the special character for ñ in the Spanish character set. Collations are sets of sorting rules that operate on a character set. A collation defines how a given data manipulation operation sorts data. For example, an American English character set might be sorted either by character-order, case-insensitive, or by character-order, case-sensitive.

The SQL standard does not say how data should be sorted, only that platforms must provide common collations found in a given language.

It is important to know what collation you are using when writing SQL code against a database platform, as it can have a direct impact on how queries behave, and particularly on the behavior of the clauses of a SELECT statement, such as WHERE, ORDER BY, GROUP BY, and PARTITION BY. For example, a query that sorts data using a binary collation will return data in a very different order than one that sorts data using, say, a Swedish collation. This is also very important when migrating SQL code between database platforms since their default sorting behavior may vary widely. For example, Oracle is normally case-sensitive, while Microsoft SQL Server is not. Moving an unmodified query from Oracle to SQL Server might therefore produce a wildly different result set because Oracle will evaluate values like “Halloween” and “HALLOWEEN” as unequal, whereas SQL Server will see them as equal by default.

NULLs (rule 3)

Most databases allow any of their supported data types to store NULL values. Inexperienced SQL developers tend to think of NULL values as zero-length strings, but in SQL NULL literally means that the value is unknown or indeterminate. (This question alone—whether NULL should be considered unknown or indeterminate—is the subject of much academic debate.) This differentiation enables a database designer to distinguish between those entries that represent a deliberately placed zero, for example, and those where either the data is not recorded in the system or a NULL has been explicitly entered. As an illustration of this semantic difference, consider a system that tracks payments. If a product has a NULL price, that does not mean the product is free; instead, a NULL price indicates that the amount is not known or perhaps has not yet been determined.

There is a good deal of differentiation between the database platforms in terms of how they handle NULL values. This leads to some major porting issues relating to these values. For example, an empty string (i.e., a NULL string) is inserted as a NULL value on Oracle. All the other databases covered in this book permit the insertion of an empty string into VARCHAR and CHAR columns.

One side effect of the indeterminate nature of a NULL value is that it cannot be used in a calculation or a comparison. Here are a few brief but very important rules, from the ANSI/ISO standard, to remember about the behavior of NULL values when dealing with NULLs in SQL statements:

A NULL value cannot be inserted into a column defined with the NOT NULL constraint.
NULL values are not equal to each other. It is a frequent mistake to compare two columns that contain NULL and expect the NULL values to match. (The proper way to identify a NULL value in a WHERE clause or in a Boolean expression is to use phrases such as “value IS NULL” and “value IS NOT NULL”.)
A column containing a NULL value is ignored in the calculation of aggregate values such as AVG, SUM, or MAX COUNT.
When columns that contain NULL values are listed in the GROUP BY clause of a query, the query output contains a single row for NULL values. In essence, the ANSI/ISO standard considers all NULLs found to be in a single group.
DISTINCT and ORDER BY clauses, like GROUP BY, also see NULL values as indistinguishable from each other. With the ORDER BY clause, the vendor is free to choose whether NULL values sort high (first in the result set) or sort low (last in the result set) by default. Some database vendors allow you to define how NULL values are sorted on a per-query basis using the NULL FIRST or NULL LAST keywords.

Metadata (rules 4 and 10)

Codd’s fourth rule for relational databases states that data about the database must be stored in standard tables, just as all other data is. Data that describes the database itself is called metadata. For example, every time you create a new table or view in a database, records are created and stored that describe the new table. Additional records are needed to store any columns, keys, or constraints on the table. This technique is implemented in most commercial and open source SQL database products. For example, SQL Server uses what it calls “system tables” to track all the information about the databases, tables, and database objects in any given database. It also has “system databases” that keep track of information about the server on which the database is installed and configured.

In addition to system tables, the SQL standard defines a set of basic metadata available through a widely adopted set of views stored in a special schema called INFORMATION_SCHEMA. Notably, Oracle (since 2015) and IBM DB2 do not support this schema. Operationally, this should not pose a problem for you since there are open source scripts to map Oracle database catalog views to the SQL standard INFORMATION_SCHEMA format.

The Language (Rules 5 and 11)

Codd’s rules do not require SQL to be used with a relational database. His rules, particularly rules 5 and 11, only specify how the language should behave when coupled with a relational database. At one time SQL competed with other languages (such as Digital’s RDO and Fox/PRO) that might have fit the relational bill, but SQL won out for three reasons. First, SQL is a relatively simple, intuitive, English-like language that handles most aspects of data manipulation. If you can read and speak English, SQL simply makes sense. Second, SQL is satisfyingly high-level. A developer or database administrator (DBA) does not have to spend time ensuring that data is stored in the proper memory registers or that data is cached from disk to memory; the database management system (DBMS) handles that task automatically. Finally, because no single vendor owns SQL it was adopted across a number of platforms, ensuring broad support and wide popularity.

Views (rule 6)

A view is a virtual table that does not exist as a physical repository of data, but is instead constructed on the fly from a SELECT statement whenever that view is queried. Views enable you to construct different representations of the same source data for a variety of audiences without having to alter the way in which the data is stored.

Some vendors support database objects called materialized views, also known as indexed views. Don’t let the similarity of these terms confuse you; materialized views are not governed by the same rules as SQL standard views.

Set operations (rules 7 and 12)

Other database manipulation languages, such as Ruby on Rails, Django, and LINQ for .NET, perform their data operations quite differently from SQL. These languages require you to tell the program exactly how to treat the data, one record at a time. Since the program iterates down through a list of records, performing its logic on one record after another, this style of programming is frequently called row processing or procedural programming.

In contrast, SQL programs operate on logical sets of data. Set theory is applied in almost all SQL statements, including SELECT, INSERT, UPDATE, and DELETE statements. In effect, data is selected from a set called a “table.” Unlike the row-processing style, set processing allows a programmer to tell the database simply what is required, not how each individual piece of data should be handled. Sometimes set processing is referred to as declarative processing since a developer declares only what data is wanted (as in the declaration, “Return all employees in the southern region who earn more than $70,000 per year”) rather than describing the exact steps used to retrieve or manipulate the data.

Set theory was the brainchild of mathematician Georg Cantor, who developed it at the end of the 19th century. At the time, set theory (and Cantor’s theory of the infinite) was quite controversial. Today, set theory is such a common part of life that it is learned in elementary school. Things like the selection catalogs for your favorite movie streaming service, online console gaming services, and popular music applications are all simple and common examples of applied set theory.

Relational databases use relational algebra and tuple relational calculus to mathematically model the data in a given database and queries acting upon that data. These theories were also introduced by E. F. Codd, along with his 12 rules for relational databases.

Examples of set theory in conjunction with relational databases are detailed in the following section.

Codd’s Rules in Action: Simple SELECT Examples

Up to this point, this chapter has focused on the individual aspects of a relational database platform as defined by Codd and implemented under the SQL standard. This section presents a high-level overview of the most important SQL statement, SELECT, and some of its most salient points—namely, the following three relational operations:

Projections: Retrieve specific columns of data.
Selections: Retrieve specific rows of data.
Joins: Return columns and rows from two or more tables in a single result set.

Although at first glance it might appear as though the SELECT statement deals only with the relational selection operation, in actuality, SELECT deals with all three operations.

The following statement embodies the projection operation by selecting the first and last names of an author, plus their home state, from the authors table:

SELECT au_fname, au_lname, state
FROM   authors;

The results from any such SELECT statement are presented as another table of data:

au_fname             au_lname                           state
----------------  ---------------------------- ----------------
Johnson              White                              CA
Marjorie             Green                              CA
Cheryl               Carson                             CA
Michael              O’Leary                            CA
Meander              Smith                              KS
Morningstar          Greene                             TN
Reginald             Blotchet-Halls                     OR
Innes                del Castillo                       MI

The resulting data is sometimes called a result set, work table, or derived table, differentiating it from the base table in the database that is the target of the SELECT statement.

It is important to note that the relational operation of projection, not selection, is specified using the SELECT clause (that is, the keyword SELECT followed by a list of expressions to be retrieved) of a SELECT statement. Selection—the operation of retrieving specific rows of data—is specified using the WHERE clause in a SELECT statement. WHERE filters out unwanted rows of data and retrieves only the requested rows. Continuing with the previous example, the following statement selects authors from states other than California:

SELECT au_fname, au_lname, state
FROM   authors
WHERE  state <> 'CA';

Whereas the first query retrieved all authors, the result of this second query is a much smaller set of records:

au_fname            au_lname                           state
---------------- ----------------------------------- -------------
Meander             Smith                              KS
Morningstar         Greene                             TN
Reginald            Blotchet-Halls                     OR
Innes               del Castillo                       MI

Different vendors allow you to join varying numbers of tables in a single query. Whereas older database platforms allowed no more than 256 joins within a given query, today most database platforms are limited only by available system resources.

However, keep in mind that your database engine will consume more system resources and incur more latency the more tables you join in a single query. For example, a single SELECT statement joining 12 tables will have to consider up to 28,158,588,057,600 possible join orders.

Today’s database query optimizers are very sophisticated and employ cost-based optimizations and heuristics to make performing queries involving a dozen or more joins a manageable task. However, many experienced SQL developers try to limit their SELECT statements to 16 or fewer joins in order to keep the query’s logic easy to understand. In other situations, experienced SQL developers might use alternative coding practices, such as common table expressions (CTEs), temporary tables, or subqueries, to achieve good performance.

By combining the capabilities of projection and selection in a single query, you can use SQL to retrieve only the columns and records that you need at any given time.

Joins are the next, and last, relational operation covered in this section. A join relates one table to another in order to return a result set consisting of related data from both tables.

The ANSI/ISO standard method of performing joins is to use the JOIN clause in a SELECT statement. An older method, sometimes called a theta join, analyzes the join search argument in the WHERE clause. The following example shows both approaches. Each statement retrieves employee information from the employee base table as well as job descriptions from the jobs base table. The first SELECT uses the newer ANSI/ISO SQL JOIN clause, while the second SELECT uses a theta join:

-- ANSI/ISO style
SELECT a.au_fname, a.au_lname, t.title_id
FROM   authors AS a
JOIN   titleauthor AS t ON a.au_id = t.au_id
WHERE  a.state <> 'CA';

-- Theta style
SELECT a.au_fname, a.au_lname, t.title_id
FROM   authors AS a,
       titleauthor AS t
WHERE  a.au_id = t.au_id
   AND a.state <> 'CA';

Although theta joins are universally supported across the various platforms and incur no performance penalty, this is considered an inferior coding pattern because anyone reading or maintaining the query cannot immediately discern the arguments used to define the join condition from those used as filtering conditions.

There are more issues with theta joins than ease of readability. For example, Oracle allows only one outer join comparison using (+), which can be quite a problem when logically constructing an outer join on a multicolumn key. In a situation like this, it can be difficult to avoid syntax errors or even retrieving the wrong result set.

For more information about joins, see the “JOIN Subclause”.

History of the SQL Standard

In response to the proliferation of SQL dialects, ANSI published its first SQL standard in 1986 to bring about greater conformity among vendors. This was approved by the ISO in 1987 and followed by a second, widely adopted standard in 1989 (also approved by the ISO). ANSI/ISO released their first joint update in 1992, known as SQL2, SQL-92, or SQL:1992, and a second in 1999, termed SQL3, SQL-99, or SQL:1999. The next update, made in 2003, is referred to as SQL:2003 and so on.

Between the release of SQL:1992 and the development of SQL:1999, the SQL draft specifications were divided into SQL3 and SQL4. However, the SQL4 draft was pruned, and the SQL3 draft was adopted as SQL:1999. After this point, the names of the SQL standards transitioned permanently to the SQL:yyyy designation.

Each revision of the SQL standard, officially known as ISO/IEC 9075 Database Language SQL, adds new features and incorporates new commands and capabilities into the language. Here is a brief list of the releases and some of their major contributions (features or specifications):

SQL-87: Standard first formalized by ANSI; support for transactions and CREATE, READ, UPDATE, and DELETE operations
SQL-89: Minor revision, added referential integrity constraints
SQL-92: Major revision (ISO 9075), added support for internationalization, etc.
SQL:1999: Added support for user-defined types, regular expression mapping, triggers, procedural and control-flow statements, and more
SQL:2003: Added support for XML and OLAP (window functions), sampling, and enhanced numeric functions
SQL:2006: Clarified how SQL and SML interact and added support for XQuery
SQL:2008: Incorporated various improvements and enhancements that had been made in several of the most prominent RDBMS platforms (INSTEAD OF triggers, TRUNCATE statement, FETCH clause, etc.) and expanded the XML specification
SQL:2011: Introduced new features for managing temporal data
SQL:2016: Described how SQL interacts with JavaScript Object Notation (JSON) and added support for polymorphic table functions and row pattern matching
SQL:2019: Described how SQL interacts with multidimensional arrays (MDAs)

The capabilities added in SQL:2019 and the planned SQL/PGQ (Graph Query Language) extension are not yet supported in the products considered in this edition of SQL in a Nutshell, so our coverage in this book extends only to SQL:2016. SQL:2019 coverage will come in future editions.

Levels of Conformance

SQL-92 introduced three levels of conformance indicating degrees of compliance with the standard: Entry, Intermediate, and Full (the U.S. National Institute of Standards and Technology later added a Transitional level between the first two). Vendors had to achieve at least Entry-level conformance to claim ANSI/ISO SQL compliance. Each higher level of the standard was a superset of the subordinate level, meaning that each higher level included all the features of the lower levels of conformance.

Later, SQL:1999 altered the base levels of conformance, doing away with the original three categories. Instead, it required vendors to implement a minimum set of mandatory features collectively called “Core SQL” in order to claim (and advertise) that they are SQL:1999 compliant. This included the old Entry Level SQL-92 feature set, features from other SQL-92 levels, and some new features. Vendors could also opt to implement additional parts of the standard, described in the next section.

Parts of the SQL Standard

Since SQL:1999, the SQL standard has been divided into a set of parts, numbered 1 to 14 (as of SQL:2016). Not all of these were publicly released, and not all achieved widespread adoption. The following list describes the different parts of the standard:

Part 1, SQL/Framework: Includes common definitions and concepts used throughout the standard. Defines the way the standard is structured and how the various parts relate to one another, and describes the conformance requirements set out by the standards committee.
Part 2, SQL/Foundation: Includes the Core, the central elements of the language, including both mandatory and optional features. This is the largest and most important part of the standard.
Part 3, SQL/CLI (Call-Level Interface): Defines the call-level interface for dynamically invoking SQL statements from external application programs. Also includes more than 60 routine specifications to facilitate the development of truly portable shrinkwrapped software.
Part 4, SQL/PSM (Persistent Stored Modules): Standardizes procedural language constructs similar to those found in platform-specific SQL dialects such as PL/SQL and Transact-SQL.
Part 9, SQL/MED (Management of External Data): Defines the management of data located outside of the database platform using datalinks and a wrapper interface.
Part 10, SQL/OLB (Object Language Binding): Describes how to embed SQL statements in Java programs. It is closely related to JDBC, but offers a few advantages. It is also very different from the traditional host language binding possible in early versions of the standard.
Part 11, SQL/Schemata: Defines over 85 views used to describe the metadata of each database and stored in a special schema called INFORMATION_SCHEMA.
Part 13, SQL/JRT (Java Routines and Types): Defines a number of SQL routines and types using the Java programming language. Several features of Java, such as Java static methods and classes, are supported.
Part 14, SQL/XML: Adds a new type called XML, four new operators (XMLPARSE, XMLSERIALIZE, XMLROOT, and XMLCONCAT), several new functions (described in Chapter 7), and the new IS DOCUMENT predicate. Also includes rules for mapping SQL-related elements (like identifiers, schemas, and objects) to XML-related elements.

There is also a Part 15, SQL/MDA, which, as mentioned earlier, first appeared in 2019 and is’nt included in the SQL:2016 standard. Note that Parts 5–8 and Part 12 were not released to the public by design. Parts 5 and 8 were originally defined within Part 2, SQL/Foundation, then split out, but were later merged back into the same part. Parts 7 and 12 were canceled due to lack of progress, and the key requirement for Part 7, temporal support, was ultimately added to Part 2 in SQL:2011.

The SQL standard, which covers retrieval, manipulation, and management of data, formalizes many SQL behaviors and syntax structures across a variety of platforms. This standard has become even more important as open source database products, such as MySQL and PostgreSQL, have grown in popularity and begun being developed by virtual teams rather than large corporations.

Be aware that an RDBMS platform may claim SQL compliance by meeting the basic Core SQL:1999 standards, so read the vendor’s fine print for a full description of its ANSI/ISO conformity. An understanding of the various parts of the standard will help you get a clearer idea of the capabilities of a particular RDBMS and of how different features might behave when SQL code is transported to other database products.

This book details the SQL implementations of four popular RDBMSs. These vendors do not fully support everything defined in the SQL standard; all RDBMS platforms play a constant game of tag with the standards bodies, which often update, refine, or otherwise change the benchmark as soon as vendors start to close in on the standard. Conversely, the vendors often implement new features that are not yet a part of the standard but that boost the effectiveness of their users.

SQL Statement Classes

SQL-92 grouped statements into three broad categories:

Data Manipulation Language (DML): Provides specific data-manipulation commands such as SELECT, INSERT, UPDATE, and DELETE
Data Definition Language (DDL): Includes commands that handle the accessibility and manipulation of database objects, such as CREATE and DROP
Data Control Language (DCL): Includes permission-related commands such as GRANT and REVOKE

In contrast, SQL:1999 and later define seven core categories, or classes, that provide a general framework for the types of commands available in SQL. This new framework attempts to identify the statements within each class more accurately and logically, and provides for the development of new features and statement classes. It also allowed some “orphaned” statements that did not fit well into any of the old categories to be properly classified.

Table 1-1 identifies the SQL statement classes and lists some of the commands in each class, each of which is fully discussed later. At this point, the key is to remember the statement class titles.

Table 1-1. SQL statement classes
Class	Description	Example commands
Connection statements	Start and end a client connection.	`CONNECT`, `DISCONNECT`
Control statements	Control the execution of a set of SQL statements.	`CALL`, `RETURN`
Data statements	May have a persistent and enduring effect upon data.	`SELECT`, `INSERT`, `UPDATE`, `DELETE`
Diagnostic statements	Provide diagnostic information and raise exceptions and errors.	`GET DIAGNOSTICS`
Schema statements	May have a persistent and enduring effect on a database schema and objects within that schema.	`ALTER`, `CREATE`, `DROP`
Session statements	Control default behavior and other parameters for a session.	`SET` statements like `SET CONSTRAINT`
Transaction statements	Set the starting and ending points of a transaction.	`COMMIT`, `ROLLBACK`

Those who work with SQL regularly should become familiar with both the old (SQL-92) and the new (SQL:1999 and later) statement classes since both nomenclatures are still used to refer to SQL features and statements.

SQL Dialects

The constantly evolving nature of the SQL standard has given rise to a number of SQL dialects among the various vendors and platforms. These dialects commonly evolved because a given database vendor’s users requested that new capabilities be added to the database before (sometimes many years before) the ANSI/ISO committees drafted the applicable standards. Occasionally, the academic or research communities also introduced new features in response to pressures from competing technologies. For example, many database vendors have augmented their current programmatic offerings with JSON and XML, or found other ways to offer features found in non-relational database platforms (Chapter 10 discusses how to use XML and JSON to store and query non-relational data in a relational database).

Many of these dialects include conditional processing capabilities (such as those that control processing through IF ... THEN statements), control-of-flow functions (such as WHILE loops), variables, and error-handling capabilities. Because the ANSI/ISO committees had not yet developed standards for these important features at the time users began to demand them, RDBMS vendors created their own commands and syntax. In fact, some of the earliest vendors from the 1980s have variances in the most elementary commands, such as SELECT, because their implementations predate the standards. When attempting to create SQL code that is interoperable across database platforms, keep in mind that your mileage may vary.

Part 4 of the standard provides many features associated with programming stored procedures and incorporates many of the extensions offered by these dialects.

Some popular dialects of SQL include:

PL/pgSQL: The SQL dialect and extensions implemented in PostgreSQL. The acronym stands for Procedural Language/PostgreSQL.
PL/SQL: Oracle’s procedural extension to SQL. PL/SQL stands for Procedural Language/SQL; this dialect bears many similarities to the language Ada.
SQL/PSM: An extension of SQL with a procedural language for use in stored procedures. MySQL, MariaDB, and PostgreSQL implement the SQL/Persistent Stored Module of the Core SQL standard. MariaDB also supports PL/SQL.
Transact-SQL: Used by both Microsoft SQL Server and Sybase Adaptive Server, now owned by SAP. As Microsoft and SAP/Sybase have moved away from the common platform they shared in the early 1990s, their implementations of Transact-SQL have also diverged widely, but the most basic commands are still very similar.

Users who plan to work extensively with a single database system should learn the intricacies of their preferred SQL dialect or platform.

NoSQL

There are many non-relational database platforms that have eschewed SQL, commonly known as NoSQL databases. These are non-tabular databases designed to satisfy very different requirements than relational databases; they use different data structures, such as key/value pairs, graphs, and documents, facilitating certain kinds of operations. Ironically (given the name), many of the most popular NoSQL offerings do provide some level of support for SQL. For example, users of Apache Hadoop typically solve their query language challenges by using Apache Hive, developed by Facebook, or Apache Pig, developed by Yahoo; Hive supports a significant subset of the SQL standard, and Pig’s query language uses SQL-like syntax. Similarly, Cassandra Query Language (CQL), used by the Cassandra NoSQL platform, includes familiar syntax for queries, data manipulation, and data definition.

It may seem counterintuitive, but the reality is that both open source and commercial NoSQL database implementations quickly came to appreciate the market value of SQL. Those who wanted the most rapid path to widespread adoption and the least friction for users getting started with the database learned that adopting SQL gave them access to a wide world of data professionals. So, while NoSQL first entered the lexicon meaning “a non-relational database platform that does not use SQL,” today it’s understood as “a non-relational database platform that uses not only SQL, but other programming languages as well.” SQL has impressive staying power, and the skills you learn with this book are likely to outlast your career.

How Do You Say It?

Veterans of the various database platforms covered in this book are known from time to time to argue over whether SQL is best pronounced acronymically, as in “S-Q-L,” or phonetically, as in “Sequel.” Based on broad experience, the most prevalent usage is to refer to the language and its implementations by the acronym: for example, most Oracle users refer to PL/SQL as “P-L-S-Q-L” and most SQL Server users refer to Transact-SQL as “T-S-Q-L.” However, when referring to proper names containing “SQL,” it typically takes on its phonetic form: for example, Microsoft users will say “Azure Sequel” and “Sequel Server” when speaking of Azure SQL Database and SQL Server, and MySQL is usually pronounced “MySequel.” If you ever want to start a lively conversation (or a heated argument), just tell a database person that they say SQL the wrong way—then watch the sparks fly.

Get SQL in a Nutshell, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

SQL in a Nutshell, 4th Edition by Kevin Kline, Regina O. Obe, Leo S. Hsu