MySQL Extensions to the SQL Language: Appendix B - Learning SQL

by Alan Beaulieu

Since this book uses the MySQL server for all the examples, I thought it would be useful for readers who are planning to continue using MySQL to include an appendix on MySQL’s extensions to the SQL language. This appendix explores some of MySQL’s extensions to the select, insert, update, and delete statements that can be very useful in certain situations.

Learning SQL book cover

This excerpt is from Learning SQL, Second Edition. Updated for the latest database management systems, this introductory guide will get you up and running with SQL quickly. Whether you need to write database applications, perform administrative tasks, or generate reports, Learning SQL, Second Edition, will help you easily master all the SQL fundamentals. Each chapter presents a self-contained lesson on a key SQL concept or technique, with numerous illustrations, annotated examples, and exercises to let you practice the skills you learn.

buy button

MySQL’s implementation of the select statement includes two additional clauses, which are discussed in the following subsections.

The limit Clause

In some situations, you may not be interested in all of the rows returned by a query. For example, you might construct a query that returns all of the bank tellers along with the number of accounts opened by each teller. If your reason for executing the query is to determine the top three tellers so that they can receive an award from the bank, then you don’t necessarily need to know who came in fourth, fifth, and so on. To help with these types of situations, MySQL’s select statement includes the limit clause, which allows you to restrict the number of rows returned by a query.

To demonstrate the utility of the limit clause, I will begin by constructing a query to show the number of accounts opened by each bank teller:

mysql> SELECT open_emp_id, COUNT(*) how_many

    -> FROM account
    -> GROUP BY open_emp_id;
+-------------+----------+
| open_emp_id | how_many |
+-------------+----------+
|           1 |        8 |
|          10 |        7 |
|          13 |        3 |
|          16 |        6 |
+-------------+----------+
4 rows in set (0.31 sec)

The results show that four different tellers opened accounts; if you want to limit the result set to only three records, you can add a limit clause specifying that only three records be returned:

mysql> SELECT open_emp_id, COUNT(*) how_many

    -> FROM account
    -> GROUP BY open_emp_id
    -> LIMIT 3;
+-------------+----------+
| open_emp_id | how_many |
+-------------+----------+
|           1 |        8 |
|          10 |        7 |
|          13 |        3 |
+-------------+----------+
3 rows in set (0.06 sec)

Thanks to the limit clause (the fourth line of the query), the result set now includes exactly three records, and the fourth teller (employee ID 16) has been discarded from the result set.

Combining the limit clause with the order by clause

While the previous query returns three records, there’s one small problem; you haven’t described which three of the four records you are interested in. If you are looking for three specific records, such as the three tellers who opened the most accounts, you will need to use the limit clause in concert with an order by clause, as in:

mysql> SELECT open_emp_id, COUNT(*) how_many

    -> FROM account
    -> GROUP BY open_emp_id
    -> ORDER BY how_many DESC
    -> LIMIT 3;

+-------------+----------+
| open_emp_id | how_many |
+-------------+----------+
|           1 |        8 |
|          10 |        7 |
|          16 |        6 |
+-------------+----------+
3 rows in set (0.03 sec)

The difference between this query and the previous query is that the limit clause is now being applied to an ordered set, resulting in the three tellers with the most opened accounts being included in the final result set. Unless you are interested in seeing only an arbitrary sample of records, you will generally want to use an order by clause along with a limit clause.

Note

The limit clause is applied after all filtering, grouping, and ordering have occurred, so it will never change the outcome of your select statement other than restricting the number of records returned by the statement.

The limit clause’s optional second parameter

Instead of finding the top three tellers, let’s say your goal is to identify all but the top two tellers (instead of giving awards to top performers, the bank will be sending some of the less-productive tellers to assertiveness training). For these types of situations, the limit clause allows for an optional second parameter; when two parameters are used, the first designates at which record to begin adding records to the final result set, and the second designates how many records to include. When specifying a record by number, remember that MySQL designates the first record as record 0. Therefore, if your goal is to find the third-best performer, you can do the following:

mysql> SELECT open_emp_id, COUNT(*) how_many

    -> FROM account
    -> GROUP BY open_emp_id
    -> ORDER BY how_many DESC
    -> LIMIT 2, 1;

+-------------+----------+
| open_emp_id | how_many |
+-------------+----------+
|          16 |        6 |
+-------------+----------+
1 row in set (0.00 sec)

In this example, the zeroth and first records are discarded, and records are included starting at the second record. Since the second parameter in the limit clause is 1, only a single record is included.

If you want to start at the second position and include all the remaining records, you can make the second argument to the limit clause large enough to guarantee that all remaining records are included. If you do not know how many tellers opened new accounts, therefore, you might do something like the following to find all but the top two performers:

mysql> SELECT open_emp_id, COUNT(*) how_many

    -> FROM account
    -> GROUP BY open_emp_id
    -> ORDER BY how_many DESC
    -> LIMIT 2, 999999999;

+-------------+----------+
| open_emp_id | how_many |
+-------------+----------+
|          16 |        6 |
|          13 |        3 |
+-------------+----------+
2 rows in set (0.00 sec)

In this version of the query, the zeroth and first records are discarded, and up to 999,999,999 records are included starting at the second record (in this case, there are only two more, but it’s better to go a bit overboard rather than taking a chance on excluding valid records from your final result set because you underestimated).

Ranking queries

When used in conjunction with an order by clause, queries that include a limit clause can be called ranking queries because they allow you to rank your data. While I have demonstrated how to rank bank tellers by the number of opened accounts, ranking queries are used to answer many different types of business questions, such as:

  • Who are our top five salespeople for 2005?

  • Who has the third-most home runs in the history of baseball?

  • Other than The Holy Bible and Quotations from Chairman Mao, what are the next 98 best-selling books of all time?

  • What are our two worst-selling flavors of ice cream?

So far, I have shown how to find the top three tellers, the third-best teller, and all but the top two tellers. If I want to do something analogous to the fourth example (i.e., find the worst performers), I need only reverse the sort order so that the results proceed from lowest number of accounts opened to highest number of accounts opened, as in:

mysql> SELECT open_emp_id, COUNT(*) how_many

    -> FROM account
    -> GROUP BY open_emp_id
    -> ORDER BY how_many ASC
    -> LIMIT 2;

+-------------+----------+
| open_emp_id | how_many |
+-------------+----------+
|          13 |        3 |
|          16 |        6 |
+-------------+----------+
2 rows in set (0.24 sec)

By simply changing the sort order (from ORDER BY how_many DESC to ORDER BY how_many ASC), the query now returns the two worst-performing tellers. Therefore, by using a limit clause with either an ascending or descending sort order, you can produce ranking queries to answer most types of business questions.

The into outfile Clause

If you want the output from your query to be written to a file, you could highlight the query results, copy them to the buffer, and paste them into your favorite editor. However, if the query’s result set is sufficiently large, or if the query is being executed from within a script, you will need a way to write the results to a file without your intervention. To aid in such situations, MySQL includes the into outfile clause to allow you to provide the name of a file into which the results will be written. Here’s an example that writes the query results to a file in my c:\temp directory:

mysql> SELECT emp_id, fname, lname, start_date

    -> INTO OUTFILE 'C:\\TEMP\\emp_list.txt'
    -> FROM employee; Query OK, 18 rows affected (0.20 sec)

Note

If you remember from Chapter 7, Data Generation, Conversion, and Manipulation, the backslash is used to escape another character within a string. If you’re a Windows user, therefore, you will need to enter two backslashes in a row when building pathnames.

Rather than showing the query results on the screen, the result set has been written to the emp_list.txt file, which looks as follows:

1    Michael    Smith    2001-06-22
2    Susan    Barker    2002-09-12
3    Robert    Tyler    2000-02-09
4    Susan    Hawthorne    2002-04-24
...
16    Theresa    Markham    2001-03-15
17    Beth    Fowler    2002-06-29
18    Rick    Tulman    2002-12-12

The default format uses tabs ('\t') between columns and newlines ('\n') after each record. If you want more control over the format of the data, several additional subclauses are available with the into outfile clause. For example, if you want the data to be in what is referred to as pipe-delimited format, you can use the fields subclause to ask that the '|' character be placed between each column, as in:

mysql> SELECT emp_id, fname, lname, start_date
    -> INTO OUTFILE 'C:\\TEMP\\emp_list_delim.txt'

    ->   FIELDS TERMINATED BY '|'
    -> FROM employee; Query OK, 18 rows affected (0.02 sec)

Note

MySQL does not allow you to overwrite an existing file when using into outfile, so you will need to remove an existing file first if you run the same query more than once.

The contents of the emp_list_delim.txt file look as follows:

1|Michael|Smith|2001-06-22
2|Susan|Barker|2002-09-12
3|Robert|Tyler|2000-02-09
4|Susan|Hawthorne|2002-04-24
...
16|Theresa|Markham|2001-03-15
17|Beth|Fowler|2002-06-29
18|Rick|Tulman|2002-12-12

Along with pipe-delimited format, you may need your data in comma-delimited format, in which case you would use fields terminated by ','. If the data being written to a file includes strings, however, using commas as field separators can prove problematic, as commas are much more likely to appear within strings than the pipe character. Consider the following query, which writes a number and two strings delimited by commas to the comma1.txt file:

mysql> SELECT data.num, data.str1, data.str2

    -> INTO OUTFILE 'C:\\TEMP\\comma1.txt'
    ->   FIELDS TERMINATED BY ','
    -> FROM
    ->  (SELECT 1 num, 'This string has no commas' str1,

    ->     'This string, however, has two commas' str2) data;
Query OK, 1 row affected (0.04 sec)

Since the third column in the output file (str2) is a string containing commas, you might think that an application attempting to read the comma1.txt file will encounter problems when parsing each line into columns, but the MySQL server has made provisions for such situations. Here are the contents of comma1.txt:

1,This string has no commas,This string\, however\, has two commas

As you can see, the commas within the third column have been escaped by putting a backslash before the two commas embedded in the str2 column. If you run the same query but use pipe-delimited format, the commas will not be escaped, since they don’t need to be. If you want to use a different escape character, such as using another comma, you can use the fields escaped by subclause to specify the escape character to use for your output file.

Along with specifying column separators, you can also specify the character used to separate the different records in your datafile. If you would like each record in the output file to be separated by something other than the newline character, you can use the lines subclause, as in:

mysql> SELECT emp_id, fname, lname, start_date
    -> INTO OUTFILE 'C:\\TEMP\\emp_list_atsign.txt'
    ->   FIELDS TERMINATED BY '|'

    ->   LINES TERMINATED BY '@'
    -> FROM employee;
Query OK, 18 rows affected (0.03 sec)

Because I am not using a newline character between records, the emp_list_atsign.txt file looks like a single long line of text when viewed, with each record separated by the '@' character:

1|Michael|Smith|2001-06-22@2|Susan|Barker|2002-09-12@3|Robert|Tyler|2000-02-
09@4|Susan|Hawthorne|2002-04-24@5|John|Gooding|2003-11-14@6|Helen|Fleming|2004-03-
17@7|Chris|Tucker|2004-09-15@8|Sarah|Parker|2002-12-02@9|Jane|Grossman|2002-05-
03@10|Paula|Roberts|2002-07-27@11|Thomas|Ziegler|2000-10-23@12|Samantha|Jameson|2003-
01-08@13|John|Blake|2000-05-11@14|Cindy|Mason|2002-08-09@15|Frank|Portman|2003-04-
01@16|Theresa|Markham|2001-03-15@17|Beth|Fowler|2002-06-29@18|Rick|Tulman|2002-12-12@

If you need to generate a datafile to be loaded into a spreadsheet application or sent within or outside your organization, the into outfile clause should provide enough flexibility for whatever file format you need.

Combination Insert/Update Statements

Let’s say that you have been asked to create a table to capture information about which of the bank’s branches are visited by which customers. The table needs to contain the customer’s ID, the branch’s ID, and a datetime column indicating the last time the customer visited the branch. Rows are added to the table whenever a customer visits a certain branch, but if the customer has already visited the branch, then the existing row should simply have its datetime column updated. Here’s the table definition:

CREATE TABLE branch_usage
 (branch_id SMALLINT UNSIGNED NOT NULL,
  cust_id INTEGER UNSIGNED NOT NULL,
  last_visited_on DATETIME,
  CONSTRAINT pk_branch_usage PRIMARY KEY (branch_id, cust_id)
 );

Along with the three column definitions, the branch_usage table defines a primary key constraint on the branch_id and cust_id columns. Therefore, the server will reject any row added to the table whose branch/customer pair already exists in the table.

Let’s say that, after the table is in place, customer ID 5 visits the main branch (branch ID 1) three times in the first week. After the first visit, you can insert a record into the branch_usage table, since no record exists yet for customer ID 5 and branch ID 1:

mysql> INSERT INTO branch_usage (branch_id, cust_id, last_visited_on)

    -> VALUES (1, 5, CURRENT_TIMESTAMP());
Query OK, 1 row affected (0.02 sec)

The next time the customer visits the same branch, however, you will need to update the existing record rather than inserting a new record; otherwise, you will receive the following error:

ERROR 1062 (23000): Duplicate entry '1-5' for key 1

To avoid this error, you can query the branch_usage table to see whether a given customer/branch pair exists and then either insert a record if no record is found or update the existing row if it already exists. To save you the trouble, however, the MySQL designers have extended the insert statement to allow you to specify that one or more columns be modified if an insert statement fails due to a duplicate key. The following statement instructs the server to modify the last_visited_on column if the given customer and branch already exist in the branch_usage table:

mysql> INSERT INTO branch_usage (branch_id, cust_id, last_visited_on)

    -> VALUES (1, 5, CURRENT_TIMESTAMP())
    -> ON DUPLICATE KEY UPDATE last_visited_on = CURRENT_TIMESTAMP();
Query OK, 2 rows affected (0.02 sec)

The on duplicate key clause allows this same statement to be executed every time customer ID 5 conducts business in branch ID 1. If run 100 times, the first execution results in a single row being added to the table, and the next 99 executions result in the last_visited_on column being changed to the current time. This type of operation is often referred to as an upsert, since it is a combination of an update and an insert statement.

Ordered Updates and Deletes

Earlier in the appendix, I showed you how to write queries using the limit clause in conjunction with an order by clause to generate rankings, such as the top three tellers in terms of accounts opened. MySQL also allows the limit and order by clauses to be used in both update and delete statements, thereby allowing you to modify or remove specific rows in a table based on a ranking. For example, imagine that you are asked to remove records from a table used to track customer logins to the bank’s online banking system. The table, which tracks the customer ID and date/time of login, looks as follows:

CREATE TABLE login_history
 (cust_id INTEGER UNSIGNED NOT NULL,
  login_date DATETIME,
  CONSTRAINT pk_login_history PRIMARY KEY (cust_id, login_date)
 );

The following statement populates the login_history table with some data by generating a cross join between the account and customer tables and using the account’s open_date column as a basis for generating login dates:

mysql> INSERT INTO login_history (cust_id, login_date)

    -> SELECT c.cust_id,
    ->   ADDDATE(a.open_date, INTERVAL a.account_id * c.cust_id HOUR)
    -> FROM customer c CROSS JOIN account a;
Query OK, 312 rows affected (0.03 sec)
Records: 312  Duplicates: 0  Warnings: 0

The table is now populated with 312 rows of relatively random data. Your task is to look at the data in the login_history table once a month, generate a report for your manager showing who is using the online banking system, and then delete all but the 50 most-recent records from the table. One approach would be to write a query using order by and limit to find the 50th most recent login, such as:

mysql> SELECT login_date

    -> FROM login_history
    -> ORDER BY login_date DESC
    -> LIMIT 49,1;
+---------------------+
| login_date          |
+---------------------+
| 2004-07-02 09:00:00 |
+---------------------+
1 row in set (0.00 sec)

Armed with this information, you can then construct a delete statement that removes all rows whose login_date column is less than the date returned by the query:

mysql> DELETE FROM login_history

    -> WHERE login_date < '2004-07-02 09:00:00';
Query OK, 262 rows affected (0.02 sec)

The table now contains the 50 most-recent logins. Using MySQL’s extensions, however, you can achieve the same result with a single delete statement using limit and order by clauses. After returning the original 312 rows to the login_history table, you can run the following:

mysql> DELETE FROM login_history

    -> ORDER BY login_date ASC
    -> LIMIT 262;
Query OK, 262 rows affected (0.05 sec)

With this statement, the rows are sorted by login_date in ascending order, and then the first 262 rows are deleted, leaving the 50 most recent rows.

Note

In this example, I had to know the number of rows in the table to construct the limit clause (312 original rows − 50 remaining rows = 262 deletions). It would be better if you could sort the rows in descending order and tell the server to skip the first 50 rows and then delete the remaining rows, as in:

DELETE FROM login_history
ORDER BY login_date DESC
LIMIT 49, 9999999;

However, MySQL does not allow the optional second parameter when using the limit clause in delete or update statements.

Along with deleting data, you can use the limit and order by clauses when modifying data as well. For example, if the bank decides to add $100 to each of the 10 oldest accounts to help retain loyal customers, you can do the following:

mysql> UPDATE account

    -> SET avail_balance = avail_balance + 100
    -> WHERE product_cd IN ('CHK', 'SAV', 'MM')
    -> ORDER BY open_date ASC
    -> LIMIT 10;

Query OK, 10 rows affected (0.06 sec)
Rows matched: 10  Changed: 10  Warnings: 0

This statement sorts accounts by the open date in ascending order and then modifies the first 10 records, which, in this case, are the 10 oldest accounts.

Multitable Updates and Deletes

In certain situations, you might need to modify or delete data from several different tables to perform a given task. If you discover that the bank’s database contains a dummy customer left over from system testing, for example, you might need to remove data from the account, customer, and individual tables.

Note

For this section, I will create a set of clones for the account, customer, and individual tables, called account2, customer2, and individual2. I am doing so both to protect the sample data from being altered and to avoid any problems with foreign key constraints between the tables (more on this later in the section). Here are the create table statements used to generate the three clone tables:

CREATE TABLE individual2 AS
SELECT * FROM individual;
CREATE TABLE customer2 AS
SELECT * FROM customer;
CREATE TABLE account2 AS
SELECT * FROM account;

If the customer ID of the dummy customer is 1, you could generate three individual delete statements against each of the three tables, as in:

DELETE FROM account2
WHERE cust_id = 1;
DELETE FROM customer2
WHERE cust_id = 1;
DELETE FROM individual2
WHERE cust_id = 1;

Instead of writing individual delete statements, however, MySQL allows you to write a single multitable delete statement, which, in this case, looks as follows:

mysql> DELETE account2, customer2, individual2

    -> FROM account2 INNER JOIN customer2
    ->   ON account2.cust_id = customer2.cust_id
    ->   INNER JOIN individual2
    ->   ON customer2.cust_id = individual2.cust_id

    -> WHERE individual2.cust_id = 1;
Query OK, 5 rows affected (0.02 sec)

This statement removes a total of five rows, one from each of the individual2 and customer2 tables, and three from the account2 table (customer ID 1 has three accounts). The statement comprises three separate clauses:

delete

Specifies the tables targeted for deletion.

from

Specifies the tables used to identify the rows to be deleted. This clause is identical in form and function to the from clause in a select statement, and not all tables named herein need to be included in the delete clause.

where

Contains filter conditions used to identify the rows to be deleted.

The multitable delete statement looks a lot like a select statement, except that a delete clause is used instead of a select clause. If you are deleting rows from a single table using a multitable delete format, the difference becomes even less noticeable. For example, here’s a select statement that finds the account IDs of all accounts owned by John Hayward:

mysql> SELECT account2.account_id

    -> FROM account2 INNER JOIN customer2
    ->   ON account2.cust_id = customer2.cust_id
    ->   INNER JOIN individual2
    ->   ON individual2.cust_id = customer2.cust_id

    -> WHERE individual2.fname = 'John'
    ->   AND individual2.lname = 'Hayward';
+------------+
| account_id |
+------------+
|          8 |
|          9 |
|         10 |
+------------+
3 rows in set (0.01 sec)

If, after viewing the results, you decide to delete all three of John’s accounts from the account2 table, you need only replace the select clause in the previous query with a delete clause naming the account2 table, as in:

mysql> DELETE account2
    -> FROM account2 INNER JOIN customer2

    ->   ON account2.cust_id = customer2.cust_id
    ->   INNER JOIN individual2
    ->   ON customer2.cust_id = individual2.cust_id
    -> WHERE individual2.fname = 'John'

    ->   AND individual2.lname = 'Hayward';
Query OK, 3 rows affected (0.01 sec)

Hopefully, this gives you a better idea of what the delete and from clauses are used for in a multitable delete statement. This statement is functionally identical to the following single-table delete statement, which uses a subquery to identify the customer ID of John Hayward:

DELETE FROM account2
WHERE cust_id =
 (SELECT cust_id
  FROM individual2
  WHERE fname = 'John' AND lname = 'Hayward';

When using a multitable delete statement to delete rows from a single table, you are simply choosing to use a querylike format involving table joins rather than a traditional delete statement using subqueries. The real power of multitable delete statements lies in the ability to delete from multiple tables in a single statement, as I demonstrated in the first statement in this section.

Along with the ability to delete rows from multiple tables, MySQL also gives you the ability to modify rows in multiple tables using a multitable update. Let’s say that your bank is merging with another bank, and the databases from both banks have overlapping customer IDs. Your management decides to fix the problem by incrementing each customer ID in your database by 10,000 so that the second bank’s data can be safely imported. The following statement shows how to modify the ID of customer ID 3 across the individual2, customer2, and account2 tables using a single statement:

mysql> UPDATE individual2 INNER JOIN customer2

    ->   ON individual2.cust_id = customer2.cust_id
    ->   INNER JOIN account2
    ->   ON customer2.cust_id = account2.cust_id
    -> SET individual2.cust_id = individual2.cust_id + 10000,

    ->   customer2.cust_id = customer2.cust_id + 10000,
    ->   account2.cust_id = account2.cust_id + 10000
    -> WHERE individual2.cust_id = 3;
Query OK, 4 rows affected (0.01 sec)
Rows matched: 5  Changed: 4  Warnings: 0

This statement modifies four rows: one in each of the individual2 and customer2 tables, and two in the account2 table. The multitable update syntax is very similar to that of the single-table update, except that the update clause contains multiple tables and their corresponding join conditions rather than just naming a single table. Just like the single-table update, the multitable version includes a set clause, the difference being that any tables referenced in the update clause may be modified via the set clause.

Note

If you are using the InnoDB storage engine, you will most likely not be able to use multitable delete and update statements if the tables involved have foreign key constraints. This is because the engine does not guarantee that the changes will be applied in an order that won’t violate the constraints. Instead, you should use multiple single-table statements in the proper order so that foreign key constraints are not violated.

If you enjoyed this excerpt, buy a copy of Learning SQL, Second Edition.