MySQL Extensions to the SQL Language: Appendix B - Learning SQL
by Alan BeaulieuSince this book uses the MySQL server for all the examples, I thought
it would be useful for readers who are planning to continue using MySQL to
include an appendix on MySQL’s extensions to the SQL language. This appendix
explores some of MySQL’s extensions to the select, insert,
update, and delete statements that can be very useful in
certain situations.
This excerpt is from Learning SQL, Second Edition. Updated for the latest database management systems, this introductory guide will get you up and running with SQL quickly. Whether you need to write database applications, perform administrative tasks, or generate reports, Learning SQL, Second Edition, will help you easily master all the SQL fundamentals. Each chapter presents a self-contained lesson on a key SQL concept or technique, with numerous illustrations, annotated examples, and exercises to let you practice the skills you learn.
MySQL’s implementation of the select statement includes two additional
clauses, which are discussed in the following subsections.
In some situations, you may not be interested in
all of the rows returned by a query. For example,
you might construct a query that returns all of the bank tellers along
with the number of accounts opened by each teller. If your reason for
executing the query is to determine the top three tellers so that they
can receive an award from the bank, then you don’t necessarily need to
know who came in fourth, fifth, and so on. To help with these types of
situations, MySQL’s select statement
includes the limit clause, which
allows you to restrict the number of rows returned by a query.
To demonstrate the utility of the limit clause, I will begin by constructing a
query to show the number of accounts opened by each bank teller:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 1 | 8 | | 10 | 7 | | 13 | 3 | | 16 | 6 | +-------------+----------+ 4 rows in set (0.31 sec)
The results show that four different tellers opened accounts; if
you want to limit the result set to only three records, you can add a
limit clause specifying that only
three records be returned:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id->LIMIT 3;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 1 | 8 | | 10 | 7 | | 13 | 3 | +-------------+----------+ 3 rows in set (0.06 sec)
Thanks to the limit clause (the
fourth line of the query), the result set now includes exactly three
records, and the fourth teller (employee ID 16) has been discarded from
the result set.
While the previous query returns three records, there’s one
small problem; you haven’t described which three
of the four records you are interested in. If you are looking for
three specific records, such as the three tellers
who opened the most accounts, you will need to use the limit clause in concert with an order by clause, as in:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id->ORDER BY how_many DESC->LIMIT 3;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 1 | 8 | | 10 | 7 | | 16 | 6 | +-------------+----------+ 3 rows in set (0.03 sec)
The difference between this query and the previous query is that
the limit clause is now being
applied to an ordered set, resulting in the three tellers with the
most opened accounts being included in the final result set. Unless
you are interested in seeing only an arbitrary sample of records, you
will generally want to use an order
by clause along with a limit clause.
Note
The limit clause is applied
after all filtering, grouping, and ordering have occurred, so it
will never change the outcome of your select statement other than restricting
the number of records returned by the statement.
Instead of finding the top three tellers, let’s say your goal is
to identify all but the top two tellers (instead of giving awards to
top performers, the bank will be sending some of the less-productive
tellers to assertiveness training). For these types of situations, the
limit clause allows for an optional
second parameter; when two parameters are used, the first designates
at which record to begin adding records to the final result set, and
the second designates how many records to include. When specifying a
record by number, remember that MySQL designates the first record as
record 0. Therefore, if your goal is to find the third-best performer,
you can do the following:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id->ORDER BY how_many DESC->LIMIT 2, 1;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 16 | 6 | +-------------+----------+ 1 row in set (0.00 sec)
In this example, the zeroth and first records are discarded, and
records are included starting at the second record. Since the second
parameter in the limit clause is
1, only a single record is
included.
If you want to start at the second position and include
all the remaining records, you can make the
second argument to the limit clause
large enough to guarantee that all remaining records are included. If
you do not know how many tellers opened new accounts, therefore, you
might do something like the following to find all but the top two
performers:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id->ORDER BY how_many DESC->LIMIT 2, 999999999;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 16 | 6 | | 13 | 3 | +-------------+----------+ 2 rows in set (0.00 sec)
In this version of the query, the zeroth and first records are discarded, and up to 999,999,999 records are included starting at the second record (in this case, there are only two more, but it’s better to go a bit overboard rather than taking a chance on excluding valid records from your final result set because you underestimated).
When used in conjunction with an order
by clause, queries that include a limit clause can be called ranking
queries because they allow you to rank your data. While I
have demonstrated how to rank bank tellers by the number of opened
accounts, ranking queries are used to answer many different types of
business questions, such as:
Who are our top five salespeople for 2005?
Who has the third-most home runs in the history of baseball?
Other than The Holy Bible and Quotations from Chairman Mao, what are the next 98 best-selling books of all time?
What are our two worst-selling flavors of ice cream?
So far, I have shown how to find the top three tellers, the third-best teller, and all but the top two tellers. If I want to do something analogous to the fourth example (i.e., find the worst performers), I need only reverse the sort order so that the results proceed from lowest number of accounts opened to highest number of accounts opened, as in:
mysql>SELECT open_emp_id, COUNT(*) how_many->FROM account->GROUP BY open_emp_id->ORDER BY how_many ASC->LIMIT 2;+-------------+----------+ | open_emp_id | how_many | +-------------+----------+ | 13 | 3 | | 16 | 6 | +-------------+----------+ 2 rows in set (0.24 sec)
By simply changing the sort order (from ORDER BY how_many DESC to ORDER BY how_many ASC), the query now
returns the two worst-performing tellers. Therefore, by using a
limit clause with either an
ascending or descending sort order, you can produce ranking queries to
answer most types of business questions.
If you want the output from your query to be written to a file,
you could highlight the query results, copy them to the buffer, and
paste them into your favorite editor. However, if the query’s result set
is sufficiently large, or if the query is being executed from within a
script, you will need a way to write the results to a file without your
intervention. To aid in such situations, MySQL includes the into outfile clause to allow you to provide
the name of a file into which the results will be written. Here’s an
example that writes the query results to a file in my c:\temp directory:
mysql>SELECT emp_id, fname, lname, start_date->INTO OUTFILE 'C:\\TEMP\\emp_list.txt'->FROM employee;Query OK, 18 rows affected (0.20 sec)
Note
If you remember from Chapter 7, Data Generation, Conversion, and Manipulation, the backslash is used to escape another character within a string. If you’re a Windows user, therefore, you will need to enter two backslashes in a row when building pathnames.
Rather than showing the query results on the screen, the result
set has been written to the emp_list.txt file, which looks as
follows:
1 Michael Smith 2001-06-22 2 Susan Barker 2002-09-12 3 Robert Tyler 2000-02-09 4 Susan Hawthorne 2002-04-24 ... 16 Theresa Markham 2001-03-15 17 Beth Fowler 2002-06-29 18 Rick Tulman 2002-12-12
The default format uses tabs ('\t') between columns and newlines ('\n') after each record. If you want more
control over the format of the data, several additional subclauses are
available with the into outfile
clause. For example, if you want the data to be in what is referred to
as pipe-delimited format, you can use the fields subclause to ask that the '|' character be placed between each column,
as in:
mysql>SELECT emp_id, fname, lname, start_date->INTO OUTFILE 'C:\\TEMP\\emp_list_delim.txt'->FIELDS TERMINATED BY '|'->FROM employee;Query OK, 18 rows affected (0.02 sec)
Note
MySQL does not allow you to overwrite an existing file when
using into outfile, so you will
need to remove an existing file first if you run the same query more
than once.
The contents of the emp_list_delim.txt file look as
follows:
1|Michael|Smith|2001-06-22 2|Susan|Barker|2002-09-12 3|Robert|Tyler|2000-02-09 4|Susan|Hawthorne|2002-04-24 ... 16|Theresa|Markham|2001-03-15 17|Beth|Fowler|2002-06-29 18|Rick|Tulman|2002-12-12
Along with pipe-delimited format, you may need your data in
comma-delimited format, in which case you would use
fields terminated by ','. If the data
being written to a file includes strings, however, using commas as field
separators can prove problematic, as commas are much more likely to
appear within strings than the pipe character. Consider the following
query, which writes a number and two strings delimited by commas to the
comma1.txt file:
mysql>SELECT data.num, data.str1, data.str2->INTO OUTFILE 'C:\\TEMP\\comma1.txt'->FIELDS TERMINATED BY ','->FROM->(SELECT 1 num, 'This string has no commas' str1,->'This string, however, has two commas' str2) data;Query OK, 1 row affected (0.04 sec)
Since the third column in the output file (str2) is a string containing commas, you might
think that an application attempting to read the comma1.txt file will encounter problems when
parsing each line into columns, but the MySQL server has made provisions
for such situations. Here are the contents of comma1.txt:
1,This string has no commas,This string\, however\, has two commas
As you can see, the commas within the third column have been
escaped by putting a backslash before the two commas embedded in the
str2 column. If you run the same
query but use pipe-delimited format, the commas will
not be escaped, since they don’t need to be. If you
want to use a different escape character, such as using another comma,
you can use the fields escaped by
subclause to specify the escape character to use for your output
file.
Along with specifying column separators, you can also specify the
character used to separate the different records in your datafile. If
you would like each record in the output file to be separated by
something other than the newline character, you can use the lines subclause, as in:
mysql>SELECT emp_id, fname, lname, start_date->INTO OUTFILE 'C:\\TEMP\\emp_list_atsign.txt'->FIELDS TERMINATED BY '|'->LINES TERMINATED BY '@'->FROM employee;Query OK, 18 rows affected (0.03 sec)
Because I am not using a newline character between records, the
emp_list_atsign.txt file looks like
a single long line of text when viewed, with each record separated by
the '@' character:
1|Michael|Smith|2001-06-22@2|Susan|Barker|2002-09-12@3|Robert|Tyler|2000-02- 09@4|Susan|Hawthorne|2002-04-24@5|John|Gooding|2003-11-14@6|Helen|Fleming|2004-03- 17@7|Chris|Tucker|2004-09-15@8|Sarah|Parker|2002-12-02@9|Jane|Grossman|2002-05- 03@10|Paula|Roberts|2002-07-27@11|Thomas|Ziegler|2000-10-23@12|Samantha|Jameson|2003- 01-08@13|John|Blake|2000-05-11@14|Cindy|Mason|2002-08-09@15|Frank|Portman|2003-04- 01@16|Theresa|Markham|2001-03-15@17|Beth|Fowler|2002-06-29@18|Rick|Tulman|2002-12-12@
If you need to generate a datafile to be loaded into a spreadsheet
application or sent within or outside your organization, the into outfile clause should provide enough
flexibility for whatever file format you need.
Let’s say that you have been asked to create a table to capture
information about which of the bank’s branches are visited by which
customers. The table needs to contain the customer’s ID, the branch’s ID,
and a datetime column indicating the
last time the customer visited the branch. Rows are added to the table
whenever a customer visits a certain branch, but if the customer has
already visited the branch, then the existing row should simply have its
datetime column updated. Here’s the
table definition:
CREATE TABLE branch_usage (branch_id SMALLINT UNSIGNED NOT NULL, cust_id INTEGER UNSIGNED NOT NULL, last_visited_on DATETIME, CONSTRAINT pk_branch_usage PRIMARY KEY (branch_id, cust_id) );
Along with the three column definitions, the branch_usage table defines a primary key
constraint on the branch_id and
cust_id columns. Therefore, the server
will reject any row added to the table whose branch/customer pair already
exists in the table.
Let’s say that, after the table is in place, customer ID 5 visits
the main branch (branch ID 1) three times in the first week. After the
first visit, you can insert a record into the branch_usage table, since no record exists yet
for customer ID 5 and branch ID 1:
mysql>INSERT INTO branch_usage (branch_id, cust_id, last_visited_on)->VALUES (1, 5, CURRENT_TIMESTAMP());Query OK, 1 row affected (0.02 sec)
The next time the customer visits the same branch, however, you will need to update the existing record rather than inserting a new record; otherwise, you will receive the following error:
ERROR 1062 (23000): Duplicate entry '1-5' for key 1
To avoid this error, you can query the branch_usage table to see whether a given
customer/branch pair exists and then
either insert a record if no record is found or update the existing row if
it already exists. To save you the trouble, however, the MySQL designers
have extended the insert statement to
allow you to specify that one or more columns be modified if an insert statement fails due to a duplicate key.
The following statement instructs the server to modify the last_visited_on column if the given customer and
branch already exist in the branch_usage table:
mysql>INSERT INTO branch_usage (branch_id, cust_id, last_visited_on)->VALUES (1, 5, CURRENT_TIMESTAMP())->ON DUPLICATE KEY UPDATE last_visited_on = CURRENT_TIMESTAMP();Query OK, 2 rows affected (0.02 sec)
The on duplicate key clause
allows this same statement to be executed every time customer ID 5
conducts business in branch ID 1. If run 100 times, the first execution
results in a single row being added to the table, and the next 99
executions result in the last_visited_on column being changed to the
current time. This type of operation is often referred to as an
upsert, since it is a combination of an update and an insert statement.
Earlier in the appendix, I showed you how to write queries using the
limit clause in conjunction with an
order by clause to generate rankings,
such as the top three tellers in terms of accounts opened. MySQL also
allows the limit and order by clauses to be used in both update and delete statements, thereby allowing you to
modify or remove specific rows in a table based on a ranking. For example,
imagine that you are asked to remove records from a table used to track
customer logins to the bank’s online banking system. The table, which
tracks the customer ID and date/time of login, looks as follows:
CREATE TABLE login_history (cust_id INTEGER UNSIGNED NOT NULL, login_date DATETIME, CONSTRAINT pk_login_history PRIMARY KEY (cust_id, login_date) );
The following statement populates the login_history table with some data by generating
a cross join between the account and
customer tables and using the account’s
open_date column as a basis for
generating login dates:
mysql>INSERT INTO login_history (cust_id, login_date)->SELECT c.cust_id,->ADDDATE(a.open_date, INTERVAL a.account_id * c.cust_id HOUR)->FROM customer c CROSS JOIN account a;Query OK, 312 rows affected (0.03 sec) Records: 312 Duplicates: 0 Warnings: 0
The table is now populated with 312 rows of relatively random data.
Your task is to look at the data in the login_history table once a month, generate a
report for your manager showing who is using the online banking system,
and then delete all but the 50 most-recent records from the table. One
approach would be to write a query using order
by and limit to find the
50th most recent login, such as:
mysql>SELECT login_date->FROM login_history->ORDER BY login_date DESC->LIMIT 49,1;+---------------------+ | login_date | +---------------------+ | 2004-07-02 09:00:00 | +---------------------+ 1 row in set (0.00 sec)
Armed with this information, you can then construct a delete statement that removes all rows whose
login_date column is less than the date
returned by the query:
mysql>DELETE FROM login_history->WHERE login_date < '2004-07-02 09:00:00';Query OK, 262 rows affected (0.02 sec)
The table now contains the 50 most-recent logins. Using MySQL’s
extensions, however, you can achieve the same result with a single
delete statement using limit and order
by clauses. After returning the original 312 rows to the
login_history table, you can run the
following:
mysql>DELETE FROM login_history->ORDER BY login_date ASC->LIMIT 262;Query OK, 262 rows affected (0.05 sec)
With this statement, the rows are sorted by login_date in ascending order, and then the
first 262 rows are deleted, leaving the 50 most recent rows.
Note
In this example, I had to know the number of rows in the table to construct the limit clause (312 original rows − 50 remaining rows = 262 deletions). It would be better if you could sort the rows in descending order and tell the server to skip the first 50 rows and then delete the remaining rows, as in:
DELETE FROM login_history ORDER BY login_date DESC LIMIT 49, 9999999;
However, MySQL does not allow the optional second parameter when
using the limit clause in delete or update statements.
Along with deleting data, you can use the limit and order
by clauses when modifying data as well. For example, if the bank
decides to add $100 to each of the 10 oldest accounts to help retain loyal
customers, you can do the following:
mysql>UPDATE account->SET avail_balance = avail_balance + 100->WHERE product_cd IN ('CHK', 'SAV', 'MM')->ORDER BY open_date ASC->LIMIT 10;Query OK, 10 rows affected (0.06 sec) Rows matched: 10 Changed: 10 Warnings: 0
This statement sorts accounts by the open date in ascending order and then modifies the first 10 records, which, in this case, are the 10 oldest accounts.
In certain situations, you might need to modify or delete data from
several different tables to perform a given task. If you discover that the
bank’s database contains a dummy customer left over from system testing,
for example, you might need to remove data from the account, customer, and individual tables.
Note
For this section, I will create a set of clones for the account, customer, and individual tables, called account2, customer2, and individual2. I am doing so both to protect the
sample data from being altered and to avoid any problems with foreign
key constraints between the tables (more on this later in the section).
Here are the create table statements
used to generate the three clone tables:
CREATE TABLE individual2 AS SELECT * FROM individual; CREATE TABLE customer2 AS SELECT * FROM customer; CREATE TABLE account2 AS SELECT * FROM account;
If the customer ID of the dummy customer is 1, you could generate
three individual delete statements
against each of the three tables, as in:
DELETE FROM account2 WHERE cust_id = 1; DELETE FROM customer2 WHERE cust_id = 1; DELETE FROM individual2 WHERE cust_id = 1;
Instead of writing individual delete statements, however, MySQL allows you to
write a single multitable delete statement, which, in this case, looks as
follows:
mysql>DELETE account2, customer2, individual2->FROM account2 INNER JOIN customer2->ON account2.cust_id = customer2.cust_id->INNER JOIN individual2->ON customer2.cust_id = individual2.cust_id->WHERE individual2.cust_id = 1;Query OK, 5 rows affected (0.02 sec)
This statement removes a total of five rows, one from each of the
individual2 and customer2 tables, and three from the account2 table (customer ID 1 has three
accounts). The statement comprises three separate clauses:
deleteSpecifies the tables targeted for deletion.
fromSpecifies the tables used to identify the rows to be deleted. This clause is identical in form and function to the
fromclause in aselectstatement, and not all tables named herein need to be included in thedeleteclause.whereContains filter conditions used to identify the rows to be deleted.
The multitable delete statement
looks a lot like a select statement,
except that a delete clause is used
instead of a select clause. If you are
deleting rows from a single table using a multitable delete format, the difference becomes even less
noticeable. For example, here’s a select statement that finds the account IDs of
all accounts owned by John Hayward:
mysql>SELECT account2.account_id->FROM account2 INNER JOIN customer2->ON account2.cust_id = customer2.cust_id->INNER JOIN individual2->ON individual2.cust_id = customer2.cust_id->WHERE individual2.fname = 'John'->AND individual2.lname = 'Hayward';+------------+ | account_id | +------------+ | 8 | | 9 | | 10 | +------------+ 3 rows in set (0.01 sec)
If, after viewing the results, you decide to delete all three of
John’s accounts from the account2
table, you need only replace the select
clause in the previous query with a delete clause naming the account2 table, as in:
mysql>DELETE account2->FROM account2 INNER JOIN customer2->ON account2.cust_id = customer2.cust_id->INNER JOIN individual2->ON customer2.cust_id = individual2.cust_id->WHERE individual2.fname = 'John'->AND individual2.lname = 'Hayward';Query OK, 3 rows affected (0.01 sec)
Hopefully, this gives you a better idea of what the delete and from clauses are used for in a multitable
delete statement. This statement is
functionally identical to the following single-table delete statement, which uses a subquery to
identify the customer ID of John Hayward:
DELETE FROM account2 WHERE cust_id = (SELECT cust_id FROM individual2 WHERE fname = 'John' AND lname = 'Hayward';
When using a multitable delete
statement to delete rows from a single table, you are simply choosing to
use a querylike format involving table joins rather than a traditional
delete statement using subqueries. The
real power of multitable delete
statements lies in the ability to delete from multiple tables in a single
statement, as I demonstrated in the first statement in this
section.
Along with the ability to delete rows from multiple tables, MySQL
also gives you the ability to modify rows in multiple
tables using a multitable update. Let’s say that your
bank is merging with another bank, and the databases from both banks have
overlapping customer IDs. Your management decides to fix the problem by
incrementing each customer ID in your database by 10,000 so that the
second bank’s data can be safely imported. The following statement shows
how to modify the ID of customer ID 3 across the individual2, customer2, and account2 tables using a single statement:
mysql>UPDATE individual2 INNER JOIN customer2->ON individual2.cust_id = customer2.cust_id->INNER JOIN account2->ON customer2.cust_id = account2.cust_id->SET individual2.cust_id = individual2.cust_id + 10000,->customer2.cust_id = customer2.cust_id + 10000,->account2.cust_id = account2.cust_id + 10000->WHERE individual2.cust_id = 3;Query OK, 4 rows affected (0.01 sec) Rows matched: 5 Changed: 4 Warnings: 0
This statement modifies four rows: one in each of the individual2 and customer2 tables, and two in the account2 table. The multitable update syntax is very similar to that of the
single-table update, except that the
update clause contains multiple tables
and their corresponding join conditions rather than just naming a single
table. Just like the single-table update, the multitable version includes a
set clause, the difference being that
any tables referenced in the update
clause may be modified via the set
clause.
Note
If you are using the InnoDB storage engine, you will most likely
not be able to use multitable delete
and update statements if the tables
involved have foreign key constraints. This is because the engine does
not guarantee that the changes will be applied in an order that won’t
violate the constraints. Instead, you should use multiple single-table
statements in the proper order so that foreign key constraints are not
violated.
If you enjoyed this excerpt, buy a copy of Learning SQL, Second Edition.
