BUY THIS BOOK
Add to Cart

Print Book $29.99


Add to Cart

Print+PDF $38.99

Add to Cart

PDF $23.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £20.99

What is this?

Looking to Reprint or License this content?


SQL Hacks
SQL Hacks By Andrew Cumming, Gordon Russell

Cover | Table of Contents


Table of Contents

Chapter 1: SQL Fundamentals
In this chapter, you will see some familiar SQL commands with some surprising variations. SQL includes many subtleties that the discerning programmer can exploit. With better SQL, you can do more processing at the database and less processing in your application. By and large, this redistribution of labor will be better for the application and better for the database; it should also reduce the traffic between these components. In addition, by improving your SQL, you will make your queries easier to read.
Each main SQL engine has a command-line interface. Although such interfaces appear ancient, they are still essential utilities for all SQL hackers. Each interface has its own peculiarities, but they all do essentially the same job. You can find details of the command-line interfaces for SQL Server, Oracle, MySQL, Access, DB2, and PostgreSQL in “Run SQL from the Command Line” [Hack #1].
The command-line processor is the lowest common denominator when it comes to running SQL, but you’ll find plenty of times when it comes in handy.
All of the popular SQL engines (except Access) have serviceable command prompt utilities that are installed by default, and all provide roughly the same benefits:
  • You can type in SQL and see the results or error messages displayed immediately.
  • You can start them up from an operating system prompt.
  • You can specify the username and password that you want to use.
  • You can pipe SQL statements in from another process.
This provides a flexible mechanism that is ideal for executing ad hoc SQL statements or developing queries that will eventually be used in applications.
The examples in this hack connect to a database on localhost called dbname with user scott and password tiger.
One of the useful features of a command-line interface is the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Run SQL from the Command Line
The command-line processor is the lowest common denominator when it comes to running SQL, but you’ll find plenty of times when it comes in handy.
All of the popular SQL engines (except Access) have serviceable command prompt utilities that are installed by default, and all provide roughly the same benefits:
  • You can type in SQL and see the results or error messages displayed immediately.
  • You can start them up from an operating system prompt.
  • You can specify the username and password that you want to use.
  • You can pipe SQL statements in from another process.
This provides a flexible mechanism that is ideal for executing ad hoc SQL statements or developing queries that will eventually be used in applications.
The examples in this hack connect to a database on localhost called dbname with user scott and password tiger.
One of the useful features of a command-line interface is the pipe, which chains a sequence of commands so that the output of one is the input for the next. You can use a pipe on Windows under the command prompt or on Linux/Unix using a shell. If you put your SQL command-line utility at the end of a pipe, the result is processed as SQL. For example, a common operation is to use a pipe to send a sequence of INSERT statements to your SQL command-line utility. Here’s an example that runs on the Windows command prompt, but could also work on a Unix or Linux system if you used the appropriate SQL command-line utility (these are described later in this hack).
You must type this entire command on one line. On Unix or Linux, you could put a \ character (the line-continuation character) before the line break:
C:>perl -pe "s/DATE //g; " < cmnd.sql | sqlcmd -U 
               
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Connect to SQL from a Program
You can access an SQL database from most programming languages, including Perl, PHP, Ruby, Java, and C#.
Working with a database from a programming language commonly involves a database connection and a statement cursor. In each language demonstrated here, you do the following:
Connect to the server
You specify the location of the server and name of the database. You also supply a username and password. In return, you obtain a connection handle that represents the connection. If you have several SQL commands to send you can reuse this connection. This process can fail if the server is not available or if your credentials are not accepted.
Execute an SQL SELECT command
This involves sending the SQL statement to the server via the connection handle. In return, you obtain a cursor. This process can fail if the SELECT statement includes a syntax error or your permissions are inadequate.
Retrieve the data
Typically you will loop until the cursor indicates that it is exhausted. At each iteration, your cursor points to a single row of data. You can get individual fields of the row from the cursor and then move on to the next row. Failure at this stage is uncommon but not unheard of (for example, your network may go down while you are in the middle of processing a result set).
Close the cursor and close the connection
Do this when you have finished issuing all your queries and are ready to disconnect from the database.
This pattern is a reasonable compromise between efficiency and utility, and there are many variations. If the data set is of a reasonable size, you might prefer to get the entire data set into a suitable data structure in one go. Each language given here will support that.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Perform Conditional INSERTs
The humble INSERT statement is a masterpiece of declarative language design. With only two main variations, it can handle a host of different behaviors.
You can use INSERT INTO table ( list ) VALUES ( list ) to add a single row to a table. You can also use INSERT INTO table ( list ) SELECT stmt to insert several rows.
You can include expressions and literal expressions in the VALUES list.
Suppose you want to record the fact that member jim01 has borrowed the book bk002 from your library. This book is due back in 14 days. Add the number 14 to today’s date to get the due date:
INSERT INTO libraryLoan(member,book,dueDate)
  VALUES ('jim01', 'bk002', CURRENT_DATE + 14);
In SQL Server, you must use the function GetDate(  ) in place of CURRENT_DATE. For Access, you can use Date(  ).
You might prefer to use the ANSI standard method in your database. With the ANSI method you use the phrase CURRENT_DATE + INTERVAL '14' DAY in place of CURRENT_DATE+14. Oracle, PostgreSQL, and MySQL will allow that.
The VALUES list can include more complex calculations, and these calculations may involve subqueries. Let’s say that when the book is returned you must impose a fine of 20 cents if the book is overdue. You can use a single INSERT statement to apply this fine:
INSERT INTO libraryReturn(member,book,returnDate,fine)
  VALUES ('jim01','bk002',CURRENT_DATE,
          (SELECT 0.20 fine
             FROM libraryLoan
           WHERE member='jim01' AND book='bk002'
           GROUP BY member, book
           HAVING MAX(dueDate)<CURRENT_DATE))
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
UPDATE the Database
The behavior of UPDATE can seem confusing to people accustomed to procedural programming languages such as Perl and Java. Learn how UPDATE works, and why.
In most programming languages, you need a temporary variable if you want to swap the values of two variables. Suppose you want to move the players around in your netball team. Let the wing attack have a go as goal shooter and put the goal shooter on wing attack:
/* The original lineup */
goalShooter = 'Camelia';
wingAttack  = 'Rosie';

/* Swap goalShooter with wingAttack */
tmp         = goalShooter;
goalShooter = wingAttack;
wingAttack  = tmp;
In an SQL UPDATE statement, you don’t need the temporary variable. The values on the right of the = are consistent throughout the whole UPDATE statement; it is as though all of the updates happened simultaneously rather than one after another. Here is the result of swapping the two positions in Oracle; you will get the same result if you try it on SQL Server or on PostgreSQL (read on for MySQL):
SQL> SELECT goalShooter,goalAttack,wingAttack FROM offenceTeam;

GOALSHOOTER           GOALATTACK            WINGATTACK
--------------------- --------------------- ---------------------
Camelia               Demi                  Rosie

SQL> UPDATE offenceTeam
  2    SET goalShooter = wingAttack,
  3        wingAttack  = goalShooter;

1 row updated.

SQL> SELECT goalShooter,goalAttack,wingAttack FROM offenceTeam;

GOALSHOOTER           GOALATTACK            WINGATTACK
--------------------- --------------------- ---------------------
Rosie                 Demi                  Camelia
This is rather like the Perl construct that allows you to assign a list of variables in a single statement:
($goalShooter,$wingAttack) = ($wingAttack,$goalShooter);
When a relational database performs an update it has to maintain a copy of all of the original values in some place to ensure isolated transactions. A single UPDATE statement might involve thousands of rows and might take several minutes to complete. If there were a failure during the update (if someone switched off the computer, for example), the system is guaranteed to roll back and none of the changes will be committed.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Solve a Crossword Puzzle Using SQL
You can use SQL to solve the kinds of pattern-matching riddles that are typical of crossword puzzles. But first you have to load in a dictionary.
Suppose you have a table called words that contains a few thousand words. If you know some characters in some positions, you can use the underscore (_) wildcard. For example, say you are looking for an eight-letter word with the following pattern:
Second letter: a
Fourth letter: l
Seventh letter: o
An underscore means “any character” in LIKE:
mysql> SELECT * FROM words WHERE word LIKE '_a_l_  _o_';
+------+----------+
| id   | word     |
+------+----------+
| 3823 | ballroom |
| 3826 | ballyhoo |
| 7255 | Carleton |
| 7480 | cauldron |
+------+----------+
4 rows in set (0.04 sec)
The ANSI standard allows % and _ as the two wildcards. % is used to represent a string of any length and _ represents any single character. In Access, you use * and ?, respectively.
Here’s how to find words in your dictionary that have the same three letters at the beginning and at the end:
mysql> SELECT word FROM words
    -> WHERE word LIKE CONCAT('%',SUBSTR(word,1,3))
    -> AND LENGTH(word) > 3;
+---------------+
| word          |
+---------------+
| Ababa         |
| antiformant   |
| booboo        |
| Einstein      |
| entertainment |
| Giorgio       |
| Ionicization  |
| murmur        |
| Oshkosh       |
| redeclared    |
| restores      |
| restructures  |
| Tsunematsu    |
| underground   |
+---------------+
14 rows in set (0.09 sec)
SELECT word FROM words
  WHERE (word LIKE '%' + SUBSTRING(word,1,3))
  AND LEN(word) > 3
SELECT word FROM words
  WHERE (word LIKE '*' + LEFT(word,3))
  AND LEN(word) > 3
PostgreSQL will accept the ANSI standard syntax:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Don’t Perform the Same Calculation Over and Over
The FROM clause of a SELECT statement may include other SELECT statements. This feature can simplify a complex statement.
Sometimes using a derived table statement is the only way to get the results that you want. But you can also use a derived table to make a query shorter and easier to read. When you have the same complicated expression cropping up in several places in your output you can use a derived table to provide a kind of local variable.
The contract table contains two columns: income and overhead. You want to produce five more columns calculated from these two values. The output would look like Table 1-2.
Table : Sharing the residual
IncomeOverheadResidual:grant minus overheadsEst:20% of residualAdmin:10% of residualRsrv:5% of residual
$1,00020%8001608040
$2,00010%1,80036018090
$1,00050%5001005025
The SQL to generate this table is not complicated, but it is rather lengthy:
mysql> SELECT income,
    ->        overhead,
    ->        (income-income*overhead/100) AS residual,
    ->        0.20*(income-income*overhead/100) AS Est,
    ->        0.10*(income-income*overhead/100) AS Admin,
    ->        0.05*(income-income*overhead/100) AS Rsrv
    ->   FROM contract;
+--------+----------+----------+------+-------+------+
| income | overhead | residual | Est  | Admin | Rsrv |
+--------+----------+----------+------+-------+------+
|   1000 |       20 |      800 |  160 |    80 |   40 |
|   2000 |       10 |     1800 |  360 |   180 |   90 |
|   1000 |       20 |      500 |  100 |    50 |   25 |
+--------+----------+----------+------+-------+------+
It would be neater if you didn’t have to keep repeating that residual calculation (income-income*overhead/100) over and over again.
You can calculate the residual in a derived table and then refer to it in the outer query. With indentation and a consistent method for naming columns, a derived table can improve the appearance of the SQL:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Joins, Unions, and Views
You can use a join to associate the rows of one table with the rows of another. Often you do this to follow a foreign key reference. For example, consider an employee table that contains a column with the id of the department for each employee. If you need to see the name of the department for each employee, you can use a JOIN, as in:
SELECT employee.name, department.name
  FROM employee JOIN department ON (employee.department=department.id)
The default JOIN is an INNER JOIN, as shown in the preceding code. There are other kinds of JOINs, such as the LEFT OUTER JOIN, the FULL OUTER JOIN, and the CROSS JOIN. You can find examples of each in this chapter.
You also can use a UNION to combine two tables, but unlike with a JOIN, a UNION appends the rows of two tables into one result. In a UNION, the two tables must have the same number of columns, and the corresponding columns must have compatible types.
You can use a VIEW to name a query. If you have a SELECT statement (possibly using a JOIN or a UNION) you can save it as a named VIEW. As much as possible the system will treat the view as though it were a base table; you can SELECT from it, or JOIN it to other tables or views. It is generally possible to UPDATE, DELETE from, and INSERT into a view (with some restrictions).
When your software requirements change and you require a different database design, you don’t have to throw out all your code. You can ensure that existing queries still work by using views to stand in for tables that no longer exist.
At some point, you will need to make some breaking changes to your database design. With the right trickery, you can keep even legacy code happy. For example, suppose that your company has a register of office equipment recorded in a table, as shown in Table 2-1.
Table : The equipment table
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Modify a Schema Without Breaking Existing Queries
When your software requirements change and you require a different database design, you don’t have to throw out all your code. You can ensure that existing queries still work by using views to stand in for tables that no longer exist.
At some point, you will need to make some breaking changes to your database design. With the right trickery, you can keep even legacy code happy. For example, suppose that your company has a register of office equipment recorded in a table, as shown in Table 2-1.
Table : The equipment table
assetTagDescriptionDateAcquired
50430Desktop PC2004-07-02
5043119-inch monitor2004-07-02
Now, suppose your company opens a new office in another location, and you need to keep track of equipment for two different offices. Should you make a copy of the applications and database or change the database design?
It is tempting to simply copy the application and the database and have separate instances running in each location. However, this type of quick fix solves the immediate problem but causes more problems in the long run. You will have two applications to maintain, two sets of hardware to buy and look after, and two sets of data that you cannot merge easily. This approach does not scale, and problems will only get worse as more new offices are opened.
SQL gives you the command you need to add a column while preserving existing data (you can also use it to change field names and remove redundant fields):
ALTER TABLE equipment ADD COLUMN office VARCHAR(20);
UPDATE equipment SET office = 'Headquarters'
In the preceding code, you’ve added a new column and assigned every row to the existing office (this may not reflect the real world; you may have moved some furniture from the old office to the new office), which has been promoted to “Headquarters.” You can now insert the rows that belong to the new office as appropriate. However, the problem with this is that all of the queries that rely on this table need to be reexamined.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Filter Rows and Columns
Don’t just download the whole table. Use filters on row and column information and minimize database traffic.
Programmers of a certain disposition try to avoid the database as much as possible. They learn a single, simple SQL statement and use it in all circumstances. The one statement they need is SELECT * FROM t. Grab the whole table and treat it as a giant array. No need to learn much SQL, right? The problem is that this approach is inefficient.
Let’s say you have a web site that keeps all of its pages in the database. It’s great for content management and version control, but each page request means getting data from the database. The table itself has two fields: pagename and content. How can you do this efficiently in, say, Perl? The name of the page you want to display is stored in $p:
my $sql = "SELECT pagename,content FROM page";
my $sth = $dbh->prepare($sql);
my $rsh = $sth->execute(  );
while (my $row = $sth->fetchrow_hashref(  ) ) {
   print $row->{content} if ($row->{pagename} eq $p);
}
The preceding code suffers from linear performance degradation. As more pages are added, more information is sent between the database server and the program. The code has to filter out all of this.
You should really be filtering in SQL so that you get just the results you need. The following example is much better, but it still has some problems:
my $sql = "SELECT pagename,content FROM page where pagename = '".$p."'";
my $sth = $dbh->prepare($sql);
my $rsh = $sth->execute(  );
my $row = $sth->fetchrow_hashref(  );
print $row->{content} if $row;
It is possible that $p could be set to something unexpected. For instance, rather than index.html it could be index'html. This would cause the query to fail with a syntax error.
If you don’t fix this issue, not only do you have a potential syntax error, but also you may be leaving yourself open to an SQL injection attack [Hack #48].
A placeholder is a way to put a program variable into an SQL statement so that SQL injection cannot work. These are also known as
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Filter on Indexed Columns
Query filtering will improve performance. You can gain even better performance if your filtering criteria can use indexed columns.
Queries can return all the rows and all the columns from tables. But what if you want only a few columns? It would be a waste of system resources to send you columns you don’t want. Similarly, if you want to see only one row of a table, you should be able to ask only for that row. By asking only for what you really want, you are filtering away rows and columns that you don’t want. To filter columns, make sure that you explicitly ask for only the columns you want (e.g., don’t just use * on the SELECT line). You can filter rows using WHERE rules, but you can also use other clauses, such as HAVING.
Defining a primary key creates an index on the columns involved. This allows the database to find data much faster than it could without the index. The database server uses this index to make sure the key is unique in the table, which is a requirement for a primary key to be valid. Joins that use primary key columns also benefit from this index.
Filtering on something that does not have an index can cause a significant performance problem. Not only is it faster to search with an index, but also query optimizers can use the index first to perform initial filtering, instead of using the actual table data being queried. It may even be possible for the optimizer to use the index for the entire operation, depending on the query being executed.
If the index is all that is required to retrieve the result set, and the database never needs to be looked at, the index is called a covering index—it “covers” everything in the query.
Now let’s consider a database of pages. Say you have page contents stored against a pagename, with previous versions of your pages also recorded so that you can implement version control. Table 2-2 shows an example.
Table : The page table
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Convert Subqueries to JOINs
Sometimes you want to query one table, use that result to query another table, and then use that result to query yet another table. It’s tempting to do this as three separate queries, but the right solution is to chain them yourself into one SQL statement.
Consider a database where employees have job titles, and job titles have ranks, and a rank has a salary, as shown in Table 2-3, Table 2-4, and Table 2-5.
Table : The jobs table
EmployeeTitle
Gordon RussellLecturer
Andrew CummingTeaching fellow
Jim SmithTechnician
Table : The ranks table
TitleRank
LecturerLECT1
Teaching fellowLECT2
TechnicianTECH1
Table : The salary table
RankPayment
LECT12000.00
LECT23000.00
TECH15000.00
TECH26000.00
Determining how much to pay Andrew Cumming would require three steps. First, you’d need to determine Andrew’s title:
mysql> SELECT title FROM jobs WHERE employee = 'Andrew Cumming';
+-----------------+
| title           |
+-----------------+
| Teaching Fellow |
+-----------------+
Next, you’d need to determine the pay rank for a teaching fellow:
mysql> SELECT rank FROM ranks WHERE title = 'Teaching Fellow';
+-------+
| rank  |
+-------+
| LECT2 |
+-------+
Finally, you’d need to look up the salary for someone at the LECT2 pay grade:
mysql> SELECT payment FROM salary WHERE rank = 'LECT2';
+---------+
| payment |
+---------+
| 3000.00 |
+---------+
That’s not efficient, because you’d need to pass three different queries to the database and process the results in between. If a table is updated during this process the answer might be wrong, or the query might even return an error. Combining queries can make people nervous. Nervous programmers often use subqueries:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Convert Aggregate Subqueries to JOINs
You can avoid subqueries using JOIN or OUTER JOIN if they don’t use aggregate functions. But what about subqueries that do use aggregation?
Some subqueries are easy to eliminate [Hack #10], but others are a bit trickier. Suppose you have the orders table shown in Table 2-6.
Table : The orders table
customerwhntotalitems
Jim2006-10-105
Jim2006-10-113
Jim2006-10-121
Brian2006-10-107
Now suppose you need to show the date on which each customer purchased the most totalitems:
SELECT customer,whn,totalitems
FROM orders o1
WHERE o1.whn = (
  SELECT MAX(whn)
  FROM orders o2
  WHERE o1.customer = o2.customer
);
To do this you need to execute the subquery for every row of orders, so the preceding code may be slow to execute. In addition, older versions of MySQL cannot handle subqueries. To avoid using a subquery, you can use a HAVING clause with a self-join:
SELECT o1.customer,o1.whn,o1.totalitems
  FROM orders o1 JOIN orders o2 on (o1.customer = o2.customer)
  GROUP BY o1.customer,o1.whn,o1.totalitems
  HAVING o1.whn = max(o2.whn)
Here’s what you’ll get as a result:
+----------+------------+------------+
| customer | whn        | totalitems |
+----------+------------+------------+
| Brian    | 2006-10-10 |          7 |
| Jim      | 2006-10-12 |          1 |
+----------+------------+------------+
2 rows in set (0.00 sec)
This approach works well for all aggregate functions.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Simplify Complicated Updates
You can perform complex calculations with an UPDATE, which can save you from having to use a cursor or from doing the calculations outside the database.
The UPDATE example shown in many introductory textbooks is a simple operation whereby you raise everyone’s salary by $100:
UPDATE employee
  SET salary = salary + 100
That’s certainly a simple statement, but it is likely to be too simple to be of any use. Let’s suppose that annual wage negotiations resulted in a more complex deal that requires you to access other tables in the database.
Employees who have a clean disciplinary record will get the $100 raise; those with a single offense will keep the same salary; those with two or more recorded transgressions will get a $100 cut in salary. The employee details and the disciplinary records are held in the employee and disciplinary tables:
mysql> SELECT * FROM employee;
+----+----------+---------+
| id | name     | salary  |
+----+----------+---------+
|  1 | Reginald | 5000.00 |
|  2 | C J      | 5000.00 |
|  3 | Joan     | 5000.00 |
+----+----------+---------+

mysql> SELECT * FROM disciplinary;
+------------+-----+
| whn        | emp |
+------------+-----+
| 2006-05-20 |   1 |
| 2006-05-21 |   1 |
| 2006-05-22 |   3 |
+------------+-----+
You could write a complex UPDATE statement that updates the employee table while referencing the disciplinary table, but it is easier to do this in two stages. First, prepare a view that calculates the new values and then apply those changes with an UPDATE.
The newSalary view includes two columns: the primary key of the table to be updated (employee) and the new value of the salary. You can preview the result of this view before executing a simple UPDATE to transfer the new values into place.
You can define the view that contains the new salaries for every employee as follows:
mysql> CREATE VIEW newSalary AS
    ->   SELECT id, CASE WHEN COUNT(emp) = 0 THEN salary+100
    ->                   WHEN COUNT(emp) > 1 THEN salary-100
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Choose the Right Join Style for Your Relationships
When a relationship between tables is optional, you need an OUTER JOIN. When querying over many changes, if you require an OUTER JOIN you sometimes have to change all the other INNER JOINs into OUTER JOINs.
There are two common patterns of JOINs: the chain and the star, as shown in Figure 2-1 and Figure 2-2. Each is described in the following sections.
Figure 2-1: A JOIN chain
Figure 2-2: A JOIN star
There are two references in this JOIN chain example. The reference from trip, shown in Table 2-7, to budget, shown in Table 2-8, is optional—users may put a NULL value in the budget field of a trip row. The link from budget to staff (shown in Table 2-9) is mandatory; every row of the budget table must have a value in the budgetHolder field. Therefore, you use an OUTER JOIN when querying across trip and budget and an INNER JOIN when querying across budget and staff.
Table : The trip table
tripIDdescriptionbudget
TR01SicilyNULL
TR02EgyptCTH22
Table : The budget table
budgetIddescriptionbudgetHolder(NOT NULL)
CT22Officer’s messST02
Table : The staff table
staffIdnamerank
ST01YossarianCaptain
ST02MiloLieutenant
If you want to list all of the trips with associated budget details, you must use a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Generate Combinations
A JOIN with no join conditions results in every row in one table being connected to every row in another table, forming all possible row combinations. Often this is done by mistake, but it can be useful.
CROSS JOIN queries occur rarely, but when you need them you need to know how to handle them. A table that is used more than once is known as a self-join. If there are no join conditions between the two instances of the same table, your query will produce every combination of rows possible. So, a table with a row containing 'A' and a row containing 'B', when joined with itself, will produce ('A','A'), ('A','B'), ('B','A'), and ('B','B'). The effect is to produce all combinations of rows.
To demonstrate this further, say you have four soccer teams in your soccer league. Each one will play the other twice—once at home and once away—as shown in Table 2-10 and Table 2-11.
Table : The teams table
teamname
Lions
Tigers
Wildcats
Toads
Table : The tscores table
hometeamawayteamhomescoreawayscore
LionsWildcats14
ToadsTigers35
WildcatsTigers00
You need to write a query that shows the current scores for all possible games. To get all the possible combinations use a CROSS JOIN:
mysql> SELECT home.teamname Home, away.teamname Away
    -> FROM teams home CROSS JOIN teams away
    -> ;
+----------+----------+
| Home     | Away     |
+----------+----------+
| Lions    | Lions    |
| Tigers   | Lions    |
| Wildcats | Lions    |
| Toads    | Lions    |
| Lions    | Tigers   |
| Tigers   | Tigers   |
| Wildcats | Tigers   |
| Toads    | Tigers   |
| Lions    | Wildcats |
| Tigers   | Wildcats |
| Wildcats | Wildcats |
| Toads    | Wildcats |
| Lions    | Toads    |
| Tigers   | Toads    |
| Wildcats | Toads    |
| Toads    | Toads    |
+----------+----------+
16 rows in set (0.00 sec)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Text Handling
SQL has extensive text-handling capabilities built in. You can extract parts of a string using the standard functions. The basic operators, such as LIKE and || or CONCAT (concatenation), are all that you need for everyday queries. But there are some more exotic facilities, such as full-text indexing and string hashing, that can make your code faster and smarter.
You can do a simple keyword search using the LIKE operator. Unfortunately, this can be slow. Fortunately, an efficient keyword search is available in many systems.
Often you must store large chunks of text in a table. For example, suppose you have a table called story, which contains the author of a story and the story itself:
CREATE TABLE story (
  author varchar(100),
  body   varchar(1000)
);
INSERT INTO story (author,body) VALUES('Atzeni'
   ,'Many database systems, through the use of SQL,↵
    are wonderful at collating...');
INSERT INTO story (author,body) VALUES('Adams'
   ,'The definitions involved in understanding SQL databases are big.↵
    You may have thought the distance from your chair to the fridge↵
    was big, but that''s peanuts compared to standard definitions.');
INSERT INTO story (author,body) VALUES('Russell and Cumming'↵
   ,'Often you must store large chunks of text in a table.');
If you wanted to find out which body has the phrase “database system” in it, you could do the following:
SELECT author FROM story
 WHERE body LIKE '%database system%'
This accurately returns matches where the exact match is found. However, for some text searches, partial matches would also be useful, as well as common roots (such as “system” and “systems”) and result weighting (a higher score for “database system” than for “the database used a system”).
In Oracle, the LIKE operator is case sensitive. If you want to do a case-insensitive search you can force the value into lowercase:
SELECT author FROM story WHERE LOWER(body) LIKE '%database system%'
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Search for Keywords Without LIKE
You can do a simple keyword search using the LIKE operator. Unfortunately, this can be slow. Fortunately, an efficient keyword search is available in many systems.
Often you must store large chunks of text in a table. For example, suppose you have a table called story, which contains the author of a story and the story itself:
CREATE TABLE story (
  author varchar(100),
  body   varchar(1000)
);
INSERT INTO story (author,body) VALUES('Atzeni'
   ,'Many database systems, through the use of SQL,↵
    are wonderful at collating...');
INSERT INTO story (author,body) VALUES('Adams'
   ,'The definitions involved in understanding SQL databases are big.↵
    You may have thought the distance from your chair to the fridge↵
    was big, but that''s peanuts compared to standard definitions.');
INSERT INTO story (author,body) VALUES('Russell and Cumming'↵
   ,'Often you must store large chunks of text in a table.');
If you wanted to find out which body has the phrase “database system” in it, you could do the following:
SELECT author FROM story
 WHERE body LIKE '%database system%'
This accurately returns matches where the exact match is found. However, for some text searches, partial matches would also be useful, as well as common roots (such as “system” and “systems”) and result weighting (a higher score for “database system” than for “the database used a system”).
In Oracle, the LIKE operator is case sensitive. If you want to do a case-insensitive search you can force the value into lowercase:
SELECT author FROM story WHERE LOWER(body) LIKE '%database system%'
The LIKE clause forces the database system to do a linear scan of the text fields in order to find the words of interest, and therefore performance will be slow. What you really need is an index on the words in these text strings. The FULLTEXT construct supports this type of indexing. It has other advantages as well: it can use a natural language engine to aid the matching algorithm, and it can return the quality of the match (rather than just
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Search for a String Across Columns
A string that you are looking for might be in any one of several columns. You can search them all at once rather than individually.
Say you have a table of people’s bedroom colors, as shown in Table 3-1. Does anyone have yellow anywhere in their room?
Table : The bedroom table
namefloorcolorceilingcolorwallcolor
JimREDGREENYELLOW
BobYELLOWBLUEBLACK
AllanBLUEPINKBLACK
GeorgeBLUEGREENOAK
To determine which people have yellow in their rooms, you could say:
SELECT name FROM bedroom
WHERE floorcolor = 'YELLOW'
OR    ceilingcolor = 'YELLOW'
OR    wallcolor = 'YELLOW'
However, in that case the search string is repeated for each column. Using OR increases the chances of creating careless errors in your queries. Instead, you could use CONCAT to do this in one line:
SELECT name FROM bedroom
WHERE CONCAT(floorcolor,ceilingcolor,wallcolor) like '%YELLOW%'
A downside of this style of query is that the database system will not necessarily run the query efficiently. It is hard for the system to use indexes when using concatenated strings, and using a wildcard in a LIKE expression will not usually employ an index (if the wildcard is not near the start of the pattern, an index might be used). However, the performance hit will be noticeable only for large data sets.
With this CONCAT approach, the colors related to George would become BLUEGREENOAK. If there actually was a color called GREENOAK, you could not be sure whether GREEN and OAK are in different columns or GREENOAK is in one column. To help avoid confusion, you can add a separator:
SELECT name FROM bedroom
WHERE CONCAT(':',floorcolor,':',ceilingcolor,':',wallcolor,':') 
  like '%:YELLOW:%'
If a color can be null, it must be wrapped in COALESCE or NVL; for example, COALESCE(floorcolor,'').
COALESCE is the SQL92 standard way of doing this. In Oracle, you can also use
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Solve Anagrams
You can use SQL to solve an anagram if you load a dictionary and calculate some hashes.
You can use a hash function to find solutions to certain kinds of word puzzles. In this hack, you will load a dictionary into SQL, tidy it up, and then attach a hash function to every word. With the right hash function you will find that all anagrams hash to the same value. For example, if rat hashes to the number 327, tar will give the same hash value. You can find all of the anagrams of rat by looking in the hash bucket numbered 327.
You can create a table to hold both the words (in a column named w) and the hash value (h). You’ll need a type with a large number of bits for h: MySQL has BIGINT which, at 64 bits, is just big enough. Having an index on the hash value makes a big difference; an index on w is handy:
CREATE TABLE dict
(w   VARCHAR(50)
,h   BIGINT
,INDEX(w)
,INDEX(h)
);
MySQL, PostgreSQL, and SQL Server support a 64-bit BIGINT data type. In Oracle, the ROWID data type has 64 bits.
To load a dictionary into your database, you can use the technique shown in “Solve a Crossword Puzzle Using SQL” [Hack #5]. Another method is to use LOAD DATA in MySQL. You can load the file into a temporary table for a little processing before putting the data into dict:
mysql> CREATE TEMPORARY TABLE tmp(w VARCHAR(50), INDEX(w));
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA LOCAL INFILE '/usr/share/dict/words' INTO TABLE tmp(w);
Query OK, 483523 rows affected (3.87 sec)
Records: 483523  Deleted: 0  Skipped: 0  Warnings: 0
If you don’t have the words file on your system (Mac OS X and many Unix and Linux systems do have it), you can obtain word lists from sources such as Moby Word Lists by Grady Ward, available at http://www.gutenberg.org/etext/3201.
This list includes hyphens and apostrophes as well as some uppercase letters. You can remove these characters and force everything into lowercase:
mysql> UPDATE tmp SET w = REPLACE(REPLACE(LOWER(w),'''',''),'-','');
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Sort Your Email
An email address breaks down into a number of components of interest, such as name and domain. You can sort a list of email contacts using these components.
It can be useful to sort a list of email addresses in more than one way. For example, if you sort on domain name and then by account name, you see your contacts grouped by their organization, as shown in Table 3-2.
Table : Email ordering
Email by account nameEmail by domain name
Alan.K.Buccannan@rbs.co.uknapier.ac.uk; i.rankin
complaints@sirius-cybernetics.comnapier.ac.uk; P.Bhardwaj
i.rankin@napier.ac.ukrbs.co.uk; Alan.K.Buccannan
P.Bhardwaj@napier.ac.ukrbs.co.uk; Scott.Kemmer
Scott.Kemmer@rbs.co.uksirius-cybernetics.com; complaints
You can extract the domain name with some string functions:
mysql> SELECT SUBSTRING(e FROM POSITION('@' IN e)+1)       AS domain
    ->      , SUBSTRING(e FROM 1 FOR POSITION('@' IN e)-1) AS account
    ->   FROM email
    ->  ORDER BY domain, account;
+------------------------+------------------+
| domain                 | account          |
+------------------------+------------------+
| napier.ac.uk           | i.rankin         |
| napier.ac.uk           | P.Bhardwaj       |
| rbs.co.uk              | Alan.K.Buccannan |
| rbs.co.uk              | Scott.Kemmer     |
| sirius-cybernetics.com | complaints       |
+------------------------+------------------+
For the domain name you want to take the substring starting just past the position of the @ character: POSITION('@' IN e)+1. For the account name you take characters starting at 1 until just before the @ character; the number of characters required is POSITION('@' IN e)-1.
This works for MySQL and for PostgreSQL, both of which implement the standard functions SUBSTRING and POSITION, including the words FROM, IN, and FOR, which are used to separate the parameters. In Oracle and SQL Server, these functions have different names.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Date Handling
Hacks 19–23
SQL is capable of handling just about any date calculation. The hacks in this chapter show how to get dates into your database, and how to get weekly, monthly, and quarterly reports out with a minimum of human intervention.
In many of the hacks described here, the reports are generated using the current date; however, it is usually a simple matter to use a user-specified parameter instead [Hack #58].
There are inconsistencies among the main database vendors regarding dates. For most of the hacks in this chapter, we used MySQL as the base example and we’ve shown the variations for SQL Server, Access, Oracle, and PostgreSQL.
You should be aware that the database system might be running on a system having a different time zone than the system your applications run on (perhaps your web server is in New York and your database server is in Chicago). To minimize clock and time zone discrepancies, you should use CURRENT_TIMESTAMP to generate times whenever possible.
The SQL standard includes a complete set of rules which govern how dates should be represented and manipulated. Each vendor implementation of SQL has a variation of these rules.
The SQL standard has a DATE type for days and a TIMESTAMP type to represent a date and time. Examples of literals are DATE '2006-05-20' and TIMESTAMP '2006-06-18 10:09:05'. The ISO format used in both examples (the year followed by the month followed by the day) has the advantage of sorting correctly even when it’s represented as a string data type. It is also visibly different from both the American convention that puts the month first, and the European style that puts the day first.
Oracle, PostgreSQL, and MySQL adhere to the SQL standard for representing dates and timestamps, but Microsoft’s SQL Server and Access use a slightly different approach. SQL Server and Access will accept a date literal such as
Additional content appearing in this sect