BUY THIS BOOK
Add to Cart

Print Book $34.95


Add to Cart

Print+PDF $45.44

Add to Cart

PDF $27.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £24.95

What is this?

Looking to Reprint or License this content?


Transact-SQL Cookbook
Transact-SQL Cookbook By Ales Spetic, Jonathan Gennick
March 2002
Pages: 302

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Pivot Tables
Support for a sequence of elements is often needed to solve various SQL problems. For example, given a range of dates, you may wish to generate one row for each date in the range. Or, you may wish to translate a series of values returned in separate rows into a series of values in separate columns of the same row. To implement such functionality, you can use a permanent table that stores a series of sequential numbers. Such a table is referred to as a Pivot table.
Many of the recipes in our book use a Pivot table, and, in all cases, the table's name is Pivot. This recipe shows you how to create that table.
First, create the Pivot table. Next, create a table named Foo that will help you populate the Pivot table:
CREATE TABLE Pivot (
   i INT,
   PRIMARY KEY(i)
)

CREATE TABLE Foo(
   i CHAR(1)
)
The Foo table is a simple support table into which you should insert the following 10 rows:
INSERT INTO Foo VALUES('0')
INSERT INTO Foo VALUES('1')
INSERT INTO Foo VALUES('2')
INSERT INTO Foo VALUES('3')
INSERT INTO Foo VALUES('4')
INSERT INTO Foo VALUES('5')
INSERT INTO Foo VALUES('6')
INSERT INTO Foo VALUES('7')
INSERT INTO Foo VALUES('8')
INSERT INTO Foo VALUES('9')
Using the 10 rows in the Foo table, you can easily populate the Pivot table with 1,000 rows. To get 1,000 rows from 10 rows, join Foo to itself three times to create a Cartesian product:
INSERT INTO Pivot
   SELECT f1.i+f2.i+f3.i
   FROM Foo f1, Foo F2, Foo f3
If you list the rows of Pivot table, you'll see that it has the desired number of elements and that they will be numbered from 0 through 999.
You can generate more rows by increasing the number of joins. Join Foo four times, and you'll end up with 10,000 rows (10 * 10 * 10 * 10).
As you'll see in recipes that follow in this book, the Pivot table is often used to add a sequencing property to a query. Some form of Pivot table is found in many SQL-based systems, though it is often hidden from the user and used primarily within predefined queries and procedures.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using a Pivot Table
Support for a sequence of elements is often needed to solve various SQL problems. For example, given a range of dates, you may wish to generate one row for each date in the range. Or, you may wish to translate a series of values returned in separate rows into a series of values in separate columns of the same row. To implement such functionality, you can use a permanent table that stores a series of sequential numbers. Such a table is referred to as a Pivot table.
Many of the recipes in our book use a Pivot table, and, in all cases, the table's name is Pivot. This recipe shows you how to create that table.
First, create the Pivot table. Next, create a table named Foo that will help you populate the Pivot table:
CREATE TABLE Pivot (
   i INT,
   PRIMARY KEY(i)
)

CREATE TABLE Foo(
   i CHAR(1)
)
The Foo table is a simple support table into which you should insert the following 10 rows:
INSERT INTO Foo VALUES('0')
INSERT INTO Foo VALUES('1')
INSERT INTO Foo VALUES('2')
INSERT INTO Foo VALUES('3')
INSERT INTO Foo VALUES('4')
INSERT INTO Foo VALUES('5')
INSERT INTO Foo VALUES('6')
INSERT INTO Foo VALUES('7')
INSERT INTO Foo VALUES('8')
INSERT INTO Foo VALUES('9')
Using the 10 rows in the Foo table, you can easily populate the Pivot table with 1,000 rows. To get 1,000 rows from 10 rows, join Foo to itself three times to create a Cartesian product:
INSERT INTO Pivot
   SELECT f1.i+f2.i+f3.i
   FROM Foo f1, Foo F2, Foo f3
If you list the rows of Pivot table, you'll see that it has the desired number of elements and that they will be numbered from 0 through 999.
You can generate more rows by increasing the number of joins. Join Foo four times, and you'll end up with 10,000 rows (10 * 10 * 10 * 10).
As you'll see in recipes that follow in this book, the Pivot table is often used to add a sequencing property to a query. Some form of Pivot table is found in many SQL-based systems, though it is often hidden from the user and used primarily within predefined queries and procedures.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Sets
SQL, as a language, was developed around the concept of a set. You may remember studying sets in elementary school, or perhaps you studied set algebra in high school or college. While SQL statements such as SELECT, UPDATE, and DELETE can be used to work on one row of data at a time, the statements were designed to operate on sets of data, and you gain the best advantage when using them that way. In spite of all this, we commonly see programs that use SQL to manipulate data one row at a time rather than take advantage of the SQL's powerful set-processing capabilities. We hope that, with this chapter, we can open your eyes to the power of set manipulation.
When you write SQL statements, try not to think in terms of procedures such as selecting a record, updating it, and then selecting another. Instead, think in terms of operating on a set of records all at once. If you're used to procedural thinking, set thinking can take some getting used to. To help you along, this chapter presents some recipes that demonstrate the power of a set-oriented approach to programming with SQL.
The recipes in this chapter are organized to demonstrate different types of operations that can be performed on sets. You'll see how to find common elements, summarize the data in a set, and find the element in a set that represents an extreme. The operations don't necessarily conform to the mathematical definition of set operations. Rather, we extend those definitions and use algebraic terminology to solve real-world problems. In the real world, some deviations from tight mathematical definitions are necessary. For example, it's often necessary to order the elements in a set, an operation that is not possible with mathematically defined sets.
Before diving into the recipes, we would like to step briefly through some basic set concepts and define the terminology used in this chapter. Although we are sure you are familiar with the mathematical concepts of sets, intersections, and unions, we would like to put each of these set-algebra terms into the context of a real-world example.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction
Before diving into the recipes, we would like to step briefly through some basic set concepts and define the terminology used in this chapter. Although we are sure you are familiar with the mathematical concepts of sets, intersections, and unions, we would like to put each of these set-algebra terms into the context of a real-world example.
There are three types of components to be aware of when working with sets. First is the set itself. A set is a collection of elements, and, for our purposes, an element is a row in a database table or a row returned by a query. Lastly, we have the universe, which is a term we use to refer to the set of all possible elements for a given set.

Section 2.1.1.1: Sets

A set is a collection of elements. By definition, the elements must not be duplicated, and they are not ordered. Here, the mathematical definition of a set differs from its practical use in SQL. In the real world, it's often useful to sort the elements of a set into a specified order. Doing so allows you to find extremes such as the top five, or bottom five, records. Figure 2-1 shows an example of two sets. We'll be referring to these examples as we discuss various aspects of set terminology.
Figure 2-1: Two sets
For our purposes, we will consider a set to be a collection of rows from a table identified by one common element. Consider, for example, the following table of order items. This table is a collection of sets, where each set is identified by a unique order-identification number.
CREATE TABLE OrderItems(
   OrderId INTEGER,
   ItemId INTEGER,
   ProductId CHAR(10),
   Qty INTEGER,
   PRIMARY KEY(OrderId,ItemId)
)
Each set in this case represents an order and will have a number of elements that are not duplicated. The elements will be rows defining the products and the quantity of those products being ordered. The common element is the OrderId column.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Students Example
All the examples in this chapter make use of a set of tables in a system that tracks students, courses, and grades. You are a database administrator at the local university, which keeps students' records in a relational database. The university requires that each student prepare several term papers for each course that they take. A professor grades each paper that is submitted, and the Score is stored in a table.
The table in which term paper Scores are recorded is named Students. Each row contains a course ID, the student's name (used as a unique identifier for students), and the Score earned for a term paper. Each term paper can have a maximum Score of 25, so you do not have to normalize the Score to a unique base. Term papers are identified by a sequential integer that starts over at 1 for each new course that a student takes. Here is the table structure:
CREATE TABLE Students (
   CourseId CHAR(20),
   StudentName CHAR(40),
   Score DECIMAL(4,2),
   TermPaper INTEGER
)
If you execute the ch01.ImplementingSetDifference.objects.sql script, all the tables needed for the recipes in this chapter will be created and populated with data. The data in the Students table will then look like this:
CourseId             StudentName     Score  TermPaper  
-------------------- --------------- ------ ----------- 
ACCN101              Andrew          15.60  4
ACCN101              Andrew          10.40  2
ACCN101              Andrew          11.00  3
ACCN101              Bert            13.40  1
ACCN101              Bert            11.20  2
ACCN101              Bert            13.00  3
ACCN101              Cindy           12.10  1
ACCN101              Cindy           16.20  2
MGMT120              Andrew          20.20  1
MGMT120              Andrew          21.70  2
MGMT120              Andrew          23.10  3
MGMT120              Cindy           12.10  1
MGMT120              Cindy           14.40  2
MGMT120              Cindy           16.00  3
Each student needs to write three term papers for each of two courses. Currently, Andrew has submitted three papers for the accounting and management class, Cindy has submitted two for accounting and three for management, and Bert has submitted three for accounting and none for the management class.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Implementing Set Difference
You want to compute the difference between two sets. For example, you want to find all the courses Andrew has taken that Cindy has not yet taken. As a variation on this problem, you also want to subtract one set from the combination of all other sets. In other words, Cindy wants to find out how her friends are doing by listing all term papers that they have completed that she has not.
There are two problems in this recipe. The first is a simple subtraction of one set from another. The second is a subtraction of one set from the union of all other sets. The solution to both is really one and the same because the union of more than one set is simply a larger set.

Section 2.3.2.1: Subtracting one set from another

Consider the problem of finding out which term papers Andrew has completed that Cindy has not completed. There are two sets involved: the set of papers that Andrew has completed and the set of papers that Cindy has completed. The following query returns the difference between these two sets:
SELECT s.CourseId, s.TermPaper
FROM Students s
WHERE s.StudentName='Andrew' AND 
   NOT EXISTS(
      SELECT * FROM Students s1 
      WHERE s1.CourseId=s.CourseId AND 
         s1.TermPaper=s.TermPaper AND 
         s1.StudentName='Cindy')
The results returned by this query will look like this:
CourseId  TermPaper   
--------- ----------- 
ACCN101   4
ACCN101   3

Section 2.3.2.2: Subtracting one set from all others

A slight variation on this problem is to subtract one specific set from the union of all other sets. This leads to the second part of our problem — that of finding which term papers Cindy's friends have taken, but not Cindy. The following query will do this:
SELECT s.StudentName, s.CourseId, s.TermPaper
FROM Students s
WHERE s.StudentName<>'Cindy' AND
   NOT EXISTS(
      SELECT * FROM Students s1 
      WHERE s.CourseId=s1.CourseId AND 
         s.TermPaper=s1.TermPaper AND 
         s1.StudentName='Cindy')
This query returns the following result:
StudentName   CourseId  TermPaper   
------------- --------- ----------- 
Andrew        ACCN101   4
Andrew        ACCN101   3
Bert          ACCN101   3
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Comparing Two Sets for Equality
You want to compare two sets of rows for equality. For example, you took a snapshot of the Students table in October and another in November. Now, you want to compare those two copies.
The snapshot for October:
CourseId             StudentName    Score  TermPaper   
-------------------- -------------- ------ ----------- 
ACCN101              Andrew         11.00  3
And for November:
CourseId             StudentName   Score  TermPaper   
-------------------- ------------- ------ ----------- 
ACCN101              Andrew        11.00  3
ACCN101              Andrew        11.00  3
ACCN101              Bert          13.40  1
One solution is to consider the two tables as two different sets. You can then adapt the "Implementing Set Difference" recipe to this problem by using it to report rows in one table that are not also contained in the other. The following query will do this for two tables named StudentsOct and StudentsNov:
SELECT so.*, COUNT(*) DupeCount, 'StudentsOct' TableName
FROM StudentsOct so
GROUP BY so.CourseId, so.StudentName, so.Score, so.TermPaper
HAVING NOT EXISTS (
      SELECT sn.*, COUNT(*)
      FROM StudentsNov sn
      GROUP BY sn.CourseId, sn.StudentName, sn.Score, sn.TermPaper
      HAVING sn.CourseId=so.CourseId AND 
         sn.TermPaper=so.TermPaper AND 
         sn.StudentName=so.StudentName AND
         COUNT(*) = COUNT(ALL so.CourseId))
UNION
SELECT sn.*, COUNT(*) DupeCount, 'StudentsNov' TableName
FROM StudentsNov sn
GROUP BY sn.CourseId, sn.StudentName, sn.Score, sn.TermPaper
HAVING NOT EXISTS (
      SELECT so.*, COUNT(*)
      FROM StudentsOct so
      GROUP BY so.CourseId, so.StudentName, so.Score, so.TermPaper
      HAVING so.CourseId=sn.CourseId AND 
         so.TermPaper=sn.TermPaper AND 
         so.StudentName=sn.StudentName AND
         COUNT(*) = COUNT(ALL sn.CourseId))
This is a somewhat complex query. However, it's really the union of two very similar SELECT statements. Once you understand the first, you'll understand the second. The first SELECT statement in the UNION returns a list of rows in StudentsOct that do not also exist in StudentsNov. The second SELECT statement does the reverse — it returns a list of rows from StudentsNov that are not also in StudentsOct. Both of these SELECT statements represent difference operations. If the two tables are equal, neither SELECT will return any rows.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Implementing Partial Intersection
You have a set of sets. You want to find the elements that represent intersections between those sets, and then you want to count the number of sets to which each of those elements belongs. The intersections can be partial. In other words, it is not necessary for an element to be present in all sets. However, you do want to specify a threshold, in terms of the number of sets, so the query results will exlude an element that falls below it. As an example of this type of problem, you want to list all term papers and show how many students have submitted each one of them.
The solution to this problem is fairly trivial. You have to count the number of term papers that occurs in the table, so you can use a GROUP BY query as shown in the following example:
SELECT CourseId, TermPaper, COUNT(*) NumStudents
FROM Students
GROUP BY TermPaper, CourseId
ORDER BY  COUNT(*) DESC
This query will return the submission count for each term paper. If you want to impose a threshold — say that you only care about term papers that have been turned in by at least two students — you can add a HAVING clause as follows:
SELECT CourseId, TermPaper, COUNT(*) NumStudents
FROM Students
GROUP BY TermPaper, CourseId
HAVING COUNT(*) >= 2
ORDER BY COUNT(*) DESC
Run against the sample data provided with this chapter, this query will produce the following result:
CourseId             TermPaper   NumStudents 
-------------------- ----------- ----------- 
ACCN101              2           3
ACCN101              1           2
ACCN101              3           2
MGMT120              1           2
MGMT120              2           2
MGMT120              3           2
The GROUP BY query takes the rows from the Students table and sorts them into groups based on term papers. Term papers are uniquely identified by a combination of course ID and term paper number. The COUNT(*) in the SELECT list causes the rows in each group to be counted. The result is the number of times each paper has been submitted.
Strictly speaking, if a count is less than 2, then it doesn't represent an intersection at all. Either nobody has written the term paper yet or only one person has written it. If you need a mathematically correct intersection, specify a minimum count of 2 in the HAVING clause. That way you will only see term papers that fall into at least two sets. The following query, for example, returns the intersection between the set of term papers written by Andrew and the set written by Cindy:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Implementing Full Intersection
You have a set of sets, and you want to find the full intersection between them. Continuing with the students example, you want to list the term papers that have been handed in by all students.
One solution is to count the students, count the number of times each term paper has been submitted, and return only those term papers where the two counts are equal. For example:
SELECT CourseId, TermPaper 
FROM Students
GROUP BY TermPaper, CourseId
HAVING COUNT(*)=(SELECT COUNT(DISTINCT StudentName) FROM Students)
When run against the sample data for this chapter, this query will return the following result:
CourseId             TermPaper   
-------------------- ----------- 
ACCN101              2
The query might look a bit strange at first. However, it follows the logic introduced in the previous recipe. The sets in the problem are groups of term papers identified by student and course IDs. In other words, we are dealing with the different sets of term papers turned in by each student. Contrary to what your initial instincts might be, we do not group the term papers into sets by students; rather, we group them into sets by paper. The GROUP BY CourseId, TermPaper clause does this for us.
The idea is that for each term paper, we use COUNT(*) in the HAVING clause to count the number of submissions. Then we compare this number to the total count of students in the table. If there are as many students in the table as there are term papers of one kind, all students have handed in the term paper and can be included in the query's result set.
Please note the DISTINCT clause in the second SELECT. It's a common mistake to forget that. That would be a mistake because then the inner SELECT would count all rows in the table instead of counting the number of distinct students.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Classifying Subsets
You want to classify aggregated results from subsets into classes of common properties. For example, you want to give each student a grade for each course they are taking. Grades are based on the average Score calculated from term papers in each course. The query has to implement the following grading rules shown in Table 2-1.
Table 2-1: Average scores required for a given grade
Grade
Average score
A
22
B
19
C
16
D
13
E
10
You have to calculate an average term paper Score per student and then classify that average according to the grading rules laid down in Table 2-1. The following query does this:
SELECT CourseId, StudentName, AVG(Score) Score,(
   CASE WHEN AVG(Score)>=22 THEN 'A' 
      WHEN AVG(Score)>=19 THEN 'B'
      WHEN AVG(Score)>=16 THEN 'C'
      WHEN AVG(Score)>=13 THEN 'D'
      WHEN AVG(Score)>=10 THEN 'E'
      ELSE 'F' END) Grade
FROM Students s
GROUP BY CourseId, StudentName
This query will return the following results:
CourseId  StudentName  Score grade 
--------- ------------ ----- ----- 
ACCN101   Andrew       12.33   E
MGMT120   Andrew       21.66   B
ACCN101   Bert         12.53   E
ACCN101   Cindy        14.15   D
MGMT120   Cindy        14.16   D
The solution uses the CASE function to classify the results. It groups term papers together into sets based on student and course. It then calculates an average Score for the term papers present in a particular set. Then, in the CASE statement, the calculated average is compared to the grading rules, and appropriate grades are returned.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summarizing Classes of Sets
You want to calculate the number of times subsets fall into different classes, and you want to measure the sizes of those classes when classification is performed on nonaggregated data. As an example of this type of problem, let's say you want to count the number of A papers, B papers, and so forth for each student.
An alternative way of stating the problem is to say that you need to count the number of times each student is given each grade. The following query does this:
SELECT s.StudentName,(
   CASE WHEN s.Score>=22 THEN 'A' 
      WHEN s.Score>=19 THEN 'B'
      WHEN s.Score>=16 THEN 'C'
      WHEN s.Score>=13 THEN 'D'
      WHEN s.Score>=10 THEN 'E'
      ELSE 'F' END) Grade,
   COUNT(*) NoOfPapers
FROM Students s
GROUP BY s.StudentName,
   CASE WHEN s.Score>=22 THEN 'A' 
      WHEN s.Score>=19 THEN 'B'
      WHEN s.Score>=16 THEN 'C'
      WHEN s.Score>=13 THEN 'D'
      WHEN s.Score>=10 THEN 'E'
      ELSE 'F' END
ORDER BY s.StudentName
The results returned by this query will resemble these:
StudentName Grade NoOfPapers  
----------- ----- ----------- 
Andrew      A     1
Andrew      B     2
Andrew      D     1
Andrew      E     2
Bert        D     2
Bert        E     1
Cindy       C     2
Cindy       D     1
Cindy       E     2
The code demonstrates an interesting feature of SQL that is not used very often — the use of a CASE statement within a GROUP BY clause. This not often seen, though it is a fairly powerful construct. The first CASE statement, the one in the main part of the query, assigns a grade to each term paper. Instead of aggregating the results by student and paper, the results are then aggregated by student and grade. By counting the number of records in each group, we find out how many A grades a student has, how many B grades, and so forth. The following example illustrates this grouping and counting process:
StudentName Grade NoOfPapers  
----------- ----- ----------- 
Andrew      B
Andrew      B
COUNT(*)          2

Bert        D  
Bert        D
COUNT(*)          2
If you wanted to, you could extend the query so that it also calculated the percentage that each grade represented out of the total number of term papers taken. The following SELECT list shows the extra column that you would need to add to do this:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Aggregating Aggregates
You want to select some data, aggregate it, and then aggregate it again. Here's an example of this type of problem: the administration at the university is preparing an internal report for the dean, who wants to compare grading habits of professors. One of the measures the dean wants to look at is the spread between the average term paper Score for each student in a given course. The spread is the difference between the best and the worst student Score per course. Your job is to find the best and the worst average Score in each course and calculate the difference.
Use a nested SELECT statement in the FROM clause of your query to generate the first aggregation. Then write your enclosing SQL statement so that it takes those aggregated results and aggregates them again. For example:
SELECT CourseId, MAX(l.s) Best ,MIN(l.s) Worst, 
   MAX(l.s)-MIN(l.s) Spread 
FROM (
   SELECT CourseId, AVG(Score) AS s 
   FROM Students 
   GROUP BY CourseId, StudentName) AS l
GROUP BY CourseId
The query will return the following result:
CourseId  Best   Worst  Spread                                   
--------- ------ ------ --------- 
ACCN101   14.15  12.33  1.81
MGMT120   21.66  14.16  7.50
SQL does not allow you to directly enclose one aggregate function within another. In other words, a query written in the following manner would not execute:
SELECT CourseId, MAX(AVG(stock)), MIN(AVG(stock))
FROM Students
GROUP BY CourseId, studentsName
However, you can work around this by using a result from one query as the source for a second query. This trick can be used as a general solution for problems where you have to aggregate already aggregated data. In the case of the solution shown in this recipe, the innermost query takes all the term paper Scores for each student in each course and returns the average of those Scores. The intermediate results look like this:
CourseId             s        
-------------------- ---------
ACCN101              12.333333
MGMT120              21.666666
ACCN101              12.533333
ACCN101              14.150000
MGMT120              14.166666
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summarizing Aggregated Classes
You want to calculate the number of times subsets fall into different classes, and you want to measure the sizes of those classes when classification is performed on already aggregated data. For example, say you want to count the number of course grades per student. A course grade is calculated by averaging the Score of all papers for a given course and then classifying that average according to Table 2-1. This is similar to the earlier recipe titled "Summarizing Classes of Sets," but this time we must aggregate the data twice.
The following SQL query combines elements of the previous two recipes to produce the desired result:
SELECT s.StudentName,(
   CASE WHEN s.Score>=22 THEN 'A' 
      WHEN s.Score>=19 THEN 'B'
      WHEN s.Score>=16 THEN 'C'
      WHEN s.Score>=13 THEN 'D'
      WHEN s.Score>=10 THEN 'E'
      ELSE 'F' END) Grade,
   COUNT(*) NoOfCourses
FROM (
   SELECT CourseId, StudentName, AVG(Score) AS Score 
   FROM Students 
   GROUP BY CourseId, StudentName) AS s
GROUP BY s.StudentName,
   CASE WHEN s.Score>=22 THEN 'A' 
      WHEN s.Score>=19 THEN 'B'
      WHEN s.Score>=16 THEN 'C'
      WHEN s.Score>=13 THEN 'D'
      WHEN s.Score>=10 THEN 'E'
      ELSE 'F' END
ORDER BY s.StudentName
The results from executing this query will look as follows:
StudentName Grade NoOfCourses 
----------- ----- ----------- 
Andrew      B     1
Andrew      E     1
Bert        E     1
Cindy       D     2
At first glance, this query appears a bit complex and intimidating. To understand it, it's best to look at the query as a two-step process. The inline SELECT in the FROM clause calculates an average Score for each course and student combination. This average Score is computed from the individual Scores of all the term papers.
The results from the inline SELECT are fed into the outer query that translates the average Scores into letter grades and then counts up the number of times each grade occurs. The CASE statement in the SELECT list does the classification. The case statement in the GROUP BY clause aggregates the results by grade, allowing the count to be computed.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Including Nonaggregated Columns
You want to write a GROUP BY query that returns summarized data, and you want to also include nonaggregated columns in the result. These nonaggregated columns do not appear in the GROUP BY clause. With respect to the students example, let's say that each course is graded according to the best term paper that each student has submitted for that course. For administrative reasons, you must find out which term paper has the best Score for each student/course combination.
The following query shows each student's highest term paper Score in each class that they took. An inline query is also used to return the specific term paper number that corresponds to that Score.
SELECT StudentName,CourseId,
   (SELECT  MAX(TermPaper) 
      FROM Students 
      WHERE Score=MAX(s.Score)and 
         StudentName=s.StudentName and 
         CourseId=s.CourseId) TermPaper, 
   MAX(s.Score) Score
FROM Students s
GROUP BY CourseId, StudentName
The output from this query will be as follows:
StudentName  CourseId  TermPaper  Score  
------------ --------- ---------- ------ 
Andrew       ACCN101   4          15.60
Andrew       MGMT120   3          23.10
Bert         ACCN101   1          13.40
Cindy        ACCN101   2          16.20
Cindy        MGMT120   3          16.00
This recipe highlights an interesting problem that has been often debated by SQL programmers. The problem is that when an extreme has been identified using an aggregation such as MAX, SQL insists that the SELECT list contain only the result of the aggregation and the columns from the GROUP BY clause. In real life, many programmers have wished for the ability to include nonaggregated columns as well. They not only want to know the maximum term paper Score for a course, but they want to know the number of that term paper. In other words, they want to write a SQL statement such as this:
SELECT StudentName, CourseId, TermPaper, MAX(Score)
FROM Students s
GROUP BY CourseId, StudentName
This kind of query is invalid. The problem is that it can be executed only if the TermPaper column is added to the GROUP BY clause. We potentially face the same problem when writing our query, but we've avoided it by writing an inline SELECT statement to retrieve the term paper number corresponding to the high Score.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding the Top N Values in a Set
You want to find the first N elements of an ordered set. In most cases, this means that you want to find the top N records. Assume that the grading rules of the school require professors to use only the best two Scores from the term papers each student submitted. You need to write a query that returns that information. You don't want all Scores for each student, only the top two.
The most straightforward solution to this problem is to use the TOP keyword. TOP is a MS SQL Server extension to SQL that allows you to limit a query so that it returns only the first N records. The following query returns the top two Scores for each student in each course:
SELECT  s1.StudentName, s1.CourseId, s1.TermPaper, MAX(s1.Score) Score
FROM Students s1
GROUP BY s1.CourseId, s1.StudentName, s1.TermPaper
HAVING MAX(s1.Score) IN 
   (SELECT TOP 2 s2.Score 
       FROM Students s2
       WHERE s1.CourseId=s2.CourseId AND
          s1.StudentName=s2.StudentName
    ORDER BY s2.Score DESC)
ORDER BY s1.StudentName, s1.CourseId, s1.Score DESC
An alternative solution is a bit less Transact-SQL-specific and a bit less intuitive. It is, however, more general, and it conforms with the SQL standard:
SELECT  s1.StudentName,s1.CourseId, s1.TermPaper, MAX(s1.Score) Score
FROM Students s1 INNER JOIN Students s2
   ON s1.CourseId=s2.CourseId AND
      s1.StudentName=s2.StudentName
GROUP BY s1.CourseId, s1.StudentName, s1.TermPaper
HAVING SUM(CASE WHEN s1.Score <= s2.Score THEN 1 END) <= 2
ORDER BY s1.StudentName, s1.CourseId, s1.Score DESC
Both queries list the two highest-scoring term papers for each student in each course and order the results in descending order by Score. The results will resemble the following output:
StudentName  CourseId  TermPaper  Score  
------------ --------- ---------- ------ 
Andrew       ACCN101   4          15.60
Andrew       ACCN101   3          11.00
Andrew       MGMT120   3          23.10
Andrew       MGMT120   2          21.70
Bert         ACCN101   1          13.40
Bert         ACCN101   3          13.00
Cindy        ACCN101   2          16.20
Cindy        ACCN101   1          12.10
Cindy        MGMT120   3          16.00
Cindy        MGMT120   2          14.40
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Reporting the Size of a Set's Complement
You want to report the number of missing values for a set. As an example, assume that each student must submit four term papers for each course. Not all students have submitted all required papers, and you want to generate a report showing the number that each student has yet to submit for each course.
SELECT  s.StudentName, s.CourseId, 4-COUNT(TermPaper) Missing
FROM Students s
GROUP BY s.StudentName, s.CourseId
ORDER BY s.StudentName
Using the sample data provided for this chapter, the query in this solution should return the following result:
StudentName  CourseId  Missing     
------------ --------- ----------- 
Andrew       ACCN101   1
Andrew       MGMT120   1
Bert         ACCN101   1
Cindy        ACCN101   2
Cindy        MGMT120   1
This query is very straightforward. It's an aggregate query that summarizes results by course and student combination. The aggregate function COUNT(TermPaper) is used to count the number of papers a student submits for a course. That result is then subtracted from the required number — four in this case — and the result is the number of missing term papers.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding the Complement of a Set
You want to find the complement of a set. Given the students example being used in this chapter, you want to list the missing term papers for each student.
To list rows that are missing, you have to know the available set of values. For this solution, therefore, it's necessary to create a support table that we can then use to generate the universe of possible term paper values for each student in each course.

Section 2.14.2.1: Step 1: Create the Pivot table

Since we are dealing with term paper numbers, we need a Pivot table with one numeric column. We'll use the standard Pivot table for that, as explained in the Section 1.1 recipe in Chapter 1.
It's worth mentioning here that this Pivot table is still useful even if the number of term papers required by each course is different. The key is for the number of rows in the Pivot table to match the largest number of term papers required by any course.

Section 2.14.2.2: Step 2: Run the query

With the Pivot table in place, the following query will return the list of missing term papers:
SELECT  s.StudentName, s.CourseId, f.i TermPaper
FROM Students s, Pivot f
WHERE f.i BETWEEN 1 AND 4 
GROUP BY s.StudentName, s.CourseId, f.i
HAVING NOT EXISTS(
   SELECT * FROM Students 
   WHERE CourseId=s.CourseId AND 
      StudentName=s.StudentName AND 
      TermPaper=f.i)
ORDER BY s.StudentName
The results returned by this query should resemble the following:
StudentName  CourseId  TermPaper   
------------ --------- ----------- 
Andrew       ACCN101   1
Andrew       MGMT120   4
Bert         ACCN101   4
Cindy        ACCN101   3
Cindy        ACCN101   4
Cindy        MGMT120   4
The queries in this recipe use a Pivot table in addition to the tables that contain the actual data. We need the Pivot table, because we need to know which term paper numbers are possible and we need to generate rows for term papers that don't exist. If a term paper hasn't been handed in, it won't have a corresponding row in the Students table. The Pivot table allows us to generate a row for that missing paper, which will be included in the query's result.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding the Complement of a Missing Set
The query in the previous recipe has one significant drawback: it does not report missing term papers for students who have not yet completed at least one term paper. That's because the driving table is the Students table. If a student hasn't handed in at least one term paper for a course, then there won't be any records in the Students table for that student/course combination. How then, do you report missing term papers for such students?
The key to reporting missing term papers for students who have not yet turned in any term papers is to find a reliable way to identify those students. There are two possibilities:
  • Define an empty row in the Students table to identify each student.
  • Create a master table that contains one row for each student.
From a database-design perspective, the second solution is probably the best, because you end up with a typical many-to-one relationship between two tables. The first solution is something of a kludge, because it uses the Students table as both a master table and as a detail table.

Section 2.15.2.1: Solution 1: Define empty rows in the Students table

Our first solution calls for empty records to be inserted into the Students table for each student/course combination. For example, the following record would be inserted to show that David was enrolled in the ACCN101 course:
INSERT INTO Students(CourseId, StudentName, Score, TermPaper)
 VALUES('ACCN101','David',0,0)
Notice that the Score and term paper number have both been set to 0. With these records in place, the query presented in the previous recipe can be used to display the list of missing term papers. This list will now include cases where a student has missed all term papers in a given course. The zero-records that we've inserted won't show in the final result because our query excludes the Pivot table row for zero.

Section 2.15.2.2: Solution 2: Create a student master table

A cleaner option than creating special zero-records in the Students table is to create a completely separate student master table. This table would then track each students's enrollment in the various courses offered by the university. The following shows one possible implementation:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding Complements of Sets with Different Universes
You want to write a query that returns the complement of several sets, but each of those sets has a different universe. For example, consider that different courses each require a different number of term papers. You want to list the papers that are missing for each student in each course. Unlike the queries in the previous recipes, this query must correctly handle the different term paper requirements (the universe) of each course.
You first need a master table to record the number of term papers required for each course:
CREATE TABLE CourseMaster(
   CourseId CHAR(20),
   numTermPapers INTEGER
)
After creating the CourseMaster table, you need to populate it with data. The following two INSERTs specify that the ACCN101 course requires four term papers and that the MGMT120 course requires three term papers:
INSERT INTO CourseMaster VALUES('ACCN101',4)
INSERT INTO CourseMaster VALUES('MGMT120',3)
With the CourseMaster table created and populated, you can use the following query to report on term papers that have not yet been submitted:
SELECT  s.StudentName, s.CourseId, f.i TermPaper
FROM Students s, Pivot f, CourseMaster c
WHERE f.i BETWEEN 1 AND c.numTermPapers 
   AND c.CourseId=s.CourseId 
GROUP BY s.StudentName, s.CourseId, f.i
HAVING NOT EXISTS(
   SELECT * FROM Students 
   WHERE CourseId=s.CourseId AND 
      StudentName=s.StudentName AND 
      TermPaper=f.i)
ORDER BY s.StudentName
The result is a list of term papers that each student still needs to submit to pass a given course:
StudentName  CourseId  TermPaper   
------------ --------- ---------- 
Andrew       ACCN101   1
Bert         ACCN101   4
Cindy        ACCN101   3
Cindy        ACCN101   4
This query is almost identical to the query presented in the earlier recipe titled "Finding the Complement of a Set." The major difference is how term paper requirements are specified. In the previous recipe, the number of term papers for each course was the same, and that value was represented by a constant:
WHERE f.i BETWEEN 1 AND 4 
For this recipe, each course has a different requirement. That requirement has to be recorded somewhere, and that's where the CourseMaster table comes into play. The first part of the query is then modified as follows:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Comparing a Set with Its Universe
You want to find out if a set contains all the elements of its universe. Then you want to report it as either a full or partial match. For example, assume that you want to generate a list showing which students have fulfilled all requirements with respect to the term papers they are required to turn in and which students have completed only a partial number of their term paper requirements.
Use the CourseMaster table defined in the previous recipe to determine the universe of required papers for each student and course, and then compare that to the actual papers submitted. Here's one approach to doing this:
SELECT s.StudentName, s.CourseId,
    CASE WHEN COUNT(*)=MAX(c.numTermPapers) 
       THEN 'All submitted'
       ELSE CONVERT(VARCHAR(8),MAX(c.numTermPapers)-COUNT(*))+' missing'
    END status
FROM Students s JOIN CourseMaster c
   ON s.CourseId=c.CourseId
GROUP BY s.StudentName, s.CourseId
ORDER BY s.StudentName, s.CourseId
This query will return results similar to the following:
StudentName  CourseId  status           
------------ --------- ---------------- 
Andrew       ACCN101   1 missing
Andrew       MGMT120   All submitted
Bert         ACCN101   1 missing
Cindy        ACCN101   2 missing
Cindy        MGMT120   All submitted
The query joins the Students table and the CourseMaster table to find the required number of term papers for each course. The join is done using the CourseId column to retrieve the appropriate numTermPapers value. The results are then grouped so that there is one row for each student and course combination.
The core of the query is the CASE statement. In the set of rows for each course and student, each row represents a term paper. To find the number of term papers the student has handed in, we just need to count the rows. Then the CASE statement compares that count with the required number of papers for the course. The result from the case statement will be a message indicating whether or not all requirements have been fulfilled.
The MAX function in this query is here only for syntactic purposes. The number of term papers is dependent on the CourseId value, so there will only be one number for a given group. MAX is only necessary because numTermPapers is not a GROUP BY column. You could add numTermPapers to the column list in the GROUP BY clause — allowing you to dispense with MAX — but the extra GROUP BY column will hurt the efficiency of the query.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Dynamic Classification System
Define a query-processing framework that will classify sets according to rules that you can define and change dynamically. The number of rules is unlimited. As an example, let's say that a special credit system has been introduced at our university as part of a new cross-disciplinary program introduced by a new dean. Students do not get credit points directly for courses they have finished, but instead get credit for different combinations of term papers. A given term paper combination does not need to represent one course and, indeed, is most likely to represent several courses. There are different categories of credits, and the categories are marked by type codes such as A, B, and C. To finish the program, a student must earn at least one credit point in each category. Table 2-2 shows the requirement matrix that controls whether or not credit can be granted in a given category.
Table 2-2: Category credit requirement matrix
Course
Paper 1
Paper 2
Paper 3
ACCN101
A,C2
A,C2
A,B
MGMT120
C1
C1
C1,B
To earn an A credit point, a student must complete the first, second, and third term paper in the ACCN101 course. There are two ways to earn a C credit point. One is to submit the first, second, and third term papers for MGMT120. The other way to earn a C credit point is to submit the first and second term papers for ACCN101.
This problem can not be solved directly by just applying one query. You first need to create an additional table where you store the rules. Then you can write a query that looks at both the rules and at the actual term papers that students have submitted. That query can apply the rules to the actual results and determine the proper credits to grant in each category.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Data Structures
Data structures are often considered the domain of other programming languages, and not of SQL. In this chapter, we show that data structures can also be very useful for modeling problems and implementing algorithms in SQL. To solve a real-world problem, it is often helpful to have a number of abstract data models available. If you can adapt the problem to be solved to an abstract model, the implementation of a solution is usually much easier.
Abstract data structures help you solve problems that otherwise seem complex. In this chapter, we show you how to perform operations on linear structures such as lists, stacks, and queues in SQL. We also show several recipes for working with multidimensional data structures, such as matrices and arrays. Please note that by "multidimensional," we are not referring to OLAP data structures, but to arrays and matrices as programmatic data structures.
In this chapter, we plan to discuss the following three major types of data structures:
  • Lists
  • Stacks and queues
  • Arrays and matrices
Lists, stacks, and queues are all linear data structures; the term linear referring to the fact that, conceptually, you're dealing with a set of elements arranged in the form of a line. The use of SQL is well-suited to such structures because they map easily onto the relational table paradigm. Arrays and matrices, on the other hand, are multidimensional data structures. You can use SQL to work with arrays and matrices, but it's better-suited for linear structures.
Lists are the most common linear data structure in computing. In contrast to other programming languages, lists are the easiest type of data structure to implement in SQL. A list is a sequence of elements with a known order. A list is unlimited in size and can shrink or grow on demand. Elements can be added to a list, removed from a list, and you can query to see whether a given element is in a list.
As you can see, the properties of a list are very similar to the properties of a SQL table, with the exception that a list has an order to it. That is why lists are so easy to implement. SQL tables are also unlimited in size, can be queried, and allow for entries to be added (inserted) and removed (deleted).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Types of Data Structures
In this chapter, we plan to discuss the following three major types of data structures:
  • Lists
  • Stacks and queues
  • Arrays and matrices
Lists, stacks, and queues are all linear data structures; the term linear referring to the fact that, conceptually, you're dealing with a set of elements arranged in the form of a line. The use of SQL is well-suited to such structures because they map easily onto the relational table paradigm. Arrays and matrices, on the other hand, are multidimensional data structures. You can use SQL to work with arrays and matrices, but it's better-suited for linear structures.
Lists are the most common linear data structure in computing. In contrast to other programming languages, lists are the easiest type of data structure to implement in SQL. A list is a sequence of elements with a known order. A list is unlimited in size and can shrink or grow on demand. Elements can be added to a list, removed from a list, and you can query to see whether a given element is in a list.
As you can see, the properties of a list are very similar to the properties of a SQL table, with the exception that a list has an order to it. That is why lists are so easy to implement. SQL tables are also unlimited in size, can be queried, and allow for entries to be added (inserted) and removed (deleted).
In this chapter we use a SQL table with an index to define a list:
CREATE TABLE List (
   Id INTEGER, 
   ...
)
A list in SQL is, therefore, a table with at least two attributes. One attribute is an index identifying the order of each element in the list. Any other attributes are data-carrying attributes, meaning that they hold the actual data in the list.
When you implement a list in SQL, it's very helpful if your list index is arithmetically increasing. By arithmetically, we mean that your list index should increase by 1 as new elements are appended to the list. We are going to assume in this chapter's recipes that our lists have arithmetically increasing indices.
All basic list operations are directly implementable in SQL. INSERT is used to add a new element to a list, DELETE is used to remove an element from a list, and SELECT is used to query for elements by value or by index. These operations are so straightforward that we won't show recipes for them in this chapter.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Working Example
Because of the nature of the material in this chapter, we haven't been able to apply one common example across all the recipes. We have a linear-structure example that we use in our recipes on lists, stacks, and queues. We extend that example to cover arrays. However, when it comes to matrices, we decided to use a simple, textbook-style example for clarity.
You are building a quality-control system for a biotech company. One of the production lines produces a special liquid for laboratory research. A specific characteristic you need to measure is the liquid's purity, which is indicated by a purity index. The normal level of purity is represented by a purity index of 100. Everything over 100 is assumed to be abnormal and requires some action to bring the product back into spec.
The liquid comes off the production line in containers. The table that stores quality control data contains the ID of each container along with the measured purity of the liquid within the container. The following CREATE TABLE statement shows the structure of this table. The ContainerId column contains the container ID numbers, and the Purity column contains the purity index values.
CREATE TABLE ProductionLine (
   ContainerId INTEGER,
   Purity INTEGER
)
Currently, the table contains purity information on 12 different containers:
ContainerId Purity      
----------- ----------- 
1           100
2           100
3           101
4           102
5           103
6           100
7           103
8           108
9           109
10          100
11          100
12          100
Your job, our job actually, is to develop some tools to help in evaluating the quality of the product and the functioning of the production machines.
To demonstrate the use of arrays, we'll extend our production-line example. If you need to control several production lines, you can represent their purity data as an array. For example:
CREATE TABLE ProductionFacility(
   Line INTEGER,
   ContainerId INTEGER,
   Purity INTEGER
)
At first glance, this table might not look like an array, but it is. Each row represents one purity-level reading, which is stored in the Purity column. The Line and ContainerId columns are the indices to the array. Given a line number and a container ID number, you can easily retrieve the associated purity value.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding Regions
Content preview·Buy PDF of this chapter|