Chapter 4. Date Handling
SQL is capable of handling just about any date calculation. The hacks in this chapter show how to get dates into your database, and how to get weekly, monthly, and quarterly reports out with a minimum of human intervention.
In many of the hacks described here, the reports are generated using the current date; however, it is usually a simple matter to use a user-specified parameter instead [Hack #58].
There are inconsistencies among the main database vendors regarding dates. For most of the hacks in this chapter, we used MySQL as the base example and we’ve shown the variations for SQL Server, Access, Oracle, and PostgreSQL.
Tip
You should be aware that the database system might be running
on a system having a different time zone than the system your
applications run on (perhaps your web server is in New York and your
database server is in Chicago). To minimize clock and time zone
discrepancies, you should use CURRENT_TIMESTAMP
to generate times whenever
possible.
Convert Strings to Dates
The SQL standard includes a complete set of rules which govern how dates should be represented and manipulated. Each vendor implementation of SQL has a variation of these rules.
The SQL standard has a DATE
type for days and a TIMESTAMP
type to represent a date and time.
Examples of literals are DATE
'2006-05-20'
and TIMESTAMP
'2006-06-18
10:09:05'
. The ISO format used in both
examples (the year followed by the month followed by the day) has the
advantage of sorting correctly even when it’s represented as a string
data type. It is also visibly different from both the American
convention that puts the month first, and the European style that puts
the day first.
Oracle, PostgreSQL, and MySQL adhere to the SQL standard
for representing dates and timestamps, but Microsoft’s SQL Server and
Access use a slightly different approach. SQL Server and Access will accept a date literal such as
'2006-06-08'
, but they cannot handle
the DATE
prefix.
The DATE
type does not exist in
SQL Server; you should use the DATETIME
type to represent both a date and a
moment in time. SQL Server uses the term TIMESTAMP
for an entirely different purpose.
Convert Your Dates
Suppose you have dates in user-supplied input in this
format—6/18/2006
—and you need to
create date literals for an INSERT
statement such as this one: DATE
'2006-06-18'
. Here’s how you can
accomplish this in Perl:
foreach ('6/18/2006', '12/13/2006'){ if (/(\d+)\/(\d+)\/(\d\d\d\d)/){ # Capture date parts into $1, $2, $3 my $m = substr("0$1", -2); # Left-pad with zeros if needed my $d = substr("0$2", -2); my $y = $3; $sql = "INSERT INTO d VALUES (DATE '$y-$m-$d')"; print "$sql\n"; } else { warn "Could not parse date: $!"; } }
Warning
Note that we in-lined the user-supplied values directly into the INSERT
statement. In theory, this would
have opened us up to an SQL injection attack [Hack #48]. However, the
input is fully sanitized in
that the regular expression guarantees that $y
, $m
,
and $d
contain only digits
(\d
matches any one character
between 0 and 9).
The output from this code is ready for use in MySQL, Oracle, PostgreSQL, or another engine that uses the SQL standard:
INSERT INTO d VALUES (DATE '2006-06-18'); INSERT INTO d VALUES (DATE '2006-12-13');
For Microsoft SQL Server, you need only drop the word DATE.
Table 4-1 shows some common variations of date formats.
Engine | DATE ’2006-06-01’ | ’2006-6-1’ | ’1 JUN 2006’ |
MySQL | OK | OK | Error |
SQL Server | Error | OK | OK |
Oracle | OK | Error | OK |
PostgreSQL | OK | OK | OK |
DB2 | Error | OK | Error |
Mimer | OK | Error | Error |
Standard | OK | Error | Error |
No single format works with every engine; you can’t do better than satisfy any two of the three most popular platforms (SQL Server, MySQL, and Oracle).
You also cannot publish even the simplest SQL data in a format
that everyone can read. For a nasty solution you can publish dates,
but you must capitalize the word date in an odd
way—for example, DaTe
'2006-06-01'
. SQL Server and DB2 users must
do a case-sensitive search and replace to remove the string DaTe
, but users of other engines can just
slurp the file into their engines directly. The advantage of using an
unusual form of capitalization is that the SQL engines don’t care, but
the string DaTe
is unlikely to
occur in any other part of the file, so it’s really easy to pick up
with a conversion script (you also could pipe or redirect your SQL to
a one-liner, such as perl
-pe
's/DaTe//g'
).
Warning
If MySQL comes across a date format it doesn’t recognize
(such as '1
JUN
2006'
), it accepts it without raising an
error and puts in the value DATE
'0000-00-00'
. However, if you
check your warnings, you’ll see that something went wrong:
mysql>insert into d values ('1 JUN 2006');
Query OK, 1 row affected, 1 warning (0.13 sec)mysql>show warnings\G
*************** 1. row *************** Level: Warning Code: 1265Message: Data truncated for column 'd' at row 11 row in set (0.00 sec)
If you are reading data in from another system you may be able to pass date strings in their original format and do the parsing in SQL. The Oracle example shows the general technique, but you’ll need to use different SQL functions for your database. We’ll show you those after the Oracle example.
Parse Dates with Oracle
Oracle has a neat function called TO_DATE
, which allows you to specify the
pattern used in your input string:
INSERT INTO d VALUES (TO_DATE('1 Jun 2006', 'dd Mon yyyy'))
You can specify a wide range of formats that include “filler” characters other than a space.
Using this technique, you could write a simple Perl script, for
example. If Perl has read a string such as '1
Jun
2006'
into the variable $v
, you could generate the SQL as:
my $sql = "INSERT INTO d VALUES (TO_DATE('$v', 'dd Mon yyyy'))";
If your dates are coming from an untrusted source, you should still check the pattern to guard against SQL injection attacks:
if ($v !~ /^\d+ \w\w\w \d\d\d\d$/) { warn "Injection attack."; }
If you were using XSLT, you might want to use code such as this:
<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform"> <template match="foo"> INSERT INTO dd VALUES ( TO_DATE('<value-of select='@bar'/>', ,'dd Mon yyyy')) </template> </stylesheet>
That sheet would take care of input such as <foo
bar='1
Jun
2006'/>
.
Parse Dates with MySQL
MySQL has a similar function, called STR_TO_DATE
. This works with the format
strings in MySQL format:
INSERT INTO d VALUES (STR_TO_DATE('1 Jun 2006', '%d %b %Y'));
%b
represents the abbreviated
month name, %d
is the day of the
month, and %Y
is a four-digit
year.
Parse Dates with SQL Server
If your input format is a fixed size (with leading zeros),
combine the SUBSTRING
function to build the string.
Convert a string such as '06/18/2006'
into a date:
INSERT INTO d SELECT SUBSTRING(x,7,4)+'-'+ SUBSTRING(x,1,2)+'-'+ SUBSTRING(x,4,2) FROM (SELECT '06/18/2006' AS x) y;
Uncover Trends in Your Data
Statistics gathered daily could contain both daily cycles and weekly trends. This can lead to chaotic-looking graphs when activity is plotted day by day. You can improve your graphs easily using SQL.
Look at Figure 4-1, which shows a chart of the raw figures for the number of page views for a web site per day, over the course of one year. These figures come from Webalizer, the web log analysis program (http://www.mrunix.net/webalizer). It is difficult to see the trends because the weekly cycle overwhelms the daily detail, and obscures the long-term trend.
To understand the data, you need to separate the effect of the weekly cycle from the table. You can see the weekly cycle by taking the average for Monday, the average for Tuesday, and so forth. In Figure 4-2, Monday to Sunday are numbered 0 to 6.
Notice that the value of the Sunday column (column 6) is less than half the value of the midweek columns. This is helping to cause the zigzag pattern in the original graph. If you view the data averaged per week (see Figure 4-3) rather than per day, it is easier to see the long-term trend.
Tip
The graphs and charts shown here come from Excel. Many spreadsheet applications, including Excel, have tools for importing directly from databases and producing a variety of graphical reports.
Before you can isolate these trends, you must turn the dates into integers to more easily put them into the appropriate buckets. You can pick an arbitrary date and start counting from there. Table 4-2 shows some source data.
In Table 4-3,
I’ve chosen the first day of the millennium, Monday, January 1, 2001, as
day zero. Every date must be converted into the number of days since
then. The mechanism for converting to integers is different on different
engines. In MySQL, you can create this view using the TO_DAYS
function:
CREATE VIEW webalizer2 AS SELECT TO_DAYS(whn)-TO_DAYS(DATE '2001-01-01') whn, pages FROM webalizer;
With dates now represented by integers, you can perform arithmetic on them. Taking the modulus 7 value gives you the day of the week. Because 2001-01-01 was a Monday, you will get 0 on every seventh day from then. Tuesday will give you 1, Wednesday 2, and so on, with Sunday having the value 6.
Modular Arithmetic
Look at the values for whn%7
and
FLOOR(whn/7)
. You can see that day number
1,622 (counting from 2001-01-01) is day number 5 of week number
231:
mysql>SELECT whn, whn%7, whn/7, FLOOR(whn/7)
->FROM webalizer2;
+------+-------+----------+--------------+ | whn | whn%7 | whn/7 | FLOOR(whn/7) | +------+-------+----------+--------------+ | 1622 | 5 | 231.7143 | 231 | | 1623 | 6 | 231.8571 | 231 | | 1624 | 0 | 232.0000 | 232 | | 1625 | 1 | 232.1429 | 232 | | 1626 | 2 | 232.2857 | 232 | | 1627 | 3 | 232.4286 | 232 | | 1628 | 4 | 232.5714 | 232 | | 1629 | 5 | 232.7143 | 232 | | 1630 | 6 | 232.8571 | 232 | | 1631 | 0 | 233.0000 | 233 | | 1632 | 1 | 233.1429 | 233 | ...
You need to GROUP
BY
the whn%7
column to see the weekly cycle and
GROUP
BY
the FLOOR(whn/7)
column to see the trend.
To look at the intra-week pattern shown back in Figure 4-2, you take
the average with GROUP
BY
whn%7
:
mysql>SELECT whn%7, AVG(pages)
->FROM webalizer2 GROUP BY whn%7;
+-------+------------+ | whn%7 | AVG(pages) | +-------+------------+ | 0 | 21391.6731 | | 1 | 23695.1538 | | 2 | 23026.2308 | | 3 | 24002.8077 | | 4 | 19773.9808 | | 5 | 10353.5472 | | 6 | 10173.9423 | +-------+------------+
To smooth out the data over the whole year, as shown in Figure 4-3, you can
divide by 7 and take the integer value using the FLOOR
function:
mysql>SELECT FLOOR(whn/7), AVG(pages)
->FROM webalizer2 GROUP BY FLOOR(whn/7);
+--------------+------------+ | FLOOR(whn/7) | AVG(pages) | +--------------+------------+ | 231 | 10748.5000 | | 232 | 23987.8571 | | 233 | 19321.1429 | | 234 | 15347.0000 | ...
The value for the first week is artificially low—by chance, it
includes two on only two days, and they are on weekends. Something
similar might happen at the end of the interval, so it is safest to
exclude any week that does not have seven entries. The HAVING
clause will take care of that:
mysql>SELECT FLOOR(whn/7), AVG(pages)
->FROM webalizer2 GROUP BY FLOOR(whn/7)
->HAVING COUNT(*)=7;
+--------------+------------+ | FLOOR(whn/7) | AVG(pages) | +--------------+------------+ | 232 | 23987.8571 | | 233 | 19321.1429 | | 234 | 15347.0000 | ...
This will work fine with MySQL and PostgreSQL, but you need to make a few alterations for SQL Server, Access, and Oracle.
SQL Server
Here’s how to create the view that represents dates as integers:
CREATE VIEW webalizer2 AS SELECT CONVERT(INT,whn-'2001-01-01') whn, pages FROM webalizer
The SELECT
statements shown
earlier will run unmodified.
Access
In Access, you can use Int(whn
-
#2001-01-01#)
to extract the number
of days since January 1, 2001:
SELECT Int(whn - #2001-01-01#), pages FROM webalizer
Also, MOD
is an infix operator used in place of
%
:
SELECT whn MOD 7, AVG(pages) FROM webalizer2 GROUP BY whn MOD 7;
Oracle
Here’s how to create the view that represents dates as integers:
CREATE VIEW webalizer2 AS SELECT whn-DATE '2001-01-01' whn, pages FROM webalizer;
In Oracle, the module function is MOD
, so you’d need to use that rather than
whn%7
:
SELECT MOD(whn,7), AVG(pages) FROM webalizer2 GROUP BY MOD(whn,7);
Report on Any Date Criteria
A report may depend on ranges of dates that can be tricky to calculate. Monthly totals are pretty straightforward; but how about current month, last month, and year to date?
To report performance indicators you need to generate values for specific time periods. Business analysts commonly are interested in the current month compared to the preceding month, or the corresponding period in the preceding year. You can do all of this in SQL.
In the examples that follow, the original data is in a table,
t
. This table records individual
incidents of paperclip usage. Every row contains the date (whn
) and the number of paperclips used
(v
):
mysql> SELECT * FROM t; +------------+------+ | whn | v | +------------+------+ | 2006-01-07 | 53 | | 2006-01-13 | 46 | | 2006-01-18 | 99 | | 2006-01-19 | 15 | | 2006-01-26 | 9 | ...
Monthly Totals
If you want to see monthly totals, you must include the year and the month
in the GROUP
BY
expression:
mysql>SELECT YEAR(whn), MONTH(whn), COUNT(v), SUM(v)
->FROM t
->GROUP BY YEAR(whn),MONTH(whn);
+-----------+------------+----------+--------+ | YEAR(whn) | MONTH(whn) | COUNT(v) | SUM(v) | +-----------+------------+----------+--------+ | 2006 | 1 | 7 | 348 | | 2006 | 2 | 5 | 329 | | 2006 | 3 | 10 | 585 | | 2006 | 4 | 8 | 293 | | 2006 | 5 | 7 | 413 | | 2006 | 6 | 8 | 465 | | 2006 | 7 | 6 | 206 | | 2006 | 8 | 9 | 456 | | 2006 | 9 | 4 | 217 | | 2006 | 10 | 10 | 401 | | 2006 | 11 | 9 | 540 | | 2006 | 12 | 7 | 402 | | 2007 | 1 | 2 | 139 | | 2007 | 2 | 13 | 800 | | 2007 | 3 | 14 | 674 | | 2007 | 4 | 6 | 456 | | 2007 | 5 | 4 | 171 | +-----------+------------+----------+--------+
Tip
In MySQL and PostgreSQL, you can implicitly cast a date to a string and you can use that to extract the year and month. For example:
SELECT SUBSTRING(whn,1,7), COUNT(v), SUM(v) GROUP BY SUBSTRING(whn,1,7)
You can combine the year and month into a single number if you want. If you multiply the year by 100 and add the month you can be certain that each month will be distinct and sortable. Also, the resulting number is human readable and is suitable for processing as a string; you can easily turn it back into a date [Hack #19]:
mysql>SELECT 100*YEAR(whn)+MONTH(whn), COUNT(v), SUM(v)
->FROM t
->GROUP BY 100*YEAR(whn)+MONTH(whn);
+--------------------------+----------+--------+ | 100*YEAR(whn)+MONTH(whn) | COUNT(v) | SUM(v) | +--------------------------+----------+--------+ | 200601 | 7 | 348 | | 200602 | 5 | 329 | | 200603 | 10 | 585 | | 200604 | 8 | 293 | | 200605 | 7 | 413 | | 200606 | 8 | 465 | | 200607 | 6 | 206 | | 200608 | 9 | 456 | | 200609 | 4 | 217 | | 200610 | 10 | 401 | | 200611 | 9 | 540 | | 200612 | 7 | 402 | | 200701 | 2 | 139 | | 200702 | 13 | 800 | | 200703 | 14 | 674 | | 200704 | 6 | 456 | | 200705 | 4 | 171 | +--------------------------+----------+--------+
Current Month
If you want to see the data for the current month you can
test both month and year in the WHERE
clause:
mysql>SELECT * FROM t
->WHERE MONTH(whn)=MONTH(CURRENT_DATE)
->AND YEAR(whn)=YEAR(CURRENT_DATE)
->ORDER BY whn;
+------------+------+ | whn | v | +------------+------+ | 2006-06-07 | 96 | | 2006-06-11 | 4 | | 2006-06-12 | 78 | | 2006-06-12 | 36 | | 2006-06-17 | 57 | | 2006-06-29 | 74 | | 2006-06-29 | 94 | | 2006-06-30 | 26 | +------------+------+
If you want to see the data for the preceding month do not
change MONTH(whn)=MONTH(CURRENT_DATE)
to MONTH(whn)=MONTH(CURRENT_DATE)-1
. If you do
that you will get data from the wrong year when you run this query in
January. Instead, you need to subtract one month from CURRENT_DATE
. It is a little neater if you
do the date calculation in a nested SELECT
:
mysql>SELECT * FROM t,
->(SELECT CURRENT_DATE - INTERVAL 1 MONTH lastMnth) p
->WHERE MONTH(whn)=MONTH(lastMnth)
->AND YEAR(whn)=YEAR(lastMnth);
+------------+------+------------+ | whn | v | lastMnth | +------------+------+------------+ | 2006-05-04 | 43 | 2006-05-23 | | 2006-05-06 | 55 | 2006-05-23 | | 2006-05-08 | 89 | 2006-05-23 | | 2006-05-15 | 87 | 2006-05-23 | | 2006-05-22 | 90 | 2006-05-23 | | 2006-05-29 | 22 | 2006-05-23 | | 2006-05-30 | 27 | 2006-05-23 | +------------+------+------------+
Year-to-Date Totals
To calculate year-to-date totals you must make sure that the year matches the current date and that the records occur on or before the current date:
mysql>SELECT COUNT(v), SUM(v) FROM t
->WHERE whn <= CURRENT_DATE
->AND YEAR(whn)=YEAR(CURRENT_DATE);
+----------+--------+ | COUNT(v) | SUM(v) | +----------+--------+ | 42 | 2239 | +----------+--------+
Fiscal year to date
Suppose you are reporting over a 365-day period, but your year does not start on January 1. This is the case with reports over a fiscal year or tax year.
For instance, say that your fiscal year starts on April 6. Calculating which dates are in the current fiscal year is rather complicated; the easiest thing to do is to work with the number of days between January 1 and April 6. You can get SQL to do the calculation as follows:
mysql> select DATEDIFF(DATE '2006-04-06',DATE '2006-01-01');
+-----------------------------------------------+
| DATEDIFF(DATE '2006-04-06',DATE '2006-01-01') |
+-----------------------------------------------+
| 95 |
+-----------------------------------------------+
In SQL Server, the DATEDIFF
function needs another parameter.
You use 'd'
to indicate that you
want the result as the number of days: DATEDIFF('d',
'2006-04-06','2006-01-01')
.
In Oracle, you can simply subtract dates to get the
number of days between them as an integer: DATE
'2006-04-06'
-
DATE
'2006-01-01'
.
Once you have this offset you can determine the relevant fiscal year by subtracting this from both the date to be tested and the current date. This means that you don’t have to worry about the different cases. In this example, March 29, 2006 is in fiscal year 2005, but April 20, 2006 is in fiscal year 2006:
mysql>SELECT whn,
->YEAR(whn - INTERVAL '95' DAY) whnFiscalYear,
->YEAR(CURRENT_DATE - INTERVAL '95' DAY) currentFiscalYear
->FROM t
->WHERE whn IN (DATE '2006-03-29', DATE '2006-04-20');
+------------+---------------+-------------------+ | whn | whnFiscalYear | currentFiscalYear | +------------+---------------+-------------------+ | 2006-03-29 | 2005 | 2006 | | 2006-04-20 | 2006 | 2006 | +------------+---------------+-------------------+
You can then use this as a condition to ensure that you are reporting on only the current fiscal year:
mysql>SELECT MIN(whn),MAX(whn), COUNT(v), SUM(v) FROM t
->WHERE whn <= CURRENT_DATE
->AND YEAR(whn - INTERVAL '95' DAY)=
->YEAR(CURRENT_DATE - INTERVAL '95' DAY);
+------------+------------+----------+--------+ | MIN(whn) | MAX(whn) | COUNT(v) | SUM(v) | +------------+------------+----------+--------+ | 2006-04-09 | 2006-06-17 | 28 | 1443 | +------------+------------+----------+--------+
The minimum and maximum relevant dates are included in the
output. This is a complicated expression and you might want to check
by hand that the MIN(whn)
value
shown matches the first record following 2006-04-06 and that the
MAX(whn)
value is the last record
to the current date.
In SQL Server, you can invoke the
DATEADD
function: DATEADD('d',
whn,
-95)
.
Perhaps your fiscal year is not a fixed number of days relative to January 1. In that case, you really have no alternative than to record the start-of-year dates in a table.
Suppose the taxYear
table
was created with the following format:
mysql> SELECT * FROM taxYear; +------------+ | strt | +------------+ | 2005-04-06 | | 2006-04-06 | | 2007-04-07 | +------------+
You can perform the same calculation as performed earlier:
mysql>SELECT MIN(whn), MAX(whn), COUNT(v), SUM(v)
->FROM t,
->(SELECT MAX(strt) txStrt FROM taxYear
->WHERE strt < CURRENT_DATE) tx
->WHERE whn >= txStrt AND whn <= CURRENT_DATE;
+------------+------------+----------+--------+ | MIN(whn) | MAX(whn) | COUNT(v) | SUM(v) | +------------+------------+----------+--------+ | 2006-04-09 | 2006-06-17 | 28 | 1443 | +------------+------------+----------+--------+
Generate Quarterly Reports
A quarterly report aggregates three months’ worth of figures. SQL has all the functions you need to get this aggregation.
Suppose you have figures that you need to report on by quarter. The source of your data is just a list of dates and values, as shown in Table 4-4.
In a quarterly report, you need to SUM
all the figures relating to January,
February, and March into Q1. So the first two rows of Table 4-4 contribute to
the Q1 total for 2005. The 2005-04-02 row occurred in April, so you
should add it to the Q2 total for 2005.
You can use the MONTH
function
to extract the month as a number, with January, February, and March
appearing as 1, 2, and 3.
If you also group on the year, each quarter of your input will correspond to exactly one cell in the output grid:
mysql>SELECT YEAR(whn) AS yr
->,SUM(CASE WHEN MONTH(whn) IN (1,2,3) THEN amount END) AS Q1
->,SUM(CASE WHEN MONTH(whn) IN (4,5,6) THEN amount END) AS Q2
->,SUM(CASE WHEN MONTH(whn) IN (7,8,9) THEN amount END) AS Q3
->,SUM(CASE WHEN MONTH(whn) IN (10,11,12) THEN amount END) AS Q4
->FROM sale
->GROUP BY YEAR(whn);
+------+------+------+------+------+ | yr | Q1 | Q2 | Q3 | Q4 | +------+------+------+------+------+ | 2005 | 10 | 40 | 80 | 660 | | 2006 | 30 | 20 | NULL | NULL | +------+------+------+------+------+
Unfortunately, the YEAR
and
MONTH
functions are not implemented
in Oracle. However, the SQL standard EXTRACT
function works just as well:
SQL>SELECT EXTRACT(YEAR FROM whn) AS yr
2,SUM(CASE WHEN EXTRACT(MONTH FROM whn) IN (1,2,3)
3THEN amount END) AS Q1
4,SUM(CASE WHEN EXTRACT(MONTH FROM whn) IN (4,5,6)
5THEN amount END) AS Q2
6,SUM(CASE WHEN EXTRACT(MONTH FROM whn) IN (7,8,9)
7THEN amount END) AS Q3
8,SUM(CASE WHEN EXTRACT(MONTH FROM whn) IN (10,11,12)
9THEN amount END) AS Q4
10FROM sale
11GROUP BY EXTRACT(YEAR FROM whn);
YR Q1 Q2 Q3 Q4 ---------- ---------- ---------- ---------- ---------- 2005 10 40 80 660 2006 30 20
Hacking the Hack
You may want to pivot the rows and columns of the report. In standard SQL, you have to apply some math:
mysql>SELECT FLOOR((EXTRACT(MONTH FROM whn)-1)/3)+1 AS Quarter
->,SUM(CASE WHEN EXTRACT(YEAR,whn)=2005 THEN amount END) AS Y2005
->,SUM(CASE WHEN EXTRACT(YEAR,whn)=2006 THEN amount END) AS Y2006
->FROM sale
->GROUP BY FLOOR((EXTRACT(MONTH FROM whn)-1)/3)+1;
+---------+-------+-------+ | Quarter | Y2005 | Y2006 | +---------+-------+-------+ | 1 | 10 | 30 | | 2 | 40 | 20 | | 3 | 80 | NULL | | 4 | 660 | NULL | +---------+-------+-------+
The expression FLOOR((MONTH(whn)-1)/3)+1
calculates the
quarter for the input date whn
. You
can see how it works if you look at the calculation one step at a
time:
mysql>SELECT whn, EXTRACT(MONTH FROM whn) 'Month',
->EXTRACT(MONTH FROM whn)-1 'Subtract 1',
->(EXTRACT(MONTH FROM whn)-1)/3 'Divide by 3',
->FLOOR((EXTRACT(MONTH FROM whn)-1)/3) 'Ignore Fraction',
->FLOOR((EXTRACT(MONTH FROM whn)-1)/3)+1 'Add 1'
->FROM sale
->WHERE YEAR(whn)=2006;
+------------+-------+------------+-------------+-----------------+-------+ | whn | Month | Subtract 1 | Divide by 3 | Ignore Fraction | Add 1 | +------------+-------+------------+-------------+-----------------+-------+ | 2006-01-01 | 1 | 0 | 0.0000 | 0 | 1 | | 2006-02-01 | 2 | 1 | 0.3333 | 0 | 1 | | 2006-03-01 | 3 | 2 | 0.6667 | 0 | 1 | | 2006-04-01 | 4 | 3 | 1.0000 | 1 | 2 | | 2006-05-01 | 5 | 4 | 1.3333 | 1 | 2 | +------------+-------+------------+-------------+-----------------+-------+
Each vendor has a function to extract the QUARTER
and the YEAR
from a date. In MySQL, these functions are QUARTER
and YEAR
:
mysql>SELECT QUARTER(whn)
->,SUM(CASE WHEN YEAR(whn)=2005 THEN amount END) AS Y2005
->,SUM(CASE WHEN YEAR(whn)=2006 THEN amount END) AS Y2006
->FROM sale
->GROUP BY QUARTER(whn);
+--------------+-------+-------+ | QUARTER(whn) | Y2005 | Y2006 | +--------------+-------+-------+ | 1 | 10 | 30 | | 2 | 40 | 20 | | 3 | 80 | NULL | | 4 | 660 | NULL | +--------------+-------+-------+
The trick is to GROUP
BY
the quarter and use the CASE
statement to extract only one year in
each column. There are some database-specific variations to keep in
mind:
Second Tuesday of the Month
You can find “floating” calendar dates, such as the second Tuesday of the month, with modular arithmetic and brute force reasoning.
The formula to calculate the second Tuesday of the month depends on the day of the week of the first day of the month. But which month? Obviously, all you need to know are the year and month, but it’s easier if you start with a date: the date of the first day of that month. For testing purposes, use the following table:
CREATE TABLE monthdates(monthdate DATE NOT NULL PRIMARY KEY); INSERT INTO monthdates(monthdate) VALUES (DATE '2007-04-01'); INSERT INTO monthdates(monthdate) VALUES (DATE '2007-05-01'); INSERT INTO monthdates(monthdate) VALUES (DATE '2007-06-01');
So, given a date, the first day of some month, what date is the second Tuesday of that month? The process for obtaining the solution begins by calculating the day of the week for the first day of the month.
Day-of-Week Function
Although standard SQL does not provide a function to give the day of the week for any date, most database systems do. Table 4-5 shows some of the functions that can accomplish this.
The essence of these functions is that they will return a number between 0 and 6, or between 1 and 7. Sometimes 0 (or 1) is Sunday and 6 (or 7) is Saturday, and sometimes 0 (or 1) is Monday and 6 (or 7) is Sunday.
Warning
Make sure you know how your database system is set up, because
some database systems have local settings that affect the weekday
number returned, such as NLS_TERRITORY
in Oracle and DATEFIRST
in SQL Server. MySQL offers
WEEKDAY(date)
, which returns 1
(Monday) through 7 (Sunday), as well as DAYOFWEEK(date)
, which returns 1 (Sunday)
through 7 (Saturday).
The following formula uses the range 1 (Sunday) through 7 (Saturday). If your database system has no easy way to produce this range, but you can produce some other similar range, then you can alter the formula easily once you understand how it works.
The Formula
Converting the first day of the month into the second Tuesday of the month simply involves manipulating the day of the week of the first day with an arithmetic formula. Before you see the formula, you should review what happens when the first day of the month falls on each day of the week, from Sunday through Saturday.
If the first day of the month is:
A Sunday, the third is a Tuesday, so the tenth is the second Tuesday.
A Monday, the second is a Tuesday, so the ninth is the second Tuesday.
A Tuesday, the eighth is the second Tuesday.
A Wednesday, the seventh is the next Tuesday, so the fourteenth is the second Tuesday.
A Thursday, the sixth is the next Tuesday, so the thirteenth is the second Tuesday.
A Friday, the fifth is the next Tuesday, so the twelfth is the second Tuesday.
A Saturday, the fourth is the next Tuesday, so the eleventh is the second Tuesday.
This exhausts all possibilities. So the challenge now is simply to reduce these facts into a formula. With the aid of an underappreciated technological methodology called brute force, you can verify the correctness of the following manipulation of the day of the week of the first day of the month, as shown in Table 4-6.
A1st | Bwkday | C10–B | DC mod 7 | ED+7 |
sun | 1 | 9 | 2 | 9 |
mon | 2 | 8 | 1 | 8 |
tue | 3 | 7 | 0 | 7 |
wed | 4 | 6 | 6 | 13 |
thu | 5 | 5 | 5 | 12 |
fri | 6 | 4 | 4 | 11 |
sat | 7 | 3 | 3 | 10 |
The first column (A
) is the
day of the week of the first day of the month, and the second column
is the numerical equivalent of this, using the range 1 (Sunday)
through 7 (Saturday).
The important data in Table 4-6 is in the last column, which is the number of days to add to the date of the first day of the month.
So in a nutshell, the formula is:
Find B, the day of the week of the first day of the month, using:
1=Sunday ... 7=Saturday. Subtract this number from 10 to get C:
With Sunday=1 ... Saturday=7, Tuesday would be 3. The number 3 – B is the offset (relative to the first of the month) for a Tuesday, but it might be in the current month or the previous month; 10 – B is also a Tuesday, and so are 17 – B and 24 – B.
You should choose to subtract from 10 because you want C to be positive for all inputs. This is because you need D to be positive in the next step, but a negative value for C would result in a negative value for D. This is because –1 % 7 gives –1 on most systems.
Divide by 7 and keep the remainder to get D.
D is also the offset for a Tuesday, and D is in the range 0 to 6. Every day in the first week has an offset between 0 and 6. So D is the first Tuesday of the month.
Add 7 to get E.
That takes the range of E from 7 to 13. Every day in the second week has an offset in the range 7–13.
Take the result and add that number of days to the date of the first day of the month.
In practical terms, to implement this formula you will need to use the specific date and arithmetic functions of your database system. Here are some examples.
MySQL
SELECT monthdate AS first_day_of_month , DATE_ADD(monthdate , INTERVAL ( ( 10 - DAYOFWEEK(monthdate) ) % 7 ) + 7 DAY ) AS second_tuesday_of_month FROM monthdates
Oracle
SELECT monthdate AS first_day_of_month , monthdate + MOD( ( 10 – TO_CHAR(monthdate,'d') ), 7 ) + 7 AS second_tuesday_of_month FROM monthdates
SQL Server
SELECT monthdate AS first_day_of_month , DATEADD(day , ( ( 10 - DATEPART(dw,monthdate) ) % 7 ) + 7 , monthdate ) AS second_tuesday_of_month FROM monthdates
PostgreSQL
PostgreSQL gives 0 for Sunday, so you must add 1. Also, the
output from EXTRACT
is a
floating-point number, so you must CAST
it before you attempt modular
arithmetic:
SELECT monthdate AS first_day_of_month , monthdate + ((10 - CAST(EXTRACT(dow FROM monthdate) + 1 AS INT)) % 7) + 7 AS second_tuesday_of_month FROM monthdates
Here are the results:
first_day_of_month second_tuesday_of_month 2007-04-01 2007-04-10 2007-05-01 2007-05-08 2007-06-01 2007-06-12
Hacking the Hack: The Last Thursday of the Month
You can use a similar technique to calculate the last Thursday of the month. Just find the first Thursday of next month and subtract seven days.
The formula for the offset for the first Thursday of the month
beginning with monthdate
is:
(12-DAYOFWEEK(monthdate) ) % 7
Subtract from 12 because Thursday is represented by 5 and 5 + 7 = 12.
The query to get the first day of next month is:
mysql>SELECT monthdate AS first_day_of_month
->,DATE_ADD(monthdate,INTERVAL 1 MONTH)
->AS first_day_of_next_month
->FROM monthdates;
+--------------------+-------------------------+ | first_day_of_month | first_day_of_next_month | +--------------------+-------------------------+ | 2007-04-01 | 2007-05-01 | | 2007-05-01 | 2007-06-01 | | 2007-06-01 | 2007-07-01 | +--------------------+-------------------------+
You can use this result to find the first Thursday of next month and subtract 7 to get the last Thursday of this month:
mysql>SELECT first_day_of_month
->,DATE_ADD(first_day_of_next_month
->,INTERVAL
->((12
- DAYOFWEEK(first_day_of_next_month)) % 7)
->- 7 DAY) AS last_thursday_of_month
->FROM
->(SELECT monthdate AS first_day_of_month
->,DATE_ADD(monthdate,INTERVAL 1 MONTH)
->AS first_day_of_next_month
->FROM monthdates) t;
+--------------------+------------------------+ | first_day_of_month | last_thursday_of_month | +--------------------+------------------------+ | 2007-04-01 | 2007-04-26 | | 2007-05-01 | 2007-05-31 | | 2007-06-01 | 2007-06-28 | +--------------------+------------------------+
Get SQL Hacks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.