Print History: March 2006 printing
The discussion starts off by asking the question, "Instead of storing the effective date from which the price applies, why not store the "end date" (e.g. the last date on which the current price prevails), identifying the time intervals by their upper bound or by their lower bound?"
This question is awkwardly structured, long and convoluted. It seems the author is suggesting we overhaul the example from the previous page, where the table is (article_id, effective_from_date, price). In other words, it seems the author is suggesting that "Instead of" the previous table, we use the following table: (article_id, end_date, price).
However, the final clause in this question introduces unnecessary convolutions. No matter what scheme we use, we need to identify time intervals by their upper bound AND (not "or") by their lower bound; that is the definition of a time interval. The real question the author wants to ask the reader, it seems, is whether an explicit end_date field and implicit effective_from_date field is superior to an explicit effective_from_date field and implicit end_date field.
To make matters more confusing: In the code example on the page, the author ends up not using an end_date field at all. Instead, the author copies and pastes the query from Page 19, and adds one extra predicate to the where clause. However, paragraph 5 on page 20 DOES mention an end_date field explicitly, and the formatting as well as underscores indicate it is a column name.
Furthermore, it is not clear the author's example code does what the text alleges. In fact, based on the line of questioning, it's incorrect, because the column name effective_from_date is being used. An end_date column name should be used.
Additionally, the query on page 19 is wrong, too. The effective_from_date column does not guarantee it is possible to have more than one effective_from_date greater than the sysdate. Without this uniqueness constraint on the column, the results of this query will vary due to run-time changes to the table that will not be guarded by the DBMS. Therefore, the query suggested on page 19 is not guaranteed to return the correct price for the author's question: "How are we going to identify the current value?"
Finally, the last paragraph on Page 20 misuses the word "denormalization"! The author suggests storing both effective_from_date and end_date is denormalization, but it is not. For instance, what happens if there is a period of time when the product was not sold? This is actually an important piece of information when data mining. Moreover, denormalization in the technically correct sense is not even a sensible option. What the DBA more than likely want to do in the scenario the author posed is use "precomputing" instead, caching the "current price" from the weak entity and storing it in the strong entity. This also eliminates a join every time somebody wants to know the current price, it eliminates the need for nested queries. However, this does not remove the requirement to store historical data and strategies for fast queries of time periods.
In my humble opinion, what the author should really be discussing is how to define a uniqueness constraint over a range of records in the table. However, the author does not do that, and ends up teaching the wrong approach pedagogically, misleading the reader with the wrong train of thought.
Overall, this is really confusing and I'm surprised it has not been reported as errata yet. This is a huge mistake that basically blots out the technical merit of the discussion in an important section of the book.