BUY THIS BOOK
Add to Cart

Print Book $29.95


Add to Cart

Print+PDF $38.94

Add to Cart

PDF $23.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £20.95

What is this?

Looking to Reprint or License this content?


Database in Depth
Database in Depth Relational Theory for Practitioners

By C.J. Date
Book Price: $29.95 USD
£20.95 GBP
PDF Price: $23.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter One: Introduction
Professionals in any discipline need to know the foundations of their field. So if you're a database professional, you need to know the relational model, because that model is the foundation (or a huge part of the foundation, anyway) of the database field in particular. Now, every course in database management, be it academic or commercial, does at least pay lip service to the idea of teaching the relational model—but most of that teaching seems to be done very badly, if results are anything to go by. The relational model certainly isn't very well or very widely understood in the database community at large. Here are some possible reasons for this state of affairs:
  • The model is taught in a vacuum. That is, for beginners at least, it's hard to see the relevance of the material, or it's hard to understand the problems it's meant to solve, or both.
  • The instructors themselves don't fully understand or appreciate the significance of the material.
  • (Most likely in practice.) The model as such isn't taught at all—the SQL language or some specific dialect of that language, such as the Oracle dialect, is taught instead.
So this book is aimed at database professionals, especially commercial database practitioners, who have had some exposure to the relational model but don't know as much about it as they ought to. It's definitely not meant for beginners; however, it isn't a refresher course, either. To be more specific: I'm sure you know something about SQL, but—and I apologize if my tone is somewhat offensive here—if your knowledge of the relational model derives only from your knowledge of SQL, then I'm afraid you won't know the relational model as well as you should, and you'll probably know "some things that ain't so."
SQL≠ the relational model!
Here by way of illustration are some relational issues that SQL isn't too clear on (to put it mildly):
  • What databases, relations, and tuples really are
  • The difference between relations and types
  • The difference between relation values and relation variables
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Remark on Terminology
In that list of relational issues in the foregoing section, you probably noticed right away that I used the formal terms relation, tuple, and attribute. SQL doesn't use these terms, of course—it uses the more "user-friendly" terms table, row, and column instead. And I'm generally sympathetic to the idea of more user-friendly terms if they can help make the ideas more palatable. In the case at hand, however, it seems to me that, regrettably, they don't make the ideas more palatable; instead, they distort them, and in fact do the cause of genuine understanding a grave disservice. The truth is, a relation is not a table, a tuple is not a row, and an attribute is not a column. And while it might be acceptable to pretend otherwise in informal contexts—indeed, I've done so myself, in many of my books and other writings—I would argue that it's acceptable only if we all understand that the more user-friendly terms are just an approximation to the truth and fail overall to capture the essence of what's really going on. To put it another way: if you do understand the true state of affairs, then judicious use of the user-friendly terms can be a good idea; but in order to learn and appreciate that true state of affairs in the first place, you really do need to come to grips with the more formal terms. In this book, therefore, I'll use those more formal terms most of the time—and of course I'll give precise definitions for them when we need them (mostly not in this first chapter, though, where I'm just trying to lay a certain amount of elementary groundwork).
And another point on terminology: having said that SQL tries to simplify one set of terms, I must now add that it does its best to complicate another. I refer to its use of the terms operator, function, procedure, routine, and method, all of which refer to essentially the same thing (with, perhaps, very minor differences). In this book I'll use the term
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Principles, Not Products
It's worth taking a few moments to examine the question of why, as I claimed earlier, you as a database professional need to know the relational model. The reason is that the relational model isn't product-specific; rather, it is concerned with principles. What do I mean by principles? Well, here's a definition (from Chambers Twentieth Century Dictionary):
principle: a source, root, origin: that which is fundamental: essential nature: theoretical basis: a fundamental truth on which others are founded or from which they spring
The point about principles is this: they endure. By contrast, products and technologies (and the SQL language, come to that) change all the time—but principles don't. For example, suppose you know Oracle; in fact, suppose you're an expert on Oracle. But if Oracle is all you know, then your knowledge is not necessarily transferable to, say, a DB2 or SQL Server environment (it might even get in the way of your making progress in that new environment). But if you know the underlying principles—in other words, if you know the relational model—then you have knowledge and skills that will be transferable: knowledge and skills that you'll be able to apply in every environment and that will never be obsolete.
In this book, therefore, we'll be concerned with principles, not products, and foundations, not fads. Of course, I realize that sometimes you do have to make compromises and trade-offs in the real world. For one example, sometimes you might have good pragmatic reasons for not designing the database in the theoretically optimal way (an issue I discuss in Chapter 7). For another, consider SQL once again. Although it's certainly possible to use SQL relationally (for the most part, at any rate), sometimes you'll find—because existing implementations are so far from perfect—that there are severe performance penalties for doing so . . . in which case you might more or less be forced into doing something not "truly relational" (like writing a query in some weird and unnatural way in order to get the implementation to use an index). However, I believe very firmly that you should always make such compromises and trade-offs from
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Review of the Original Model
You're a database professional, so you already have some familiarity with the relational model. The purpose of this section is to serve as a kickoff point for our subsequent discussions; it reviews some of the most basic aspects of that model as originally defined. Note the qualifier "as originally defined"! One widespread misconception about the relational model is that it's a totally static thing. It's not. It's like mathematics in that respect: mathematics too is not a static thing but changes over time. In fact, the relational model can itself be seen as a small branch of mathematics; as such, it evolves over time as new theorems are proved and new results discovered. What's more, those new contributions can be made by anyone who's competent to do so. Like mathematics again, the relational model, though originally invented by one man, has become a community effort and now belongs to the world.
By the way, in case you don't know, that one man was E. F. Codd, at the time a researcher at IBM. It was late in 1968 that Codd, a mathematician by training, first realized that the discipline of mathematics could be used to inject some solid principles and rigor into the field of database management, which was all too deficient in such qualities prior to that time. His original definition of the relational model appeared in an IBM Research Report in 1969, and I'll have a little more to say about that paper in Appendix B.
The original model had three major components—structure, integrity, and manipulation—and I'll briefly describe each in turn. Please note right away, however, that all of the "definitions" I'll be giving are very loose; I'll make them more precise as and when appropriate in later chapters.
First of all, then, structure. The principal structural feature is, of course, the relation itself, and as everybody knows it's common to picture relations as tables on paper (see Figure 1-1 for a self-explanatory example). Relations are defined over
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Model Versus Implementation
Before going any further, there's one very important point I need to explain, because it underpins everything else in this book. The relational model is, of course, a data model. Unfortunately, however, this latter term has two quite distinct meanings in the database world. The first and more fundamental meaning is this:
Definition: A data model (first sense) is an abstract, self-contained, logical definition of the data structures, data operators, and so forth, that together make up the abstract machine with which users interact.
This is the meaning we have in mind when we talk about the relational model in particular. And, armed with this definition, we can usefully (and importantly) go on to distinguish a data model in this first sense from its implementation , which can be defined as follows:
Definition: An implementation of a given data model is a physical realization on a real machine of the components of the abstract machine that together constitute that model.
I'll illustrate these definitions in terms of the relational model specifically. First, and obviously enough, the concept of relation is itself part of the model: users have to know what relations are, they have to know they're made up of tuples and attributes, they have to know how to interpret them, and so on. All that is part of the model. But they don't have to know how relations are physically stored on the disk, or how individual data values are physically encoded, or what indexes or other access paths exist; all that is part of the implementation, not part of the model.
Or consider the concept join: users have to know what a join is, they have to know how to invoke a join, they have to know what the result of a join looks like, and so on. Again, all that is part of the model. But they don't have to know how joins are physically implemented, or what expression transformations take place under the covers, or what indexes or other access paths are used, or what physical I/O's occur; all that is part of the implementation, not the model.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Properties of Relations
Now let's get back to an examination of basic relational concepts. In this section I want to focus on some specific properties of relations themselves. First of all, every relation has a heading and a body: the heading is a set of attributes (where an attribute is an attribute-name:type-name pair), and the body is a set of tuples that conform to that heading. In the case of the suppliers relation of Figure 1-3, for example, there are four attributes in the heading and five tuples in the body. Note, therefore, that a relation doesn't really contain tuples—it contains a body, and that body in turn contains the tuples—but we do usually talk as if relations contained tuples directly, for the sake of simplicity.
By the way, although it's strictly correct to say that the heading consists of attribute-name:type-name pairs, it's usual to omit the type names in pictures like Figure 1-3 and thereby pretend that the heading is a set of attribute names only. For example, the STATUS attribute does have a type (INTEGER, let's say), but I didn't show it in Figure 1-3. But you should never forget it's there!
Next, the number of attributes in the heading is the degree (sometimes the arity ), and the number of tuples in the body is the cardinality. For example, relations S, P, and SP in Figure 1-3 have degree 4, 5, and 3, respectively, and cardinality 5, 6, and 12, respectively.
Next, relations never contain duplicate tuples. This property follows because a body is a set of tuples, and sets in mathematics do not contain duplicate elements. By the way, SQL fails here: SQL tables are allowed to contain duplicate rows, as I'm sure you know, and SQL tables are thus not relations, in general. Please understand, therefore, that in this book I always use the term "relation" to mean a relation—without duplicate tuples, by definition—and not an SQL table. Please understand also that relational operations
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Relations Versus Relvars
Now, it's entirely possible that you already knew everything I've discussed in this chapter so far; in fact, I rather hope you did (though I also hope that doesn't mean you found the discussions boring). Anyway, now I come to something you might not know already. The fact is, historically there's been a lot of confusion between relations—I mean relations as such—on the one hand, and relation variables on the other.
Forget about databases and relations for a moment; instead, consider the following simple programming language example. Suppose I say in some arbitrary programming language:
    DECLARE N INTEGER ... ;
Then N here is not an integer. Rather, it's a variable whose values are integers as such (different integers at different times). Right? I'm sure we can agree on that. Well, in exactly the same way, if we say in SQL:
    CREATE TABLE T ... ;
then T is not a table; it's a table variable or (as I would prefer to say, ignoring various SQL quirks such as nulls and duplicate rows) a relation variable, whose values are relations as such (different relations at different times).
Consider Figure 1-3 once again. That figure shows three relation values: namely, those that happen to appear in the database at some particular time. But if we were to look at the database at some different time, we would probably see three different relation values appearing in their place. In other words, S, P, and SP in that database are really variables: relation variables, to be precise. For example, suppose the relation variable S currently has the value—the relation value—shown in Figure 1-3, and suppose we delete the tuples (actually there's only one) for suppliers in Athens:
    DELETE S WHERE CITY = 'Athens' ;
Here's the result:
Figure :
Conceptually, what's happened is that the old value of S has been replaced in its entirety by a new value. Of course, the old value (with five tuples) and the new one (with four) are very similar, but they certainly are different values. In fact, the DELETE just shown is logically equivalent to, and indeed shorthand for, the following relational assignment:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Values Versus Variables
The difference between relations and relvars is actually a special case of the difference between values and variables in general, and I'd like to take a few moments to look at the more general case. (It's a bit of a digression, but I think it's worth taking the time because clear thinking here can be a tremendous help in so many areas.) Here then are some definitions:
Definition: A value is an "individual constant": for example, the individual integer 3. A value has no location in time or space. However, values can be represented in memory by means of some encoding, and those representations do have locations in time and space. Indeed, distinct representations of the same value can appear at any number of distinct locations in time and space, meaning, loosely, that any number of different variables—see the definition below—can have the same value, at the same time or different times. Observe in particular that, by definition, a value can't be updated ; for if it could, then after such an update it would no longer be that value.
Definition: A variable is a holder for a representation of a value. A variable does have location in time and space. Also, of course, variables, unlike values, can be updated; that is, the current value of the variable can be replaced by another value.
Please note very carefully that it isn't just simple things like the integer 3 that are legitimate values. On the contrary, values can be arbitrarily complex; for example, a value might be a geometric point, or a polygon, or an X ray, or an XML document, or a fingerprint, or an array, or a stack, or a list, or a relation (and on and on). Analogous remarks apply to variables too, of course. I'll have more to say on these matters in the next two chapters.
Now, it might be hard to imagine people getting confused over a distinction as obvious as that between values and variables, but in fact it's all too easy to fall into traps in this area. By way of illustration, consider the following extract from a tutorial on object-oriented databases (the italicized portions in brackets are comments by myself):
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
For the most part, the aim of this preliminary chapter has been to tell you what I rather hope you knew already (and you might therefore have felt it was rather light on technical substance). Anyway, just to review briefly:
  • I explained why we'd be concerned in this book with principles, not products, and why I'd be using formal terminology such as relations, tuples, and attributes in place of their more "user-friendly" SQL counterparts.
  • I claimed that SQL and the relational model aren't the same thing. We've seen a few differences already—for example, the fact that SQL permits duplicate rows—and we'll see many more in later chapters.
  • I gave an overview of the original model, touching in particular on the following concepts: type, n-ary relation, tuple, attribute, candidate key, primary key, foreign key, entity integrity, referential integrity, relational assignment, and the relational algebra. With regard to the algebra, I mentioned closure and very briefly described the operators restrict, project, product, intersection, union, difference, join, and divide.
  • I discussed various properties of relations, introducing the terms heading, body, cardinality, and degree. Relations have no duplicate tuples, no tuple ordering top to bottom, and no attribute ordering left to right. I also discussed views.
  • I discussed the differences between model and implementation, relations and relvars, and values and variables in general. The model versus implementation discussion in particular led to a discussion of data independence.
One last point (I didn't mention this explicitly before, but I hope it's obvious from everything I did say): overall, the relational model is declarative, not procedural, in nature; that is, we favor declarative solutions over procedural ones, wherever such solutions are feasible. The reason is obvious:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Exercises
As noted in the preface, you certainly don't have to do any of the exercises, but I think it's a good idea to try at least some of them. Answers, often giving more information about the subject at hand, can be found online at http://oreilly.com/catalog/databaseid.

Section 1.9.1.1: Exercise 1-1

(Repeated from the body of the chapter, but slightly reworded here.) If you haven't done so already, go through the chapter again and identify all of the places where I used the term relation when I should by rights have used the term relvar instead (or as well).

Section 1.9.1.2: Exercise 1-2

Who was E. F. Codd?

Section 1.9.1.3: Exercise 1-3

What's a domain?

Section 1.9.1.4: Exercise 1-4

What do you understand by the term referential integrity?

Section 1.9.1.5: Exercise 1-5

The terms heading, body, attribute, tuple, cardinality, and degree, defined in the body of the chapter for relation values, can all be interpreted in the obvious way to apply to relvars as well. Make sure you understand this remark.

Section 1.9.1.6: Exercise 1-6

Distinguish between the two meanings of the term data model.

Section 1.9.1.7: Exercise 1-7

Explain the difference between model and implementation in your own words.

Section 1.9.1.8: Exercise 1-8

In the body of the chapter, I said that tables like those in Figures Figures 1-1 and 1-3 weren't relations as such but, rather, pictures of relations. What are some of the specific points of difference between such pictures and the corresponding relations?

Section 1.9.1.9: Exercise 1-9

Explain data independence in your own words.

Section 1.9.1.10: Exercise 1-10

(Try this exercise without looking back at the body of the chapter.) What relvars are contained in the suppliers-and-parts database? What attributes do they involve? What candidate and foreign keys exist? (The point of this exercise is that it's worth making yourself as familiar as possible with the structure, at least, of the running example. Of course, it's not so important to remember the actual data values in detail—though it wouldn't hurt if you did.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter Two: Relations Versus Types
The title of this chapter is Relations Versus Types, but most of it has to do with types. The point is, the relational model certainly requires a supporting type system, but it has very little to say about the nature of that system. Why does it require it? Because relations (and relvars) are defined in terms of types; that is, every attribute of every relation (and of every relvar) is defined to be of some type. For example, attribute STATUS of the suppliers relvar S might be of type INTEGER. If it is, then every relation s that's a possible value for relvar S must also have a STATUS attribute of type INTEGER—which means in turn that every tuple in such a relation s must have a STATUS value that's an integer.
I'll be discussing such matters in more detail later in this chapter. For now, let me just say that—with certain important exceptions, which I'll also be discussing later—relational attributes can be defined on any types whatsoever (implying among other things that those types can be as complex as we like, as we'll see later as well). In particular, such attributes can be defined on either system-defined (that is, built-in) or user-defined types. For our running example, I'll assume the attributes have types as follows (note that some of the attributes have the same name as the types they're defined on and others don't):
    Suppliers              Parts                 Shipments

    SNO    : SNO           PNO    : PNO          SNO : SNO
    SNAME  : NAME          PNAME  : NAME         PNO : PNO
    STATUS : INTEGER       COLOR  : COLOR        QTY : QTY
    CITY   : CHAR          WEIGHT : WEIGHT
                           CITY   : CHAR
I'll also assume, where it makes any difference, that types INTEGER (integers) and CHAR (character strings of arbitrary length) are system-defined and the others are user-defined.
By the way, SQL in particular does have a built-in type called INTEGER, as I'm sure you know. It also has a built-in type called CHAR, but (a) that type denotes fixed-length strings, not arbitrary-length ones, and (b) the length in question,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Domain-Constrained Comparisons
Everyone knows (or should know!) that, in the relational model, two values can be tested for equality only if they come from the same domain. In the case of suppliers and parts, for example, the following comparison—which might be part of a WHERE clause—is obviously valid:
    SP.SNO = S.SNO         /* OK     */
By contrast, the following one is not:
    SP.PNO = S.SNO         /* not OK */
The reason is that part numbers and supplier numbers are different kinds of things, and they correspond to different domains. So the general idea is that the database management system (DBMS) should reject any attempt to perform any relational operation—join, union, divide, or whatever—that calls, explicitly or implicitly, for a comparison between values from different domains. For example, here's an SQL query where the user is trying to find suppliers who supply no parts:
    SELECT S.SNO, S.SNAME, S.STATUS, S.CITY
    FROM   S
    WHERE  NOT EXISTS
         ( SELECT SP.PNO
           FROM   SP
           WHERE  SP.PNO = S.SNO )      /* not OK */
(There's no terminating semicolon because this is an expression, not a statement. See Exercise 2-24 at the end of the chapter.)
As the comment says, this query is not OK. The reason is that, in the last line, the user presumably meant to say WHERE SP.SNO = S.SNO, but by mistake—probably just a slip of the typing fingers—he or she said WHERE SP.PNO = S.SNO instead. And, given that we're indeed talking about a simple typo (probably), it would be a friendly act on the part of the DBMS to interrupt at this point, highlight the error, and ask if the user would like to correct it before proceeding.
Now, I don't know any commercial product that actually behaves in the way I've just suggested; in today's products, depending on how you've set up the database, either the query will simply fail or it'll give the wrong answer. Well . . . not exactly the wrong answer, perhaps, but the right answer to the wrong question. (Does that make you feel any better?)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Data Value Atomicity
I hope the previous section succeeded in convincing you that domains are indeed types, no more and no less. Now I want to turn to the issue of data value atomicity and the related notion of first normal form (1NF for short). In Chapter 1, I said that 1NF meant that every tuple in every relation contains just a single value (of the appropriate type, of course) in every attribute position—and it's usual to add that those "single values" are supposed to be atomic. But this latter requirement raises the obvious question: what does it mean for data to be atomic?
Well, on page 6 of the book mentioned earlier, Codd defines atomic data as data that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)." But even if we ignore that parenthetical exclusion, this definition is a trifle puzzling, and not very precise. For example, what about character strings? Are character strings atomic? Every product I know provides several operators on such strings—LIKE, SUBSTR (substring), "||" (concatenate), and so on—that clearly rely on the fact that character strings in general can be decomposed by the DBMS. So are those strings atomic? What do you think?
Here are some other examples of values whose atomicity is at least open to question and yet we would certainly want to be able to include as attribute values in tuples in relations:
  • Integers, which might be regarded as being decomposable into their prime factors (I know this isn't the kind of decomposability we usually consider in this context—I'm just trying to show that the notion of decomposability itself is open to a variety of interpretations)
  • Fixed-point numbers, which might be regarded as being decomposable into integer and fractional parts
  • Dates and times, which might be regarded as being decomposable into year/month/day and hour/minute second components, respectively
Now I'd like to move on to a potentially more startling example. Refer to Figure 2-1. Relation R1 in that figure is a reduced version of the shipments relation from our running example; it shows that certain suppliers supply certain parts, and it contains one tuple for each legitimate SNO-PNO combination. For the sake of the example, let's agree that supplier numbers and part numbers are indeed "atomic"; then we can presumably agree that R1, at least, is in 1NF.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
So What's a Type?
From this point forward I'll switch to the term type in preference to domain. What is a type, exactly? In essence, it's a named, finite set of value s—all possible values of some specific kind: for example, all possible integers, or all possible character strings, or all possible supplier numbers, or all possible XML documents, or all possible relations with a certain heading (and so on). Moreover:
  • Every value is of some type—in fact, of exactly one type, except possibly if type inheritance is supported, a concept that's beyond the scope of this book. Note, therefore, that types are disjoint or nonoverlapping. (To elaborate briefly: as one reviewer said, surely types WarmBloodedAnimal and FourLeggedAnimal overlap? Indeed they do; but what I'm saying is that if types overlap, then for a variety of reasons we're getting into the realm of type inheritance—in fact, into multiple inheritance. Since those reasons, and indeed the whole topic of inheritance, are independent of the context we're in, be it relational or something else, I'm not going to discuss them in this book.)
  • Every variable, every attribute, every operator that returns a result, and every parameter of every operator is declared to be of some type. And to say that (for example) variable V is declared to be of type T means, precisely, that every value v that can legally be assigned to V is itself of type T.
  • Every expression denotes some value and is therefore of some type: namely, the type of the value in question, which is to say the type of the value returned by the outermost operator in the expression (where by "outermost" I mean the operator that's executed last). For example, the type of the expression (A+B)*(X-Y) is the declared type of the operator "*", whatever that happens to be.
The fact that parameters in particular are declared to be of some type touches on an issue that I've mentioned but haven't properly discussed yet: namely, the fact that every type has an associated set of operators for operating on values and variables of the type in question. For example, integers have the usual arithmetic operators; dates and times have special calendar arithmetic operators; XML documents have what are called "XPath" operators; relations have the operators of the relational algebra; and
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Scalar Versus Nonscalar Types
It's usual to think of types as being either scalar or nonscalar. Loosely, a type is scalar if it has no user-visible components and nonscalar otherwise—and the values, variables, attributes, operators, parameters, and expressions of some type T are scalar or nonscalar according as type T itself is scalar or nonscalar. For example:
  • Type INTEGER is a scalar type; hence, values, variables, and so on of type INTEGER are also all scalar, meaning they have no user-visible components.
  • Tuple and relation types are nonscalar—the pertinent user-visible components being, of course, the corresponding attributes—and hence tuple and relation values, variables, and so on are also all nonscalar.
That said, I must now stress the point that these notions are quite informal. Indeed, we've already seen that the concept of atomicity has no absolute meaning, and "scalarness" is just the concept of atomicity by another name. Thus, the relational model nowhere formally relies on the scalar versus nonscalar distinction. In this book, however, I do rely on it informally; to be specific, I use the term scalar in connection with types that are neither tuple nor relation types, and the term nonscalar in connection with types that are either tuple or relation types.
Let's look at an example. Here's a Tutorial D definition for the base relvar S (suppliers):
    1 VAR S BASE
    2     RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR }
    3     KEY { SNO } ;
Explanation:
  • The keyword VAR in line 1 means this is a variable definition; the keyword BASE means the variable is a base relvar specifically.
  • Line 2 specifies the type of this variable. The keyword RELATION shows it's a relation type; the rest of the line specifies the set of attributes that make up the corresponding heading (where, as you'll recall from Chapter 1, an attribute is an attribute-name:type-name pair). The type is, of course, a nonscalar type. No significance attaches to the order in which the attributes are specified.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Summary
It's a very common misconception that the relational model deals only with rather simple types: numbers, strings, perhaps dates and times, and not much else. In this chapter, I've tried to show that this is indeed a misconception. Rather, relations can have attributes of any type whatsoever—the relational model nowhere prescribes what those types must be, and in fact they can be as complex as we like (except as noted in just a moment). In other words, the question as to what types are supported is orthogonal to the question of support for the relational model itself. Or (less precisely but more catchily): types are orthogonal to tables.
I also remind you that the foregoing state of affairs in no way violates the requirements of first normal form. First normal form just means that every tuple in every relation contains a single value, of the appropriate type, in every attribute position. Now that we know those types can be anything, we also know that all relations are in first normal form by definition.
Finally, I mentioned in the introduction to this chapter that there are certain important exceptions to the rule that relational attributes can be of any type whatsoever. In fact, there are two. The first—which I'll simplify just slightly for present purposes—is that if relation r is of type T, then no attribute of r can itself be of type T (think about it!). The second is that no relation in the database can have an attribute of any pointer type. As you probably know, prerelational databases were full of pointers, and access to such databases involved a lot of pointer-chasing: a fact that made application programming error-prone and direct end-user access impossible. Codd wanted to get away from such problems in his relational model, and of course he succeeded.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Exercises

Section 2.6.1.1: Exercise 2-1

What's a type? What's the difference between a domain and a type?

Section 2.6.1.2: Exercise 2-2

What do you understand by the term selector?

Section 2.6.1.3: Exercise 2-3

What's a THE_ operator?

Section 2.6.1.4: Exercise 2-4

Physical representations are always hidden from the user: true or false?

Section 2.6.1.5: Exercise 2-5

Elaborate on the following: argument versus parameter; database versus DBMS; generated type versus nongenerated type; scalar versus nonscalar; type versus representation; user-defined type versus system-defined type; user-defined operator versus system-defined operator.

Section 2.6.1.6: Exercise 2-6

What do you understand by the term coercion? Why is coercion a bad idea?

Section 2.6.1.7: Exercise 2-7

Why doesn't "domain check override" make sense?

Section 2.6.1.8: Exercise 2-8

What's a type generator?

Section 2.6.1.9: Exercise 2-9

Define first normal form.

Section 2.6.1.10: Exercise 2-10

Let X be an expression. What's the type of X? What's the significance of the fact that X is of some type?

Section 2.6.1.11: Exercise 2-11

Using the definition of the REFLECT operator in the body of the chapter as a pattern, define a Tutorial D operator that, given an integer, returns the cube of that integer.

Section 2.6.1.12: Exercise 2-12

Use Tutorial D to define an operator that, given a point with cartesian coordinates x and y, returns the point with cartesian coordinates f(x) and g(y), where f and g are predefined operators.

Section 2.6.1.13: Exercise 2-13

Give an example of a relation type. Distinguish between relation types, relation values, and relation variables.

Section 2.6.1.14: Exercise 2-14

Use SQL or Tutorial D (or both) to define relvars P and SP from the suppliers-and-parts database. If you give both SQL and Tutorial D definitions, identify as many differences between them as you can. What's the significance of the fact that relvar P (for example) is of a certain relation type?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter Three: Tuples and Relations
From the first two chapters you should have gained a pretty good understanding of what tuples and relations are, at least from an informal point of view. Now I want to define those concepts more precisely, and I want to explore some of the consequences of those precise definitions. Perhaps I should warn you that the definitions can look a little daunting, but that's not unusual with formal definitions; the tuple and relation concepts themselves are quite straightforward, once you've struggled through the formalism, and you should be ready to do that by now because the terminology, at least, should be reasonably familiar to you.
Is this a tuple?
Figure :
Well, no, of course it isn't—it's a picture of a tuple, not a tuple as such (and note that for once I've included the type names as well as the attribute names in that picture). As we saw in Chapter 1, there's a difference between a thing and a picture of a thing, and that difference can be very important. For example, tuples have no left-to-right ordering to their attributes, and so the following is an equally good (or bad?) picture of the very same tuple :
Figure :
Thus, while I'll certainly be making use of pictures like these in the sections to follow, please keep in mind that they're only pictures, and they can sometimes suggest some things that aren't true.
With that caveat out of the way, I can now say exactly what a tuple is:
Definition: Let T1, T2, . . . ,Tn (n ≥ 0) be type names, not necessarily all distinct. Associate with each Ti a distinct attribute name, Ai; each of the n attribute-name:type-name combinations that results is an attribute. Associate with each attribute a value vi of type Ti; each of the n attribute:value combinations that results is a component. The set of all n components thus defined, t say, is a tuple value (or just a tuple for short) over the attributes
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What's a Tuple?
Is this a tuple?
Figure :
Well, no, of course it isn't—it's a picture of a tuple, not a tuple as such (and note that for once I've included the type names as well as the attribute names in that picture). As we saw in Chapter 1, there's a difference between a thing and a picture of a thing, and that difference can be very important. For example, tuples have no left-to-right ordering to their attributes, and so the following is an equally good (or bad?) picture of the very same tuple :
Figure :
Thus, while I'll certainly be making use of pictures like these in the sections to follow, please keep in mind that they're only pictures, and they can sometimes suggest some things that aren't true.
With that caveat out of the way, I can now say exactly what a tuple is:
Definition: Let T1, T2, . . . ,Tn (n ≥ 0) be type names, not necessarily all distinct. Associate with each Ti a distinct attribute name, Ai; each of the n attribute-name:type-name combinations that results is an attribute. Associate with each attribute a value vi of type Ti; each of the n attribute:value combinations that results is a component. The set of all n components thus defined, t say, is a tuple value (or just a tuple for short) over the attributes A1, A2,..., An. The value n is the degree of t; a tuple of degree one is unary, a tuple of degree two is binary, a tuple of degree three is ternary,..., and more generally a tuple of degree n is n-ary. The set of all n attributes is the heading of t.
For example, with reference to either of the earlier pictures (of our usual tuple for supplier S1), we have:
Degree
Four. The heading is also said to have degree four.
Type names
SNO, NAME, INTEGER, and CHAR.
Corresponding attribute names
SNO, SNAME, STATUS, and CITY.
Corresponding attribute values
SNO('S1'), NAME('Smith'), 20, and 'London'.
I showed most of these attribute values in simplified form in the pictures (and I'll continue to make use of similar simplifications in such pictures throughout this book). However, it's strictly incorrect to say that, for example, the SNO value in the tuple we're talking about is just 'S1' or (even sloppier) just S1. A value of type SNO is a value of type SNO, not a value of type CHAR!—and the expression SNO('S1') is a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Some Important Consequences
Now I want to highlight some important consequences of the foregoing definitions. The first is: no tuple ever contains any nulls. The reason is that, by definition, every tuple contains a value (of the appropriate type) for each of its attributes, and we saw in Chapter 1 that nulls aren't values. Of course, if no tuple ever contains any nulls, then no relation does either, a fortiori; so right away we have at least a formal reason for rejecting the concept of nulls—but I'll give some more pragmatic reasons as well, in the section "Why Nulls Are Prohibited," later in this chapter.
The next consequence is:every subset of a tuple is a tuple and every subset of a heading is a heading. For example, given our usual tuple for supplier S1, what we might call "the {SNO,CITY} value" within that tuple is itself another tuple:
Figure :
Its heading is as indicated, and its type is:
    TUPLE { SNO SNO, CITY CHAR }
In the same way, the following is a tuple too (its type is TUPLE {SNO SNO}):
Figure :
So if we want to access the actual attribute value—SNO('S1') in the example—we have to extract it somehow from its containing tuple. Tutorial D uses syntax of the form SNO FROM t for this purpose (where t is any expression that denotes a tuple with an SNO attribute). SQL uses dot qualification: t.SNO.
We saw in Chapter 2 that a tuple t and a relation r that contains just that tuple t aren't the same thing. Analogously, a value v and a tuple t that contains just that value v aren't the same thing, either; in particular, they're of different types.
Now, I'm sure you know that the empty set is a subset of every set. It follows that a tuple with an empty heading, and therefore an empty set of components, is a valid tuple!—though it's a little hard to draw a picture of such a tuple on paper, and I'm not even going to try. A tuple with an empty heading has type TUPLE{} (it has no components); indeed, we sometimes refer to it explicitly as a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What's a Relation?
I'll use our usual suppliers relation as a basis for examples in this section. Here's a picture:
Figure :
And here's a definition:
Definition: Let {H} be a tuple heading and let t1, t2, . . . , tm (m ≥ 0) be distinct tuples with heading {H}. The combination, r say, of {H} and the set of tuples {t1, t2, . . . , tm} is a relation value (or just a relation for short) over the attributes A1, A2, . . . , An, where A1, A2, . . . , An are the attributes in {H}. The heading of r is {H}; r has the same attributes (and hence the same attribute names and types) and the same degree as that heading does. The body of r is the set of tuples {t1, t2, . . . , tm}. The value m is the cardinality of r.
I'll leave it as an exercise to interpret the suppliers relation in terms of the foregoing definition. However, I will at least explain why we call such things relations. Basically, each tuple in a relation represents an n-ary relationship, in the ordinary natural-language sense, among a set of n values (one value for each tuple attribute), and the full set of tuples in a given relation represents the full set of such relationships that happen to exist at some given time—and, mathematically speaking, that's a relation. Thus, the "explanation" often heard, to the effect that the relational model is so called because it lets us "relate one table to another," though accurate in a kind of secondary sense, really misses the basic point. The relational model is so called because it deals with certain abstractions that we can think of as "tables" but are known, formally, as relations in mathematics.
Now, a relation, like a tuple, is itself a value and has a type, and that type has a name. In Tutorial D, such names take the form RELATION{H}, where {H} is the heading. Here's an example:
    RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR }
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Further Important Consequences
Most of the properties of relations I talked about in Chapter 1 are direct consequences of the definitions in the previous section, but there are a couple I didn't call out explicitly before, and I want to elaborate on some of the others.
The first point I didn't mention before is that every subset of a body is a body (loosely, every subset of a relation is a relation). In particular, therefore, since the empty set is a subset of every set, a relation can have a body that consists of an empty set of tuples (and we call such a relation an empty relation ). For example, suppose there are no shipments right now. In this case, relvar SP will have as its current value the empty shipments relation, which we might draw like this:
Figure :
Note that, given any particular relation type, there's exactly one empty relation of that type—but empty relations of different types aren't the same thing, precisely because they're of different types. For example, the empty suppliers relation isn't equal to the empty parts relation; their bodies are equal but their headings aren't.
The second point I deliberately didn't mention before is this. Recall from Chapter 1 that, while a relation can be pictured as a table, a relation isn't a table. (To say it one more time, a picture of a thing is not the same as the thing.) Of course, it can be convenient to think of a relation as a table; after all, tables are "user-friendly"; indeed, it's the fact that we can think of relations, informally, as tables—sometimes more explicitly as flat or two-dimensional tables—that makes relational systems intuitively easy to understand and use, and makes it intuitively easy to reason about the way such systems behave. In other words, it's a very nice property of the relational model that its basic data structure, the relation, has such an intuitively attractive pictorial representation.
Unfortunately, many people seem to have been blinded by that attractive pictorial representation into thinking that
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Why Duplicate Tuples Are Prohibited
There are numerous practical arguments in support of the position that duplicate tuples ("duplicates" for short) should be prohibited. Here I have room for just one—but I think it's a powerful one. However, it does rely on certain notions I haven't discussed yet in this book, so I need to make a couple of preliminary assumptions:
  • I assume you know that relational DBMSs include a component called the optimizer , whose job is to try to figure out the best way to implement user queries and the like (where "best" basically means best-performing).
  • I assume you also know that one of the things optimizers do is what's sometimes called query rewrite . Query rewrite is the process of transforming some relational expression