BUY THIS BOOK
Add to Cart

Print Book $29.99


Safari Books Online

What is this?

Add to UK Cart

Print Book £17.50

What is this?

Looking to Reprint this content?


XPath and XPointer
XPath and XPointer Locating Content in XML Documents

By John E. Simpson
Price: $29.99 USD
£17.50 GBP

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introducing XPath and XPointer
The XPath and XPointer specifications promulgated by the World Wide Web Consortium (W3C) aim to simplify the location of XML-based content. With software based on those two specs, you're freed of much of the tedium of finding out if something useful is in a document, so you can simply enjoy the excitement of doing something with it.
Before getting specifically into the details of XPath or XPointer, though, you should have a handle on some concepts and other background the two specs have in common. Don't worry, the details — and there are enough, it seems, to fill a phone directory (or this book, at least) — are coming.
Detailed answers to the following questions are implicit throughout this book and explicit in a couple of spots:
Why should I care about XPath and XPointer? What do they even do?
To answer them briefly for now, consider even a simple XML document, such as this:
<house_pet_hazards>
   <hazard type="cleanup">
      <name>hairballs</name>
      <guilty_party species="cat">Dilly</guilty_party>
      <guilty_party species="cat">Nameless</guilty_party>
      <guilty_party species="cat">Katie</guilty_party>
   </hazard>
   <hazard type="cleanup">
      <name>miscellaneous post-ingestion surprises</name>
      <guilty_party species="cat">Dilly</guilty_party>
      <guilty_party species="cat">Katie</guilty_party>
      <guilty_party species="dog">Kianu</guilty_party>
      <guilty_party species="snake">Mephisto</guilty_party>
   </hazard>
   <hazard type="phys_jeopardy">
      <name>underfoot instability</name>
      <guilty_party species="cat">Dilly</guilty_party>
      <guilty_party species="snake">Mephisto</guilty_party>
   </hazard>
</house_pet_hazards>
Even so simple a document as this opens the door to dozens of potential questions, from the obvious ("Which pets have been guilty of tripping me up as I walked across the room?") to the non-obvious, even baroque ("Which species is most likely to cause a problem for me on a given day?" and "For hazards requiring cleanup, is there a correlation between the species and the number of letters in a given pet's name?"). For real-world XML applications — the ones inspiring you to research XPath/XPointer in the first place — the number of such practical questions might be in the thousands.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Why XPath and XPointer?
Detailed answers to the following questions are implicit throughout this book and explicit in a couple of spots:
Why should I care about XPath and XPointer? What do they even do?
To answer them briefly for now, consider even a simple XML document, such as this:
<house_pet_hazards>
   <hazard type="cleanup">
      <name>hairballs</name>
      <guilty_party species="cat">Dilly</guilty_party>
      <guilty_party species="cat">Nameless</guilty_party>
      <guilty_party species="cat">Katie</guilty_party>
   </hazard>
   <hazard type="cleanup">
      <name>miscellaneous post-ingestion surprises</name>
      <guilty_party species="cat">Dilly</guilty_party>
      <guilty_party species="cat">Katie</guilty_party>
      <guilty_party species="dog">Kianu</guilty_party>
      <guilty_party species="snake">Mephisto</guilty_party>
   </hazard>
   <hazard type="phys_jeopardy">
      <name>underfoot instability</name>
      <guilty_party species="cat">Dilly</guilty_party>
      <guilty_party species="snake">Mephisto</guilty_party>
   </hazard>
</house_pet_hazards>
Even so simple a document as this opens the door to dozens of potential questions, from the obvious ("Which pets have been guilty of tripping me up as I walked across the room?") to the non-obvious, even baroque ("Which species is most likely to cause a problem for me on a given day?" and "For hazards requiring cleanup, is there a correlation between the species and the number of letters in a given pet's name?"). For real-world XML applications — the ones inspiring you to research XPath/XPointer in the first place — the number of such practical questions might be in the thousands.
XPath provides you with a standard tool for locating the answers to real-world questions — answers contained in an XML document's content or hidden in its structure. For its part, XPointer (which in part is built on an XPath foundation) provides you with standard mechanisms for creating references to parts of XML documents and using them as addresses.
On a practical level, if you know and become comfortable with XPath, you'll have prepared yourself for easy use not only of XPointer but also of numerous other XML-related specifications, notably Extensible Stylesheet Language Transformations (XSLT) and XQuery. Knowing XPointer provides you with a key to a smaller castle (the XLink standard for advanced hyperlinking capabilities within or among portions of documents) but without that key the door is barred.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Antecedents/History
An interesting portion of many W3C specs is the list of non-normative (or simply "other") references at the end. After wading through all the dry prose whose overarching purpose is the removal of ambiguity (sometimes at the expense of clarity and terseness), in this section you get to peek into the minds and personalities of the specs' authors. (The "non-normative" says, in effect, that the resources listed here aren't required reading — although they may have profoundly affected the authors' own thinking about the subject.)
The XPath specification's "other references," without exception, are other formally published standards from the W3C or other (quasi-)official institutions. But XPath, as you will see, is a full-blown standard (the W3C refers to these as "recommendations"). XPointer is still a bit ragged around the edges at the time of this writing, and its non-normative references (Appendix A.2 of the XPointer xpointer( ) Scheme) are consequently more revealing of the background. This is especially useful, because there is some overlap in the membership of the W3C Working Groups (WGs) that produced XPointer and XPath.
Following is a brief look at a few of the most influential historical antecedents for XPath and XPointer.
The Document Style Semantics and Specification Language (DSSSL) was developed as a means of defining the presentation characteristics of SGML documents. Based syntactically on a programming language called Scheme, DSSSL does for SGML roughly what XSLT does for XML: it identifies for a DSSSL processor portions of the structure of an input document and how to behave once those portions are located.
Of particular interest in relation to this book's subject matter is DSSSL's core query language. This is the portion of a DSSSL instruction that locates content of a particular kind in an SGML document. For instance:
(element bottle
   [...instructions...])
tells the processor to follow the steps outlined in [...instructions...] for each occurrence of a bottle element in the source document. You can also navigate to various portions of the source document based on
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XPath, XPointer, and Other XML-Related Specs
It's highly unlikely, if you're at the point of wanting to learn about XPath and XPointer, that you'll be surprised by one ugly reality: everything in XML seems to hinge on everything else. For a brief period of a year or two, you could pick up a couple of general-purpose books on XML and learn everything you needed to know about the subject; that time is long gone.
So let's pretend that XML as a whole is represented graphically by one of Intermedia's global maps. It's a mess, isn't it? There's no way to figure it all out, even if by "it" you just mean that part of it relating to XPath and XPointer — or so it seems. But let's narrow the focus a bit, following the Intermedia Web view's local-map approach.
Let's start with XPath. Successfully getting your mind around XPath currently requires that you have some knowledge of XML itself (including such occasionally overlooked little dark corners as ID-type attributes and whitespace handling). It also requires that you "get" at least the rudiments of XML namespaces.
XPointer is a bit more complicated. First, it's built principally on an XPath foundation. While it's possible to use XPointer with no knowledge at all of XPath, the range of applications in which you can do so is quite limited.
Second, XPointers themselves are used by the XLink standard for linking from one XML resource to another (in the same or a different document). You can come to understand how to use XPointers quite completely without ever actually using them, and hence without any working knowledge of XLink; nonetheless, an elementary grasp of at least basic XLink terminology and syntax is necessary for true understanding.
Third, a couple of XML-related standards — XML Base and the XML Infoset — are referenced by the XPointer spec but don't require that you understand much about them to effectively use XPointer.
Finally, as you will see, an ability to use XPointer depends to a certain extent on a number of non-XML standards (particularly, Internet media types, URIs, and character encodings).
Don't panic; I'll cover what you need to know of these more-obscure standards when the need arises.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XPath and XPointer Versus XQuery
To get one other important question out of the way immediately: XPath and XPointer are not XQuery. The latter is a recent addition to the (rather crowded) gallery of the W3C's XML-related standards. Its purpose is to provide to XML-based data stores some (ideally all) of the advantages of Structured Query Language (SQL) in the relational-database world. In SQL, a very simple query might look something like this:
SELECT emp_id, emp_name
FROM emp_table
WHERE emp_id = "73519"
As you can see, this comprises a straightforward mix of SQL keywords (shown here in uppercase), the names of relational tables and fields, operators (such as the equals sign), and literal values (such as 73519). The result of "running" such a query is a list, in table form (that is, rows and columns), of data values.
The XQuery form of the above SQL query might look as follows (note in particular the relationship between the above WHERE clause and the boldfaced portion of the XQuery query):
{for $b in document("emp_table.xml")//employee[emp_id = "73519"]
   return
      <p>{ emp_id }{ emp_name }</p>
}
The result of "running" this query is a well-formed XML document or document fragment, such as:
<p>
   <emp_id>73519</emp_id>
   <emp_name>DeGaulle,Charles</emp_name>
</p>
XQuery is still wending its way through the sometimes-tortuous route prescribed for all W3C specifications; at the time of this writing, it's still a Working Draft, dated April 2002. A number of controversies swirl about it. First is that, while its equivalent of the SQL WHERE clause is based on XPath, it's not quite XPath as you will come to understand it. (The XPath-based portion of the above XQuery statement is in boldface.) Second, XQuery's approach to returning an XML result from an XML source conflicts with the approach taken by the XSLT spec for the same purpose. And third is the XQuery syntax itself, which though vaguely resembling XML, is not exactly XML. The "meaning" of an XQuery query is bound up not in elements and attributes but in special element text content delimited by curly braces (the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: XPath Basics
Chapter 1 provided sketchy information about using XPath. For the remainder of the book, you'll get details aplenty. In particular, this chapter covers the most fundamental building blocks of XPath. These are the "things" XPath syntax (covered in succeeding chapters) enables you to manipulate. Chief among these "things" are XPath expressions, the nodes and node-sets returned by those expressions, the context in which an expression is evaluated, and the so-called string-values returned by each type of node.
You'll learn much more about nodes in this chapter and the rest of the book. But before proceeding into even the most elementary details about using XPath, it's essential that you understand what, exactly, an XPath processor deals with.
Consider this fairly simple document:
<?xml-stylesheet type="text/xsl" href="battleinfo.xsl"?>
<battleinfo conflict="WW2">
   <name>Guadalcanal</name>
   <!-- Note: Add dates, units, key personnel -->
   <geog general="Pacific Theater">
      <islands>
         <name>Guadalcanal</name>
         <name>Savo Island</name>
         <name>Florida Islands</name>
      </islands>
   </geog>
</battleinfo>
As the knowledgeable human eye — or an XML parser — scans this document from start to finish, it encounters signals that what follows is an element, an attribute, a comment, a processing instruction (PI), whatever. These signals are of course the markup in the document, such as the start and end tags delimiting the elements.
XPath functions at a higher level of abstraction than this simple kind of lexical analysis, though. It doesn't know anything about a document's tags and thus can't communicate anything about them to a downstream application. What it knows about, and knows about intimately, are the nodes that make up the document: the discrete chunks of information encapsulated within and among the markup. Furthermore, it recognizes that these chunks of information bear a relationship to one another, a relationship imposed on them by their physical arrangement within the document. (such as the successively deeper nesting of elements within one another) Figure 2-1 illustrates this node-tree view of the above document as seen by XPath.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Node Tree: An Introduction
You'll learn much more about nodes in this chapter and the rest of the book. But before proceeding into even the most elementary details about using XPath, it's essential that you understand what, exactly, an XPath processor deals with.
Consider this fairly simple document:
<?xml-stylesheet type="text/xsl" href="battleinfo.xsl"?>
<battleinfo conflict="WW2">
   <name>Guadalcanal</name>
   <!-- Note: Add dates, units, key personnel -->
   <geog general="Pacific Theater">
      <islands>
         <name>Guadalcanal</name>
         <name>Savo Island</name>
         <name>Florida Islands</name>
      </islands>
   </geog>
</battleinfo>
As the knowledgeable human eye — or an XML parser — scans this document from start to finish, it encounters signals that what follows is an element, an attribute, a comment, a processing instruction (PI), whatever. These signals are of course the markup in the document, such as the start and end tags delimiting the elements.
XPath functions at a higher level of abstraction than this simple kind of lexical analysis, though. It doesn't know anything about a document's tags and thus can't communicate anything about them to a downstream application. What it knows about, and knows about intimately, are the nodes that make up the document: the discrete chunks of information encapsulated within and among the markup. Furthermore, it recognizes that these chunks of information bear a relationship to one another, a relationship imposed on them by their physical arrangement within the document. (such as the successively deeper nesting of elements within one another) Figure 2-1 illustrates this node-tree view of the above document as seen by XPath.
Figure 2-1: Above XML document represented as a tree of nodes
There a few things to note about the node tree depicted in Figure 2-1:
  • First, there's a hierarchical relationship among the different "things" that make up the tree. Of course, all the nodes are contained by the document itself (represented by the overall figure). Furthermore, many of the nodes have "offshoot" nodes. The
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XPath Expressions
If you've never worked with XPath before, you may be expecting its syntax to be XML-based. That's not the case, though. XPath is not an XML vocabulary in its own right. You can't submit "an XPath" to an XML parser — even a simple well-formedness checker — and expect it to pass muster. That's because "an XPath" is meant to be used as an attribute value.
Chapter 1 discussed why using XML syntax for general-purpose languages, such as XPath and XPointer, is impractical. As mentioned there, the chief reason might be summed up as: such languages are needed in the context of special-purpose languages, such as XSLT and XLink. Expressing the general-purpose language as XML would both make them extremely verbose and require the use of namespaces, complicating inordinately what is already complicated enough.
"An XPath" consists of one or more chunks of text, delimited by any of a number of special characters, assembled in any of various formal ways. Each chunk, as well as the assemblage as a whole, is called an XPath expression.
Here's a handful of examples, by no means comprehensive. (Don't fret; there are more detailed examples aplenty throughout the rest of the book.)
taxcut
Locates an element, in some relative context, whose name is "taxcut"
/
Locates the document root of some XML instance document
/taxcuts
Locates the root element of an XML instance document, only if that element's name is "taxcuts"
/taxcuts/taxcut
Locates all child elements of the root taxcuts element whose names are "taxcut"
2001
The number 2001
"2001"
The string "2001"
/taxcuts/taxcut[attribute::year="2001"]
Locates all child elements of the root taxcuts element, as long as those child elements are named "taxcut" and have a year attribute whose value is the string "2001"
/taxcuts/taxcut[@year="2001"]
Abbreviated form of the preceding
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XPath Data Types
A careful reading of the previous material about XPath expressions should reveal that XPath is capable of processing four data types: string, numeric, Boolean, and nodes (or node-sets).
The first three data types I'll address in this section. Nodes and node-sets are easily the most important single XPath data type, so I've relegated them to a complete section in their own right, following this one.
You can find two kinds of strings, explicit and implicit, in nearly any XPath expression. Explicit (or literal) strings, of course, are strings of characters delimited by quotation marks. Now, don't get confused here. As I've said, XPath expressions themselves appear as attribute values in XML documents. Therefore, an expression as a whole will be contained in quotation marks. Within that expression, any literal strings must be contained in embedded quotation marks. If the expression as a whole is contained in double quotation marks, ", then a string within it must be enclosed in single quotation marks or apostrophes: '. If you prefer to enclose attribute values in single quotes, the embedded string(s) must appear in double quotes.
This nesting of literal quotation marks and apostrophes — or vice versa — is unnecessary, strictly speaking. If you prefer, you can escape the literals using their entity representations. That is, the expressions "a string" and &quot;a string&quot; are functionally identical. The former is simply more convenient and legible.
For example, in XSLT stylesheets, one of the most common attributes is select, applied to the xsl:value-of element (which is empty) and others. The value of this attribute is an XPath expression. So you might see code such as the following:
<xsl:value-of select="fallacy[type='pathetic']"/>
If the string "pathetic" were not enclosed in quotation marks, of course, it would be considered a node name rather than a string. (This might make sense in some contexts, but even in those contexts, it would almost certainly produce quite different results from the quoted form.) Note that the kind of quotation marks used in this example alternates between single and double as the quoted matter is nested successively deeper.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Nodes and Node-Sets
The fourth and most important data type handled by XPath is the node-set data type.
Let's look first at nodes themselves. A node is any discrete logical something able to be located by an XPath location step. Every element in a document constitutes a node, as does every attribute, PI, and so on.
Each node in a document has various properties. I've discussed one of these properties briefly already — the string-value — and will provide more information about it at the end of this chapter. The others are its name, its sequence within the document, and its "family relationships" with other nodes.

Section 2.4.1.1: Node names

Most (but not all) nodes have names. To understand node names, you need to understand three terms:
qualified name
This term, almost always contracted to "QName," is taken straight from the W3C "Namespaces in XML" spec, at http://www.w3.org/TR/REC-xml-names. The QName of a node, in general, is the identifier for the node as it actually appears in an instance document, including any namespace prefix. For example, an element whose start tag is <concerto> has a QName of "concerto"; if the start tag were <mml:concerto>, the QName would be "mml:concerto."
local-name
The local-name of a node is its QName, sans any namespace prefix. If an element's QName is "mml:concerto," its local-name is "concerto." If there's no namespace in effect for a given node, its QName and local-name are identical.
expanded-name
If the node is associated with a particular namespace, its expanded-name is a pair, consisting of the URI associated with that namespace and the local-name. Because the expanded-name doesn't consider the namespace prefix at all, two elements, for example, can have the same expanded-name even if their QNames are different, as long as both their associated namespace URIs (possibly null) and their local-names are identical. For more information, see Expanded but Elusive later in this chapter.
These three kinds of name conform to common sense in most cases, for most nodes, but can be surprising in others. When covering node types, below, I'll tell you how to determine the name of a node of a given type.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Node-Set Context
It's hard to imagine a node in an XML document that exists in isolation, devoid of any context. First, as I've already mentioned, nodes have relationships with other nodes in the document — both document-order and "family" relationships. Maybe more importantly, but also more subtly, nodes in the node-set returned by a location path also have properties relative to the other nodes in that node-set — even when document order is irrelevant and family relationships, nonexistent. These are the properties of context size, context position, and namespace bindings.
Consider the following XML document:
<ChangeInMyPocket>
   <Quarters quantity="1"/>
   <Dimes quantity="1"/>
   <Nickels quantity="1"/>
   <Pennies quantity="3"/>
   <!-- No vending-machine purchase in my immediate future -->
</ChangeInMyPocket>
It's possible, in a single location path, to locate only the four quantity attributes (or any three, two, or one of them) and the comment; or just the root node and the comment; or just the Quarters element, the Pennies element, and the quantity attribute of the Nickels element. The nodes in the resulting node-set need not share any significant formal relationship in the context of the document itself. But in all cases, these nodes suddenly acquire relationships to others in a given node-set, simply by virtue of their membership in that node-set.
The context size is simply the number of nodes in the node-set, irrespective of the type of nodes. A location path that returned a node-set of all element nodes in the above document would have a context size of 5 (one for each of the ChangeInMyPocket, Quarters, Nickels, Dimes, and Pennies elements). A location path returning all the quantity attributes and the comment would also have a context size of 5.
The context position is different for every member of the node-set: it's the integer representing the ordinal position that a given node holds in the node-set, relative to all other nodes in it, in document order. If a node-set consists of all child elements of the ChangeInMyPocket element, the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
String-Values
By definition, a well-formed XML document is a text document, incapable of containing such "binary" content as multimedia files and images. Thus, it stands to reason that in navigating XML documents via XPath the strings of text that make up the bulk of the document (aside from the element names themselves) would be of supreme importance. This notion is codified in the concept of string-values. And the importance of string-values lies in the fact that most of the time, when you locate a node in a document via XPath, what you're after is not the node per se but rather its string-value.
Each node returned by a location path has its own string-value. The string-value of a node depends on the node type, as summarized in Table 2-1. Note that the word "normalized" used to describe the string-value for the attribute node type is the same as elsewhere in the markup world: it means stripped of extraneous whitespace, by trimming leading and trailing whitespace and collapsing multiple consecutive occurrences of whitespace into a single space. For example, given an attribute such as region=" NW SE" (note leading blank spaces and multiple spaces between the "NW" and "SE"), its normalized value would be "NW SE". Also note, though, that this normalization depends on the attribute's type, as declared in a DTD or schema. If the attribute type is CDATA, those interior blank spaces would be assumed to be significant and not normalized. Therefore, if the region attribute is (say) of type NMTOKENS, the interior whitespace is collapsed; if it's CDATA, the whitespace remains.
Table 2-1: String-values, by node type
Node type
String-value
Root
Concatenated value of all text nodes in the document
Element
Concatenated value of all text nodes within the scope of the element's start and end tags, including the text nodes contained by any descendant elements
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Location Steps and Paths
In Chapter 2, I covered the kinds of content XPath is capable of locating: essentially, any content at all in an XML document. Now it's time to take a look at how exactly you locate it — a look, in short, at XPath syntax.
As earlier chapters (notably Chapter 1) have explained, knowing XML's own syntax does not prepare you for knowing XPath syntax. Unlike the languages that make use of XPath, XPath itself is not an XML vocabulary. A given "XPath" doesn't contain all the characteristic left and right angle brackets, ampersands, and other hallmarks of XML syntax dear (or not) to your heart from your other XML work.
Instead, units of XPath meaning, called expressions, are typically used in attribute values. Thus you'll be creating and using XML code that uses these expressions in ways such as:
<xsl:value-of select="expression"/>
and:
<a xlink:href="xpointer(expression)">Table of Contents</a>
Sometimes, when you see the term XPath expression, what's being referred to is simply a speck of meaning — a subatomic particle, as it were, that has a sort of abstract academic interest but little practical value by itself. This sort of expression is a string or numeric value. For instance, both of the following are valid XPath expressions in this limited sense:
"I should have been a pair of ragged claws"
119.725
In the real world of XPath, though, such literal expressions are pretty pointless. If you locate the literal string "I should have been a pair of ragged claws," you simply locate that string — outside the context of an XML document or, for that matter, devoid of any context at all. XPath expressions are meant primarily to locate content in context. The most familiar real-world analogy for the syntax to accomplish this is a computer's filesystem or a web server's directory structure.
Although I probably sounded scornful just now of literal-valued XPath expressions, don't write them off. The ability to "find" a literal value (instead of a chunk of content in the source document) is actually quite useful. You'll see many examples later in this chapter, particularly in the section on the predicate portion of an XPath expression. There, you'll learn how to locate a particular node (represented by a location path) when its value equals, say, some particular literal value. There's no way to represent the righthand side of this equation other than with a literal XPath expression. The point is merely that locating the literal value itself is absurd.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XPath Expressions
As earlier chapters (notably Chapter 1) have explained, knowing XML's own syntax does not prepare you for knowing XPath syntax. Unlike the languages that make use of XPath, XPath itself is not an XML vocabulary. A given "XPath" doesn't contain all the characteristic left and right angle brackets, ampersands, and other hallmarks of XML syntax dear (or not) to your heart from your other XML work.
Instead, units of XPath meaning, called expressions, are typically used in attribute values. Thus you'll be creating and using XML code that uses these expressions in ways such as:
<xsl:value-of select="expression"/>
and:
<a xlink:href="xpointer(expression)">Table of Contents</a>
Sometimes, when you see the term XPath expression, what's being referred to is simply a speck of meaning — a subatomic particle, as it were, that has a sort of abstract academic interest but little practical value by itself. This sort of expression is a string or numeric value. For instance, both of the following are valid XPath expressions in this limited sense:
"I should have been a pair of ragged claws"
119.725
In the real world of XPath, though, such literal expressions are pretty pointless. If you locate the literal string "I should have been a pair of ragged claws," you simply locate that string — outside the context of an XML document or, for that matter, devoid of any context at all. XPath expressions are meant primarily to locate content in context. The most familiar real-world analogy for the syntax to accomplish this is a computer's filesystem or a web server's directory structure.
Although I probably sounded scornful just now of literal-valued XPath expressions, don't write them off. The ability to "find" a literal value (instead of a chunk of content in the source document) is actually quite useful. You'll see many examples later in this chapter, particularly in the section on the predicate portion of an XPath expression. There, you'll learn how to locate a particular node (represented by a location path) when its value equals, say, some particular literal value. There's no way to represent the righthand side of this equation other than with a literal XPath expression. The point is merely that locating the literal value itself is absurd.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Location Paths
"Understanding" a location path and how to code it requires no great intellectual leap. If you know how to walk a filesystem directory tree, separating each level in the navigation from the others with slashes, you already grasp the rudiments of location paths. Still, you need to keep a few points in mind.
Chapter 2 discussed context, particularly the notion that each node in a given node-set shares with all the others a context size, and has its own context position within that size — the "Node X of Y" notion.
More subtly, using a multilevel location path imposes a successively finer sieve of context on each level in the path. Consider:
/customers/customer/invoice/item/quantity
As you move to the right in this location path, you're not only "walking down" into the document's nether regions, you're also almost certainly excluding from consideration various portions of the document not of interest at the moment. That is, each level in the location path implicitly changes the context node in terms of which levels to the right will be evaluated. Figure 3-1 illustrates this process.
Figure 3-1: Filtering content via successive steps in a location path
The full location path can be decomposed into five location steps, each separated from the others by slashes; each step narrows the view already established by those that preceded it. Step 1 limits the selection to the root customers element, and step 2, to the customer elements that are children of that root element.
So far, there's been no filtering at all occurring; every element down to this level in this sample document is still visible. In step 3, though, something interesting happens: the location path selects the invoice children of each customer element. The first and second customer elements have two and one such children, respectively; the third customer element has no invoice children, and as a result this customer effectively drops out of consideration as a match for further location steps. For purposes of content retrieved by this location path, in other words,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Location Steps
Location paths are interesting on a grand, macroscopic level. But (at least to my way of thinking) they're essentially unsophisticated, blunt instruments for extracting content from an XML document. All the real action in XPath is found between the slashes in a full location path — in the location steps.
The location steps you've seen so far in this chapter have been extremely simple. They've walked you down into the given XML source document by way of the tree of element nodes, and element nodes only, and only those element nodes with specific names. Easy enough to understand, perhaps (not to deny the value of understandability!), and also arguably the most common sort of location step, but not particularly eye opening. In fact, these elementary location steps have simply taken advantage of various default values and shortcuts for parts of the full location step syntax: the axis, the node test, and the predicate. This syntax is:
axis::nodetest[predicate]
As you will see later in this chapter, it's possible for a location step to include multiple predicates — one after the other or even nested.
Of the three components that a location step may contain, only the node test is required. If you omit the axis, you also omit the double colon (::) that delimits it from the node test. If you omit the predicate, you also omit the square brackets ([ and ]) that enclose it.
Before getting into the details of these three pieces of a location step, let's take a look at their general purposes.
A common misconception about microscopes, magnifying glasses, telescopes, and binoculars is that they enlarge the image presented to our eyes from some object or other in the real world. Actually, they narrow the field of vision (assuming you're looking into them the right way); the image presented to our eyes always stays the same size. Armed with this information, take a look at Figure 3-2.
Figure 3-2: Narrowing the field of vision: "seeing" just boats with sails in a particular direction
Here, you're standing on the rock at the end of a jetty projecting out into a bay, binoculars held to your eyes. Everything outside the field of vision doesn't "exist" for you as long as you're looking through the lenses: the boats on the water behind you and to either side, the colony of seals on the rocks. You have, in effect, no peripheral vision.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Compound Location Paths Revisited
Early in this chapter, I mentioned that you could join together multiple location paths into a single one, using the "pipe" character, |. In that section, you saw this example:
/customers/customer/invoice | /customers/customer/cust_info
For any given location path, you could say that any given location step shifts the context in which succeeding location steps are evaluated. Thus, for the first location path in this compound location path, the context is narrowed first to the root customers element (thereby excluding from consideration any content in the document that precedes or follows the root element), then to customer children of the root element, and finally to invoice children of those customer children.
What, you might reasonably wonder, happens to the context in succeeding location paths of a compound location path? Every constituent location path is considered independently — just as if it were the only location path. Obviously, if the location path is absolute (as in the example from early in the chapter just repeated here), its context node is immaterial. If the location path is relative, it is evaluated relative to whatever the context node is at that point, disregarding any shifts in context effected by preceding portions of the compound location path. Consider this example:
invoice | cust_info
The first location path selects all invoice children of the context node for the compound location path as a whole. Likewise, the second selects all cust_info children of the context node for the compound location path as a whole — not all cust_info children of the invoice elements selected by the first location path. The results of the two selections are simply unioned together into a single node-set.
Along these lines, also note that each constituent location path may employ "stacked" predicates (as discussed earlier in the chapter), compound predicates, or any other location path variations. Predicates used in location path A have no effect on those in location path B and vice versa.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: XPath Functions and Numeric Operators
The XPath 1.0 Recommendation specifies a number of functions and numeric operations that can be used to refine the results returned by an XPath expression.
Before getting into the details of these features' uses, let's take a look at a fundamental question: what are functions in the first place? (If you're already familiar with the use of functions in programming languages, such as Java, C++, and Visual Basic, feel free to skip this section.)
When I was a kid, I loved watching my father work on cars. He'd been a mechanic all his life, and the automotive toolkit he'd acquired over the course of the years was exotic (to my eyes, anyhow).
One of the smaller items in Dad's toolkit was something he called a "spark-plug gapper." It was something like a Swiss-Army knife, with a half-dozen or so stiff steel prongs that you could swivel out from the tool's main body. Each L-shaped prong was of a slightly different thickness; depending on the model of car you were working on and the specific spark plug's specifications, you'd tap the end of the spark plug on the pavement and, using the gapper, ensure that the distance across which the spark was to jump was just right. There was also a small, stiff plane of sheet metal attached to the gapper, which you could use to spread the gap if you'd already closed it up too much. The objective was the get the gap just right, to ensure that the spark plug fired in just exactly the right way.
A function in computer-language terms is like a spark-plug gapper. It's a tool provided by a software developer. You use the tool in the same general way for a given task, whenever you need to obtain some result you can't obtain (or obtain easily) without the tool.
Almost without exception, regardless of the computer language in question, functions are represented syntactically the same way:
function_name(arg1, ...)
Each function (like each tool in a mechanic's toolbox) has a distinct name. Depending on the function, you may pass one or more arguments to it, which change its behavior in various ways. The arguments are enclosed in parentheses. Thus, the spark-plug gapper might be represented like this:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction to Functions
When I was a kid, I loved watching my father work on cars. He'd been a mechanic all his life, and the automotive toolkit he'd acquired over the course of the years was exotic (to my eyes, anyhow).
One of the smaller items in Dad's toolkit was something he called a "spark-plug gapper." It was something like a Swiss-Army knife, with a half-dozen or so stiff steel prongs that you could swivel out from the tool's main body. Each L-shaped prong was of a slightly different thickness; depending on the model of car you were working on and the specific spark plug's specifications, you'd tap the end of the spark plug on the pavement and, using the gapper, ensure that the distance across which the spark was to jump was just right. There was also a small, stiff plane of sheet metal attached to the gapper, which you could use to spread the gap if you'd already closed it up too much. The objective was the get the gap just right, to ensure that the spark plug fired in just exactly the right way.
A function in computer-language terms is like a spark-plug gapper. It's a tool provided by a software developer. You use the tool in the same general way for a given task, whenever you need to obtain some result you can't obtain (or obtain easily) without the tool.
Almost without exception, regardless of the computer language in question, functions are represented syntactically the same way:
function_name(arg1, ...)
Each function (like each tool in a mechanic's toolbox) has a distinct name. Depending on the function, you may pass one or more arguments to it, which change its behavior in various ways. The arguments are enclosed in parentheses. Thus, the spark-plug gapper might be represented like this:
gapper(prong)
where gapper is the function name and prong, a single argument provided (or "passed") to the function. Under many circumstances, you wouldn't pass a function like this the literal token p, r, o, n, g; rather, this is just a placeholder, a reminder to you of what you do pass to it. In this form, the function syntax is called a prototype. When you actually use (or "call" or "invoke") a function, you typically substitute a literal value for each argument. So an actual call to our hypothetical
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XPath Function Types
The functions available for use in XPath expressions are categorized by the type of value they return and/or by the type of values they operate on as arguments. These categories are node-set, string, Boolean, and numeric functions.
In each of the function prototypes in this section, I'll use the following scheme to denote the kind of arguments passed:
string
Argument is a string value, to be enclosed in quotation marks in the function call. If a function call takes more than one string argument, I'll append a number to each, as in string1, string2, and so on.
nodeset
Argument is a node-set, represented by an XPath location path. Note that if you're using XPath in an XSLT stylesheet, this location path will (if it's a relative path) be sensitive to the context established by the stylesheet at that point. Whether you're using XPath in XSLT or an XPointer, earlier portions of a complete location path can of course establish a context for node-set references in later portions.
boolean
Argument has a Boolean value of true or false.
number
Argument has a numeric value. If a function call takes more than one numeric argument, I'll append a number to each, as in number1, number2, and so on.
anytype
Argument can be any of several types. For instance, you can pass certain functions a string or a numeric argument, and the function will handle any necessary data-type conversion.
?
A question mark appended to one of the above data types means the argument is optional. For instance, a call to a hypothetical my_func( ) function might come with a prototype such as my_func(string?). This would mean that when you call
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XPath Numeric Operators
XPath includes a set of numeric operators for performing basic arithmetic operations. Don't go looking for net-present-value or square-root operators; they don't exist. But if you simply need to add, subtract, multiply, divide, or find a remainder of two numeric values, here's your answer. Table 4-5 summarizes these numeric operators.
Table 4-5: XPath numeric operators
Operator
Description
Example
+
Adds two values
(//weight)[1] + (//weight)[2]
-
Subtracts one value from another
(//weight)[3] - (//weight)[1]
*
Multiplies one value times another
(//weight)[3] * 5
div
Divides one value by another
(//weight)[3] div 1016.0469
mod
Returns the remainder after dividing one value by another
(//weight)[3] mod 1016.0469
Most of these are straightforward, not requiring any further explanation; however, both the div and mod operators could use bit more explanation.
Why use a special div operator at all? Why not just use the more familiar forward slash character, /, to divide one value by another?
The answer is that a slash in an XPath expression is already freighted with meaning: it operates as a delimiter between location steps. (A good analogy, in XML terms, might be the required use of entity references, such as
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: XPath in Action
Taken on its own terms, as a teaching tool, XPath might not seem to meet the test for a practical standard: it's useful only in the context of some other standard. How do you demonstrate something like XPath without requiring the novice to learn that other standard as well? Luckily, several tools have emerged to simplify this task. These tools allow you to enter and modify an XPath expression — typically, a full location path — returning to you in some highlighted form a selected portion of a target document. (The portion in question might or might not be contiguous, of course, depending on how exotic the location path is.) In this chapter, I'll demonstrate XPath using a tool called XPath Visualiser, developed by Dmitre Novatchev.
XPath Visualiser can be downloaded from the VBXML site, at http://www.vbxml.com.
XPath Visualiser runs under Microsoft Windows, from Windows 95 on up, and is built on top of the Microsoft MSXML XML/XSLT processor included with the Internet Explorer browser. This operating environment for the tool implies some advantages and disadvantages to its use.
An important practical advantage of this tool is that the results are visual. As we go through the examples in this chapter, you'll be able instantly to see the effects — subtle or grand — of changes in XPath expressions. (You don't even need to use Windows, let alone XPath Visualiser itself, because all these effects are captured in screen shots for you.) Trying to explain verbally what an XPath expression "does" is a convenient way to extend a book's length, but it's not simple, and it's prone to misinterpretation. (A picture of an XPath expression is worth a thousand words of description.)
Next, because XPath Visualiser uses a current version of the MSXML processor, its "understanding" of the XPath Recommendation is complete. If an expression is legal under the terms of that standard, you can illustrate it with XPath Visualiser.
Interestingly, though, a significant disadvantage of using XPath Visualiser is also that it's based on MSXML. That's because MSXML supports not only the current versions of XPath and XSLT, but also an early version of XSLT (called plain-old XSL). I described this early version in Chapter 1 and Chapter 2. Among the differences in this "backward-compatible" XSL processor is that it included numerous Microsoft-only capabilities; for example, you could use their version of what became XPath to select a valid document's document type declaration. (Note that this isn't a problem with XPath Visualiser itself, which deals only with true-blue XPath; it may be something to consider if you're planning to use MSXML for other purposes of your own.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XPath Visualiser: Some Background
XPath Visualiser runs under Microsoft Windows, from Windows 95 on up, and is built on top of the Microsoft MSXML XML/XSLT processor included with the Internet Explorer browser. This operating environment for the tool implies some advantages and disadvantages to its use.
An important practical advantage of this tool is that the results are visual. As we go through the examples in this chapter, you'll be able instantly to see the effects — subtle or grand — of changes in XPath expressions. (You don't even need to use Windows, let alone XPath Visualiser itself, because all these effects are captured in screen shots for you.) Trying to explain verbally what an XPath expression "does" is a convenient way to extend a book's length, but it's not simple, and it's prone to misinterpretation. (A picture of an XPath expression is worth a thousand words of description.)
Next, because XPath Visualiser uses a current version of the MSXML processor, its "understanding" of the XPath Recommendation is complete. If an expression is legal under the terms of that standard, you can illustrate it with XPath Visualiser.
Interestingly, though, a significant disadvantage of using XPath Visualiser is also that it's based on MSXML. That's because MSXML supports not only the current versions of XPath and XSLT, but also an early version of XSLT (called plain-old XSL). I described this early version in Chapter 1 and Chapter 2. Among the differences in this "backward-compatible" XSL processor is that it included numerous Microsoft-only capabilities; for example, you could use their version of what became XPath to select a valid document's document type declaration. (Note that this isn't a problem with XPath Visualiser itself, which deals only with true-blue XPath; it may be something to consider if you're planning to use MSXML for other purposes of your own.)
XPath Visualiser is not a "program" per se. It's a plain-old frames-based set of HTML documents and a customized version of Microsoft's default XSL(T) stylesheet, which work only when viewed through Internet Explorer Versions 5 and up. (More precisely, it works only with MSXML Versions 3 and up. Internet Explorer 5 and 5.5 do not come with MSXML 3, although you could download and install MSXML 3 to run under them. Internet Explorer 6 comes with the next version of MSXML, Version 4.) Figure 5-1 shows a portion of how the browser window appears when you first open this frameset.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Sample XML Document
To keep a consistent base for all the example location paths in this chapter, I'll refer to the same XML source document. This document is short but contains at least one of every XPath node type:
<!-- Basic astrological data for T's and J's signs -->
<?xml-stylesheet type="text/xsl" href="astro.xsl"?>
<astro xmlns:xlink="http://www.w3.org/1999/xlink">
   <sign start-date="12-22" end-date="01-20">
      <name type="main">Capricorn</name>
      <name type="alt">The Sea-Goat</name>
      <!-- capricorn.gif corresponds to Unicode 3.0 #x2651 -->
      <symbol xlink:type="simple" xlink:href="capricorn.gif"/>
      <ruling_planet>Saturn</ruling_planet>
      <ruling_planet>Earth</ruling_planet>
      <energy>Feminine</energy>
      <quality>Cardinal</quality>
      <anatomy>
         <part>Bones</part>
         <part>Knees</part>
      </anatomy>
   </sign>
   <sign