Though controversial, XML namespaces are a necessity if you want to manage XML documents in the wild. This hack gets into some of the nitty-gritty of namespaces so you can more easily untangle them.
In January 1999, the W3C published its Namespaces in XML recommendation (http://www.w3.org/TR/REC-xml-names/), about a year after the XML recommendation arrived. There were hints of namespaces in the original XML spec, evidenced by suggestions about the use of colons, but that was about it. On the surface, namespaces appear reasonable enough, but their implications have been the subject of confusion and criticism for over five years.
Namespaces were mentioned briefly in [Hack #7] . In this hack, we’ll talk about how namespaces work in more detail.
Look at the following document, namespace.xml , in Example 4-1.
Example 4-1. namespace.xml
<?xml version="1.0" encoding="UTF-8"?>
<!-- a time instant -->
<time timezone="PST" xmlns="http://www.wyeast.net/time">
<hour>11</hour>
<minute>59</minute>
<second>59</second>
<meridiem>p.m.</meridiem>
<atomic signal="true"/>
</time>
This document isn’t very different from
time.xml except for the special
xmlns
attribute on the time
element. The xmlns
attribute and its value
http://www.wyeast.net/time
are considered a
default
namespace
declaration. A default namespace declaration associates a
namespace name
—always a Uniform Resource
Identifier or URI (http://www.ietf.org/rfc/rfc2396.txt)—with
one or more elements. A local name together with its namespace name
is called an expanded
name
, and is often given as
{http://www.wyeast.net/time}time
in descriptive
text.
The default declaration in namespace.xml
associates the namespace name
http://www.wyeast.net/time
with the element
time
and its child elements
hour
, minute
,
second
, meridiem
, and
atomic
. A default namespace declaration applies
only to the element where it is declared and any of its child or
descendent elements. A default declaration on the document element
therefore applies to elements in the entire document. It does not
apply to attributes, however. You must use a prefix in order to apply
a namespace to an attribute.
Instead of a default declaration, you can also get more specific by using a prefix with a namespace. This is shown in prefix.xml (Example 4-2).
Example 4-2. prefix.xml
<?xml version="1.0" encoding="UTF-8"?>
<!-- a time instant -->
<tz:time timezone="PST" xmlns:tz="http://www.wyeast.net/time">
<tz:hour>11</tz:hour>
<tz:minute>59</tz:minute>
<tz:second>59</tz:second>
<tz:meridiem>p.m.</tz:meridiem>
<tz:atomic tz:signal="true"/>
</tz:time>
In this declaration (again on the time
element),
the prefix tz
is associated with the namespace
name or URI. So any element or attribute in the document that is
prefixed with tz
will be associated with the
namespace http://www.wyeast.net/time
.
The timezone
attribute on time
does not have a prefix, so it is not associated with the namespace.
In fact, the only way you can associate an attribute with a namespace
is with a prefix. Default namespace declarations never apply to
attributes.
The special namespace prefix xml
is bound to the
namespace URI
http://www.w3.org/XML/1998/namespace
, and is used
with attributes such as xml:lang
and
xml:space
. Because it is built in, it
doesn’t have to be declared, but you can declare it
if you want. However, you are not allowed to bind
xml
: to any other namespace name, and you
can’t bind any other prefix to
http://www.w3.org/XML/1998/namespace
.
xmlns
is a special attribute and can be
used as a prefix (http://www.w3.org/TR/REC-xml-names/#ns-decl).
As the result of an erratum in http://www.w3.org/XML/xml-names-19990114-errata,
the prefix xmlns
: was bound to the namespace name
http://www.w3.org/2000/xmlns/
. Unlike the prefix
xml
:, xmlns
cannot be declared,
and no other prefix may be bound to
http://www.w3.org/2000/xmlns/
.
The intent of
namespaces was to allow different
vocabularies to be mixed together in a single document and to avoid
the collision of names in environments where the names might run into
each other. The unfortunate and confusing part about namespaces is
their use of any URI as a namespace name. The scheme or protocol name
http://
suggests that the URI identify a resource
that can be retrieved using Hypertext Transfer Protocol (http://www.ietf.org/rfc/rfc2616.txt), just
like any web resource would be retrieved. But this is not the case.
The URI is just considered a name, not a guarantee of the location or
existence of a resource. Fortunately, a technique that uses RDDL
[Hack #60]
can help overcome this
annoyance. The nice thing about URIs, on the other hand, is that they
are allocated locally; that is, you don’t have to
deal with a global registry to use them. However, the downside of
that is you can’t really police people who might use
a domain name that you own as part of a URI.
A new namespace spec was created for use only with XML 1.1
(http://www.w3.org/TR/xml-names11). Notably,
this spec allows you to undeclare a previously declared namespace;
that is, with xmlns="
" you can undeclare a default
namespace declaration, and with xmlns:tz="
" you
can undeclare a namespace associated with the prefix
tz
. In Version 1.0 of the namespaces spec, a
default namespace may be empty (as in xmlns="
“),
but you cannot redeclare a namespace as you can in Version 1.1.
Section 2.4 of Simon St.Laurent’s Common XML spec offers some useful tips on namespaces: http://www.simonstl.com/articles/cxmlspec.txt
Joe English’s reasoned “Plea for Sanity” in the use of namespaces: http://lists.xml.org/archives/xml-dev/200204/msg00170.html
Get XML Hacks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.