In XML, character and entity references are formed by surrounding a
numerical value or a name with & and
;—for example, ©
is a decimal character reference and © is
an entity reference. This hack shows you how to use both.
Character References
According to the third and latest
edition of the XML 1.0 specification (http://www.w3.org/TR/REC-xml/), XML
processors must accept over 1,000,000 hexadecimal characters
(http://www.w3.org/TR/REC-xml/#charsets).
It's possible that you won't be
able to find all those characters on your keyboard!
Don't worry. You can use character references
instead.
TIP
You can look up the semantics of individual Unicode characters at
http://www.unicode.org/charts/.
You can reference characters using either decimal or hexadecimal
numbers. Which one you use is a matter of style. The document
Namen.xml uses both (); it contains some German names enclosed in
German language tags.
Example 1. Namen.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="Namen.css" type="text/css"?>
<Namen xml:lang="de">
<Name>
<Vorname>Marie</Vorname>
<Nachname>Müller</Nachname>
<Geschlecht>♀</Geschlecht>
</Name>
<Name>
<Vorname>Klaus</Vorname>
<Nachname>Müller</Nachname>
<Geschlecht>♂</Geschlecht>
</Name>
</Namen>
On lines 7 and 8 are the decimal character references
ü and ♀,
respectively. The first one refers to the letter u with an umlaut
(ü) and the second one is a female sign. Lines 12 and 13
use the hexadecimal character references
ü (ü) and
♂ (male sign), respectively. You can
see how these character references are rendered in Opera in .
Figure 1. Namen.xml in Opera, styled by Namen.css
Entity References
XML has five predefined entities,
listed in . These predefined entities can
be used where the equivalent literal character is forbidden. For
example, an attribute value cannot contain a less-than sign
(<), because it looks too much like the
beginning of a tag to an XML parser. No problem: you can use
< instead. Likewise, you cannot use an
ampersand in parsed character data, the text content of an element.
Why? Again, it looks like the beginning of a character or entity
reference to an XML parser. Again, no problem: you can use
&
instead.
Table 1. XML predefined entities
|
Entity reference
|
Description
|
|
<
|
Less-than sign or open angle bracket (<)
|
|
>
|
Greater-than sign or close angle bracket (>)
|
|
&
|
Ampersand (&)
|
|
'
|
Apostrophe or single quote (')
|
|
"
|
Quote or double quote (")
|
The following document,
copy.xml
in , uses a predefined entity and also
declares and references a new entity.
Example 2. copy.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="copy.css" type="text/css"?>
<!DOCTYPE time [<!ENTITY copy "©">]>
<!-- a time instant -->
<time timezone="PST">
<hour>11</hour>
<minute>59</minute>
<second>59</second>
<meridiem>p.m.</meridiem>
<atomic signal="true"/>
<copyright>© O'Reilly & Associates</copyright>
</time>
The entity copy is declared in the document type
declaration on line 3. The keyword is ENTITY; it
is followed by the entity name copy; and this is
followed by the value or content of the entity in quotes,
"©". (This entity comes standard in HTML
and XHTML.) Line 12 of this document references the entity declared
on line 3 (©) and also references the XML
1.0 predefined entity for an ampersand
(&). Open this document in Firefox (it is
styled by the CSS stylesheet copy.css) and it
will appear like .
Figure 2. copy.xml in Firefox
Character references provide a convenient means to access a very
large number of characters. Entities are also a convenient means
to store information and access it elsewhere, even multiple times if
necessary.