
|
|
|
Create Well-Formed XML with JavaScript
Use Javascript to ensure that you write
correct, well-formed XML in web pages
[Discuss (2) | Link to this hack] |
Sometimes you need to create
some XML from within a browser. It is easy to write bad XML without
realizing it. Writing correct XML with all its bells and whistles is
not easy, but in this type of scenario you usually only need to write
basic XML.
There is a kind of hierarchy of
XML:
-
Basic: Elements only; no attributes, entities, character references,
escaped characters, or encoding issues
-
Plain: Basic plus attributes
-
Plain/escaped: Plain with special XML characters escaped
-
Plain/advanced: Plain/escaped with CDATA sections and processing
instructions
The list continues with increasing levels of sophistication (and
difficulty).
This hack covers the basic and plain styles (with some enhancements),
and you can adapt the techniques to move several more steps up the
ladder if you like.
The main issues with writing basic XML is to get the elements closed
properly and keep the code simple. Here is how.
The Element Function
Here is a Javascript function for
writing elements:
// Bare bones XML writer - no attributes
function element(name,content){
var xml
if (!content){
xml='<' + name + '/>'
}
else {
xml='<'+ name + '>' + content + '</' + name + '>'
}
return xml
}
This basic hack even writes the empty-element form when there is no
element content. What is especially nice about this hack is that you
can use it recursively, like this:
var xml = element('p', 'This is ' +
element('strong','Bold Text') + 'inline')
Both inner and outer elements are guaranteed to be closed properly.
You can display the result for testing like this:
alert(xml)
You can build up your entire XML document by combining bits like
these, and all the elements will be properly nested and closed.
The element() function does not do any
pretty-printing, because it has no way to know where line breaks
should go. If that is important to you, just create a variant
function:
function elementNL(name, content) {
return element(name,content) + '\n'
}
More sophisticated variations are possible but rarely needed.
Adding Attributes
At the next level up, the most
pressing problems are to format the attribute string properly, to
escape single and double quotes embedded in the attribute values, and
to do the least amount of quote escaping so that the result will be
as readable as possible.
We modify the element() function to optionally accept an
associative array containing the attribute names and values. In other
languages, an associative array may be called a dictionary or a hash.
// XML writer with attributes and smart attribute quote escaping
function element(name,content,attributes){
var att_str = ''
if (attributes) { // tests false if this arg is missing!
att_str = formatAttributes(attributes)
}
var xml
if (!content){
xml='<' + name + att_str + '/>'
}
else {
xml='<' + name + att_str + '>' + content + '</'+name+'>'
}
return xml
}
The function formatAtributes() handles formatting and escaping
the attributes.
To fix up the quotes, we use the following algorithm if there are
embedded quotes (single or double):
-
Whichever type of quote occurs first in the string, use the other
kind to enclose the attribute value.
-
Only escape occurrences of the kind of quote used to enclose the
attribute value. We don't need to escape the other
kind.
Here is the code:
var APOS = "'"; QUOTE = '"'
var ESCAPED_QUOTE = { }
ESCAPED_QUOTE[QUOTE] = '"'
ESCAPED_QUOTE[APOS] = '''
/*
Format a dictionary of attributes into a string suitable
for inserting into the start tag of an element. Be smart
about escaping embedded quotes in the attribute values.
*/
function formatAttributes(attributes) {
var att_value
var apos_pos, quot_pos
var use_quote, escape, quote_to_escape
var att_str
var re
var result = ''
for (var att in attributes) {
att_value = attributes[att]
// Find first quote marks if any
apos_pos = att_value.indexOf(APOS)
quot_pos = att_value.indexOf(QUOTE)
// Determine which quote type to use around
// the attribute value
if (apos_pos = = -1 && quot_pos = = -1) {
att_str = ' ' + att + "='" + att_value + "'"
result += att_str
continue
}
// Prefer the single quote unless forced to use double
if (quot_pos != -1 && quot_pos < apos_pos) {
use_quote = APOS
}
else {
use_quote = QUOTE
}
// Figure out which kind of quote to escape
// Use nice dictionary instead of yucky if-else nests
escape = ESCAPED_QUOTE[use_quote]
// Escape only the right kind of quote
re = new RegExp(use_quote,'g')
att_str = ' ' + att + '=' + use_quote +
att_value.replace(re, escape) + use_quote
result += att_str
}
return result
}
Here is code to test everything we've seen so far:
function test() {
var atts = {att1:"a1",
att2:"This is in \"double quotes\" and this is " +
"in 'single quotes'",
att3:"This is in 'single quotes' and this is in " +
"\"double quotes\""}
// Basic XML example
alert(element('elem','This is a test'))
// Nested elements
var xml = element('p', 'This is ' +
element('strong','Bold Text') + 'inline')
alert(xml)
// Attributes with all kinds of embedded quotes
alert(element('elem','This is a test', atts))
// Empty element version
alert(element('elem','', atts))
}
Open the file jswriter.html
() in a browser that supports Java-Script
(the script is also stored in jswriter.js so you
can easily include it in any HTML or XHTML document).
Example 1. jswriter.html
<html xmlns="http://www.w3.org/1999/xhtml">
<head><Title>Testing the Well-formed XML Hack</Title></head>
<script type='text/javascript'>
// XML writer with attributes and smart attribute quote escaping
function element(name,content,attributes){
var att_str = ''
if (attributes) { // tests false if this arg is missing!
att_str = formatAttributes(attributes)
}
var xml
if (!content){
xml='<' + name + att_str + '/>'
}
else {
xml='<' + name + att_str + '>' + content + '</'+name+'>'
}
return xml
}
var APOS = "'"; QUOTE = '"'
var ESCAPED_QUOTE = { }
ESCAPED_QUOTE[QUOTE] = '"'
ESCAPED_QUOTE[APOS] = '''
/*
Format a dictionary of attributes into a string suitable
for inserting into the start tag of an element. Be smart
about escaping embedded quotes in the attribute values.
*/
function formatAttributes(attributes) {
var att_value
var apos_pos, quot_pos
var use_quote, escape, quote_to_escape
var att_str
var re
var result = ''
for (var att in attributes) {
att_value = attributes[att]
// Find first quote marks if any
apos_pos = att_value.indexOf(APOS)
quot_pos = att_value.indexOf(QUOTE)
// Determine which quote type to use around
// the attribute value
if (apos_pos = = -1 && quot_pos = = -1) {
att_str = ' ' + att + "='" + att_value + "'"
result += att_str
continue
}
// Prefer the single quote unless forced to use double
if (quot_pos != -1 && quot_pos < apos_pos) {
use_quote = APOS
}
else {
use_quote = QUOTE
}
// Figure out which kind of quote to escape
// Use nice dictionary instead of yucky if-else nests
escape = ESCAPED_QUOTE[use_quote]
// Escape only the right kind of quote
re = new RegExp(use_quote,'g')
att_str = ' ' + att + '=' + use_quote +
att_value.replace(re, escape) + use_quote
result += att_str
}
return result
}
function test() {
var atts = {att1:"a1",
att2:"This is in \"double quotes\" and this is " +
"in 'single quotes'",
att3:"This is in 'single quotes' and this is in " +
"\"double quotes\""}
// Basic XML example
alert(element('elem','This is a test'))
// Nested elements
var xml = element('p', 'This is ' +
element('strong','Bold Text') + 'inline')
alert(xml)
// Attributes with all kinds of embedded quotes
alert(element('elem','This is a test', atts))
// Empty element version
alert(element('elem','', atts))
}
</script>
</head>
<body onload='test()'>
</body>
</html>
When the page loads, you will see the following in four successive
alert boxes, as shown in . The lines have
been wrapped for readability.
- First alert:
-
<elem>This is a test</elem>
- Second alert:
-
<p>This is <strong>Bold
Text</strong>inline</p>
- Third alert:
-
<elem att1='a1'
att2='This is in "double quotes" and this is
in 'single quotes''
att3="This is in 'single quotes' and this is in
"double quotes"">This is a
test</elem>
- Fourth alert:
-
<elem att1='a1'
att2='This is in "double quotes" and this is in
'single quotes''
att3="This is in 'single quotes' and this is in
"double quotes""/>
Figure 1. jswriter.html in Firefox
Extending the Hack
You may want to escape the other
special XML characters. You can do this by adding calls such as:
content = content.replace(/</g, '<')
Take care not to replace the quotes in attribute values, since
formatAttributes() handles this so nicely.
Because the parameters to elements() and
formatAttributes() are strings, they are easy to
manipulate as you like.
Creating Large Chunks of XML
If you create long strings of XML, say with
more than a few hundred string fragments, you may find the
performance to be slow. That's normal, and happens
because JavaScript, like most other languages, has to allocate memory
for each new string every time you concatenate more fragments.
The standard way around this is to accumulate the fragments in a
list, then join the list back to a string at the end. This process is
generally very fast, even for very large results.
Here is how you can do it:
var results = [ ]
results.push(element("p","This is some content"))
results.push(element('p', 'This is ' +
element('strong','Bold Text') + 'inline'))
// ... Append more bits
var end_result = results.join(' ')
See also:
Showing messages 1 through 2 of 2.
-
Typo in article
2010-12-07 15:58:16
Alquerian
[View]
-
Typo in article
2010-12-08 07:57:06
rachel.j
[View]
|
Showing messages 1 through 2 of 2.
|
|
O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website:
| Customer Service:
| Book issues:
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
|
|
<head><Title>Testing the Well-formed XML Hack</head>
should be changed to:
<head><Title>Testing the Well-formed XML Hack</title></head>