Credit: David Ascher
Suppose that you want to convert element attributes into child
elements. A simple subclass of the
XMLGenerator
object gives you complete freedom in such
XML-to-XML transformation tasks:
from xml.sax import saxutils, make_parser
import sys
class Tweak(saxutils.XMLGenerator):
def startElement(self, name, attrs):
saxutils.XMLGenerator.startElement(self, name, {})
attributes = attrs.keys( )
attributes.sort( )
for attribute in attributes:
self._out.write("<%s>%s</%s>" % (attribute,
attrs[attribute], attribute))
parser = make_parser( )
dh = Tweak(sys.stdout)
parser.setContentHandler(dh)
parser.parse(sys.argv[1])
This particular recipe defines a Tweak
subclass of
the XMLGenerator
class provided by the
xml.sax.saxutils
module. The only purpose of the
subclass is to perform special handling of element starts while
relying on its base class to do everything else. SAX is a nice and
simple (after all, that’s what the S stands for) API
for processing XML documents. It defines various kinds of events that
occur when an XML document is being processed, such as
startElement
and endElement
.
The key to understanding this recipe is to understand that
Python’s XML library provides a base class,
XMLGenerator
, which performs an identity
transform. If you feed it an XML document, it will output an
equivalent XML document. Using standard Python object-oriented
techniques of subclassing and method override, you are free to
specialize how the generated XML document differs from the source.
The code above simply takes each element (attributes and their values
are passed in as a dictionary on startElement
calls), relies on the base class to output the proper XML for the
element (but omitting the attributes), and then writes an element for
each attribute.
Subclassing the XMLGenerator
class is a nice place
to start when you need to tweak some XML, especially if your tweaks
don’t require you to change the existing
parent-child relationships. For more complex jobs, you may want to
explore some other ways of processing XML, such as
minidom
or pulldom
. Or, if
you’re really into that sort of thing, you could use
XSLT (see Recipe 12.5).
Recipe 12.5 for various ways of driving XSLT from Python; Recipe 12.2, Recipe 12.3, and Recipe 12.4 for other uses of the SAX API.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.