Credit: Jürgen Hermann
You need to process XML documents and access external documents (e.g., stylesheets), but you can’t use filesystem paths (to keep documents portable) or Internet-accessible URLs (for performance and security).
4Suite’s
xml.xslt
package (http://www.4suite.org/) gives you all the
power you need to handle XML stylesheets, including the hooks for
sophisticated needs such as those met by this recipe:
# uses 4Suite Version 0.10.2 or later from xml.xslt.Processor import Processor from xml.xslt.StylesheetReader import StylesheetReader class StylesheetFromDict(StylesheetReader): "A stylesheet reader that loads XSLT stylesheets from a python dictionary" def _ _init_ _(self, styles, *args): "Remember the dict we want to load the stylesheets from" StylesheetReader._ _init_ _(self, *args) self.styles = styles self._ _myargs = args def _ _getinitargs_ _(self): "Return init args for clone( )" return (self.styles,) + self._ _myargs def fromUri(self, uri, baseUri='', ownerDoc=None, stripElements=None): "Load stylesheet from a dict" parts = uri.split(':', 1) if parts[0] == 'internal' and self.styles.has_key(parts[1]): # Load the stylesheet from the internal repository (your dictionary) return StylesheetReader.fromString(self, self.styles[parts[1]], baseUri, ownerDoc, stripElements) else: # Revert to normal behavior return StylesheetReader.fromUri(self, uri, baseUri, ownerDoc, stripElements) if _ _name_ _ == "_ _main_ _": # test and example of this stylesheet's loading approach # the sample stylesheet repository internal_stylesheets = { 'second-author.xsl': """ <person xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xsl:version="1.0"> <xsl:value-of select="books/book/author[2]"/> </person> """ } # the sample document, referring to an "internal" stylesheet xmldoc = """ <?xml-stylesheet href="internal:second-author.xsl" type="text/xml"?> <books> <book title="Python Essential Reference"> <author>David M. Beazley</author> <author>Guido van Rossum</author> </book> </books> """ # Create XSLT processor and run it processor = Processor( ) processor.setStylesheetReader(StylesheetFromDict(internal_stylesheets)) print processor.runString(xmldoc)
If you get a lot of XML documents from third parties (via FTP, HTTP, or other means), problems could arise because the documents were created in their environments, and now you must process them in your environment. If a document refers to external files (such as stylesheets) in the filesystem of the remote host, these paths often do not make sense on your local host. One common solution is to refer to external documents through public URLs accessible via the Internet, but this, of course, incurs substantial overhead (you need to fetch the stylesheet from the remote server) and poses some risks. (What if the remote server is down? What about privacy and security?)
Another approach is to use private URL schemes, such as
stylesheet:layout.xsl
. These need to be resolved
to real, existing URLs, which this recipe’s code
does for XSLT processing. We show how to use a hook offered by
4Suite, a Python XSLT engine, to refer to stylesheets in an
XML-Stylesheet processing instruction (see http://www.w3.org/TR/xml-stylesheet/).
A completely analogous approach can be used to load the stylesheet
from a database or return a locally cached stylesheet previously
fetched from a remote URL. The essence of this recipe is that you can
subclass StylesheetReader
and customize the
fromUri
method to perform whatever resolution of
private URL schemes you require. The recipe specifically looks at the
URL’s protocol. If it’s
internal
: followed by a name that is a known key
in an internal dictionary that maps names to stylesheets, it returns
the stylesheet by delegating the parsing of the dictionary
entry’s value to the fromString
method of StylesheetReader
. In all other cases, it
leaves the URI alone and delegates to the parent
class’s method.
The output of the test code is:
<?xml version='1.0' encoding='UTF-8'?> <person>Guido van Rossum</person>
This recipe requires at least Python 2.0 and 4Suite Version 0.10.2.
The XML-Stylesheet processing instruction is described in a W3C recommendation (http://www.w3.org/TR/xml-stylesheet/); the 4Suite tools from FourThought are available at http://www.4suite.org/.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.