Transforming an XML Document Using XSLT

Credit: David Ascher

Problem

You have an XSLT transform that you wish to programmatically run through XML documents.

Solution

The solution depends on the XSLT processor you’re using. If you’re using Microsoft’s XSLT engine (part of its XML Core Services package), you can drive the XSLT engine through its COM interface:

def process_with_msxslt(xsltfname, xmlfname, targetfname):
    import win32com.client.dynamic
    xslt = win32com.client.dynamic.Dispatch("Msxml2.DOMDocument.4.0")
    xslt.async = 0
    xslt.load(xsltfname)
    xml = win32com.client.dynamic.Dispatch("Msxml2.DOMDocument.4.0")
    xml.async = 0
    xml.load(xmlfname)
    output = xml.transformNode(xslt)
    open(targetfname, 'wb').write(output)

If you’d rather use Xalan’s XSLT engine, it’s as simple as using the right module:

import Pyana
output = Pyana.transform2String(source=open(xmlfname).read(  ),
                                style=open(xsltfname).read(  ))
open(targetfname, 'wb').write(output)

Discussion

There are many different ways that XML documents need to be processed. Extensible Stylesheet Language Transformations (XSLT) is a language that was developed specifically for transforming XML documents. Using XSLT, you define a stylesheet, which is a set of templates and rules that defines how to transform specific parts of an XML document into arbitrary outputs.

The XSLT specification is a World Wide Web Consortium Recommendation (http://www.w3.org/TR/xslt) that has been implemented by many different organizations. The two most commonly used XSLT processors are Microsoft’s XLST engine, part of its XML Core Services, and the Apache group’s Xalan engines (C++ and Java versions are available).

If you have an existing XSLT transform, running it from Python is easy with this recipe. The first variant uses the COM interface (provided automatically by the win32com package, part of win32all) to Microsoft’s engine, while the second uses Brian Quinlan’s convenient wrapper around Xalan, Pyana (http://pyana.sourceforge.net/). While this recipe shows only the easiest way of using Xalan through Pyana, there’s a lot more to the package. You can easily extend XSLT and XPath with Python code, something which can save you a lot of time if you know Python well.

XSLT is definitely trendy, partially because it seems at first well-suited to processing XML documents. If you’re comfortable with XSLT and want to use Python to help you work with your existing stylesheets, these recipes will start you on your way. If, however, you’re quite comfortable with Python and are just starting with XSLT, you may find that it’s easier to forget about these newfangled technologies and use good old Python code to do the job. See Recipe 12.6 for a very different approach.

See Also

Recipe 12.6 for a pure Python approach to the same problem; Recipe 12.10; Pyana is available and documented at http://pyana.sourceforge.net; Apache’s Xalan is available and documented at http://xml.apache.org; Microsoft’s XML technologies are available from Microsoft’s developer site (http://msdn.microsoft.com).

Get Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.