By Christopher A. Jones, Fred L. Drake, Jr.
Price: $39.95 USD
£28.50 GBP
Cover | Table of Contents | Colophon
Ï_ࡱ_á > _ ÿ _ _ B_ _ D_ _
ÿÿÿ ?_ @_ A_ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿì¥Á 7 _ _¿ _ _ >_ _
bjbjU_U_ __ 0¸_ 7| 7| W_ _ C
ÿÿ_ ÿÿ_ ÿÿ_ l Ê_
Ê_ Ê_ Ê_ Ê_ Ê_ Ê_ ¶ _ http://www.w3.org/TR/REC-xml. (The second
edition differs from the first only in that some editorial
corrections and clarifications have been made; the specification is
stable.)
<?xml version="1.0"?>
<PurchaseOrder>
<account refnum="2390094"/>
<item sku="33-993933" qty="4">
<name>Potato Smasher</name>
<description>Smash Potatoes like never before.</description>
</item>
</PurchaseOrder><? characters, is the XML
declaration. It states which version of XML is being used
and can also include information about the character encoding of the
document. The text starting with
<PurchaseOrder> and ending with
</PurchaseOrder> is an XML
element. An element must have an opening and
closing tag, or the opening tag must end with
the characters /> if it is to be empty. The
account element shown here is an example of an
empty element that ends with a
/>. The item element opens,
contains two other elements, and then closes. The
sku="33-993933" expression is an
attribute named sku with its
value 33-993933 in quotes. An element can have as
many attributes as needed. Both the name and
description elements are followed by character
data or text. Finally, the elements are closed and the document
terminates.
<?xml version="1.0"?>
<PurchaseOrder>
<account refnum="2390094"/>
<item sku="33-993933" qty="4">
<name>Potato Smasher</name>
<description>Smash Potatoes like never before.</description>
</item>
</PurchaseOrder><? characters, is the XML
declaration. It states which version of XML is being used
and can also include information about the character encoding of the
document. The text starting with
<PurchaseOrder> and ending with
</PurchaseOrder> is an XML
element. An element must have an opening and
closing tag, or the opening tag must end with
the characters /> if it is to be empty. The
account element shown here is an example of an
empty element that ends with a
/>. The item element opens,
contains two other elements, and then closes. The
sku="33-993933" expression is an
attribute named sku with its
value 33-993933 in quotes. An element can have as
many attributes as needed. Both the name and
description elements are followed by character
data or text. Finally, the elements are closed and the document
terminates.
<?xml version="1.0"?> <book> <title>Python and XML</title> </book>
book and title
elements are opened and closed so that elements
nest within each other in a strictly hierarchical way. You
can't open a book and close a
magazine.
<?xml version="1.0" encoding="UTF-8"?>
<?xml encoding="UTF-8"?>
myEntity and a parameter entity of
the same name, and the names do not clash.
xmlns attribute with a URI. Namespaces are
communicated in an XML document using the reserved colon character in
an element name, prefixed with the xmlns symbol.
For example:
<sumc:purchaseOrder refnum="389473984-38844"
xmlns:sumc="http://www.superultramegacorp.com">
<sumc:product name="Magical Widget" sku="398-4993833">
<sumc:qty value="24">One Case Order</sumc:qty>
<sumc:amount value="34.56">34.56</sumc:amount>
<sumc:shipping value="overnight">Next-day</sumc:shipping>
</sumc:product>
</sumc:purchaseOrder>sumc has been associated with it in the
xmlns:sumc attribute. Elements prefixed with
sumc: are within this namespace. This
purchaseOrder now has a context that can set it
apart from a similarly structured purchase order intended for a
different business domain.
<?xml version="1.0"?>
<webArticle category="news" subcategory="technical">
<header title="NASA Builds Warp Drive"
length="3k"
author="Joe Reporter"
distribution="all"/>
<body>Seattle, WA - Today an anonymous individual
announced that NASA has completed building a
Warp Drive and has parked a ship that uses
the drive in his back yard. This individual
claims that although he hasn't been contacted by
NASA concerning the parked space vessel, he assumes
that he will be launching it later this week to
mount an exhibition to the Andromeda Galaxy.
</body>
</webArticle>ArticleHandler class to a new file,
handlers.py; we'll keep adding new
handlers to this file throughout the chapter. Keep it simple at
first, just to see how SAX works:
# - ArticleHandler (add to handlers.py file)
class ArticleHandler(ContentHandler):
"""
A handler to deal with articles in XML
"""
def startElement(self, name, attrs):
print "Start element:", namepopen call to the file command
for each file. (While this could be made more efficient by calling
find less often and requiring it to operate on
more than one file at a time, that isn't the topic of this
book.) One of the key methods of this class is
indexDirectoryFiles:
def indexDirectoryFiles(self, dir):
"""Index a directory structure and creates an XML output file."""
# prepare output XML file
self.__fd = open(self.outputFile, "w")
self.__fd.write('<?xml version="1.0" encoding="' +
XML_ENC + '"?>\n')
self.__fd.write("<IndexedFiles>\n")
# do actual indexing
self.__indexDir(dir)
# close out XML file
self.__fd.write("</IndexedFiles>\n")
self.__fd.close()outputFile and an XML declaration and root element
are added. The indexDirectoryFiles method calls
its internal _ _indexDir method—this is the
real worker method. It is a recursive method that descends the file
hierarchy, indexing files along the way.
file elements within the XML document that have a
corresponding <imagename>.jpg file that is
the entire image, and a t-<imagename>.jpg
file that is a thumbnail-size image.
$> ls -l *newimage* -rw-rw-r-- 1 shm00 shm00 98197 Jan 18 11:08 newimage.jpg -rw-rw-r-- 1 shm00 shm00 5272 Jan 18 11:42 t-newimage.jpg
convert command. This command is part of the
ImageMagick package, and is installed by default by most modern Linux
distributions. For other Unix systems, the package is available at
http://www.imagemagick.org/.
$> convert image.jpg -geometry 192x128 t-image.jpg
os.path.walk
function.)
"""
genxml.py
Descends PyXML tree, indexing source files and creating
XML tags for use in navigating the source.
"""
import os
import sys
from xml.sax.saxutils import escape
def process(filename, fp):
print "* Processing:", filename,
# parse the file
pyFile = open(filename)
fp.write("<file name=\"" + filename + "\">\n")
inClass = 0
line = pyFile.readline( )
while line:
line = line.strip( )
if line.startswith("class") and line[-1] == ":":
if inClass:
fp.write(" </class>\n")
inClass = 1
fp.write(" <class name='" + line[:-1] + "'>\n")
elif line.find("def") > 0 and line[:-1] == ":" and inClass:
fp.write(" <method name='" + escape(line[:-1]) + "'/>\n")
line = pyFile.readline( )
pyFile.close( )
if inClass:
fp.write(" </class>\n")
inClass = 0
fp.write("</file>\n")
def finder(fp, dirname, names):
"""Add files in the directory dirname to a list."""
for name in names:
if name.endswith(".py"):
path = os.path.join(dirname, name)
if os.path.isfile(path):
process(path, fp)
def main( ):
print "[genxml.py started]"
xmlFd = open("pyxml.xml", "w")
xmlFd.write("<?xml version=\"1.0\"?>\n")
xmlFd.write("<pyxml>\n")
os.path.walk(sys.argv[1], finder, xmlFd)
xmlFd.write("</pyxml>")
xmlFd.close( )
print "[genxml.py finished]"
if __name__ == "__main__":
main( )ParserFactory class is provided that
supplies a SAX-ready parser guaranteed available in your runtime
environment. Additionally, you can explicitly create a parser (or SAX
driver) by dipping into any specific package, such as PyExpat. We
illustrate an example of both, but normally you should rely on the
parser factory to instantiate a parser.
make_parser function (imported from
xml.sax) returns a SAX driver for the first
available parser in the list that you supply, or returns an available
parser if no list is specified or if the list contains parsers that
are not found or cannot be loaded. The make_parser
function has its roots as part of the
xml.sax.saxexts.ParserFactory class, but it is
better to import the method from xml.sax (more on
this in a bit). For example:
from xml.sax import make_parser parser = make_parser( )
make_parser without an argument is sure to return
either a PyExpat or xmlproc driver. If you dig
into the source of the xml.sax module, you will
see this list supplied to the ParserFactory class.
If you instantiate a parser factory directly out of
xml.sax.saxexts, you need to be sure to supply a
list containing the name of at least one valid parser, or it
won't be able to create a parser:
>>> from xml.sax.saxexts import ParserFactory
>>> p = ParserFactory( )
>>> parser = p.make_parser( )
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.0/site-packages/_xmlplus/sax/saxexts.py",
line 77, in make_parser
raise SAXReaderNotAvailable("No parsers found", None)
xml.sax._exceptions.SAXReaderNotAvailable: No parsers found>>> from xml.sax.saxexts import ParserFactory >>> p = ParserFactory(["xml.sax.drivers.drv_pyexpat"]) >>> parser = p.make_parser( )
ContentHandler, you may be
wondering how much of that ease comes from using SAX and how much is
a matter of convenience functions in the Python libraries. While we
won't delve deeply into the native interfaces of the individual
parsers, this is a good question, and can lead to some interesting
observations.
xml.parsers.expat module.
If we want to modify our last example to use PyExpat directly, we
don't have a lot of work to do, but there are a few changes.
Since the PyExpat handler methods closely match the SAX handlers, at
least for the basic use we demonstrate here, we can use the same
handler class we've already written. The imports won't
need to change much:
#!/usr/bin/env python import sys from xml.parsers import expat from handlers import PyXMLConversionHandler
parser = expat.ParserCreate( )
>>> from xml.parsers import expat >>> parser = expat.ParserCreate( ) >>> dir(parser) ['CharacterDataHandler', 'CommentHandler', 'DefaultHandler', 'DefaultHandlerExpa nd', 'EndCdataSectionHandler', 'EndElementHandler', 'EndNamespaceDeclHandler', ' ErrorByteIndex', 'ErrorCode', 'ErrorColumnNumber', 'ErrorLineNumber', 'ExternalE ntityParserCreate', 'ExternalEntityRefHandler', 'GetBase', 'NotStandaloneHandler ', 'NotationDeclHandler', 'Parse', 'ParseFile', 'ProcessingInstructionHandler', 'SetBase', 'StartCdataSectionHandler', 'StartElementHandler', 'StartNamespaceDec lHandler', 'UnparsedEntityDeclHandler', 'ordered_attributes', 'returns_unicode', 'specified_attributes']