BUY THIS BOOK
Add to Cart

Print Book $39.95


Safari Books Online

What is this?

Add to UK Cart

Print Book £28.50

What is this?

Looking to Reprint this content?


.NET & XML
.NET & XML

By Niel Bornstein
Price: $39.95 USD
£28.50 GBP

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction to .NET and XML
The .NET framework, formally introduced to the public in July 2000, is the key to Microsoft's next-generation software strategy. It consists of several sets of products, which fulfill several goals Microsoft has targeted as being critical to its success over the next decade.
The Extensible Markup Language (XML), introduced in 1996 by the World Wide Web Consortium (W3C), provides a common syntax for data transfer between dissimilar systems. XML's use is not limited to heterogeneous systems, however; it can be, and often is, used for an application's internal configuration and datafiles.
In this chapter, I introduce the .NET Framework and XML, and give you the basic information you need to start using XML in the .NET Framework.
Unlike Windows (and operating systems generally), .NET is a software platform that enables developers to create software applications that are network-native. A network-native application is one whose natural environment is a standards-based network, such as the Internet or a corporate intranet. Rather than merely coexisting with the network, the network-native application is designed from the ground up to use the network as its playground. The alphabet soup of network standards includes such players as Internet Protocol (IP), Hypertext Transfer Protocol (HTTP), and others.
.NET enables componentization of software; that is, it allows developers to create small units of functionality, called assemblies in .NET, that can later be reused by other developers. These components can reside locally, on a standalone machine, or they can reside elsewhere on a network. Componentization is not new; previous attempts at building component software environments have included Common Object Request Broker Architecture (CORBA) and the Component Object Model (COM).
An important factor in the componentization of software is
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The .NET Framework
Unlike Windows (and operating systems generally), .NET is a software platform that enables developers to create software applications that are network-native. A network-native application is one whose natural environment is a standards-based network, such as the Internet or a corporate intranet. Rather than merely coexisting with the network, the network-native application is designed from the ground up to use the network as its playground. The alphabet soup of network standards includes such players as Internet Protocol (IP), Hypertext Transfer Protocol (HTTP), and others.
.NET enables componentization of software; that is, it allows developers to create small units of functionality, called assemblies in .NET, that can later be reused by other developers. These components can reside locally, on a standalone machine, or they can reside elsewhere on a network. Componentization is not new; previous attempts at building component software environments have included Common Object Request Broker Architecture (CORBA) and the Component Object Model (COM).
An important factor in the componentization of software is language integration. You may already be familiar with the concept of language independence, which means that you can develop software components in any of the languages that .NET supports and use the components you develop in any of those languages. However, language integration goes a step further, meaning that those languages support .NET natively. Using the .NET Framework from any of the .NET languages is as natural as using the language's native syntax.
Building on top of these basic goals, .NET also allows developers to use enterprise services in their applications. The .NET Framework handles common tasks such as messaging, transaction monitoring, and security, so that you don't have to. Enterprise services that .NET takes advantage of can include those provided by Microsoft SQL Server, Microsoft Message Queuing (MSMQ), and Windows Authentication.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The XML Family of Standards
XML was specifically designed to combine the flexibility of SGML with the simplicity of Hypertext Markup Language (HTML). HTML, the markup language upon which the World Wide Web is based, is an application of an older and more complex language known as Standard Generalized Markup Language (SGML). SGML was created to provide a standardized language for complex documents, such as airplane repair manuals and parts lists. HTML, on the other hand, was designed for the specific purpose of creating documents that could be displayed by a variety of different web browsers. As such, HTML provides only a subset of SGML's functionality and is limited to features that make sense in a web browser. XML takes a broader view.
There are several types of tasks you'll typically want to perform with XML documents. XML documents can be read into arbitrary data structures, manipulated in memory, and written back out as XML. Existing objects can be written (or serialized, to use the technical term) to a number of different XML formats, including ones that you define, as well as standard serialization formats. The technologies most commonly used to perform these operations are the following:
Input
In order to read an XML Document into memory, you need to read it. There are a variety of XML parsers that can be used to read XML, and I discuss the .NET implementation in Chapter 2.
Output
After either reading XML in or creating an XML representation in memory, you'll most likely need to write it out to an XML file. This is the flip side of parsing, and it's covered in Chapter 3.
Extension
You can use the same APIs you use to read and write XML to read and write other formats. I explore how this works in Chapter 4.
DOM
Once it has been read into memory, you can manipulate an XML document's tree structure through the Document Object Model (DOM). The DOM specification was developed to introduce a platform-independent model for XML documents. The DOM is discussed in Chapter 5.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction to XML in .NET
Although many programming languages and environments have provided XML support as an add-on, .NET's support is integrated into the framework more tightly than most. The .NET development team decided to use XML extensively within the framework in order to meet its design goals. Accordingly, they built in XML support from the beginning.
The .NET Framework contains five main assemblies that implement the core XML standards. Table 1-1 lists the five assemblies, along with a description of the functionality contained in each. Each of these assemblies is documented in detail in Chapter 16 through Chapter 20.
Table 1-1: .NET XML assemblies
Assembly
Description
System.Xml
Basic XML input and output with XmlReader and XmlWriter, DOM with XmlNode and its subclasses, many XML utility classes
System.Xml.Schema
Constraint of XML via XML Schema with XmlSchemaObject and its subclasses
System.Xml.Serialization
Serialization to plain XML and SOAP with XmlSerializer
System.Xml.XPath
Navigation of XML via XPath with XPathDocument, XPathExpression, and XPathNavigator
System.Xml.Xsl
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Key Concepts
Before you can learn to work with XML in the .NET Framework, I have to introduce some of the key types you'll be using.
When using the DOM, as shown in Chapter 5, each node in an XML document is represented by an appropriately named class, starting with the abstract base class, XmlNode. Derived from XmlNode are XmlAttribute, XmlDocument, XmlDocumentFragment, XmlEntity, XmlLinkedNode, and XmlNotation. In turn, XmlLinkedNode has a number of subclasses that serve specific purposes (XmlCharacterData, XmlDeclaration, XmlDocumentType, XmlElement, XmlEntityReference, and XmlProcessingInstruction). Several of these key types also have further subclasses. In each case, the final subclass of each inheritance branch has a name that is meaningful to one familiar with XML.
Figure 1-3 shows the XmlNode inheritance hierarchy.
Figure 1-3: XmlNode inheritance hierarchy
Each of the concrete XmlNode subclasses are also represented by the members of the XmlNodeType enumeration: Element, Attribute, Text, CDATA, EntityReference, Entity, ProcessingInstruction, Comment, Document, DocumentType, DocumentFragment, Notation, Whitespace, and SignificantWhitespace, plus the special pseudo-node types, None, EndElement, EndEntity, and XmlDeclaration. Each XmlNode instance has a NodeType property, which returns an XmlNodeType that represents the type of the instance. An XmlNodeType value is also returned by the NodeType property of XmlReader, as discussed in Chapter 2, Chapter 3, and Chapter 4.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Moving On
In this chapter, I introduce the .NET Framework and the XML specification, and give you a flavor of how they work together. In the next chapter I show you how to read XML documents in .NET.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Reading XML
Perhaps the simplest thing you can do with an existing XML document is to read it into memory. The .NET Framework provides a set of tools in the System.Xml namespace to help you read XML, whether you wish to deal with it as a stream of events or to load the data into your own data structures. In this chapter we take a look at XmlReader, its subclasses, and the associated .NET types and interfaces. I also discuss when it is appropriate to use the XmlReader instead of other methods of reading XML, and describe the differences between pull parsers and push parsers.
You can read XML from a local file or from a remote source over a network. You'll see how to deal with various local and remote inputs, including reading through a network proxy. And you'll learn how to validate an XML document regardless of which sort of input source is used.
Throughout this chapter, I make use of a hypothetical Angus Hardware purchase order in XML and do some simple processing of its contents.
Before you learn about reading XML, you must learn how to read a file. In this section, I'll cover basic filesystem and network input in .NET. If you're already familiar with basic I/O types and methods in .NET, feel free to skip to the next section.
I/O classes in .NET are located in the System.IO namespace. The basic object used for reading and writing data, regardless of the source, is the Stream object. Stream is an abstract base class, which represents a sequence of bytes; the Stream has a Read( ) method to read the bytes from the Stream, a Write( ) method to write bytes to the Stream, and a Seek( ) method to set the current location within the Stream. Not all instances or subclasses of Stream support all these operations; for example, you cannot write to a FileStream representing a read-only file, and you cannot Seek( ) to a position in a NetworkStream. The properties CanRead, CanWrite
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Reading Data
Before you learn about reading XML, you must learn how to read a file. In this section, I'll cover basic filesystem and network input in .NET. If you're already familiar with basic I/O types and methods in .NET, feel free to skip to the next section.
I/O classes in .NET are located in the System.IO namespace. The basic object used for reading and writing data, regardless of the source, is the Stream object. Stream is an abstract base class, which represents a sequence of bytes; the Stream has a Read( ) method to read the bytes from the Stream, a Write( ) method to write bytes to the Stream, and a Seek( ) method to set the current location within the Stream. Not all instances or subclasses of Stream support all these operations; for example, you cannot write to a FileStream representing a read-only file, and you cannot Seek( ) to a position in a NetworkStream. The properties CanRead, CanWrite, and CanSeek can be interrogated to determine whether the respective operations are supported by the instance of Stream you're dealing with.
Table 2-1 shows the Stream type's subclasses and the methods each type supports.
Table 2-1: Stream subclasses and their supported members
Type
Length
Position
Flush( )
Read( )
Seek( )
Write( )
System.IO.BufferedStream
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XmlReader
XmlReader is an abstract base class that provides an event-based, read-only, forward-only XML pull parser (I'll discuss each of these terms shortly). XmlReader has three concrete subclasses, XmlTextReader, XmlValidatingReader, and XmlNodeReader, which enable you to read XML from a file, a Stream, or an XmlNode. You can also extend XmlReader to read other, non-XML data formats, and deal with them as if they were XML (you'll learn how to do this in Chapter 4).
The base XmlReader provides only the most essential functionality for reading XML documents. It does not, for example, validate XML (that's what XmlValidatingReader does) or expand XML entities into their respective character data (though XmlTextReader does). This does not mean that XML read from a text file cannot be validated at all; you can validate XML from any source by using the XmlValidatingReader constructor that takes an XmlReader object as a parameter, as I'll demonstrate.
Here are those four terms I used to describe XmlReader again, with a little explanation.
Event-based
An event in a stream-based XML reader indicates the start or end of an XML node as it is read from the data stream. The event's information is delivered to your application, and your application takes some action based on that information. In XmlReader, events are delivered by querying XmlReader's properties after calling its Read( ) method.
Read-only
XmlReader , as its name implies, can only read XML. For writing XML, there is an XmlWriter class, which I will discuss in Chapter 3.
Forward-only
Once a node has been read from an XML document, you cannot back up and read it again. For random access to an XML document, you should use XmlDocument (which I'll discuss in Chapter 5) or XPathDocument (which I'll discuss in Chapter 6).
Pull parser
Pull parsing is a more complex concept, which I'll describe in detail in the next section.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Moving On
You've now seen how to access files on a local filesystem and a network. You have learned how to use the various XmlReader implementations. And I've discussed the pull parser pattern used by the .NET XML parser and how it differs from a push parser.
You should now be able to read any arbitrary XML file using XmlReader. In the next chapter, I'll show you the other side of the XML I/O picture by introducing XmlWriter.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Writing XML
In the previous chapter, you saw that reading XML in .NET is a fairly simple proposition. In some ways, writing XML is even simpler. In this chapter, I'll start by covering common file and network output in .NET. Then I'll show you how to write XML to a local or remote file, including various ways to customize the appearance of the generated XML.
As with XmlReader, I'll start by taking a general look at how data is written in .NET. I've already covered input, and output is very similar in that most operations involve the Stream class. After a general introduction to how the writing process works, I'll show you a quick and simple way of writing text to a file.
I covered the basics of opening and reading a file through the File and FileInfo objects in Chapter 2. In this section, I'll focus on writing to a file using the same objects.
To begin with, File has a Create( ) method. This method takes a filename as a parameter and returns a FileStream, so the most basic creation and writing to a file is fairly intuitive. Stream and its subclasses implement a variety of Write( ) methods, including one that writes an array of bytes to the Stream. The following code snippet creates a file named myfile.txt and writes the text .NET & XML to it:
byte [ ] buffer = new byte [ ] {46,78,69,84,32,38,32,88,77,76};
string filename = "myfile.txt";

FileStream stream;
stream = File.Create(filename);
stream.Write(buffer,0,buffer.Length);
That byte array is an awkward way to write a string to a Stream; ordinarily, you wouldn't hardcode an array of bytes like that. I'll show you a more typical way of encoding a string as a byte array in a moment.
If the file already exists, the previous code overwrites the files's current contents. You may not want to do that in practice; you may prefer to append to the file if it already exists. You can handle this very easily in .NET in several different ways. This snippet shows one way, with the changes highlighted:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Writing Data
As with XmlReader, I'll start by taking a general look at how data is written in .NET. I've already covered input, and output is very similar in that most operations involve the Stream class. After a general introduction to how the writing process works, I'll show you a quick and simple way of writing text to a file.
I covered the basics of opening and reading a file through the File and FileInfo objects in Chapter 2. In this section, I'll focus on writing to a file using the same objects.
To begin with, File has a Create( ) method. This method takes a filename as a parameter and returns a FileStream, so the most basic creation and writing to a file is fairly intuitive. Stream and its subclasses implement a variety of Write( ) methods, including one that writes an array of bytes to the Stream. The following code snippet creates a file named myfile.txt and writes the text .NET & XML to it:
byte [ ] buffer = new byte [ ] {46,78,69,84,32,38,32,88,77,76};
string filename = "myfile.txt";

FileStream stream;
stream = File.Create(filename);
stream.Write(buffer,0,buffer.Length);
That byte array is an awkward way to write a string to a Stream; ordinarily, you wouldn't hardcode an array of bytes like that. I'll show you a more typical way of encoding a string as a byte array in a moment.
If the file already exists, the previous code overwrites the files's current contents. You may not want to do that in practice; you may prefer to append to the file if it already exists. You can handle this very easily in .NET in several different ways. This snippet shows one way, with the changes highlighted:
byte [ ] buffer = new byte [ ] {46,78,69,84,32,38,32,88,77,76};
string filename = "myfile.txt";

FileStream stream;
if (File.Exists(filename)) {
  // it already exists, let's append to it
  stream = File.OpenWrite(filename);
  stream.Seek(0,SeekOrigin.End);
} else {
  // it does not exist, let's create it
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
XmlWriter and Its Subclasses
XmlWriter is an abstract base class that defines the interface for creating XML output programmatically. It contains methods such as WriteStartElement( ) and WriteEndElement( ) to write data. XmlWriter maintains the state of the XML document as it writes, so it knows which start element or attribute to close when you call WriteEndElement( ) or WriteEndAttribute( ).
XmlTextWriter is the subclass of XmlWriter, which provides support for output of XML to any Stream, filename, or TextWriter. In addition to all the required features of an XmlWriter, XmlTextWriter allows you to set the formatting of the output, using the Formatting, Indentation, IndentChar, Namespaces, and QuoteChar properties.
The XmlTextWriter formatting properties are described in Table 3-6.
Table 3-6: XmlTextWriter formatting properties
Property
Type
Description
Formatting
System.Xml.Formatting
Specify Formatting.None if the XML is to be produced without indentation, or Formatting.Indented to produce indented XML. Formatting.Indented makes for more readable output, but the canonical XML produced is identical.
Indentation
int
If Formatting is set to Formatting.Indented, Indentation specifies the number of characters by which to indent each successive level of markup.
IndentChar
char
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Moving On
You've now seen how to read and write XML using the XmlReader and XmlWriter types. The next chapter will show you how to read and write non-XML data as though it were XML.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Reading and Writing Non-XML Formats
While more and more information is stored in XML, there are still lots of systems out there that use other formats. Both legacy integration and new non-XML formats are constant challenges for XML developers. Now that you've seen how to use the implementations of XmlReader and XmlWriter provided in the .NET class libraries, you're ready to learn how to implement your own custom types to handle some more complex scenarios. By combining XmlReader and XmlWriter, you can work with information stored in other formats as if it was XML, mixing and matching formats as you find appropriate for your projects.
For example, although the XmlReader class allows you to read standard XML syntax, there are alternative XML syntaxes that serve specialized purposes. There are XML syntaxes that do not use slashes and angle brackets, and some of these are considered to be more human-readable and less verbose than standard XML. Most of these alternative XML formats still retain all the functionality of standard XML. Other common non-XML formats contain structures you can treat as XML structures when convenient.
To read any sort of document using a non-XML format as though it were XML, you can extend XmlReader by writing a custom XmlReader subclass. Among the advantages of writing your own XmlReader subclass is that you can use your custom XmlReader wherever you would use any of the built-in XmlReaders. For example, even if the underlying data isn't formatted using standard XML syntax, you can pass any instance of a custom XmlReader to XmlDocument.Load( ) to load the XML document into a DOM (more on XmlDocument in Chapter 5). You could load a DOM tree from the data, use XPath to query the data, even transform the data with XSLT, all this even though the original data does not look anything like XML.
As long as an alternative syntax provides a hierarchical structure similar to XML, you can create an XmlReader for it that presents its content in a way that looks like XML. In this chapter you'll learn how to write a custom
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Reading Non-XML Documents with XmlReader
To read any sort of document using a non-XML format as though it were XML, you can extend XmlReader by writing a custom XmlReader subclass. Among the advantages of writing your own XmlReader subclass is that you can use your custom XmlReader wherever you would use any of the built-in XmlReaders. For example, even if the underlying data isn't formatted using standard XML syntax, you can pass any instance of a custom XmlReader to XmlDocument.Load( ) to load the XML document into a DOM (more on XmlDocument in Chapter 5). You could load a DOM tree from the data, use XPath to query the data, even transform the data with XSLT, all this even though the original data does not look anything like XML.
As long as an alternative syntax provides a hierarchical structure similar to XML, you can create an XmlReader for it that presents its content in a way that looks like XML. In this chapter you'll learn how to write a custom XmlReader implementation which will enable you to read data formatted in PYX, a line-oriented XML format, as if it were XML.
Before you can write an XmlPyxReader, you first need to understand PYX syntax. PYX is a line-oriented XML syntax, developed by Sean McGrath, which reflects XML's SGML heritage. PYX is based on Element Structure Information Set (ESIS), a popular alternative syntax for SGML.
Unlike many of the terms in this book, PYX is not an acronym for anything. A pyx is is a container used in certain religious rites, and the PYX notation was developed mostly using the Python programming language.
In a line-oriented format, each XML node occurs on a new line. The XML nodes that PYX can represent include start element, end element, attribute, character data, and processing instruction. The first character of each line indicates what sort of node the line represents. Table 4-1 shows the prefix characters and what node type each represents.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Writing an XmlPyxWriter
Using XmlTextWriter is very simple, if you want to write your XML in standard angle-brackets syntax. But since you learned how to read PYX in Chapter 3, you should learn how to write PYX here.
Because PYX does not handle many XML structural features, the implementation is quite simple. Example 4-6 shows a possible implementation of an XmlPyxWriter. After you look over the code, I'll highlight some important bits.
Example 4-6. XmlPyxWriter implementation
using System;
using System.Collections;
using System.Globalization;
using System.IO;
using System.Xml;

public class XmlPyxWriter : XmlWriter {

  // constructors

  public XmlPyxWriter(TextWriter writer) {
    this.writer = writer;
  }

  public XmlPyxWriter(Stream stream) {
    this.writer = new StreamWriter(stream);
  }

  public XmlPyxWriter(string filename) {
    this.writer = new StreamWriter(filename);
  }

  // private instance variables

  private TextWriter writer;
  private WriteState writeState = WriteState.Start;
  private XmlSpace xmlSpace = XmlSpace.Default;
  private string xmlLang = CultureInfo.CurrentCulture.ThreeLetterISOLanguageName;
  private Stack elementNames = new Stack( );

  // private instance methods

  private void Write(string text) {
    writer.WriteLine("-{0}", text);
    if (writeState == WriteState.Element) {
      writeState = WriteState.Content;
    }
  }

  private void Write(char ch) {
    writer.WriteLine("-{0}", ch);
    if (writeState == WriteState.Element) {
      writeState = WriteState.Content;
    }
  }

  private void Write(char [ ] buffer, int index, int count) {
    writer.WriteLine("-{0}", buffer, index, count);
    if (writeState == WriteState.Element) {
      writeState = WriteState.Content;
    }
  }

  // properties from XmlWriter

  public override WriteState WriteState { 
    get { return writeState; }
  }

  public override XmlSpace XmlSpace { 
    get { return xmlSpace; } 
  }

  public override string XmlLang { 
    get { return xmlLang; } 
  }

  // methods from XmlWriter

  public override void WriteEndDocument( ) { 
    // no-op
  }

  public override void WriteComment(string text) { 
    // no-op
  }

  public override void WriteStartDocument( ) { 
    writeState = WriteState.Prolog;
  }

  public override void WriteStartDocument(bool standalone) { 
    writeState = WriteState.Prolog;
  }

  public override void WriteDocType(string name, string pubid, string sysid, string subset){ 
    writeState = WriteState.Prolog;
  }

  public override void WriteStartElement(string prefix, string localName, string ns) { 
    writer.WriteLine("({0} ", localName);
    elementNames.Push(localName);
    writeState = WriteState.Element;
  }

  public override void WriteEndElement( ) { 
    writer.WriteLine("){0}", elementNames.Pop( ));
  }

  public override void WriteFullEndElement( ) { 
    WriteEndElement( );
  }

  public override void WriteStartAttribute(string prefix, string localName, string ns) { 
    writer.Write("A{0} ",localName);
    writeState = WriteState.Attribute;
  }

  public override void WriteEndAttribute( ) { 
    writer.WriteLine( );
    writeState = WriteState.Element;
  }

  public override void WriteProcessingInstruction(string name, string text) { 
    writer.WriteLine("?{0} {1}", name, text);
  }

  public override void WriteEntityRef(string name) { 
    char ch = ' ';
    switch (name) {
      case "amp":
        ch = '&';
        break;
      case "lt":
        ch = '<';
        break;
      case "gt":
        ch = '>';
        break;
      case "quot":
        ch = '"';
        break;
      case "apos":
        ch = '\'';
        break;
    }
    Write(ch);
  }

  public override void WriteCData(string text) { 
    Write(text);
  }

  public override void WriteCharEntity(char ch) { 
    Write(ch);
  }

  public override void WriteWhitespace(string ws) { 
    Write(ws);
  }

  public override void WriteString(string text) {
    if (writeState == WriteState.Attribute) {
      writer.Write("{0}", text);
    } else {
      Write(text);
    }
  }

  public override void WriteSurrogateCharEntity(char lowChar, char highChar) {
    Write(lowChar);
    Write(highChar);
  }

  public override void WriteChars(char [ ] buffer, int index, int count) { 
    Write(buffer, index, count);
  }

  public override void WriteRaw(char [ ] buffer, int index, int count) { 
    Write(buffer, index, count);
  }

  public override void WriteRaw(string data) { 
    Write(data);
  }

  public override void WriteBase64(byte [ ] buffer, int index, int count) { 
    Write(writer.Encoding.GetChars(buffer), index, count);
  }

  public override void WriteBinHex(byte [ ] buffer, int index, int count) { 
    Write(writer.Encoding.GetChars(buffer), index, count);
  }

  public override void Close( ) { 
    writer.Close( );
    writeState = WriteState.Closed;
  }

  public override void Flush( ) { 
    writer.Flush( );
  }

  public override string LookupPrefix(string ns) { 
    return string.Empty;
  }

  public override void WriteNmToken(string name) { 
    writer.Write(name);
  }

  public override void WriteName(string name) { 
    writer.Write(name);
  }

  public override void WriteQualifiedName(string localName, string ns) { 
    writer.Write(localName);
  }
}
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Moving On
I've now shown you how to create XmlReader and XmlWriter types to read one particular alternative XML syntax, and how to use them in programs that think they're reading and writing XML. You can think of other applications; besides other alternative XML syntaxes, such as YAML (Yet Another Markup Language) and James Clark's Compact Syntax for RELAX NG, you could read data from other formats completely unrelated to XML, such as CSV files, DBF files—even databases and filesystems.
The knowledge of how to read and write XML to and from a variety of physical and logical formats forms a good basis for what's to follow. You'll see the real power of XmlReader and XmlWriter as they are combined with higher-level XML functionality, starting with the Document Object Model.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: Manipulating XML with DOM
The first section of this book laid the groundwork for your XML education by showing you how to read and write XML and other data using the .NET Framework. In between reading and writing, however, you'll often need to work with the data in other ways. This section will introduce various W3C standards and the implementations of those standards in the .NET Framework.
The XmlReader allows you to access XML data in a read-only, forward-only manner, but sometimes you need to read XML in a non-sequential manner. For example, you may want to change the order of a couple of elements somewhere in the middle of the document tree. For this purpose, the World Wide Web Consortium developed the Document Object Model (DOM).
In this chapter, I'll discuss what the DOM is, how .NET implements it, and when it is appropriate to use the DOM in your own code. Finally, we'll look at some examples using the DOM in C#.
The DOM is an interface for manipulating XML content, structure, and style in an object-oriented fashion. It provides a standardized way of manipulating XML documents, including accessing elements and other nodes, taking actions on an object tree based on events, applying styles to documents, loading documents into object trees and saving object trees to documents, and more.
The DOM is language- and platform-neutral, meaning that it can be applied to any programming language on any hardware platform or operating system. Since its start in 1997, the DOM Working Group has made it a specific goal to ensure the DOM's language- and platform-neutrality. They've been successful; you can easily find a DOM implementation in just about any modern programming language, on any modern hardware platform.
The DOM represents an XML document as a tree of objects. Each object in the tree is called a node. The types of nodes that the DOM specifies are
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Is the DOM?
The DOM is an interface for manipulating XML content, structure, and style in an object-oriented fashion. It provides a standardized way of manipulating XML documents, including accessing elements and other nodes, taking actions on an object tree based on events, applying styles to documents, loading documents into object trees and saving object trees to documents, and more.
The DOM is language- and platform-neutral, meaning that it can be applied to any programming language on any hardware platform or operating system. Since its start in 1997, the DOM Working Group has made it a specific goal to ensure the DOM's language- and platform-neutrality. They've been successful; you can easily find a DOM implementation in just about any modern programming language, on any modern hardware platform.
The DOM represents an XML document as a tree of objects. Each object in the tree is called a node. The types of nodes that the DOM specifies are Document, DocumentFragment, DocumentType, EntityReference, Element, Attr, ProcessingInstruction, Comment, Text, CDATASection, Entity, and Notation. Some of these node types can have subnodes, and the types of subnodes that a particular node type can have are specified. To handle collections of nodes, the DOM also specifies a NodeList object and, for dictionaries of nodes (keyed by their names), the NamedNodeMap object. Figure 5-1 shows the DOM inheritance hierarchy.
Figure 5-1: The DOM inheritance hierarchy
The DOM specifies a group of interfaces, not actual objects. This means that the implementation of the objects is not mandated, only the methods that must be accessible from a client of the DOM. Because the objects are specified by their interfaces, they cannot be created with traditional constructors; instead, factory methods are commonly used.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The .NET DOM Implementation
.NET implements only Levels 1 and 2 of the Core module of DOM. As such, the core DOM functionality is provided: standard node types and the object tree view of a document. .NET also provides some features specified in other modules that are not yet part of an official DOM level (such as loading and saving of a document, and document traversal via XPath). If these modules become official W3C Recommendations, it is expected that future .NET implementations will support them.
Example 5-1 lists a program you can run to demonstrate which features the .NET Framework's DOM implementation supports.
Example 5-1. A program to report DOM module support
using System;

using System.Xml;

class DomFeatureChecker {

  private static readonly string [ ] versions = new string [ ] {
    "1.0", "2.0" };
  
  private static readonly string [ ] features = new string [ ] { 
    "Core", "XML", "HTML", "Views", "Stylesheets", "CSS", 
    "CSS2", "Events", "UIEvents", "MouseEvents", "MutationEvents", 
    "HTMLEvents", "Range", "Traversal" };

  public static void Main(string[ ] args) {
    XmlImplementation impl = new XmlImplementation( );
        
    foreach (string version in versions) {
      foreach (string feature in features) {
        Console.WriteLine("{0} {1}={2}", feature, version, 
          impl.HasFeature(feature, version));
      }
    }
  }
}
The HasFeature( ) method of the XmlImplementation class returns true if the given feature is implemented. If you run this program with the .NET Framework version 1.0 or 1.1, you'll see the following output:
Core 1.0=False
XML 1.0=True
HTML 1.0=False
Views 1.0=False
Stylesheets 1.0=False
CSS 1.0=False
CSS2 1.0=False
Events 1.0=False
UIEvents 1.0=False
MouseEvents 1.0=False
MutationEvents 1.0=False
HTMLEvents 1.0=False
Range 1.0=False
Traversal 1.0=False
Core 2.0=False
XML 2.0=True
HTML 2.0=False
Views 2.0=False
Stylesheets 2.0=False
CSS 2.0=False
CSS2 2.0=False
Events 2.0=False
UIEvents 2.0=False
MouseEvents 2.0=False
MutationEvents 2.0=False
HTMLEvents 2.0=False
Range 2.0=False
Traversal 2.0=False
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Moving On
Now you've seen how to create a DOM document in memory, how to read one from disk, and how to manipulate one. You've looked at some different ways to manipulate a document once it's in memory, and you've used two XmlDocument instances simultaneously to manage an inventory system.
I also introduced XPath in this chapter. There's a lot more to say on that subject, so in Chapter 6 you'll learn about the System.Xml.XPath assembly.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 6: Navigating XML with XPath
Once you have an XmlDocument in memory, you could choose to navigate through its nodes by using XmlNodeReader to read each node and do some action if it was of the desired type. Or, you could recursively iterate through its child nodes, interrogating each child node's node type and name, until you reach the one you're interested in. Or, you could use XPath.
In this chapter, I'll introduce the XPath specification, the syntax of XPath expressions, and some of its typical uses. Then I'll show you the System.Xml.XPath assembly, and how it allows you to use XPath in your .NET applications. Finally, I'll go through some examples using XPath.
XPath is a specification that allows you to address individual parts of an XML document, originally intended for use in the XSLT transformation language and the XPointer syntax for XML fragment identifiers. However, XPath is quite useful on its own, and is available for standalone use in .NET.
Although XSLT is covered in Chapter 7, XPointer is not implemented in the .NET Framework. Thus, XPointer falls outside of the range of this book. For more information on XPath, XPointer, and their relationship, see John Simpson's XPath & XPointer (O'Reilly).
XPath 1.0 became a formal recommendation of the W3C in November, 1999, although XPath 2.0 is currently a working draft, still evolving as of this writing. The official XPath recommendation is located on the web at http://www.w3.org/TR/xpath.
The essence of XPath is that you can select certain nodes from within an XML document through a simple XPath expression. In addition, XPath allows you to do some simple string, numeric, and Boolean data transformation on selected nodes. XPath expressions take the form of strings with a certain well-known syntax. This syntax is not explicitly XML itself; it is similar to filesystem pathnames and URLs, and this is where XPath gets its name.
In addition to addressing nodes by name, XPath syntax enables pattern matching, so that you can select individual nodes by their attribute or content values.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Is XPath?
XPath is a specification that allows you to address individual parts of an XML document, originally intended for use in the XSLT transformation language and the XPointer syntax for XML fragment identifiers. However, XPath is quite useful on its own, and is available for standalone use in .NET.
Although XSLT is covered in Chapter 7, XPointer is not implemented in the .NET Framework. Thus, XPointer falls outside of the range of this book. For more information on XPath, XPointer, and their relationship, see John Simpson's XPath & XPointer (O'Reilly).
XPath 1.0 became a formal recommendation of the W3C in November, 1999, although XPath 2.0 is currently a working draft, still evolving as of this writing. The official XPath recommendation is located on the web at http://www.w3.org/TR/xpath.
The essence of XPath is that you can select certain nodes from within an XML document through a simple XPath expression. In addition, XPath allows you to do some simple string, numeric, and Boolean data transformation on selected nodes. XPath expressions take the form of strings with a certain well-known syntax. This syntax is not explicitly XML itself; it is similar to filesystem pathnames and URLs, and this is where XPath gets its name.
In addition to addressing nodes by name, XPath syntax enables pattern matching, so that you can select individual nodes by their attribute or content values.
In this section, I'll discuss the structure and syntax of XPath expressions, and some of the functions built in to the specification.
Just like DOM, XPath operates on a tree-based view of an XML document. The XPath tree is built of the same node types used in DOM, except that CDATA sections, entity references, and document type declarations are not directly addressable. Their content is, however; the net result is that you can navigate to a text node's content, but you cannot tell whether that content contains plain text, CDATA, expanded entity references, or some combination thereof. You cannot access document type declarations at all with XPath.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using XPath
The System.Xml.XPath assembly is relatively small, containing only five classes, six enumerations, and one interface. There are two ways to select nodes from an XML document with XPath. The first, which was introduced in Chapter 5, uses the SelectNodes( ) and SelectSingleNode( ) methods of XmlNode. The second way uses the XPathNavigator class, obtained by calling XmlNode.GetNavigator( ) or XPathDocument.GetNavigator( ).
In this section, I'll discuss these methods of using XPath in .NET.
XmlNode defines two methods, with two overloads each, to allow navigation via XPath. SelectSingleNode( ) returns a single XmlNode that matches the given XPath, and SelectNodes( ) returns an XmlNodeList.

Section 6.2.1.1: Selecting a single node

SelectSingleNode( ) returns a single XmlNode that matches the given XPath expression. If more than one node matches the expression, the first one is returned; the definition of "first" depends on the order of the axis used. The context node of the XPath query is set to the XmlNode instance on which the method is invoked.
One overload of SelectSingleNode( ) takes just the XPath expression. The other one takes the XPath expression and an XmlNamespaceManager. The XmlNamespaceManager is used to resolve any prefixes in the XPath expression.
Example 6-2 shows a simple program that selects a single node from an XML document and writes it to the console, with human-readable formatting.
Example 6-2. Program to execute an XPath query on a document
using System;
using System.Xml;
using System.Xml.XPath;

public class XPathQuery {

  public static void Main(string [ ] args) {

    string filename = args[0];
    string xpathExpression = args[1]; 

    XmlDocument document = new XmlDocument( );
    document.Load(filename);

    XmlTextWriter writer = new XmlTextWriter(Console.Out);
    writer.Formatting = Formatting.Indented;

    XmlNode node = document.SelectSingleNode(xpathExpression);
    node.WriteTo(writer);

    writer.Close( );
  }
}
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Moving On