Perl & XML by Erik T. Ray, Jason McIntosh The unconfirmed error reports are from readers. They have not yet been approved or disproved by the author or editor and represent solely the opinion of the reader. Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem ?page-number?: reader question or request for clarification This page was updated November 16, 2004. UNCONFIRMED errors and comments from readers: [22] 4th paragraph; (1) "... known as UTF-8. This encoding uses between one and six bytes ..." In Nov. 2003, RFC 3629 changed this to between one and four bytes. (2) "... if that address is greater than 255." Should be "if that address is greater than 127." (3) "It's possible to write an entire document in 1-byte characters and have it be indistinguishable from ISO Latin-1 ..." Only if the characters chosen are lower than 0x7f (or 127 decimal). For more details, see http://en.wikipedia.org/wiki/UTF-8 and http://en.wikipedia.org/wiki/Latin-1 {26} 1st ELEMENT example; Unbalanced parenthesis, probably want: tags in this example incorrectly end with [64-65] last paragraph; (1) "... up to a maximum of six bytes." As of Nov. 2003, this was reduced to a maximum of four bytes. (2) "Several centuries from now, [[page 65]] after Earth begrudgingly joins the Galactic Friendship Union ... bytes four through six will come in quite handy." No longer applies, since bytes 5 and 6 are no longer legal for UTF-8. {92} Bottom code example, second line; I believe SUPER::start_document(); works. {97} top; the line: use XML::Parser::PerlSax is duplicated from the bottom of page 96 {97} sub_characters; $self->in_element( 'h1' ) should read $self->within_element( 'h1' ) since in_element tests for the current innermost element, which in the critical case of "big" (in the example) is "em", whereas what is required is to test whether we are at any level within an h1. Incidentally, despite an existing erratum, I find that the line also works without the quotes around h1: print $data if( $self->within_element( h1 )); [111] Example 5-9; In sub end_element, print "<", $data->{Name}, ">\n"; should be changed to print "{Name}, ">\n"; (This explains the erroneous output shown at the top of page 109, previously submitted as an errata by another reader.) (143) middle page; "acestor's" is misspelled (ancestor's) {144} sub is_element; return value == 1 s.b. ELEMENT_NODE? There is no discussion of the constants included w/ the xml packages, so the actual name is a guess. {144} Example 8-2; The example will not work without including the XML::DOMIterator package. Assuming that a reader uses the downloadable examples, adding a use ex01_iter; line at the top of the program will work. This also requires that 1; be added to the very end of the ex01_iter.pm package. {145} Example 8-3; Running example 8-3 returns the following message: this function is obsolete! It was disabled in version 1.54 My version of XML::LibXML is 1.58, the most current at the time of this submission. Further investigation has found that iterator() is the offending function and has been deprecated. (150) last line on the page, the result of the example. ; >grabber.pl data.xml "//*[@id='104']/parent::*/child::*[2]/name[not(@style='latin')]/node()" Poison Sumac The result of this example should be Speckled Alder, not Poison Sumac This error also occurs on page 151, 1st paragraph, last line. Specifying @id='222' will produce a result of Poison Sumac. (153) Section: The Origin of XSLT; "XSLT stands for XML Style Language: Transformations. The name means that it's a component of the XML Style Language (XSL), assigned to handle the task of converting input XML into a special format called XSL-FO..." should be changed to "XSLT stands for Extensible Stylesheet Language Transformations. The name means that it's a component of the XML Stylesheet Language (XSL), assigned the task of expressing style sheets." {155} last code segment; The recurring s/amp/amp error - wouldn't 'tr' be a better way of doing this? (164) first line; You shouldn't modify 'unique', RSS is either not unique or it is, 'not very unique' could be not uncommon, not unusual, not surprising etc. (165) 3 code comment line; printed not pronted (169) First paragraph; From the text: "This example isn't very interesting, but it looks good in print.... (modulo the whitespace that YAWriter inserted to make things more human-readable)." A test of this example script yielded XML without added whitespace. In order to get the desired effect, I had to add the following lines after the YAWriter constructor: $ya->{'Pretty'} = { PrettyWhiteIndent => 1, PrettyWhiteNewline => 1}; This works with v0.23 of XML::Handler::YAWriter. (169) Third paragraph; From the text: "CPAN has many of these modules to choose from, DBD::MySQL,..." could be changed to "CPAN has many of these modules to choose from, including DBD::MySQL, ..." The current verbiage could be interpreted as stating that there are only three DBD modules available. Thankfully, this is not the case. {175} XML Example; should be Also, Should probably be As originally written, the XML causes the following error when processed using the example at the bottom of page 183: Undefined namespace prefix xmlXPathCompiledEval: evaluation failed {179} 1st code example; my $parser = XML::ComicsML::Parser->new; should it not be? my $parser = XML::ComicsML->new; (180) sub rebless method; Comment should read # Define a hash ... {184} Code example; The line: print "\n"; should precede the final line of code: print end_html; {184} code at top of page; I think the line my $parser = XML::XPath; ought to be my $parser = new XML::LibXML; or my $parser = XML::LibXML->new; It doesn't look like XML::Xpath is used anywhere, but if it were, it would still need a "new". ***SAFARI ONLINE*** {1} 2.6 Unicode, Character Sets, and Encodings; Sorry, I'm reading it online, so I don't know the page number, but here's the URL: http://safari.oreilly.com/main.asp?bookname=9780596002053&snode=24 "It's possible to write an entire document in 1-byte characters and have it be indistinguishable from ISO Latin-1 (a humble address block with addresses ranging from 0 to 255)" I don't think that's true. While lower ASCII is identical with UTF8, higher ASCII (128-255) is not. Just to let you know ... {snode 56} Example 5-7; Missing semicolon on line 2 of the script. Currently: use XML::Handler::Subs Should be: use XML::Handler::Subs; [snode 56] Example 5-7; Start document subroutine dies with the error "Undefined subroutine &SUPER::start_document called at safari.pl line 26.". Currently: sub start_document { SUPER::start_document( ); print "Summary of file:\n"; } Should read: sub start_document { $self = shift; $self->SUPER::start_document( ); print "Summary of file:\n"; }