Mastering Regular Expressions, 2nd Edition by Jeffrey E. F. Friedl The following errata were *corrected* in the 5/03 reprint: Here's a key to the markup: [page-number]: serious technical mistake {page-number}: minor technical mistake : important language/formatting problem (page-number): language change or minor formatting problem p 21 AUTHOR: footnote; Apparently, there are a number of other egreps that exhibit this bug, including those on Solaris and SGI's Irix. (Perhaps they are derived from GNU's?) Due to this situation, the following footnote HAS BEEN ADDED to page 21: "Be aware that some version of egrep, including the popular GNU version, have a bug with the -i option such that it doesn't apply to backreferences. Thus it finds 'the the' but not 'The the.' p70, IN PRINT: second para from the bottom; "...is quote complex, so do to it exactly is... " NOW READS: "...is quite complex, so to do it exactly is... " p73, 2nd para At ``/x turns whitespace into an "ignore me" metacharacter, and # into an "ignore me, and everything else up to the next newline" metacharacter'', a cross-reference to page 110 HAS BEEN ADDED. p75, IN PRINT: 3rd line; "... include an ending .,?! as part... " NOW READS: "... include an ending [.,?!] as part... " p76, IN PRINT: middle of 2nd-to-last paragraph; "or even as subexpression of some other regex " NOW READS: "or even as a subexpression of some other regex " p76, IN PRINT: last paragraph; "...with Java packages an .NET discussed... " NOW READS:: "...with Java packages and .NET discussed... " p87, IN PRINT: second line from the bottom; "Thy are not regex-specific concept, ... " NOW READS: "They are not a regex-specific concept, ... " p89, IN PRINT: 2nd-to-last para; "Perl 4 was release half a year later, ... " NOW READS: "Perl 4 was release a year and a half later, ... " p108, IN PRINT: end of 3rd para; (U+0048 U+U007A). NOW READS: (U+0048 U+007A). p110, IN PRINT: mid 2nd para; "... combining sequence (=> 107), U+006A and U+030C. " NOW READS: "... combining sequence (=> 107), U+004A and U+030C. " p129, IN PRINT: third line; "..."match all" function, The failure that... " NOW READS: "..."match all" function, the failure that... " p129, IN PRINT: second-to-last line; "bumps along to the next character when it... " NOW READS: "bumps along to the next character (xref to p148) when it... " p130, IN PRINT: last line with a regex; "my ($badstuff) = $html =~ m/\G(.{1,12})/g; " NOW READS: "my ($badstuff) = $html =~ m/\G(.{1,12})/; " p136, Table 3-14 The table should note: "GNU Emacs supports, within the replacement string, \& to represent the text of the entire match, and \1, \2, etc. for parenthesized submatches. GNU awk supports & to represent the text of the entire match in replacement strings. Additionally, the gensub function supports \0 (as well as &) in the replacement string, along with \1, \2, etc, for parenthesized submatches. For GNU sed (and any sed), the entire match is &, not \& (as was written). And for & and \1, etc., they also work within the regex itself (and not only the replacement sting). table, 2nd row The table should note that GNU Emacs supports, within the replacement string, \& to represent the text of the entire match, and \1, \2, etc. for parenthesized submatches. To indicate the above-mentioned information, the following notes HAVE BEEN ADDED to Table 3-14, as indicated: GNU Emacs: Column 1-"(\& within replacement string)"; Column 2-"(\1 within replacement string)" GNU awk: Column 1-"(\& within replacement string)"; Column 2-"(within gensub replacement)" GNU sed: Column 1-"(in replacement string only)"; Column 2-"(in replacement and regex only)" p197, mid-page "You need a 2\"3\" photo." NOW READS: "You need a 2\"x3\" photo." The same change occurs in the boxed text one paragraph below, as well. p212 mid-page while (m/(\d\d\d\d\d)/) { NOW READS: while (m/(\d\d\d\d\d)/g) { p215, end of 3rd para That's why there's one empty match between each valid match (and although not shown, there's an empty match at the end). NOW READS: That's why there's one empty match between each valid match, and one more empty match before each quoted field (and although not shown, there's an empty match at the end). p217, last para Unfortunately, as the section in the previous chapter (=> 132) explains NOW READS: Unfortunately, as the section in Chapter 3 (=> 132) explains p223, Figure 6-1 The grayed doublequote in the lower-right quadrant of the figure NO LONGER APPEARS with a grayed background. p238, Ruby example takes .3f seconds NOW READS: takes %.3f seconds (This correction appears on each of the two print statements in the example.) p240, Tcl example takes .3f seconds NOW READS: takes %.3f seconds (This mistake appears on each of the two puts statements in the example.) p247, 2nd-to-last para Uses of start, plus, and friends... NOW READS: Uses of star, plus, and friends... p256, 1st para: ``^abc|^123 and ^(?:abc|123) are logically the same expression,'' NOW READS: ``^(?:abc|123) and ^abc|^123 are logically the same expression,'' p258, end of second-to-last paragraph (Tcl, of course, can... NOW READS: (PCRE can do it if the optional pcre_study function is called, and Tcl, of course, can... p290, first para of last bullet ...such as in \F or \&. NOW READS: ...such as in \F or \H. p292, first line ... is parsed like to a "regex-aware" NOW READS: ... is parsed like a "regex-aware" p302, para about $^R ... constructs (=> 327), and has no value outside of a regex. It is the... NOW READS: ... constructs (=> 327). It is the... p311, first large paragraph ... in the scalar context provided by the while conditional, ... NOW READS: ... in the scalar context provided by the if conditional, ... p311, 2nd-to-last code snippet my @nums = $text =~ m/\d+(?:.\d+)?|\.\d+/g; NOW READS: my @nums = $text =~ m/\d+(?:\.\d+)?|\.\d+/g; p326, code snippets In the first element of the array (in each code snippet), the space after NOW APPEARS shown with the light gray "visual space" dot. p326, footnote if split is used in a scalar context, NOW READS: if split is used in a scalar or void context, p344, 3rd line in the sidebar on page => 130). NOW READS: in the sidebar on page 130). p344, 2nd para m/($MyStuff)*+// NOW READS: m/($MyStuff)*+/ p356, last para ... rather than the 50,000 or so reasonbly sized lines. NOW READS: ... rather than the 130,000 or so reasonbly sized lines. I'd updated the data about the test from the first edition, but neglected to update this reference. You can see how much Perl's source has grown in the last five years. p356, last line you to tweak your code for better efficiently. NOW READS: you to tweak your code for better efficiency. p358, first para The easiest is perhaps to use... NOW READS: The easiest, if your perl binary has been compiled with the -DDEBUGGING option, is to use... p361, 3rd para ... turn it back off with no re 'debug';, but it turns of automatically at the end of the block or file in which the use is placed. NOW READS: ... turn it back off with no re 'debug';. p364, third paragraph Finally, I as I mentioned... NOW READS: Finally, as I mentioned... p364, fourth paragraph ..., but it very close, ... NOW READS: ..., but it is very close, ... p382, 3rd line of last table ^ matches at the beginning of line only NOW READS: ^ matches at the beginning of string only p386, bottom code snippet All instances of the variable "first" HAVE BEEN CHANGED to the variable "second" p398, bottom code snippet All instances of the variable "first" HAVE BEEN CHANGED to the variable "second" p394, just above second "substitute" header s/\\b([A-Z])([A-Z]+)/... NOW READS: s/\\b([A-Z])([A-Z]+)\\b/... p422, result output (Is Right-To-Left: False) (twice) NOW READS: Is Right-To-Left: False p427, 2nd para Given a string, Match.Escape(...) returns... NOW READS: Given a string, Regex.Escape(...) returns... p432, 3rd para The two references to Group(0) NOW READ Groups(0).