The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Version |
Location |
Description |
Submitted By |
Date submitted |
Date corrected |
PDF |
Page various
various (see description) |
The following regular expressions all include the character class [\s\S] and are listed as JavaScript-only, although they in fact work with all regex flavors covered by the book. The intention was to suggest that they should only be used in JavaScript, because more appropriate alternatives are already listed for other regex flavors. Thus, whether what's currently printed is actually wrong is debatable, but to avoid confusion it is better to expand the regex flavor lists referenced below. Some related text changes are also listed.
Page 34:
----------
Printed:
Regex flavor: JavaScript
Corrected:
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Page 35:
----------
Printed (3rd paragraph):
'.' thus matches any single character except a newline character.
Corrected:
. thus matches any single character except a newline character.
Page 245:
----------
Printed:
Regex flavor: JavaScript
Corrected:
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Page 246 (two fixes):
----------
Printed (1st paragraph):
In this case, the pattern .* (or [\S\s]* in the JavaScript version) is used to simply match the entire subject text with no added constraints.
Corrected:
In this case, the pattern .* (or [\S\s]* in the version that adds JavaScript support) is used to simply match the entire subject text with no added constraints.
Printed (2nd paragraph):
This regex uses the dot matches line breaks option to allow the dots to match all characters, including line breaks. See Recipe 3.4 for details about how to apply this modifier with your programming language. The JavaScript regex is different, since JavaScript does not have a dot matches line breaks option. See Any character including line breaks on page 35 in Recipe 2.4 for more information.
Corrected:
The first regex uses the dot matches line breaks option so that it will work correctly when your subject string contains line breaks. See Recipe 3.4 for details about how to apply this modifier with your programming language. JavaScript doesn't have a dot matches line breaks option, so the second regex uses a character class that matches any character. See Any character including line breaks on page 35 in Recipe 2.4 for more information.
Page 306:
----------
Printed:
Regex flavor: JavaScript
Corrected:
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Page 309 x 2 (two occurrences, both with the same corrected replacement):
----------
Printed:
Regex options: ^ and $ match at line breaks
Regex flavor: JavaScript
Corrected:
Regex options: ^ and $ match at line breaks ("dot matches line breaks" must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Page 429:
----------
Printed:
Regex flavors: JavaScript
Corrected:
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Page 430:
----------
Printed:
Regex flavors: JavaScript
Corrected:
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Page 458:
----------
Printed:
Regex flavor: JavaScript
Corrected:
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Page 460:
----------
Printed:
Regex flavor: JavaScript
Corrected:
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Page 463 x 2 (two occurrences, both with the same corrected replacement):
----------
Printed:
Regex flavor: JavaScript
Corrected:
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
|
Steven Levithan |
Jul 10, 2009 |
|
Printed |
Page ix
3rd paragraph, 1st sentence |
as published:
"... in situations where people with limited with regular expressions experience ..."
corrected:
"... in situations where people with limited regular expressions experience ..."
|
Jeff Roberson |
Jun 28, 2009 |
Aug 01, 2009 |
PDF |
Page 28
3rd paragraph, I think? |
Solution
\a\e\f\n\r\t\v
Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
\x07\x1B\f\n\r\t\v
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Perl does not support the \v vertical tab escape sequence. It must be represented in Perl using a hexadecimal (\x0B) or octal (\013) escape sequence instead.
Note from the Author or Editor: On page 28, in the Solution section, remove Perl from the list of regex flavors for both given solutions. Add a 3rd solution:
\a\e\f\n\r\t\0x0B
Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
On page 29, append this sentence to the first paragraph (which starts with "The ECMA-262 standard..."):
Perl does not support \v, so we have to use a different syntax for the vertical tab in Perl.
In this sentence, \v should be formatted as a regular expression, just as \a and \e are in that sentence.
|
John W. Krahn |
Jun 12, 2009 |
Aug 01, 2009 |
Printed |
Page 45-46
List of Unicode Properites |
Some Unicode propties are missing from the list, these may include <\p{Lu}>, <\p{L&}>, <\p{Lm}>, and <\p{Mc}>.
Note from the Author or Editor: In the "Unicode property or category" list beginning on page 45, the following 3 items should be added:
Insert between \p{Ll} and \p{Lt}:
\p{Lu}
An uppercase letter that has a lowercase variant
Insert between \p{Lt} and \p{Lo}:
\p{Lm}
A special character that is used as a letter
Insert between \p{Mn} and \p{Me}:
\p{Mc}
A character intended to be combined with another character that takes up extra space (vowel signs in many Eastern scripts)
|
Yao G |
Aug 25, 2009 |
|
Printed |
Page 65
2nd and 3rd example |
On page 65 your 2nd and 3rd examples would incorrectly match input
that was not in the form of a hexadecimal number.
For example, the input 9g01h would, incorrectly, succeed in making a
match.
The regex should be \b[a-fA-F0-9]{1,8}h?\b
Note from the Author or Editor: In the Solution section, change both instances of [a-z0-9] to [a-f0-9]
Also, change the second "Hexadecimal number" subheading to "Hexadecimal number with optional suffix" to differentiate it from the first.
|
Anonymous |
Oct 12, 2009 |
|
PDF |
Page 65
Floating-point number section |
The \b at the start of the regular expression under "floating point number" should be deleted. It prevents the regular expression from matching floating point numbers without an integer part, as is required by the problem statement for this recipe.
Note from the Author or Editor: Delete \b at the start of the regular expression under "floating point number"
|
Jan Goyvaerts |
Jul 09, 2009 |
Aug 01, 2009 |
Printed |
Page 67
1st paragraph, 3rd sentence |
as published:
"<(\d\d){3}> matches a string of two, four or six digits."
corrected:
"<(\d\d){3}> matches a string of six digits."
Note from the Author or Editor: In the first paragraph on page 67, change {3} into {1,3}
In the second paragraph, do NOT change the first occurrence of {3} at the start of the paragraph. Change the 2nd and 3rd occurrrences of {3} in the second paragraph into {1,3}
|
Jeff Roberson |
Jun 28, 2009 |
Aug 01, 2009 |
Printed |
Page 76
5th paragraph (first paragraph under "Negative lookaround"), first sentence |
As printed:
"<(?!regex)>, with an explanation point instead of..."
Should be:
"<(?!regex)>, with an exclamation point instead of..."
|
Nick Aldwin |
Jul 20, 2009 |
|
Printed |
Page 78
2nd paragraph, 1st sentence |
as published:
"... character class subtra ction to match ..."
corrected:
"... character class subtraction to match ..."
|
Jeff Roberson |
Jun 28, 2009 |
Aug 01, 2009 |
PDF |
Page 81
Solution and Discussion |
JavaScript does not support conditionals.
In the Solution section, remove JavaScript from the first list of regex flavors (but leave it in the second list). Change "Java and Ruby" and "Java or Ruby" to "Java, JavaScript, and Ruby" and "Java, JavaScript, or Ruby".
In the Discussion section, remove JavaScript from the first paragraph.
|
Jan Goyvaerts |
Apr 05, 2010 |
|
Printed |
Page 96
2nd paragraph |
As printed:
"This chapter covers seven programming languages. Each recipe has separate solutions for all seven programming languages, and many recipes also have separate discussions for all seven languages."
Change to:
"This chapter covers eight programming languages. Each recipe has separate solutions for all eight programming languages, and many recipes also have separate discussions for all eight languages."
|
Yao G. |
Aug 31, 2009 |
|
Printed |
Page 130
1st sentence |
"The Regex() class" should be "The Regex class".
|
Yao G. |
Aug 28, 2009 |
|
Printed |
Page 132
last sentence before 3.7 |
"Follow Recipe 3.7 when partial matches are acceptable."
Should it be changed to "Recipe 3.5" ?
Note from the Author or Editor: The last sentence in recipe 3.6 should be changed to:
"Follow Recipe 3.5 when partial matches are acceptable."
|
Yao G. |
Sep 01, 2009 |
|
PDF |
Page 143
2nd paragraph of "Ruby", 3rd line at the end |
"=~ variable" should be replaced with "$~ variable"
|
Jan Goyvaerts |
Oct 10, 2009 |
|
Printed |
Page 147
Java Section, last sentence |
"Group(n) returns null,..." should be "Group() returns null,...".
Note from the Author or Editor: Change this at the end of the Java section on page 147:
"group(n) returns null, whereas start() and end() both return -1."
into this:
"group(n) returns null, whereas start(n) and end(n) both return -1."
|
Yao G. |
Aug 28, 2009 |
|
Printed |
Page 156
Java section |
In the comment: "Here you can process the match stored in regexMacher" should be "... regexMatcher"
Note from the Author or Editor: Change regexMacher into regexMatcher
|
Hunter Johnson |
Jul 28, 2009 |
|
PDF |
Page 160
1st paragraph |
In JavaScript, string.match(/regexp/) works identically to /regexp/.exec(string). string.match only differs when provided a regex that uses /g.
Printed:
This problem does not exist with string.match() (Recipe 3.10) or string.replace() (Recipe 3.14).
Corrected:
This problem does not exist with string.replace() (Recipe 3.14) or when finding all matches with string.match() (Recipe 3.10).
|
Steven Levithan |
Jul 12, 2009 |
Aug 01, 2009 |
Printed |
Page 168
regex following 2nd paragraph |
as published:
"\d+(?=(?:.(?!<b>))*</b>)"
corrected:
"\d+(?=(?:(?!<b>).)*</b>)"
The as published version works properly for the given test subject ("1 <b>2</b> 3 4 <b>5 6 7</b>"), but does not handle the case where a number is immediately followed by an opening bold tag. If you remove all the spaces from the test subject string ("1<b>2</b>34<b>567</b>"), the as published regex erroneously matches all the numbers both inside and outside the bold tags. In the corrected regex, the dot must follow the negative lookahead, otherwise it will consume the first char of the opening bold tag.
|
Jeff Roberson |
Jun 28, 2009 |
Aug 01, 2009 |
Printed |
Page 168
First footnote |
Because there are two authors, "...they only end up proving my point that..." should be "...they only end up proving our point that..." or "...they only end up proving the authors' point that...".
Note from the Author or Editor: Change "my point" into "our point" in the footnote.
|
Jim.Monty |
Jul 04, 2009 |
Aug 01, 2009 |
Printed |
Page 171
last sentence above code |
As printed:
"..., you should use the Regex object with full exception handling:"
Change to:
"..., you should use the Matcher object with full exception handling:"
|
Yao G. |
Aug 30, 2009 |
|
Printed |
Page 175
Line 4 |
As printed:
"When searching for an array or regular"
Corrected:
"When searching for an array of regular"
|
Yao G. |
Aug 31, 2009 |
|
Printed |
Page 175
last sentence above the "Perl" section |
As printed:
"... to preg_replace."
Corrected:
"... to preg_replace()."
|
Yao G. |
Aug 31, 2009 |
|
Printed |
Page 206-207
1st paragraph on 206, in 3 paragraphs on 207 |
The input string is "I like <b>bold</b> and <i>italic</i> fonts", but in a number of the discussion sections it says:
Simply put, you'll get an array with: I like , <b>, bold, </b>, and , <italic>, and italic</italic> fonts.
It should read:
Simply put, you'll get an array with: I like , <b>, bold, </b>, and , <i>, and italic</i> fonts.
Note from the Author or Editor: In recipe 3.20, which runs from page 203 to page 207, <italic> and </italic> (incorrect HTML tags) should be replaced with <i> and </i> (correct HTML tags for italic).
|
Jared Crookston |
May 05, 2010 |
|
PDF |
Page 210
2nd paragraph? |
Perl
If you have a multiline string, split it into an array of strings first, with each string in
the array holding one line of text:
$lines = split(m/\r?\n/, $subject)
Then, iterate over the $lines array:
foreach $line ($lines) {
if ($line =~ m/regex pattern/) {
# The regex matches $line
} else {
# The regex does not match $line
}
}
In Perl $lines is a scalar variable that can only hold one value. In the case of the split function above $lines will be assigned the number of the items that split returns and the actual data will be assigned to the @_ array. You need to change $lines to @lines for that to work properly.
Note from the Author or Editor: On page 210 in the Perl section, the line:
$lines = split(m/\r?\n/, $subject)
must be changed into:
@lines = split(m/\r?\n/, $subject)
Similarly, the line:
foreach $line ($lines) {
must be changed into:
foreach $line (@lines) {
|
John W. Krahn |
Jun 12, 2009 |
Aug 01, 2009 |
Printed |
Page 210
Python code |
lines = re.split("\r?\n", subject)
reobj = re.compile("regex pattern")
for line in lines:
if re.search(line):
# the regex matches line
else:
# the regex does not match line
This is the corrected version.
The object returned from re.compile() should be used to call search()
lines = re.split("\r?\n", subject)
reobj = re.compile("regex pattern")
for line in lines:
if reobj.search(line):
# the regex matches line
else:
# the regex does not match line
Note from the Author or Editor: In the Python section on page 210, this line:
if re.search(line):
must be changed into:
if reobj.search(line):
|
Tony Cappellini |
Jun 14, 2009 |
Aug 01, 2009 |
Printed |
Page 215
1st paragrah, 1st sentence |
As printed:
"...the part of the domain name after the dot can only consist of letters."
Change to:
"...the part of the domain name after the last (rightmost?) dot can only consist of letters."
Note from the Author or Editor: Change to:
and that the part of the domain name after the last dot can only consist of letters.
|
Yao G. |
Nov 29, 2009 |
|
Printed |
Page 215
All four regexes on this page |
as published:
"... [!#$%&'*+/=?`{|}~^-]+ ..."
corrected:
"... [\w!#$%&'*+/=?`{|}~^-]+ ..."
If an email has a username that has a dot in it, the as-published regex will fail to match the part of the username following the dot if that portion has a word character in it (i.e. "\w"). In other words, the '\w' was erroneously dropped from the character class component of the regex which matches the portion of the username which follows a dot. This same error occurs in all four regexes on this page.
Note from the Author or Editor: In all 4 regular expressions on page 215, the characters [! appear once as a pair. In all 4 regexes [! should be changed into [\w!
|
Jeff Roberson |
Jun 28, 2009 |
Aug 01, 2009 |
Printed |
Page 236
1st paragraph under "Variations", last line |
As printed:
"... the date cannot be ..."
Change to:
"... the time cannot be ..."
|
Yao G. |
Sep 02, 2009 |
|
Printed |
Page 239
last paragraph (p.239), also 1st paragraph (p.240) |
As printed:
"Time, with optional microseconds...".
"microseconds" here sounds like a misinterpretation. I guess any number of digits could be added after a decimal dot or comma to represent a fraction of a second in ISO 8601. So, technically it cannot be referred as either "microseconds" or "milliseconds".
Note from the Author or Editor: Change "microseconds" into "fractional seconds" in two places: at the bottom of page 239 and at the top of page 240.
|
Yao G. |
Sep 02, 2009 |
|
Printed |
Page 239
last sentence above the regexes |
As printed:
"...XML Schema dateTime type:"
Change to:
"...XML Schema time type:"
|
Yao G. |
Sep 02, 2009 |
|
PDF |
Page 244
3rd paragraph; 1st paragraph under "Solution" heading |
The final sentence of the Solution section's first paragraph should use "regular expression" instead of "regular expressions".
Printed:
You can modify the regular expressions to allow any minimum or maximum text length, or allow characters other than A-Z.
Corrected:
You can modify the regular expression to allow any minimum or maximum text length, or allow characters other than A-Z.
|
Steven Levithan |
Jul 09, 2009 |
|
PDF |
Page 249
Code listing under the heading "PHP (PCRE)" |
The PHP source code example uses ^ as the start-of-string anchor along with \z as the end-of string anchor. Although this works perfectly fine (since the /m modifier is not used), it would be better to use \A as the start of string anchor for consistency with the prior regex listings.
Printed:
if (preg_match('/^(?>(?>\r\n?|\n)?[^\r\n]*){0,5}\z/', $_POST['subject'])) {
Corrected:
if (preg_match('/\A(?>(?>\r\n?|\n)?[^\r\n]*){0,5}\z/', $_POST['subject'])) {
|
Steven Levithan |
Jul 09, 2009 |
Aug 01, 2009 |
PDF |
Page 275
Definition list under the heading "Validate the number" |
There is a mistake in the description of the Discover card format. However, the included regexes are correct.
----------
Printed:
Discover
16 digits, starting with 6011, or 15 digits starting with 5.
Corrected:
Discover
16 digits, starting with 6011 or 65.
----------
The numbers 6011 and 65 should use a fixed-width font. The number 16 at the beginning of the corrected sentence should not.
-- Reported by Vikas Shukla at http://referencedesigner.com/blog/?p=328.
|
Steven Levithan |
Jul 09, 2009 |
Aug 01, 2009 |
Printed |
Page 339
2nd paragraph, 2nd line |
As printed:
"... 16 hexadecimal decimal digits."
Change to:
"... 16 hexadecimal digits."
|
Yao G. |
Oct 01, 2009 |
|
Printed |
Page 366
1st paragraph |
as published:
"you want to extract jan from http://jan@www.regexcookbook.com"
According to RFC1738, usernames are not allowed in the http scheme. A better example would be to use the ftp scheme, which does allow the username:password component in a URL.
Note from the Author or Editor: Change http://jan@www.regexcookbook.com into ftp://jan@www.regexcookbook.com
|
Jeff Roberson |
Jun 28, 2009 |
Aug 01, 2009 |
PDF |
Page 369
Paragraph before "see also" |
Change this:
"including those that don't specify the user"
to:
"including those that don't specify the host"
|
Jan Goyvaerts |
Dec 08, 2009 |
|
PDF |
Page 370
Second subheading under Solution |
Change this:
"Extract the host while validating the URL"
into:
"Extract the port while validating the URL"
|
Jan Goyvaerts |
Dec 08, 2009 |
|
Printed |
Page 371
4th paragraph |
As printed:
"Since we want to extract the host, we can exclude URLs that don?t specify an authority."
Change to:
"Since we want to extract the port number, we can exclude URLs that don?t specify a port number."
|
Yao G. |
Nov 29, 2009 |
|
PDF |
Page 454
Multiple paragraphs |
Page 454, first regex:
Change:
^(?:[^>"']|"[^"]*"|'[^']*')+?\sclass\s*=\s*("[^"]*"|'[^']*')
To:
^(?:[^>"']|"[^"]*"|'[^']*')+?\sclass\s*=\s*(?:"([^"]*)"|'([^']*)')
Page 454, second paragraph:
Change:
This captures the entire class value and its surrounding quote marks to backreference 1.
To:
This captures the entire class value to backreference 1 or 2, depending on the type of quote marks surrounding the value.
Page 454, fourth paragraph:
Change:
Finally, if both of the previous regexes matched successfully, you?ll want to search within backreference 1 of the second regex?s matches using the following pattern:
To:
Finally, if both of the previous regexes matched successfully, you?ll want to search within backreferences 1 and 2 of the second regex?s matches using the following pattern:
Note from the Author or Editor: I got this submission backwards--I included the fix in the description, and have the details here. The error is that the previous version of the regex in question included the surrounding quote marks in the backreference, but the followup regex to search within the backreference for a class did not account for the quote marks.
|
Steven Levithan |
Oct 27, 2009 |
|