Errata

Natural Language Processing with Python

Errata for Natural Language Processing with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date Submitted
Printed Page 41
output from code fragement at top of page

When I run the code fragment, I get different results in the second column of the output (avg. number of words per sentence), for each corpus. The other columns match the book. Has the actual data in the corpora changed, or has the code in NLTK changed? I tried running this under both Python 2.7 and Python 3.6 and got the same results. I also checked the PDF version of the 2nd. edition, and it seems to have the same error (if it is an error) as the print version. Here are my results:

4 24 26 austen-emma.txt
4 26 16 austen-persuasion.txt
4 28 22 austen-sense.txt
4 33 79 bible-kjv.txt
4 19 5 blake-poems.txt
4 19 14 bryant-stories.txt
4 17 12 burgess-busterbrown.txt
4 20 12 carroll-alice.txt
4 20 11 chesterton-ball.txt
4 22 11 chesterton-brown.txt
4 18 10 chesterton-thursday.txt
4 20 24 edgeworth-parents.txt
4 25 15 melville-moby_dick.txt
4 52 10 milton-paradise.txt
4 11 8 shakespeare-caesar.txt
4 12 7 shakespeare-hamlet.txt
4 12 6 shakespeare-macbeth.txt
4 36 12 whitman-leaves.txt

Anonymous  Mar 04, 2018 
Printed Page 108
errors produced in both Stemmers example and Lemmatization example

From the stemmers example, I receive this error. It may not be related to the code in the book, but I am not sure how to work around it. I am working with the 1st edition, but have modified the ldisplay and rdisplay lines according to how they appear at http://www.nltk.org/book/ch03.html

Traceback (most recent call last):
# ... line 1, in <module>
# ... line 13, in concordance
# "File ... util.py", line 242, in __getitem__
# return self._cache[2][start-offset:stop-offset]
# TypeError: slice indices must be integers or None or have an __index__ method


From the lemmatization example, I receive this error:
NameError: name 'tokens' is not defined
# same code online. tokens = word_tokenize(wnl) does not work.

The online resource I cited earlier shows the same code. Indeed, tokens is never defined in the two lines of code. I assume maybe in an earlier version it was defined as part of WordNetLemmatizer() but it may not be now.

Anonymous  Jan 08, 2017 
Printed Page 309
1st paragraph of body text.

"we know that we can enter NP in cell (0,2)" should be "we know that we can enter NP in cell (2,4).

Anonymous  Oct 12, 2015 
PDF Page 20
2nd example

Output has changed

>>> bigrams(['more', 'is', 'said', 'than', 'done'])
<generator object bigrams at 0x7f131dc96960>


>>> nltk.__version__
'3.0.0'

Anonymous  Oct 18, 2014 
PDF Page 7
1th example, top of page

>>> text3.generate()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Text' object has no attribute 'generate'

It is a serious mistake from the author, it is probably due to an upgraded nltk library

>>> nltk.__version__
'3.0.0'

Anonymous  Oct 18, 2014 
PDF Page Cover, 3, 6, 18, 31, 46, 49, 50, 52, 60, 61, 62, 69, 86, 91, 94, 98, 99,
Various locations

In the PDF version there are major graphical artifacts on the cover pages (page 1 and 2 of the PDF). This also occurs in Figures 1-1, 1-2, 1-4, 1-5, 2-1, 2-2, 2-3, 2-4, 2-5, 2-6, 2-7, 2-8, 3-1, 3-2, 3-3, 3-4, 3-5 and many more. Note that these don't appear in the ePub file, so it seems to be something in how the PDF was processed/created. In some cases, it makes the figures hard to read properly.

Anonymous  May 03, 2013 
Printed, Other Digital Version Page 140
second code block

The procedural sort code raises an IndexError in this line:

if j == 0 or tokens[i] != word_list[j]:

It should be changed like so:

if j == 0 or tokens[i] != word_list[j-1]:

Alex Bendig  Oct 18, 2012 
PDF Page 6
Top of Page

Using the 2.7 python stack (with all the appropriate libraries) entering the code:

>>> text3.generate()

Produces the following error:

Building ngram index...

Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
text3.generate()
File "C:\Python27\lib\site-packages\nltk\text.py", line 382, in generate
self._trigram_model = NgramModel(3, self, estimator)
File "C:\Python27\lib\site-packages\nltk\model\ngram.py", line 81, in __init__
assert(isinstance(pad_left, bool))
AssertionError

This has been reproduced on several windows 7 boxes running 2.7

John Ellenberger  Sep 15, 2012 
Printed Page 49
top of page

Figure 2.2 on page 49 cannot be generated with the source displayed on page 48. The generated graph actually shows frequency, not percentage.

This is currently wrong in the online version as well: http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html#fig-word-len-dist

Alex Bendig  Jul 06, 2012 
Printed Page 30
1st paragraph

The books says the first translation from English to German is correct. That is wrong. A correct translation would be: "wie lange vor dem naechsten Flug nach Alice Strings?" folgenden -> naechsten and zu -> nach.

I'm a native german speaker.

Robert Schadek  Apr 16, 2012 
PDF Page 21
after "Counting Other Things" Example

The text says "But there are only 20 distinct items being counted" when in fact, there are only 19 items since words with length of 19 is missing.

Clarence Huang  Feb 09, 2012 
Printed Page 41
8th line from bottom

macbeth_sentences[1037]
is shown as returning the the sentence
"Double, double, toile, and trouble;...."

but this sentence is actually at
macbeth_sentences[1117]

Richard Westmore  Jan 08, 2012 
Printed Page 41
8th line from bottom

macbeth_sentences[1037]
is shown as returning the the sentence
"Double, double, toile, and trouble;...."

but this sentence is actually at
macbeth_sentences[1117]

Richard Westmore  Jan 08, 2012 
PDF Page 125
4th and 5th lines

>>> from test import msg
>>> msg

should be changed to:

>>> from test import monty
>>> monty

mg6t  Sep 03, 2011 
PDF Page 241
in the confusion matrix

The code in this example does not print the confusion matrix.
In order to print out the confusion matrix in percents as shown in the book
the following line should be added.

>>> print cm.pp(show_percents=True, sort_by_count=True)

Also, in the print out
NS | 1.5%
should be
NNS | 1.5%

mg6t  Aug 29, 2011 
, Printed Page 79
last line but one

IDLE 2.6.5
>>> from _future_ import division

Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
from _future_ import division
ImportError: No module named _future_

Philip KEEVIL  Apr 25, 2011 
Printed Page 6
Graph

When calling the dispersion_plot function the first time it works, the second time nothing is redrawn, the code just executes but nothing is displayed.

Using Python 2.6.6. on Ubuntu 10.10

Marco Riggi  Jan 26, 2011 
PDF Page 29
Generating Language Output 2nd paragraph

"ils if the thieves are sold, and elle if the paintings are sold."

This is not exactly wrong, still:
"sold" should be replaced by "found" as "found" is used in the subsequent example and "selling" thieves is well...a little odd ;)

Maximilian Scherr  Jan 15, 2011 
PDF Page 338-340, 342
Diagrams (21), (22), (23), (24), (27b, c)

In the PDF version, Diagrams (21), (22), (23), (24), (27b, c) (in Section 9.2) are all absent when displayed in Preview.app (Mac OS X Snow Leopard). (But they do appear in Acrobat Reader, so perhaps this is a bug report that should be submitted to Apple?)

Sandy Nicholson  Dec 17, 2010