Errata

Errata for Natural Language Processing with Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date Submitted
Printed	Page 41 output from code fragement at top of page	When I run the code fragment, I get different results in the second column of the output (avg. number of words per sentence), for each corpus. The other columns match the book. Has the actual data in the corpora changed, or has the code in NLTK changed? I tried running this under both Python 2.7 and Python 3.6 and got the same results. I also checked the PDF version of the 2nd. edition, and it seems to have the same error (if it is an error) as the print version. Here are my results: 4 24 26 austen-emma.txt 4 26 16 austen-persuasion.txt 4 28 22 austen-sense.txt 4 33 79 bible-kjv.txt 4 19 5 blake-poems.txt 4 19 14 bryant-stories.txt 4 17 12 burgess-busterbrown.txt 4 20 12 carroll-alice.txt 4 20 11 chesterton-ball.txt 4 22 11 chesterton-brown.txt 4 18 10 chesterton-thursday.txt 4 20 24 edgeworth-parents.txt 4 25 15 melville-moby_dick.txt 4 52 10 milton-paradise.txt 4 11 8 shakespeare-caesar.txt 4 12 7 shakespeare-hamlet.txt 4 12 6 shakespeare-macbeth.txt 4 36 12 whitman-leaves.txt	Anonymous	Mar 04, 2018
Printed	Page 108 errors produced in both Stemmers example and Lemmatization example	From the stemmers example, I receive this error. It may not be related to the code in the book, but I am not sure how to work around it. I am working with the 1st edition, but have modified the ldisplay and rdisplay lines according to how they appear at http://www.nltk.org/book/ch03.html Traceback (most recent call last): # ... line 1, in <module> # ... line 13, in concordance # "File ... util.py", line 242, in __getitem__ # return self._cache[2][start-offset:stop-offset] # TypeError: slice indices must be integers or None or have an __index__ method From the lemmatization example, I receive this error: NameError: name 'tokens' is not defined # same code online. tokens = word_tokenize(wnl) does not work. The online resource I cited earlier shows the same code. Indeed, tokens is never defined in the two lines of code. I assume maybe in an earlier version it was defined as part of WordNetLemmatizer() but it may not be now.	Anonymous	Jan 08, 2017
Printed	Page 309 1st paragraph of body text.	"we know that we can enter NP in cell (0,2)" should be "we know that we can enter NP in cell (2,4).	Anonymous	Oct 12, 2015
PDF	Page 20 2nd example	Output has changed >>> bigrams(['more', 'is', 'said', 'than', 'done']) <generator object bigrams at 0x7f131dc96960> >>> nltk.__version__ '3.0.0'	Anonymous	Oct 18, 2014
PDF	Page 7 1th example, top of page	>>> text3.generate() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Text' object has no attribute 'generate' It is a serious mistake from the author, it is probably due to an upgraded nltk library >>> nltk.__version__ '3.0.0'	Anonymous	Oct 18, 2014
PDF	Page Cover, 3, 6, 18, 31, 46, 49, 50, 52, 60, 61, 62, 69, 86, 91, 94, 98, 99, Various locations	In the PDF version there are major graphical artifacts on the cover pages (page 1 and 2 of the PDF). This also occurs in Figures 1-1, 1-2, 1-4, 1-5, 2-1, 2-2, 2-3, 2-4, 2-5, 2-6, 2-7, 2-8, 3-1, 3-2, 3-3, 3-4, 3-5 and many more. Note that these don't appear in the ePub file, so it seems to be something in how the PDF was processed/created. In some cases, it makes the figures hard to read properly.	Anonymous	May 03, 2013
Printed, Other Digital Version	Page 140 second code block	The procedural sort code raises an IndexError in this line: if j == 0 or tokens[i] != word_list[j]: It should be changed like so: if j == 0 or tokens[i] != word_list[j-1]:	Alex Bendig	Oct 18, 2012
PDF	Page 6 Top of Page	Using the 2.7 python stack (with all the appropriate libraries) entering the code: >>> text3.generate() Produces the following error: Building ngram index... Traceback (most recent call last): File "<pyshell#5>", line 1, in <module> text3.generate() File "C:\Python27\lib\site-packages\nltk\text.py", line 382, in generate self._trigram_model = NgramModel(3, self, estimator) File "C:\Python27\lib\site-packages\nltk\model\ngram.py", line 81, in __init__ assert(isinstance(pad_left, bool)) AssertionError This has been reproduced on several windows 7 boxes running 2.7	John Ellenberger	Sep 15, 2012
Printed	Page 49 top of page	Figure 2.2 on page 49 cannot be generated with the source displayed on page 48. The generated graph actually shows frequency, not percentage. This is currently wrong in the online version as well: http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html#fig-word-len-dist	Alex Bendig	Jul 06, 2012
Printed	Page 30 1st paragraph	The books says the first translation from English to German is correct. That is wrong. A correct translation would be: "wie lange vor dem naechsten Flug nach Alice Strings?" folgenden -> naechsten and zu -> nach. I'm a native german speaker.	Robert Schadek	Apr 16, 2012
PDF	Page 21 after "Counting Other Things" Example	The text says "But there are only 20 distinct items being counted" when in fact, there are only 19 items since words with length of 19 is missing.	Clarence Huang	Feb 09, 2012
Printed	Page 41 8th line from bottom	macbeth_sentences[1037] is shown as returning the the sentence "Double, double, toile, and trouble;...." but this sentence is actually at macbeth_sentences[1117]	Richard Westmore	Jan 08, 2012
Printed	Page 41 8th line from bottom	macbeth_sentences[1037] is shown as returning the the sentence "Double, double, toile, and trouble;...." but this sentence is actually at macbeth_sentences[1117]	Richard Westmore	Jan 08, 2012
PDF	Page 125 4th and 5th lines	>>> from test import msg >>> msg should be changed to: >>> from test import monty >>> monty	mg6t	Sep 03, 2011
PDF	Page 241 in the confusion matrix	The code in this example does not print the confusion matrix. In order to print out the confusion matrix in percents as shown in the book the following line should be added. >>> print cm.pp(show_percents=True, sort_by_count=True) Also, in the print out NS \| 1.5% should be NNS \| 1.5%	mg6t	Aug 29, 2011
, Printed	Page 79 last line but one	IDLE 2.6.5 >>> from _future_ import division Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> from _future_ import division ImportError: No module named _future_	Philip KEEVIL	Apr 25, 2011
Printed	Page 6 Graph	When calling the dispersion_plot function the first time it works, the second time nothing is redrawn, the code just executes but nothing is displayed. Using Python 2.6.6. on Ubuntu 10.10	Marco Riggi	Jan 26, 2011
PDF	Page 29 Generating Language Output 2nd paragraph	"ils if the thieves are sold, and elle if the paintings are sold." This is not exactly wrong, still: "sold" should be replaced by "found" as "found" is used in the subsequent example and "selling" thieves is well...a little odd ;)	Maximilian Scherr	Jan 15, 2011
PDF	Page 338-340, 342 Diagrams (21), (22), (23), (24), (27b, c)	In the PDF version, Diagrams (21), (22), (23), (24), (27b, c) (in Section 9.2) are all absent when displayed in Preview.app (Mac OS X Snow Leopard). (But they do appear in Acrobat Reader, so perhaps this is a bug report that should be submitted to Apple?)	Sandy Nicholson	Dec 17, 2010