Errata
Submit your own errata for this product.
The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question
| Version | Location | Description | Submitted By |
|---|---|---|---|
| Printed | Page xix 3rd code snippet |
print [v*10 for v in l1 if if v1>4] |
Anonymous |
| Printed | Page xix 2nd to last paragraph |
move -> movie |
Anonymous |
| Printed | Page xviii Under List Comprehensions |
print [v*10 for v in l1 if v1 > 4] |
Anonymous |
| Printed | Page xvii Third example under the List and dictionary constructors heading |
The example list is given as: |
Anonymous |
| Printed | Page xx line 13 |
v1 should be v |
Anonymous |
| Printed | Page xvii 9th Paragraph |
Under the Heading "Overview of the Chapters", sub-heading "Chapter 2..." The final clause of the paragraph contains the word "move" which should be "movie". |
Anonymous |
| Printed | Page xviii List comprehensions |
[xviii] Python Tips: List comprehensions; |
Ryan |
| Safari Books Online | NA Section 6.4.1 |
c1 = classifier.classifier() |
Anonymous |
| Safari Books Online | NA Section 6.6.1 |
I quote: |
Anonymous |
| Safari Books Online | NA Section 6.7.1 |
Persisting the classifier using SQLite is a good idea, but its implementation is terribly naive for the simple reason that primary keys for the tables are not specified. This leads to simply atrocious performance for anything but the most trivial of applications. Makes me think that the author wrote pretty much untested Python code for the book. |
Anonymous |
| Safari Books Online | PCI_code.zip addlinkref function |
In the addlinkref function, the call to the separatewords function appears as separateWords rather than in all lowercase, as it defined within the text of the book. Mixing the two 'spellings' causes an error. |
Marisano James |
| Safari Books Online | PCI_code.zip chapter11, gridgame function, about halfway through |
# Board wraps |
Marisano James |
| Printed | Page 6 "National security" section (3rd from last paragraph) |
"... and the analysis of this data requires ..." |
Marisano James |
| Safari Books Online | 8.1 Building a Sample Dataset |
Text says: |
Anonymous |
| Safari Books Online | 8.1 Building a Sample Dataset |
The wineprice function in text does not match online source. |
Anonymous |
| Safari Books Online | 8.1 Building a Sample Dataset |
The noise calculation in the wineset1 function in the text does not match the online source. |
Anonymous |
| Safari Books Online | 8.3.3 Gaussian Function definition |
The default value for sigma is printed as 10.0, but this should be 1.0 in order to match the graph and the printed results. |
Anonymous |
| Safari Books Online | 8.4 Cross-Validation |
Typo in text and difference between crossvalidate function source in text and online source. |
Anonymous |
| Safari Books Online | 8.5 Heterogeneous Variables |
Difference in bottle size between source code and text: |
Anonymous |
| Safari Books Online | 8.6 Optimizing the Scale |
Two differences between text and online source: |
Anonymous |
| Safari Books Online | 8.6 Optimizing the Scale |
The text refers to the geneticoptimize function, but this function is not defined in optimization.py (online). |
Anonymous |
| Safari Books Online | 8.7 Uneven Distributions |
There is no createhiddendataset function but there is a wineset3 function: |
Anonymous |
| Safari Books Online | 8.7.2 Graphing the Probabilities |
The input vector in the example below is wacky: |
Anonymous |
| Safari Books Online | 8.7.2 Graphing the Probabilities |
The input vector and the high value are wacky: |
Anonymous |
| Safari Books Online | 8.9 When to Use k-Nearest Neighbors |
Typo: s/observation/observations/ |
Anonymous |
| Printed | Page 10 after 3rd and 4th paragraph |
The examples read: |
paulo |
| Printed | Page 11 2nd code block |
reload(recommendations) |
Anonymous |
| Printed | Page 11 last line of code example |
The last line of the sim_distance function: |
Anonymous |
| Printed | Page 11 1st example function |
The 'confirmed' errata for this mistake was incorrect. |
Anonymous |
| Other Digital Version | 11 refer to the online error report 'Changes made in the 3/08 printing' |
In the error fix code on your page called: |
Anonymous |
| Printed | Page 11 Result of execution of code example (sim_dstance btw 'Lisa Rose' and 'Gene Seymour') |
Book gives sim_distance between Lisa and Gene of 0.148148148148. My result was 0.29429805508554946; verified with calculator. |
Justin Middleton |
| Printed | Page 11 last line of code AND python output |
last line of code reads: |
Ian Ford |
| Printed | Page 11 In the 4th paragraph from the bottom on page 11 |
It seems that the Euclidean distance-based similarity score between 'Lisa Rose' and 'Gene Seymour' should be 0.294298 |
Daqing Chen |
| Printed | Page 11 numerical result (0.148148...) |
This is (supposedly) the result of using the function 'sim_distance' as it appears on this page, but it actually uses the statement |
Yehiel Milman |
| Printed | Page 11 Bottom line of python code section describing the function sim_distance |
The sim_distance function should return 1/(1+sqrt(sum_of_squares)) rather than 1/(1+sum_of_squares) when inverting the Euclidean distance. The formula for Euclidean distance includes a square root, but the square root is never taken of the sum of all the squares in this segment of code. |
Thea |
| Printed | Page 11 Euclidian Distance Score code snippet |
In the code snippet for sim_distance: |
Koen Mannaerts |
| Printed | Page 13 sim_pearson code fragment |
Definition of sim_pearson is subject to integer math errors, at least in Python 2.5.1. The number of overlapping elements is saved as "n = len(is)". This is used later to determine the numerator of the formula. |
Anonymous |
| Page 13 in the sample code |
# if they are no ratings in common, return 0 |
Anonymous | |
| Printed | Page 14 first paragraph of "Ranking the Critics" |
"learning which movie critics have tastes simliar to mine" |
Anonymous |
| Printed | Page 15 Table 2-2 |
FinalScore = Total/Sim.Sum |
Fuchen Ying |
| Printed | Page 20 |
In the pydelicious.py module, an exception is raised when "feedparser" is imported. |
Anonymous |
| Printed | Page 21 1st piece of code |
I think the API might have changed and doesn't include the 'href' key anymore. |
Lina Faller |
| Printed | Page 31 2nd code sample |
The code to split text into words uses this regular expression: |
Anonymous |
| Printed | Page 50 Very bottom of page |
The variable "outersum" is defined but never used. |
Anonymous |
| Printed | Page 62 2nd paragraph (suggested), just after def getentryid |
The first edition of the book does not include the working definition for the addlinkref function. That is, as printed, the function only contains the pass statement, and unfortunately is never updated. This causes the code in the Inbound Links section to fail (particularly the PageRank algorithm). I suggest that the completed code be placed on p. 62, just after the getentryid definition. [The full function definition does appear in the PCI_code.zip file available via the Examples link, however.] |
Marisano James |
| Printed | Page 68 1st code segment of the "Word Distance" section; 3rd from last line |
The line |
Marisano James |
| Printed | Page 69 1st code segment (4th line down) |
|
Anonymous |
| Printed | Page 70 Figure 4-3 |
Page C has only three links to other pages. It should have four links to other pages. |
Jarno Mielikainen |
| Printed | Page 70 Figure 4-3 |
The text states "C links to four other pages", but in the figure are only 3 additional arrows besides the one pointing to A. In the computation the total number of links on C is 5. Just a inconsistency. |
Anonymous |
| Printed | Page 70 Last Paragraph |
Looking at Figure 4-3, Page B and page C both have three links going to pages other than A. The text in the final paragraph states "B also has links to three other pages and C links to four other pages". Essentially, it looks like there should be an extra arrow coming out of C in the diagram to bring the total number of arrows to four. |
Bryce Thomas |
| Printed | Page 71 Beginning of code section, calculatepagerank |
To be most effective, a call to self.dbcommit() should be inserted after the self.con.execute('drop table if exists pagerank') call (i.e. just after the first command of the calculatepagerank function). This permits calculatepagerank to be called multiple times in a single python session without the database returning an error stating that it already contains a pagerank table, and subsequently remaining locked due to the error. |
Marisano James |
| Page 81 last lines of last two paragraphs of the codes |
The formulas to calculate output_deltas and hidden_deltas are wrong. According to the Delta Rule (http://en.wikipedia.org/wiki/Delta_rule), either these formulas should be error divided by dtanh, or the dtanh itself should be 1/(1-y*y). |
Eric Wang | |
| Printed | Page 88 Code definition for printschedule method |
for d in range(0, len(r), 2): |
Jakob Homan |
| Printed | Page 88,90 Throughout schedulecost and printschedule method definitions |
All instances of determining the return flight (returnf and ret variables) should have their destination and origin assignments switched, in order to find a flight back from LGA. As printed, the code finds the same flight for both origin and return. |
Jakob Homan |
| Printed | Page 88 bottom of page |
The index for the line beginning with out= should be [int(r[2*d])] rather than [r[d]], and the index for the line beginning with ret= should be [int(r[2*d+1])] rather than [r[d+1]]. |
Anonymous |
| Printed | Page 89 code example output |
there has been previous report on some other issues on the example, e.g airport name is nowhere in schedule.txt and not programmed, also, the discussion and code are a bit inconsistent. Now, when running example code, the output is very different from the printout in the book as well. In summary, the illustration here is very confusing. |
Anonymous |
| Page 90 the last 3 and 4 lines |
int(sol[d]) should be int(sol[2*d]) |
Eric Wang | |
| Printed | Page 90 3rd paragraph, first sentence |
The text states that, "There are a huge number of possibilities for the getcost function defined here." There is no getcost function, however; "getcost" should be replaced with "schedulecost". |
Marisano James |
| Printed | Page 91 3rd paragraph |
In theory, you could try every possible combination, but in this example there are 10 flights, all with 6 possibilities, giving a total of 6^10 combinations. |
Anonymous |
| Printed | Page 91 3rd paragraph |
I believe that calculation should be: |
Anonymous |
| Printed | Page 91 3rd paragraph |
Oh, and 9**16 is much closer to 300 trillion than it is to 300 billion |
Anonymous |
| Printed | Page 92 1st code snippet |
> return r |
Anonymous |
| Printed | Page 92 first code snippet |
The randomoptimize function should return bestr, not r. |
Andy Young |
| Printed | Page 92 2nd line of Python session section (toward top of page) |
The domain goes from 0-10, i.e. 0..9 rather than 0..7. (See schedule.txt for confirmation; also on p. 97, 8 could not be included as part of the solution if the highest available value were 7.) This means the line specifying the domain should read: |
Marisano James |
| Printed | Page 93 hillclimb function code sample |
The spelling of neighbors is inconsistent in the code sample. All instances of the word neighbors/neighbours in the code sample should be changed either to "neighbors" or "neighbours", but not a mix of both. |
Bryce Thomas |
| Printed | Page 93 bottom third of page, about halfway through the code section |
The code as printed in the first edition of the book, allows for negative index references (in Python these will not cause index out of range errors, but will instead read from the opposite end of the list) and does not explore the full domain. One way to address these shortcomings is to change the domain to take on actual values from 0 up to 9 inclusive, and to make the following changes: |
Marisano James |
| Printed | Page 98 mutate function definition |
The mutate function is missing an else clause (or some other mechanism for returning a default value). A possible solution would be to remove the elif conditional and guarantee that the function always returns a value: |
Jeremy Mason |
| Printed | Page 98 code example in function def mutate |
the mutate function returns None if neither the if or elif are true. This results in None being appended to the pop list which causes badness in the cost function. |
Matt Mercer |
| Page 98 the mutate function of code example |
the mutate function should has a else clause which return vec. Otherwise None will be added into pop occasionally |
Eric Wang | |
| Printed | Page 108 second code example |
for i in range(len(dorms): slots += [i,i] |
Anonymous |
| Printed | Page 109 bottom |
The output of dorm.printsolution(s) uses the value of s as generated by s=optimize.randomoptimize(), for a solution cost of 18 (the original generated solution.) |
Anonymous |
| Printed | Page 121 2nd sample code block |
After importing docclass, the db should be initialized. |
Anonymous |
| Printed | Page 123 1st paragraph |
The function for the calculation of the weighted probabilities (weightedprob) is wrong. In more details, the variable "total" is calculated in a wrong way. "total" is a weighting factor for "basicprob" (probability to find a document with a given feature in a given category). In the book "total" is calculated as "the number of times this feature has appeared in all categories". However, the weighting factor "total" should be equal to the number of items in a considered category. |
Anonymous |
| Printed | Page 129 very last line of code |
I believe the last argument to the invchi2 function should *not* be multiplied by 2. That is, it should be: |
Roy Pardee |
| Printed | Page 150 buildtree function |
The recursive calls at the end of the buildtree function should propagate the scoref parameter. Otherwise if you use a scoref function besides the default "entropy" function it will only be used on the first call. |
Stan Dyck |
| Printed | Page 157 bottom of page, second last line of code |
mdclassify function is called qualified by treepredict2, but treepredict2 is neither imported (doesn't exist anyway) or defined anywhere. Changing |
Bryce Thomas |
| Printed | Page 160 getaddressdata function code sample top half of page |
It appears as though the Zillow API does not return "totalRooms" anymore (assuming it once did). Furthermore, some of the houses that it searches for appear to return no actual values from Zillow, or still cause the exception block to be hit. Later on, when the code asks for len(row) in the variance method of treepredict.py, this will cause "TypeError: object of type 'NoneType' has no len()". |
Bryce Thomas |
| Printed | Page 161 Modelling "Hotness" |
Hot or Not API is no longer available, so Hot or Not stuff will not work. |
Bryce Thomas |
| Printed | Page 175 top half of page, gaussian function and interactive interpreter sample |
The gaussian function shown at the top of the page does not produce the results shown in the interactive interpreter sample code. |
Bryce Thomas |
| Printed | Page 186 code example |
numpredict.cumulativegraph(data, (1,1), 6) |
Anonymous |
| Printed | Page 186 interactive interpreter code sample |
the cumulativegraph function is being passed the vector (1,1), which means it will draw the cumulative probability for the price of bottles of wine which have a rating of 1 and are 1 year old. These wine bottles would be so terrible that the cumulative probability would reach 1 at a price of 0, meaning there's nothing to see on the graph generated. |
Bryce Thomas |
| Printed | Page 188 interactive interpreter code sample |
Like the error on page 186, using the vector (1,1) - a wine bottle with a rating of 1 and 1 year old, means that the wine would be so bad that it would all cost essentially 0. Instead, use a vector like say (99,10). |
Bryce Thomas |
| Printed | Page 198 First paragraph, 2nd to last sentence |
"This data is used ...", should probably read, "These data are used ..." [Data is the plural form of datum.] |
Anonymous |
| Printed | Page 199 code sample |
The paragraph after the code sample states "The points will be O if the people are a match and X if they are not." The code sample however draws a scatterplot where people that are a match are represented with a green O and people that are not a match are represented with a red O (not an X). To get a scatterplot that uses red X's instead of red O's, change the line: |
Bryce Thomas |
| Printed | Page 199 middle of page (code section) |
If the plotagematches function is really to be called from one's Python session (rather than being added to the advancedclassify.py file) then one does not need to reload advancedclassify, and should only type "plotagematches(agesonly)" instead of "advancedclassify.plotagematches(agesonly)" into the Python session. |
Marisano James |
| Printed | Page 204 First Paragraph |
There is a sentence which states "There are two other points, X0 and X1, which are examples that have to be classified". The diagram on the other hand uses points X1 and X2, but no X0. I believe that the sentence should be changed to "There are two other points, X1 and X2, which are examples that have to be classified." |
Bryce Thomas |
| Printed | Page 208 top of page, 2nd line of getlocation function body |
The URL for Yahoo! Maps Web Services has changed. The line reading, |
Marisano James |
| Printed | Page 209 halfway down the page |
Apparently there are two 824 3rd Avenues in New York city: one that's approximately 0.9 miles from 220 W 42nd St, and another that's about 6.6 miles away. It might be good to include zip codes with some, or all, of the addresses from the matchmaker.csv file to avoid confusion. [The new Yahoo! Maps Services returns the latitude and longitude for the second address - the one that's circa 6.6 miles away - as the default rather than those of the address that's approx. 0.9 miles away, as listed in the book.] |
Marisano James |
| Printed | Page 210 In the scaledata function |
In the book, scaleinput should be defined as: |
Anonymous |
| Printed | Page 213 top |
The printed first edition does not include the source code for the veclength function, |
Marisano James |
| Printed | Page 213 top |
Just above the definition of the rbf function there should be an |
Marisano James |
| Printed | Page 213 top of page, in rbf definition |
To be in accordance with the other gammas, and with the code listed in PCI_code.zip, the first line of the rbf function definition should read, |
Marisano James |
| Printed | Page 213 bottom third of page (2nd from last line of the nlclassify function) |
The line, |
Marisano James |
| Printed | Page 214 Interactive interpreter sample at top of page |
Before you can execute any of the statements, you first need to reload advancedclassify using: |
Bryce Thomas |
| Printed | Page 214 top of page, just before the first call to nlclassify |
Just after the missing reload(advancedclassify) statement (See Bryce's comment for more), one should execute |
Marisano James |
| Printed | Page 218 top |
The lines reading, |
Marisano James |
| Printed | Page 219 Facebook section |
The code for facebook has change dramatically since the publishing and no longer functions. Use of the book code returns a "This API version is deprecated" |
Jeffery Shipman |
| Printed | Page 230 makematrix() function |
Non-Negative Matrix Factorization is well defined only when there all no all-zero rows and columns. The makeMatrix() function, by excluding "words that are common but not too common," can eliminate all the word in some documents yielding all-zero rows. The evidence that this has occurred is error messages and NaNs in the output. It can be fixed two ways. A quick-and-dirty fix is to eliminate fewer words in makematrix(); this can be accomplished with simple changes to the if statement. A more robust fix is removing the resulting all-zero rows. |
Scott Ainsworth |
| Printed | Page 261 3rd line from bottom of page |
isinstance(t, node): |
Marisano James |
| Printed | Page 263 4th line from the bottom of the page |
The line |
Marisano James |
| Printed | Page 271 toward the bottom of the tournament function |
As currently presented, the when there is a tie between players i and j it is as if player i lost. Presently the code reads, |
Marisano James |
| Printed | Page 313 9th line |
The command should be |
Anonymous |
