The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Version |
Location |
Description |
Submitted By |
Date submitted |
Date corrected |
|
Chapter 9 Section Running the Trials |
val trials = seedRdd.flatMap(trialReturns(_, numTrials / parallelism, bFactorWeights.value, factorMean
fails with:
org.apache.spark.SparkException: Task not serializable
Note from the Author or Editor: More discussion is in https://github.com/sryza/aas/issues/64 including a potential workaround. Let's move there.
|
Ranko Mosic |
Mar 08, 2016 |
|
PDF |
Page 13
2nd Paragraph, curl command |
The curl command used is `curl -o donation.zip http://bit.ly/1Aoywaq`. bit.ly responds with a 301, which curl on my system (curl 7.37.1 (x86_64-apple-darwin14.0), by default does not follow. To alleviate this, the command used should be `curl -L -o donation.zip http://bit.ly/1Aoywaq`
Note from the Author or Editor: Yes, the command should start with "curl -L -o ..."
|
whaley |
Apr 10, 2015 |
Aug 07, 2015 |
PDF |
Page 28
5th line from bottom |
In the text: "... decompressing and then serializing the results, and finally, performing computations on the aggregated data", the word "serializing" should be "deserializing".
|
Sean Owen |
May 11, 2015 |
Aug 07, 2015 |
PDF |
Page 33
The code snippet on top of the page |
The code snippet reads: val misses = parsed.filter($"is_match" === false), the false value isn't wrapped in the lit() function.
In the paragraph below where this code is talked through, the author claims that "we need to wrap the boolean literal false with the lit
function".
This needs clarification, is the code snippet missing a lit( ) around false or the paragraph below is incorrect and we don't in fact need a lit function in there?
Note from the Author or Editor: Hm, are you sure? that seems to compile fine. The Column class defines an === method that takes "Any" and wraps its arg in lit() anyway. However I agree that the text says it's necessary, but I'm not sure it is. I'm not sure I'm able to update the text of this chapter anymore, but I would indeed make these consistent one way or the other.
|
Jacek Jankowiak |
Jan 23, 2021 |
|
ePub |
Page 38%
1 |
Hi,
The Wikipedia files appear to be corrupt:
$ curl -s -L http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles-multistream.xml.bz2 | bzip2 -cd | ~/hadoop-2.7.0/bin/hadoop fs -put - wikidump.xml
bzip2: Data integrity error when decompressing.
Input file = (stdin), output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
Same thing when I try to download and extract them from Firefox:
ubuntu@ip-10-0-1-186:/data$ bzip2 -d enwiki-latest-pages-articles-multistream.xml.bz2
bzip2: Data integrity error when decompressing.
Input file = enwiki-latest-pages-articles-multistream.xml.bz2, output file = enwiki-latest-pages-articles-multistream.xml
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
bzip2: Deleting output file enwiki-latest-pages-articles-multistream.xml, if it exists.
ubuntu@ip-10-0-1-186:/data$ ls
enwiki-latest-pages-articles-multistream.xml.bz2 lost+found
ubuntu@ip-10-0-1-186:/data$ bzip2 -dtvv enwiki-latest-pages-articles-multistream.xml.bz2
enwiki-latest-pages-articles-multistream.xml.bz2:
[1: huff+mtf rt+rld]
[1: huff+mtf rt+rld]
[2: huff+mtf rt+rld]
[1: huff+mtf rt+rld]
[2: huff+mtf rt+rld]
[3: huff+mtf rt+rld]
[4: huff+mtf rt+rld]
[5: huff+mtf rt+rld]
[1: huff+mtf data integrity (CRC) error in data
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
ubuntu@ip-10-0-1-186:/data$ bzip2recover enwiki-latest-pages-articles-multistream.xml.bz2
bzip2recover 1.0.6: extracts blocks from damaged .bz2 files.
bzip2recover: searching for block boundaries ...
block 1 runs from 80 to 4640
block 2 runs from 4808 to 1948251
block 3 runs from 1948300 to 3752034
block 4 runs from 3752200 to 5832866
block 5 runs from 5832915 to 7818462
block 6 runs from 7818511 to 9886990
...
Any ideas?
Note from the Author or Editor: Hm, you're right. It seems like the dumps starting with April 3 have this problem. March 4 seems OK. We should change the text to refer to that specific version.
On page 102, the URL should change latest -> 20150304 in two places:
$ curl -s -L http://dumps.wikimedia.org/enwiki/20150304/\
$ enwiki-20150304-pages-articles-multistream.xml.bz2 \
...
|
David Laxer |
May 26, 2015 |
Aug 07, 2015 |
Printed, PDF |
Page 43
lines 4 and 5 |
There are two dead links:
1. “Collaborative Filtering for Implicit Feedback Datasets”
shortener: http://bit.ly/1ALoX4q which goes to: https://research.yahoo.com/files/HuKorenVolinsky-ICDM08.pdf which is now 404.
The paper can now be found here:
http://yifanhu.net/PUB/cf.pdf
2. “Large-scale Parallel Collaborative Filtering for the Netflix Prize”
shortener http://bit.ly/16im1AT which now goes to: https://www.labs.hpe.com/about
The paper can be now found here:
https://endymecy.gitbooks.io/spark-ml-source-analysis/content/%E6%8E%A8%E8%8D%90/papers/Large-scale%20Parallel%20Collaborative%20Filtering%20the%20Netflix%20Prize.pdf
Note from the Author or Editor: The links should be updated for the 2nd edition. Both links are correct in the draft PDF I am looking at now. The second link for "Large-scale Parallel Collaborative Filtering for the Netflix Prize" goes to http://dl.acm.org/citation.cfm?id=1424269 instead now.
|
Clem Wang |
Jun 09, 2017 |
|
PDF |
Page 62
2nd paragraph from the bottom |
The description of the logic of the middle decision tree node is incorrect, and the text should update to match the diagram.
Replace this sentence:
If the date has passed by more than three days, I predict yes, it’s spoiled.
with:
If the date has passed, but that was three or fewer days ago, I take my chances and predict it's not spoiled.
|
Sean Owen |
May 11, 2015 |
Aug 07, 2015 |
PDF |
Page 71
Start of final paragraph |
The paragraph should start with "The decision tree algorithm" but starts with "he decision tree algorithm".
|
Sean Owen |
May 11, 2015 |
Aug 07, 2015 |
Printed, PDF |
Page 72
2nd equation (6th from bottom) |
the term log(1/p)
is missing the subscript i for p
In LaTex, it should be:
$$I_{E}(p) = \sum_{i=i}^{N}p_i log(\frac{1}{p_i}) = - \sum_{i=i}^{N}p_i log(p_i) $$
Note from the Author or Editor: Yes, you're right. I'll fix that for future printing.
|
Clem Wang |
Jun 10, 2017 |
|
PDF |
Page 92
Very end, continuing into page 93 |
From a reader report at https://github.com/sryza/aas/issues/33 :
On page 92 in calculating sumSquares, the code
val sumSquares = dataAsArray.fold(
new Array[Double](numCols)
)(
(a,b) => a.zip(b).map(t => t._1 + t._2 * t._2)
)
As the RDD.fold requires operator to be communicative, which was violated by asymmetry in the map() function, the result might be different for different number of partitions in RDD.
Yes, this code should be replaced with a call to aggregate:
val sumSquares = dataAsArray.aggregate(
new Array[Double](numCols)
)(
(a, b) => a.zip(b).map(t => t._1 + t._2 * t._2),
(a, b) => a.zip(b).map(t => t._1 + t._2)
)
|
Sean Owen |
Jul 17, 2015 |
Aug 07, 2015 |
Printed |
Page 100
Middle of 3rd paragraph |
There appears to be confusing inconsistency in the assignment of rows and columns to terms and documents. In the middle of page 100 and the line passing from the bottom of page 100 to the top of page 101, rows represent terms and columns represent documents. But in the 5th paragraph of page 101 and through to the end of the chapter, rows are documents and columns are terms. See in particular the passage from the bottom of page 107 to the top of page 108.
Note from the Author or Editor: Agree. I will forward to Sandy for a look. I think it may be best to change all references to refer to a "document-term" matrix where docs are rows.
|
John Boersma |
Nov 04, 2015 |
|
PDF |
Page 104
Last line of code on page |
The code snippet refers to the file "stopwords.txt", but doesn't say where this file comes from. It is available at https://github.com/sryza/aas/blob/master/ch06-lsa/src/main/resources/stopwords.txt and this should be explicit in the text.
To address this, in the text that precedes the listing, after the sentence "The following snippet takes the RDD of plain-text documents and both lemmatizes it and filters out stop words:", instead end that sentence with a period and add the sentence:
Note that this code relies on a file of stopwords called stopwords.txt, which is available in the accompanying source code repo at https://github.com/sryza/aas/blob/master/ch06-lsa/src/main/resources/stopwords.txt and should be downloaded into the current working directory first:
|
Sean Owen |
Jul 09, 2015 |
Aug 07, 2015 |
PDF |
Page 104, 107
Code listings in each page |
See https://github.com/sryza/aas/issues/34
in RunLSA.scala
error: value containsKey is not a member of scala.collection.immutable.Map[String,Int]
case (term, freq) => bTermToId.containsKey(term)
http://www.scala-lang.org/api/2.11.5/index.html#scala.collection.immutable.Map
looks like it should be "contains" instead of "containsKey"
On page 104, the following import needs to be added at the start of the code listing:
import scala.collection.JavaConversions._
On page 107, the same import can be removed from the listing. In addition, in that listing termFreqs.values().sum should become termFreqs.values.sum, and bTermIds.containsKey(term) should become bTermIds.contains(term)
|
Sean Owen |
Jul 17, 2015 |
Aug 07, 2015 |
Printed |
Page 107
United States |
Just a suggestion.
For completeness, might be worth adding line:
val bIdfs = sc.broadcast(idfs).value
though this is easily extrapolated by a reader who follows the code and/or can be looked up in the accompanying repo online.
Note from the Author or Editor: Since the text is explaining the computation and broadcast of one data structure, bTermIds, I'd rather not inject a second one there.
However I don't think it would hurt to add a little text here as it does sort of feel like the next chunk of code should be executable as-is, but this necessary second broadcast is not mentioned. It is the accompanying source.
Before "Finally, we tie it all together ...", add "Similarly, broadcast idfs as bIdfs." Code font for idfs and bIdsf.
|
Renat Bekbolatov |
May 11, 2015 |
Aug 07, 2015 |
PDF |
Page 107
Code listing at top of page |
The listing at the top of the page does not define numDocs. See https://github.com/sryza/aas/issues/31
The suggested fix is to insert this line of code before the first line of this listing (beginning "val idfs =..."):
val numDocs = docTermFreqs.count()
Also, this listing needs a different small fix. The "toMap" at the end needs to be "collectAsMap()"
|
Sean Owen |
Jul 15, 2015 |
Aug 07, 2015 |
PDF, ePub, Mobi |
Page 107
SVD definition |
The original text says:
V is a k x n matrix ...
It should be:
V is a n x k matrix ...
Explanation
If V is a k x n matrix, its transpose is a n x k matrix,
and the matrix multiplication U S Vt is not possible (unless n = k).
U S Vt
(m x k) x (k x k) x (n x k)
But if V is a n x k matrix, its transpose is a k x n matrix, and everything is ok!
U S Vt
(m x k) x (k x k) x (k x n)
Note from the Author or Editor: That's correct. In the third bullet point following the equation M = U S VT, it should start with "VT is a k x n matrix ..." (that's V with superscript T)
|
Carlos Pavia |
Feb 07, 2015 |
Aug 07, 2015 |
PDF |
Page 108
Code listing in middle of page |
See https://github.com/sryza/aas/issues/36
termDocMatrix is not defined. It should actually be "vecs", defined on the previous page.
|
Sean Owen |
Jul 20, 2015 |
Aug 07, 2015 |
Printed |
Page 109
code block in the middle, 4th line from bottom |
If reader only uses the book, variable "termId" will be misleading there, because it was earlier defined to be of type Map String->Int, whereas now it is the reverse of that.
Again, this could be found in the online repo and guessed, but readers will find it useful to have consistent variable names if they are only reading the book.
Note from the Author or Editor: Yeah, they are different code blocks, and it's maybe clearer in the full source code, which works, but this does result in a conflict in listings.
I think the simplest clarification involves changing the previous code on page 107. Replace "termIds" with "termToId", and "bTermIds" with "bTermToId" on page 107. Then page 109 has no conflict.
I can make a parallel change in the source code in the repo.
|
Renat Bekbolatov |
May 11, 2015 |
Aug 07, 2015 |
Printed |
Page 109
last paragraph, 2nd~3rd line |
We can find the terms relevant to each of the top concepts in a similar manner using Y, ...
->
We can find the documents relevant to each of the top concepts in a similar manner using Y, ...
Is it right?
Note from the Author or Editor: Agreed, the text clearly shows using V for terms, and then U for documents. Your change is correct.
|
Edberg |
Dec 29, 2015 |
|
Printed |
Page 111
def wikiXml... |
missing "None" line
Note from the Author or Editor: Yes, after the line in the listing:
page.getTitle.contains("(disambiguation)")) {
and before
} else {
should appear a new line containing just:
None
It should be indented like "Some" below.
|
Renat Bekbolatov |
May 11, 2015 |
Aug 07, 2015 |
PDF |
Page 114, 117, 118
Various code listings |
From a reader report --
It seems like the variables idTerms and termIds may be used inconsistently in the code listings and in the accompanying source code. The idTerms variable is not shown in the listings in the book. Corresponding listings in the source code use termIds.
It seems like idTerms should be a map from id to terms, and termIds should be terms to ids, but the convention is largely reversed. Is that intentional? that's fine if so. But either way it needs to be consistent in the code / listing.
|
Sean Owen |
Jul 09, 2015 |
Aug 07, 2015 |
Printed |
Page 134
First line of second code block on the page |
Two typos - or - One typo, if another line added. Explanation below.
A:
1) we probably want to use variable name "componentCounts" instead of "topComponentCounts" which was not introduced.
2) we should be looking up "componentCounts(1)._1" instead of "componentCounts(1)._2"
B:
It is possible that there was a missing line - but it is also not in the official repo:
val topComponentCounts = componentCounts.take(10).map(_.swap)
Note from the Author or Editor: Yes, the best fix is to change
topComponentCounts(1)._2
to
componentCounts(1)._1
on page 134. I will fix the source code repo too.
Could we also add Renat Bekbolatov to the acknowledgements section as part of this erratum? Lots of good catches and deserves recognition.
|
Renat Bekbolatov |
May 15, 2015 |
Aug 07, 2015 |
Printed |
Page 134
second line above |
while the second largest contains only 4% -> containts only 4
Is it right?
Note from the Author or Editor: Yes, it should read "4 vertices" instead of "4%" for clarity.
|
Edberg |
Jan 02, 2016 |
|
PDF |
Page 136
. |
There is an error on pg. 136 of the latest release of Advanced Analytics
with Spark.
The marginal totals on the contingency table- the bottom row total should
be "A total" rather than "B total"
Note from the Author or Editor: Agree, though I believe it's the third column label of this table (on p. 138 in the final PDF) that should be "A Total" rather than third row label, given the following text.
|
Anonymous |
Feb 19, 2015 |
Aug 07, 2015 |
Printed |
Page 139
3rd line from bottom |
"val inner = (YY * NN - YN * NY) - T / 2.0"
->
"val inner = YY * NN - YN * NY"
Note from the Author or Editor: Yes, this code is inconsistent with the formula on the preceding page, page 138 at the bottom. However I think we should change the formula. The extra term here is the Yates continuity correction and is probably the right version of the chi-squared test to show people.
So, on page 138, the numerator of the formula should add two things: absolute value around the current inner product, and then a "- T / 2" term, to read:
(|YY * NN − YN * NY| - T / 2)^2
Immediately following, before the sentence "If our samples are...", we should insert a clarifying remark:
Note that this formulation of the chi-squared statistic includes a term "- T / 2", which is Yates's continuity correction (http://en.wikipedia.org/wiki/Yates%27s_correction_for_continuity) and not included in some formulations of the chi-squared statistic.
Then this line of code on 139:
val inner = (YY*NN-YN*NY) - T / 2.0
needs to be
val inner = math.abs(YY*NN-YN*NY) - T / 2.0
|
Renat Bekbolatov |
May 15, 2015 |
Aug 07, 2015 |
PDF |
Page 156
Code segment at end of page |
The code has some typos in it:
the distance() method is missing the beginning curly brace and the call inside it to GeometryEngine.distance() is missing the ending paren
Note from the Author or Editor: Yes, the distance function should read:
def distance(other: Geometry): Double = {
GeometryEngine.distance(geometry, other, spatialReference)
}
|
Chaz Chandler |
Apr 19, 2015 |
Aug 07, 2015 |
PDF |
Page 159
Code block at beginning of page |
The code blocks described in this section are only a portion of the code necessary. If the intent of the text is for the reader to follow along in the console while reading then the reader will be stuck without referencing the full code on Github (https://github.com/sryza/aas/blob/master/ch08-geotime/src/main/scala/com/cloudera/datascience/geotime/GeoJson.scala). It may be useful to point out that this is just an illustrative excerpt, since adding the full code listing may unnecessarily add to the length of the text. Also the implicit declarations in the text aren't nested within the GeoJsonProtocol object like they are in the Github code.
Note from the Author or Editor: I agree. At "Esri Geometry API to represent the longitude and latitude of the pickup and dropoff locations:", let's remove the colon and finish the sentence with a period, then add, "Note that the code listings below are only illustrative extracts from the complete code that you will need to execute to follow along with this chapter. Please refer to the accompanying Chapter 8 source code repository, in particular GeoJson.scala."
|
Chaz Chandler |
Apr 19, 2015 |
Aug 07, 2015 |
PDF |
Page 165
first line of top code block |
The first import statement should read "import com.cloudera.datascience.geotime._" (ie, datascience instead of science and geotime instead of geojson). See https://github.com/sryza/aas/blob/master/ch08-geotime/src/main/scala/com/cloudera/datascience/geotime/GeoJson.scala#L6
Note from the Author or Editor: Yes, it should read "import com.cloudera.datascience.geotime._" instead of "import com.cloudera.science.geojson._"
|
Chaz Chandler |
Apr 19, 2015 |
Aug 07, 2015 |
Printed |
Page 165
(1) First sentence, (2) 5th line from below |
A couple of minor suggestions:
(1) First sentence:
"...we need to use ... tools ... in[to] the Spark shell ..."
(2) 5th line from below:
It is not exactly clear what "frs sequence" stands for, but of course, from context we can guess it is about areaSortedFeatures.
Note from the Author or Editor: On page 165, opening sentence, change "Now we need to use" to "Now we need to import"
Page 165, "in the frs sequence" should be "in the areaSortedFeatures sequence"
|
Renat Bekbolatov |
May 19, 2015 |
Aug 07, 2015 |
Printed |
Page 170
First line of code block on the bottom of the page |
Small typo:
previously unseen "bdrdd" -> "boroughDurations"
Note from the Author or Editor: Correct, bdrdd should read boroughDurations in the final snippet on 170.
|
Renat Bekbolatov |
May 19, 2015 |
Aug 07, 2015 |
Printed |
Page 177
code segment at the bottom |
Just a tiny thing, might be worth noting.
Using the current version of codebase and the way it is described in the book, the script actually gets different instruments - not ones that track S&P 500 or Nasdaq index values. (That might also explain unexpected lower correlation numbers between these two on page 185.)
Also a couple of other very minor typos to fix in future prints:
p. 175, "Variance-Covariance" section, last words: "...deriving a[n] estimate ..."
p. 176, "Our Model" section, end of first paragraph: "... of possibl[e|y] different ..."
Note from the Author or Editor: The typo fixes are confirmed, yes. I'll ask Sandy to look at the download.
|
Renat Bekbolatov |
May 21, 2015 |
Aug 07, 2015 |
Printed, PDF |
Page 180
Bottom two code snippets |
Errors:
val stocks: Seq[Array[Double]] =
Should be:
val stocks: Seq[Array[(DateTime, Double)]]
Error
val factors: Seq[Array[Double] =
Should be
val factors: Seq[Array[(DateTime, Double)]] =
Note from the Author or Editor: Yes, the type is incorrect and should be Seq[Array[(DateTime, Double)]] in both cases. In the accompanying source code, there's no problem since the type is simply left off. I think it's best to just match that.
The type declaration ": Seq[Array[Double]]" can be deleted in both occurrences, leaving just "val stocks = ..." and "val factors = ..."
|
Dr Zach Izham |
May 15, 2015 |
Aug 07, 2015 |
Printed |
Page 195
Quote, Ch. 10 |
What is SCHPON[...]?
Note from the Author or Editor: It's shorthand for Sulfur, Carbon, Hydrogen, Phosphorous, Oxygen, Nitrogen. I'll try to stick in a footnote.
|
Edberg |
Feb 15, 2016 |
|
PDF |
Page 199
Last command on page |
https://github.com/sryza/aas/issues/38
The URL below doesn't work anymore:
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/HG00103/alignment/HG00103.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
It appears to now be at:
ftp://ftp.ncbi.nih.gov/1000genomes/ftp/phase3/data/HG00103/alignment/HG00103.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
There's a UK mirror at:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00103/alignment/HG00103.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam
|
Sean Owen |
Jul 24, 2015 |
Aug 07, 2015 |
PDF |
Page 199
First line of code |
The "adamLoad" method used in the text and source was removed after adam 0.16.0, which the text uses. The source repo depends on 0.16.0, but here in the "git clone" command it's important to also checkout 0.16.0:
git clone -b adam-parent-0.16.0 https://github.com/bigdatagenomics/adam.git
|
Sean Owen |
Aug 03, 2015 |
|
PDF |
Page 199
2nd block of code |
export $ADAM_HOME=path/to/adam
should be
export ADAM_HOME=path/to/adam
or
ADAM_HOME=path/to/adam; export ADAM_HOME
(no dollar sign in any case)
This works in bash, sh and ksh
In tcsh or csh it should be
setenv ADAM_HOME path/to/adam
Note from the Author or Editor: Yes, the "export ADAM_HOME=..." is correct.
|
David G Pisano |
Sep 22, 2015 |
|
PDF |
Page 222
2nd paragraph following "Loading Data with Thunder" |
The location of the resource hyperlinked as python/thunder/utils/data/fish/tif-stack has changed. It should be:
https://github.com/thunder-project/thunder/tree/v0.4.1/python/thunder/utils/data/fish/tif-stack
|
Sean Owen |
May 01, 2015 |
Aug 07, 2015 |
PDF |
Page 240
2nd paragraph |
"A task only contributes to the accumulator the first time it runs. For example, if a task completes successfully, but its outputs are lost and it needs to be rerun, it will not increment the accumulator again."
That's wrong. It is only true for actions. When using Accumulators in transformations they may be incremented multiple times. This can happen for various reasons (reuse of RDD, task failure, task rerun etc.)
https://spark.apache.org/docs/latest/programming-guide.html#accumulators-a-nameaccumlinka
Note from the Author or Editor: Yes I think this at least has to be converted to a note explaining that this could over-count as implemented here. Better would be to convert the example to make the accumulator updates inside aggregate() instead somehow, as that's correct. Sandy WDYT?
|
Lars Francke |
Jul 30, 2015 |
Aug 07, 2015 |
PDF |
Page 240
2nd & 3rd paragraph |
Thanks for fixing the Accumulator docs.
2nd paragraph (the new one): "but its output are lots" -> lost
3rd paragraph: This paragraph is very hard to understand and parse for me. Especially now that the previous paragraph basically says the opposite.
Note from the Author or Editor: Indeed lots -> lost needs to be fixed.
Sandy up to you whether you want to revise the 3rd paragraph. The link may be: "For cases where these behaviors are acceptable, accumulators can be a big win, because ..."
|
Lars Francke |
Aug 18, 2015 |
|
Printed |
Page 242
last line before section beginning |
To make it work by default (without implicit conversions):
val (train, test) = ...
to
val Array(train, test) = ...
Note from the Author or Editor: Yes, this needs to be
val Array(train, test) = ...
to work "out of the box". This line of code wasn't actually in the repo as it was just an example, so didn't catch it as a compiler error.
|
Renat Bekbolatov |
May 11, 2015 |
Aug 07, 2015 |
Printed |
Page 242
first line |
Code explanation (1) of first line in this page
from "Swap the order of the tuples to sort on the numbers instead of the counts"
I think that example code in previous page sorts data on the counts instead of the numbers. Am I incorrect?
Note from the Author or Editor: You're right that this is what the example on the previous page does, and it's also what this example does. Really, the bullet text should be different, to explain that it's doing this to still sort on count. I will adjust it.
|
Edberg |
Feb 12, 2016 |
|
ePub |
Page 314
1st paragraph in "Document-Document Relevance" |
"where u sub-i is the row in U corresponding to term i". I think that "term" should be "document", since U is the document space.
Note from the Author or Editor: Yes, it should read "document i".
|
Brent Schneeman |
Apr 14, 2015 |
Aug 07, 2015 |