Errata

Deep Learning for Coders with fastai and PyTorch

Errata for Deep Learning for Coders with fastai and PyTorch

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Printed
Page 16
1st paragraph

There are two folders containing different versions of the notebooks. The full folder contains the exact notebooks used to create the book you're reading now, with all the propose and outputs. The stripped version has the same headings and code cells, but all outputs and prose has been removed.

The folders are no longer called "full" and "stripped". The https://github.com/fastai/fastbook contains the "full" version of the book.

The clean (https://github.com/fastai/fastbook/tree/master/clean) is the stripped version.


Andrew Nakamura  Aug 24, 2020  Sep 18, 2020
Printed
Page 28
5th paragraph

We have to tell fastai how to get labels from the filenames, which we do by calling from_name_func (which means that the *filenames* can be extracted using a function applied to the filename) .... That should say labels instead of filenames


Note from the Author or Editor:
Replace filenames by labels in the parenthesis:
(which means that the labels can be extracted using a function applied to the filename)

John O'Reilly  Sep 13, 2020  Dec 18, 2020
Printed
Page 28
-5

" and passing x[0].isupper(), which evaluates to True if the first letter is uppercase (i.e., it’s a cat)."

The code is not passing 'x[0].isupper()', instead it passes a fucntion 'is_cat'

Note from the Author or Editor:
replace 'x[0].isupper()', by 'is_cat' in this sentence.

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 136
3rd paragraph

In the 3rd line of the 3rd paragraph. it should be " for a total of 784 pixels" not "768 pixels" ]. Change 768 to 784

Note from the Author or Editor:
Please replace 768 by 784 as advised.

Mohammed Maheer  Oct 13, 2020  Dec 18, 2020
Printed
Page 143
l.1

"If you’ve done numeric programming in PyTorch before, you may recognize these as being similar to NumPy arrays."

PyTorch should be NumPy.

Note from the Author or Editor:
Yes, replace PyTorch by NumPy in this sentence.

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 147
last paragraph

"we'll get back 1,010 absolute values" should read
"we'll get back 1,010x28x28 absolute values" -
(valid_3_tens-mean3).abs().shape -> torch.Size([1010, 28, 28])

Note from the Author or Editor:
change to "get back 1,010 matrices of absolute values"

Peter Butterfill  Sep 08, 2020  Sep 18, 2020
Printed
Page 147
in the middle

tensor([1,2,3]) + tensor([1,1,1])

this does not make sense as an example.

i guess it should be

tensor([1,2,3]) + tensor(1)

Note from the Author or Editor:
Yes, this should be `tensor([1,2,3]) + tensor(1)`

HIDEMOTO NAKADA  Feb 08, 2021  May 07, 2021
PDF
Page 149
The last two paragraphs

All the vector symbols X and W should be in lowercase in order to be consistent with the function above the second to last paragraph.

Note from the Author or Editor:
In the last two paragraphs of p149, replace all (code-formatted) 'W' by 'w' and 'X' by 'x'

ZHANG Hongyuan  Dec 18, 2020  May 07, 2021
PDF
Page 149
middle paragraph

while we are discussing how to discriminate '3' and '7' in this chapter, this paragraph is talking about '8'.
I guess '8' here should be '3'.

and the syntax of the function for probability of being the number 8 is strange.

def pr_eight(x, w) = (x*w).sum()

it should be something like,

def pr_three(x, w): return (x*w).sum()

Note from the Author or Editor:
For consistency with the previous page, I guess we can use 3s here, so replace all 8 by 3 (I counted five instances on this page) and pr_eight by pr_three.

HIDEMOTO NAKADA  Mar 19, 2021  May 07, 2021
Printed
Page 173
3rd paragraph

"To decide if an output represents a 3 or a 7, we can just check whether it's greater than 0."

I understand that the threshold here is 0.5, so 0 should be 0.5.

Also, the code snippet just below this paragraph
----
(preds>0.0).float() == train_y[:4]
----
0.0 should be 0.5

Note from the Author or Editor:
Yes, in the paragraph mentioned, 0 should be replaced with 0.5 and in the code snippet immediately below, 0.0 should be 0.5.

HIDEMOTO NAKADA  Feb 08, 2021  May 07, 2021
PDF
Page 198
2nd paragraph, last line

"...would then look like something like Figure 5-3" should be "would then look something like Figure 5-3"

Note from the Author or Editor:
This should be fixed as suggested

ZHANG Hongyuan  Mar 03, 2021  May 07, 2021
Printed
Page 200
last paragraph

"So, we want to transform our numbers between 0 and 1 to instead be between negative infinity and infinity. "

logarithm does not give us infinity for 1. it should be 'negative infinity and 0' .

Note from the Author or Editor:
Indeed, "negative infinity and infinity" should be replaced by "negative infinity and 0"

HIDEMOTO NAKADA  Feb 13, 2021  May 07, 2021
Printed
Page 201
last paragraph (not the Sylvain says section)

modification should be multiplication in the following:
"Computer scientists love using logarithms, because it means that modification, which can create really really large and really really small numbers, can be replaced by addition"

Peter Butterfill  Sep 08, 2020  Sep 18, 2020
Printed
Page 255
last line of the 2nd paragraph

'The Last Skywalker'

I dont know if this is some kind of joke, but the star wars movie was
'The Last Jedi' or "The rise of Skywalker"

Note from the Author or Editor:
Replace "The Last Skylwalker" by "The Rise of Skylwalker"
Replace "last_skywalker" in the next code examples by "rise_skywalker" (two instances)

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 272
3rd paragraph

The term 'embedding matrices' used here means something different from the one used in p.268. I believe you are talking about the output of the embedding layer here.

it looks like 'embedding' without 'matrices' is better here.

Note from the Author or Editor:
In that paragraph replace "embedding matrices" by "embeddings"

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 335
in the middle list

xxunk
Indicates the next word is unknown


I understand this special token means the word is unknown. not the next word.

Note from the Author or Editor:
Yes, replace "Indicates the next word is unknown" by "Indicates this word is unknown"

HIDEMOTO NAKADA  Feb 16, 2021  May 07, 2021
Printed
Page 341
2nd paragraph

"We then cut this stream into a certain number of batches (which is our batch size)."

'batch size' usually means 'size of each batch', not 'number of batches'.

This is really confusing. Am I missing something?


Note from the Author or Editor:
In the mentioned sentence, replace "number of batches" by "number of chunks of contiguous text"

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 364
Last code section

The class SiameseImage is derived from Tuple (note the capital T). The native Python datatype is written all in lower case 'tuple' but I think what's meant here is fastuple, the extended tuple in fastcore. I wonder if Tuple was renamed fastuple at some point.

Note from the Author or Editor:
Please replace all instance of "Tuple" (with the capital, code-formatted) by "fastuple" (I counted two on p364, one in the code of the class SiameseImage, one in the paragraph before and one on p365).

Nils Brünggel  Oct 13, 2020  Dec 18, 2020
Printed
Page 399
last itemize

There are "Embedding dropout" and "Input dropout" are listed.

What is the difference between them?

Note from the Author or Editor:
p 399 replace the first two items of the last list with
- Embedding dropout (inside the embedding layer, drops some random lines of embeddings)
- Input dropout (applied after the embedding layer)

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 401
Questionnaire 33.

"Why do we scale the weights with dropout?"
I understand weights are not scaled. 'activations' are scaled.

Note from the Author or Editor:
In q33 replace "weights" by "activations"

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 410
first paragraph

[channels_in, features_out, rows, columns] should be read:
[features_out, channels_in, rows, columns]
Indeed, a few lines later, the shape of the kernel is shown to be [4,1,3,3], which corresponds to my suggested correction, but not to the original print.
Or am I missing something?

Note from the Author or Editor:
Please make the modification proposed by the reader.

vallotton  Oct 03, 2020  Dec 18, 2020
Printed
Page 423
bottom figures

What are shown in these figures are not the red, green and blue channel of the original image. Instead, you show the same image three times using a different colormap. Indeed, the three images show exactly the same pattern of intensitiy, which they shouldn't (for example, if the channels were really shown, the green grass should appear more prominently in the green channel than in the red channel). Or am I missing something? Thanks a for a great book though!

Note from the Author or Editor:
Add a parenthesis:
The first axis contains the channels red, green, and blue (here highlighted with the corresponding color maps):

pascal vallotton  Oct 04, 2020  Dec 18, 2020
Printed
Page 427
second paragraph

"We'll use the same as one as earlier..." should read:
"We'll use the same one as earlier..."

Note from the Author or Editor:
Please make that modification.

pascal vallotton  Oct 04, 2020  Dec 18, 2020
Printed
Page 428
-2nd paragraph

"except in camel_case."

'camel_case' should be 'snake_case'

Note from the Author or Editor:
Indeed, replace 'camel_case' with 'snake_case' in the second to last paragraph.

HIDEMOTO NAKADA  Feb 16, 2021  May 07, 2021
Printed
Page 433
1st paragraph

"The percentage of nonzero weights is getting much better",

'nonzero' should be 'near zero'

Note from the Author or Editor:
Replace nonzero by near-zero

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 447
first paragraph

"What if we intitialized gamma to zero for every one of those final batchnorm layers?"
Shouldn't beta also be intitialised to zero? Only then would the residual mapping be guaranteed to be zero at intitialisation. This is what seems to be done e.g. in https://arxiv.org/pdf/1901.09321.pdf
Thanks for clearing it up!

Note from the Author or Editor:
Add a precision:
"What if we initialized gamma to zero for every one of those final batchnorm layers? Since beta is already initialized to zero, our conv(x) for those..."
with beta code-formatted

vallotton  Oct 05, 2020  Dec 18, 2020
Printed
Page 447
l.3

"where conv is the function from the previous chapter that adds a second convolution, then a ReLU, then a batchnorm layer"

the conv in p.437 uses batchnorm *before* ReLU.

Note from the Author or Editor:
On p447 replace ""where conv is the function from the previous chapter that adds a second convolution, then a ReLU, then a batchnorm layer" by "where conv is the function from the previous chapter that adds a second convolution, then a batchnorm layer, then a ReLU"

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 464
3rd paragraph

Note Fastai has changed so this should be
head = create_head(512*2,2,ps=0.5) not
head = create_head(512*4,2,ps=0.5) and revise text.

Note from the Author or Editor:
Indeed, this should be changed as suggested.

Conwyn Flavell  Apr 15, 2021  May 07, 2021
Printed
Page 465
just above 'x = self.emb_drop(x)'

"You can pass `emb_drop` to `__init__` to change this value:"

i could not find init parameter emb_drop forTabularModel
https://github.com/fastai/fastai/blob/master/fastai/tabular/model.py

This should be "You can pass `embd_p` to `__init__` to change the dropout probabilty:" ?

Note from the Author or Editor:
Replace `emb_drop` by `embed_p` on p467, in the sentence "you can pass emb_drop to __init__ to change this value"

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 487
last list

CancelFitException and CancelBatchException

the label and explanation do not match.

CancelFitException and CancelBatchException
should be exchanged.

Note from the Author or Editor:
In p487-488, switch the labels CancelFitException and CancelBatchException.

HIDEMOTO NAKADA  Jan 16, 2021  Dec 18, 2020
Printed
Page 501
-2

"Scale (1d tensor): (1) 256 x 256"

it is 2d tensor, not 1d, i guess.

Note from the Author or Editor:
In the last paragraph of the page, replace 1d in "Scaled (1d tensor): (1) 256 x 256" by 2d.

HIDEMOTO NAKADA  Jan 16, 2021  Dec 18, 2020
Printed
Page 502
rules of Einstein summation



"2. Each index can appear at most twice in any term."

what do you mean by 'term' here? if you count right hand side, it can appear
more than twice.

In https://ajcr.net/Basic-guide-to-einsum/ we can see an example like 'i,i->i'

"3. Each term must contain identical nonrepeated indices. "

this is also doubtful according to the blog above.

Note from the Author or Editor:
Replace the rules by:

1. Repeated indices on the left side are implicitly summed over if they are not on the right side.
2. Each index can appear at most twice on the left side.
3. The unrepeated indices on the left side must appear on the right side.

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 503
l.2

torch.einsum('bi,ij,bj->b', a, b, c)

the char 'b' is used for mat and index char. this is quite confusing.

Note from the Author or Editor:
Replace "torch.einsum('bi,ij,bj->b', a, b, c)" by "torch.einsum('bi,ij,bj->b', x, y, z)" and later on "torch.einsum('bik,bkj->b', a, b)" by "torch.einsum('bik,bkj->b', x, y)"

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 505
l.3

"the scale of our activations will go from 1 to 0.1, and after 100 layers"

the follwing code uses 50, not 100.

Note from the Author or Editor:
Replace 100 by 50

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 509
5th paragraph

"For the gradients of the ReLU and our linear layer, we use the gradients of the loss with respect to the output (in out.g) and apply the chain rule to compute the gradients of the loss with respect to the output (in inp.g)."

the last 'the gradients of the loss with respect to the output (in inp.g)' is doubtful.
'output' should be 'input', or 'output of the previous layer'?

Note from the Author or Editor:
Replace
"For the gradients of the ReLU and our linear layer, we use the gradients of the loss with respect to the output (in out.g) and apply the chain rule to compute the gradients of the loss with respect to the output (in inp.g)."
by
"For the gradients of the ReLU and our linear layer, we use the gradients of the loss with respect to the output (in out.g) and apply the chain rule to compute the gradients of the loss with respect to the input (in inp.g)."

HIDEMOTO NAKADA  Jan 16, 2021  May 07, 2021
Printed
Page 510
the last paragraph of the column

Here, SymPy has taken the derivative of x**2 for us!

'x' should be 'sx',

Here, SymPy has taken the derivative of sx**2 for us!

Note from the Author or Editor:
Yes, the sentence should be fixed as proposed.

HIDEMOTO NAKADA  Feb 19, 2021  May 07, 2021
Printed
Page 513
1st paragraph

The computation of bwd for the Lin(LayerFunction), in the second line, refers to self.inp and self.out.

class Lin(LayerFunction):
def __init__(self, w, b): self.w,self.b = w,b

def forward(self, inp): return inp@self.w + self.b

def bwd(self, out, inp):
inp.g = out.g @ self.w.t()
self.w.g = self.inp.t() @ self.out.g
self.b.g = out.g.sum(0)

Should bwd be changed to the below?

def bwd(self, out, inp):
inp.g = out.g @ self.w.t()
self.w.g = inp.t() @ out.g # <--- inp instead of self.inp. ditto for out
self.b.g = out.g.sum(0)

Kaushik Sinha  Jun 28, 2021  Nov 05, 2021
Printed
Page 521
3rd paragraph

"To do the dot product of our weight matrix (2 by number of activations) with the
activations (batch size by activations by rows by cols), we use a custom einsum":

the activation does not include the batch size.

Note from the Author or Editor:
Replace "(batch size by activations by rows by cols)" by "(batch size by rows by cols)"

HIDEMOTO NAKADA  Feb 19, 2021  May 07, 2021
Printed
Page 524
2nd code snippet

x.shape

I cannot understand why we check the shape of x here.
I guess it should be

act.shape

Note from the Author or Editor:
p521 replace

x.shape
torch.Size([1, 3, 224, 224])

by

act.shape
torch.Size([512, 7, 7])

HIDEMOTO NAKADA  Feb 19, 2021  May 07, 2021
Printed
Page 532
last line

"Before we do, we’ll call a hook, if it’s defined. "

'Before' should be 'After', according to the code above.

# I assume that by 'do' the authors mean 'call forward'.



Note from the Author or Editor:
replace the sentence "Before we do, we'll call a hook, if it's defined" by "After, we call a hook, if it's defined".

HIDEMOTO NAKADA  Feb 19, 2021  May 07, 2021