Errata

Natural Language Processing with Transformers, Revised Edition

Errata for Natural Language Processing with Transformers, Revised Edition

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Page BLEU section in chapter 6
2nd $p_n$ equation

Safari books online, chapter 6.

In numerator of 2nd $p_n$ equation,

$\sum_{snt \in C}$ should be $\sum_{snt' \in C}$

based on the original paper.

Thanks.

Note from the Author or Editor:
Agreed, thanks for reporting!

Haesun Park  Jul 31, 2022 
Chapter 2. Text Classification
7th paragraph


I'm running this code below from the notebook but the "emotion" dataset could not be loaded. It is complaining that it could not find the file from the dropbox location. The error is ... FileNotFoundError: Couldn't find file at

# hide_output
from datasets import load_dataset

emotions = load_dataset("emotion")

Note from the Author or Editor:
Thank you for reporting! We fixed this in the repo in February 2023, but let us know if you still have issues.

Charlie Bulosan  Dec 20, 2022 
ePub
Page Ch2, Ch3 ipynb files first block of codes
Ch2, Ch3 ipynb files's first block of codes

Hi,
My name is Sunjip Yim, lecturer in Woosong Univ. South Korea.
I am teaching the codes of the book
"Natural Language Processing with Transformers, Revised Edition" to my students.

Last year all codes of the book worked well.
However, this semester Ch2 and Ch3's first block of codes of the github malfunctioned.
And I guess the remaining chapters of the book will have the same problem.
I used the Google Colab to run the codes.
The error message is as follows:

# Uncomment and run this cell if you're on Colab or Kaggle

from install import *
install_requirements(is_chapter2=True)


Cloning into 'notebooks'...
remote: Enumerating objects: 515, done.
remote: Counting objects: 100% (161/161), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 515 (delta 139), reused 126 (delta 122), pack-reused 354
Receiving objects: 100% (515/515), 28.61 MiB | 30.27 MiB/s, done.
Resolving deltas: 100% (246/246), done.
/content/notebooks/notebooks
? Installing base requirements ...


---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-5-f181490118c1> in <cell line: 5>()
3 get_ipython().run_line_magic('cd', 'notebooks')
4 from install import *
----> 5 install_requirements(is_chapter2=True)

/content/notebooks/install.py in install_requirements(is_chapter2, is_chapter6, is_chapter7, is_chapter7_v2, is_chapter10, is_chapter11)
29 process_install = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
30 if process_install.returncode != 0:
---> 31 raise Exception("? Failed to install base requirements")
32 else:
33 print("? Base requirements installed!")

Exception: ? Failed to install base requirements


My skillset lies on the data analytics, but not that much level on the technical base.
Would you please fix the codes on the github and let me know?
I am suffering for not working codes in my class.

I really appreciate if your company or authors quickly fix this problem and let me know the solution.

Sincerely,
Sunjip Yim

Note from the Author or Editor:
Hello! We have fixed this with the following patch to the requirements; please let us know if you're still getting an error.

github.com/nlp-with-transformers/notebooks/pull/104

Sunjip Yim  May 03, 2023 
Printed
Page Preface xix
Using Code Examples

Hello,

when i try to use the supplemental material at github by using the google colab environment, i get following error when starting t chapter 1, cell 1

---------------
# Uncomment and run this cell if you're on Colab or Kaggle
!git clone .........
%cd notebooks
from install import *
install_requirements()



Cloning into 'notebooks'...
remote: Enumerating objects: 515, done.
remote: Counting objects: 100% (161/161), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 515 (delta 139), reused 126 (delta 122), pack-reused 354
Receiving objects: 100% (515/515), 28.61 MiB | 23.29 MiB/s, done.
Resolving deltas: 100% (246/246), done.
/content/notebooks/notebooks/notebooks/notebooks
? Installing base requirements ...

---------------------------------------------------------------------------

Exception Traceback (most recent call last)

<ipython-input-14-86579c675642> in <cell line: 5>()
3 get_ipython().run_line_magic('cd', 'notebooks')
4 from install import *
----> 5 install_requirements()

/content/notebooks/install.py in install_requirements(is_chapter2, is_chapter6, is_chapter7, is_chapter7_v2, is_chapter10, is_chapter11)
29 process_install = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
30 if process_install.returncode != 0:
---> 31 raise Exception("? Failed to install base requirements")
32 else:
33 print("? Base requirements installed!")

Exception: ? Failed to install base requirements

Note from the Author or Editor:
Hello! We have fixed this with the following patch to the requirements; please let us know if you're still getting an error.

github.com/nlp-with-transformers/notebooks/pull/104

Thomas Kranz  May 08, 2023 
Page 127
footnote 3

In footnote 3, 'model_name = "gpt-xl" with model_name = "gpt"'

should be

'model_name = "gpt2-xl" with model_name = "gpt2"'

Thanks.

Note from the Author or Editor:
Yes, that's a typo. Thanks for reporting!

Haesun Park  Jul 25, 2022 
Page 153
$F_{LCS}$ equation

In denominator of $F_{LCS}$ equation,

$R_{LCS} + \beta P_{LCS}$
should be
$R_{LCS} + \beta^2 P_{LCS}$

Thanks.

Note from the Author or Editor:
Indeed the exponent 2 is missing, thanks for reporting!

Haesun Park  Jul 31, 2022 
Page 153
1st paragraph

In book it stated that ROUGE-L calculates the "Longest common substring" between reference and generated text. But it should be "Longest common subsequence".

Note from the Author or Editor:
That's correct, thanks for reporting!

Kirushikesh DB  Aug 14, 2022 
Page 154
In <Note> box

In Note box,

"The average value is stored in the attribute mid"
should be
"The median value is stored in the attribute mid"

Thanks.

Note from the Author or Editor:
It should indeed be median, thanks for reporting!

Haesun Park  Jul 31, 2022 
Page 154
1st paragraph under <note> box

It says "T5 is slightly better on ROUGE-1 and the LCS scores".
But T5's ROUGE-1 is 0.486486 and LCS is 0.378378, BART's ROUGE-1 is 0.582278 and LCS is 0.455696
So T5 is not better than BART.
Please let me know the sentence's meaning.
Thanks.

Note from the Author or Editor:
Indeed, this sentence might have referred to old values in the table. We should change the sentence to "PEGASUS is the best models overall (higher ROUGE scores are better)", but again these [...]" and "[...] to outperform T5 and at least match BART on [...]"

Haesun Park  Jul 31, 2022 
Page 169
Table 7-1

In 3rd row of table 7-1,

'answers.answer_text' should be 'answers.text'

Thanks

Note from the Author or Editor:
That's correct!

Haesun Park  Aug 03, 2022 
Page 185
13th line from the top.

In 13th line from the top,

'q_review_id columns of SubjQA' should be 'id columns of SubjQA'

Thanks

Note from the Author or Editor:
Agreed, thanks for reporting!

Haesun Park  Aug 03, 2022 
Page 190
5th line from the top and 12th line

In 5th line from the top and 12th line,

Shouldn't EvalRetriever be changed to EvalDocuments?

Thanks.

Note from the Author or Editor:
Agreed, thanks for reporting!

Haesun Park  Aug 03, 2022 
Page 197
4th line from the bottom

In 4th line from the bottom,

Shouldn't EvalReader be changed to EvalAnswers?

Thanks.

Note from the Author or Editor:
Indeed, thanks for reporting!

Haesun Park  Aug 03, 2022 
Page 205
5th line from the bottom

In 5th line from the bottom,

Shouldn't DPRetriever be changed to dpr_retriever?

Thanks

Note from the Author or Editor:
Indeed, thanks for reporting!

Haesun Park  Aug 03, 2022 
Page 221
3rd line from the bottom

reduction=batchmean should be reduction="batchmean"

Thanks

Note from the Author or Editor:
Indeed, this can be fixed in the last sentence of text.

Haesun Park  Aug 10, 2022 
Page 243
1st paragraph

In last sentence of 1st paragraph,

Shouldn't "three-fold gain compared to our BERT baseline" be changed to "three-fold gain compared to our DistilBERT" or "five-fold gain compared to our BERT baseline" in terms of average latency.

Thanks.

Note from the Author or Editor:
Indeed, that's correct. Thanks for reporting!

Haesun Park  Aug 10, 2022 
Page 244
Last equation

In last equation,

Subscript k should be changed j to match the description above.

Thanks.

Note from the Author or Editor:
Thanks for submitting this report! The equation is correct, but I agree we can change the subscript from k to j for clarity

Haesun Park  Aug 10, 2022