Errata

Hands-On Large Language Models

Errata for Hands-On Large Language Models

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date submitted
ePub Page Chaptwe 12, Section 3: Training No Reward Model
Figure 12-32

Figure 12-32

At the very bottom of the figure, it says, "Increase likelihood of rejected generation." Shouldn't it say "decrease" instead of "increase," as we are trying to optimize a trainable model to have a lower likelihood of generating rejected samples?

Anonymous  Sep 17, 2024 
ePub Page Chapter 1, Section "Interfacing with Large Language Models", subsection "Open Models", "NOTE" box
"NOTE" Box, second sentence

The sentence misstates the meaning of "permissive" by stating that it means a model "cannot" be used for X purposes. The opposite is true, e.g.:
- a permissive license means a model "can" be used for X purposes
- a restrictive license means a model "cannot" be used for X purposes

Original text:
"For instance, some publicly shared models have a permissive commercial license, which means that the model cannot be used for commercial purposes."

Aaron Carver  Sep 19, 2024 
O'Reilly learning platform Page Choosing a Single Token from the Probability Distribution (Sampling/Decoding)
6th paragraph (1-indexing)

In the code example where the prompt "The capital of France is" is tokenized and passed through a language model, the textbook states that the expected shape of the lm_head_output tensor is [1, 6, ...] – and I think this may be incorrect.

Since the prompt tokenizes into 5 tokens, the actual shape of `lm_head_output` should be [1, 5, ...] as the model hasn't yet predict the next token. The model provides outputs for each input token without generating additional tokens unless explicitly programmed to do so.

When checking the most probable next token, it does so by checking the last position ([-1]) but that index corresponds to 4 and not 5.

Also, I think that adding the following code snipped would enhance a lot this example:
> [print(f"{tokenizer.decode(input_ids[0][:i+1])} -> {tokenizer.decode(t_pred_next.argmax(-1))}") for i, t_pred_next in enumerate(lm_head_output[0])];
The -> code
The capital -> of
The capital of -> the
The capital of France -> is
The capital of France is -> Paris

As it states clear what's generated and what's "only yet predicted" at each steps.

Let me know your thoughts on my reasoning – looking forward to your answer! Thanks for all the amazing content and knowledge shared with the community over the past years.

María Benavente  Sep 22, 2024