Errata

Errata for Hands-On Large Language Models

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
ePub	Page Chaptwe 12, Section 3: Training No Reward Model Figure 12-32	Figure 12-32 At the very bottom of the figure, it says, "Increase likelihood of rejected generation." Shouldn't it say "decrease" instead of "increase," as we are trying to optimize a trainable model to have a lower likelihood of generating rejected samples?	Anonymous	Sep 17, 2024
ePub	Page Chapter 1, Section "Interfacing with Large Language Models", subsection "Open Models", "NOTE" box "NOTE" Box, second sentence	The sentence misstates the meaning of "permissive" by stating that it means a model "cannot" be used for X purposes. The opposite is true, e.g.: - a permissive license means a model "can" be used for X purposes - a restrictive license means a model "cannot" be used for X purposes Original text: "For instance, some publicly shared models have a permissive commercial license, which means that the model cannot be used for commercial purposes."	Aaron Carver	Sep 19, 2024
O'Reilly learning platform	Page Choosing a Single Token from the Probability Distribution (Sampling/Decoding) 6th paragraph (1-indexing)	In the code example where the prompt "The capital of France is" is tokenized and passed through a language model, the textbook states that the expected shape of the lm_head_output tensor is [1, 6, ...] – and I think this may be incorrect. Since the prompt tokenizes into 5 tokens, the actual shape of `lm_head_output` should be [1, 5, ...] as the model hasn't yet predict the next token. The model provides outputs for each input token without generating additional tokens unless explicitly programmed to do so. When checking the most probable next token, it does so by checking the last position ([-1]) but that index corresponds to 4 and not 5. Also, I think that adding the following code snipped would enhance a lot this example: > [print(f"{tokenizer.decode(input_ids[0][:i+1])} -> {tokenizer.decode(t_pred_next.argmax(-1))}") for i, t_pred_next in enumerate(lm_head_output[0])]; The -> code The capital -> of The capital of -> the The capital of France -> is The capital of France is -> Paris As it states clear what's generated and what's "only yet predicted" at each steps. Let me know your thoughts on my reasoning – looking forward to your answer! Thanks for all the amazing content and knowledge shared with the community over the past years.	María Benavente	Sep 22, 2024