The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Version |
Location |
Description |
Submitted By |
Date submitted |
Date corrected |
|
Page ?
Chapter 2, Running the Experiment, 4th paragraph |
The asymptote of the optimal action is given as:
1/n + (1-e)/n = (2-e)/n
Perhaps I am missing something, but I think the asymptotic behavior should be:
e/n + (1-e) = (e(1-n) + n)/n
which I got from algorithm 2-1:
e-proportion of the time, we pick the optimum solution with a probability of 1/n
During the remaining (1-e) proportion of the time, we always pick the optimum solution (in the long-running asymptote).
These two equations [ (2-e)/n or (e(1-n) +n)/n ] will give very different results. For example, if n = 10, the result for e -> 0 (perfect exploitation) for the given equation is 1/5 while my equation always achieves 1, a perfect selection of the optimal solution, irrespective of the number of actions possible.
Also, a more minor detail -- line 7 of Algorithm 2-1, I think the denominator should be N(a) (lower case a, not capital A), to denote we are dividing by the number of times the action was taken, not the total number of actions taken so far)
Note from the Author or Editor: Page 30, Algorithm 2-1:
In step 7, replace "N(A)" with "N(a)" -- lowercase a.
Page 32, last paragraph which begins with: "Looking toward the end of the experiment..."
The equation near the bottom should be replaced. The text should read: "The asymptote of the optimal action is e/n + (1-e), where ..."
|
Kenji Oman |
Dec 22, 2020 |
Jan 13, 2023 |
|
Page p.48
Figure 2-8 |
The "Right" and "Left" labels in Figure 2-8 need to be reveresed.
Note from the Author or Editor: Page 48, Figure 2-8.
The "Right" and "Left" labels on all of the four images are the wrong way around. "Left" should be on the top, "Right" should be on the bottom.
|
Andrew |
Mar 29, 2021 |
Jan 13, 2023 |
|
Page Prediction Error
Chapter 1 -> Fundamental Concepts in Reinforcement Learning -> The First RL Algorithm -> Prediction error |
The sentence “Knowledge of the previous state and the prediction error helps alter the weights. Multiplying these together, the result is δx(s)=[0,1]. Adding this to the current weights yields w=[1,0].”
I think the result of this formula `δx(s)` should be [0,-1] instead of [0,1] since the prior sentence says, “The value of Equation 1-2, δ, is equal to −1”. Considering the state x(s) = [0,1], multiplying δx(s) would yield [0,-1]. Then it would make sense that adding [0,-1] to the prior weights w = [1,1] to yield the new weights w = [1,0].
Note from the Author or Editor: Page 15, the sentence that currently reads "Multiplying these together, the result is δx(s)=[0,1]."
Should be: "Multiplying these together, the result is δx(s)=[0,-1]."
Note the minus sign at the end.
|
Nhan Tran |
Dec 28, 2022 |
Jan 13, 2023 |
Printed |
Page 29
Equation 2-3 |
Equation 2-3 should be: "r = r + α (r'-r)" -- note the ' should be on the first r.
|
Phil Winder |
Jan 02, 2023 |
Jan 13, 2023 |
|
Page 54
Algorithm 2-4 |
The algorithm exits the loop when DELTA is less than or equal to theta, but DELTA is always calculated as:
DELTA <- max(DELTA, (anything))
DELTA will never get smaller than its initial value. If that initial value is greater than theta, the algorithm will never exit its loop.
Note from the Author or Editor: Page 54, Algorithm 2-4:
1. From the end of step 2, remove ", ∇ ← 0"
2. Insert a new step between 3 and 4 - let's call it 3a so that the references in the text remain correct. Insert: "3.a ∇ ← 0" and indent to align with the word "loop" on line 4.
|
Patrick Doyle |
Jun 04, 2022 |
Jan 13, 2023 |
|
Page 62, 63, 64
(page 62) Equation 3-5, (page 63) Algorithm 3-1, (page 64) 2nd Paragraph 2nd Line |
In the Q-Learning formula the argmax should be just max.
Note from the Author or Editor: (page 62) Equation 3-5, (page 63) Algorithm 3-1, (page 64) 2nd Paragraph 2nd Line
Replace "argmax" with "max"
|
Manuel |
Mar 15, 2021 |
Jan 13, 2023 |
|
Page 65
Algorithm 3-2 |
Step 6 states:
Choose a from s using pi, breaking ties randomly
Since this is in a loop, the value of "a" updated at the end of the loop will be obliterated by choosing a new value for "a".
Note from the Author or Editor: Page 65, Algorithm 3-2:
1. Change step 4 to say "s, a ← Initialize s from the environment and choose a using π"
2. Remove step 6 entirely
3. Update all subsequent numbers to be contiguous
|
Patrick Doyle |
Jun 05, 2022 |
Jan 13, 2023 |
|
Page 123
Algorithm 5-1, step 7 |
Missing ln when calculating the gradient of π - it should have been:
θ ← θ + αγ^tG∇lnπ(a ∣ s, θ)
Note from the Author or Editor: Page 123, Algorithm 5-1:
Add "ln" to step 7, so that the equation reads as: "θ ← θ + αγ^tG∇lnπ(a ∣ s, θ)"
Page 125, Algorithm 5-2:
Add "ln" to step 9, so that the equation reads as: "θ ← θ + αγ^t????∇lnπ(a ∣ s, θ)"
|
Anonymous |
Aug 08, 2022 |
Jan 13, 2023 |
|
Page 129
Algorithm 5-3 |
1. Variable t isn't being updated after each step.
2. At step 6. there's no need to break ties randomly, since we aren't dealing with a deterministic action-value function, but with a stochastic policy that outputs probabilities.
3. At step 8. V(s, θ) should have been V(s, w) (weights "w" belong to the critic model V, while weights "θ" belong to the actor model π, as denoted in step 1.).
Similar errors appear at page 134 (Algorithm 5-4) at the corresponding steps (6 and 8).
Note from the Author or Editor: On page 129, Algorithm 5-3:
1. Add a 13th step to update t: "t <- t + 1", indent to align with line 12
2. Step 6: Remove ", breaking ties randomly" from the text
3. Step 8: change "V(s, θ)" to "V(s, w)"
On page 134, Algorithm 5-4:
1. Add a 15th step to update t: "t <- t + 1", indent to align with line 12
2. Step 6: Remove ", breaking ties randomly" from the text
3. Step 8: at the end of the line change "V(s, θ)" to "V(s, w)"
|
Anonymous |
Aug 09, 2022 |
Jan 13, 2023 |