9. Algorithm Summary

There are three defining characteristics of the algorithms we have introduced in this book. First, is an algorithm on-policy or off-policy? Second, what types of action spaces can it be applied to? And third, what functions does it learn?

REINFORCE, SARSA, A2C, and PPO are all on-policy algorithms, whereas DQN and Double DQN + PER are off-policy. SARSA, DQN, and Double DQN + PER are value-based algorithms that learn to approximate the Qπ function. Consequently, they are only applicable to environments with discrete action spaces.

REINFORCE is a pure policy-based algorithm and so only learns a policy π. A2C and PPO are hybrid methods which learn a policy π and the Vπ function. REINFORCE, A2C, and PPO can all be applied to ...

Get Foundations of Deep Reinforcement Learning: Theory and Practice in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.