8. Parallelization Methods

One common theme in our discussions of the deep RL algorithms introduced in this book is that they are sample-inefficient. For nontrivial problems, millions of experiences are typically required before an agent learns to perform well. It may take days or weeks to generate sufficiently many experiences if data gathering is done sequentially by a single agent running in a single process.

Another theme arising from our discussion of the DQN algorithm is the importance of diverse and decorrelated training data to stabilize and accelerate learning. The use of experience replay helps achieve this, but it depends on DQN being an off-policy algorithm. Consequently, this approach is not available to policy gradient algorithms, ...

Get Foundations of Deep Reinforcement Learning: Theory and Practice in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.