Foundations of Deep Reinforcement Learning: Theory and Practice in Python

5. Improving DQN

In this chapter, we will look at three modifications to the DQN algorithm—target networks, Double DQN [141], and Prioritized Experience Replay [121]. Each modification addresses a separate issue with DQN, so they can be combined to yield significant performance improvements.

In Section 5.1 we discuss target networks which are lagged copies of ${\hat{Q}}^{π} (s, a)$ . The target network is then used to generate the maximum Q-value in the next state s′ when calculating $Q_{tar}^{π}$ , in contrast to the DQN algorithm from Chapter 4 which used ${\hat{Q}}^{π} (s, a)$ when calculating $Q_{tar}^{π}$ . This helps to stabilize training by reducing the speed at which $Q_{tar}^{π}$ changes.

Next, we discuss the Double DQN algorithm in Section 5.2. Double DQN uses two Q-networks to calculate ...

Get Foundations of Deep Reinforcement Learning: Theory and Practice in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Foundations of Deep Reinforcement Learning: Theory and Practice in Python by Laura Graesser, Wah Loon Keng

5. Improving DQN

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly