5. Improving DQN

In this chapter, we will look at three modifications to the DQN algorithm—target networks, Double DQN [141], and Prioritized Experience Replay [121]. Each modification addresses a separate issue with DQN, so they can be combined to yield significant performance improvements.

In Section 5.1 we discuss target networks which are lagged copies of Q^π(s,a). The target network is then used to generate the maximum Q-value in the next state s′ when calculating Qtarπ, in contrast to the DQN algorithm from Chapter 4 which used Q^π(s,a) when calculating Qtarπ. This helps to stabilize training by reducing the speed at which Qtarπ changes.

Next, we discuss the Double DQN algorithm in Section 5.2. Double DQN uses two Q-networks to calculate ...

Get Foundations of Deep Reinforcement Learning: Theory and Practice in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.