This may be the paper which is recognized for introducing deep Q-networks (although the authors had already published an initial paper with the name: Playing Atari with Deep Reinforcement Learning).
Although the concept of using neural networks as function aproximators was not new, reinforcement learning was known to diverge or be unstable when using this kind of nonlinear functions. In their original paper, the authors proposed two very important ideas to solve (or alleviate) this problem:
-
The use of Experience Replay: In order to break the correlation between the observed experiences, they propose to store all these experiences (s,a,r,s',terminal) in a replay memory. When training the network, they use minibatches randomly sampled from the memory instead of the most recent observations.
-
The use of a Target Network: They have two sets of identical parameters: The training network (with parameters θ ) and the Target network (with parameters θ'). When training, they adjust the parameters θ using θ' as targets. The parameters θ', however, are only updated (by copying the training network parameters) every C number of steps and held fixed between individual updates.
The proposed approach learned how to play several atari games using raw images as the only input, outperforming human players in most of the games.