You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default memory is set up as 50 episodes in your code, the deep-mind Atari paper uses 1,000,000 -- this seems like a major limiting issue when training -- can you achieve similar performance with 50 ?
The deep-mind paper runs for 100 x 50,000 episode epochs during training roughly, do you see roughly the same training time to proficiency on the same games here with similar scores?
Looking at a few new environments and value fn networks, I have experienced somewhat unstable average max value function outputs (growing undoubtedly or large fluctuation) -- I'm wondering if it makes sense to do some kind of action value target clipping or re-scaling - this seems to occur when the network is having difficulty differentiating next states from current states (lots of the inputs are the same or indistinguishable but a few are slightly different) -- is this kind of instability expected? what kind of training time is expected to achieve more stability here?
The text was updated successfully, but these errors were encountered:
Re: number 1, note that 1,000,000 refers to frames, whereas 50 in this code refers to episodes. I am currently trying a 1,000 episode memory (among other hyperparameter changes) and will submit a pull request with those details if it works well. 1,000 episodes seemed to be in the ballpark of 1 million frames (60 fps * 1000 * 16 seconds or so per episode = close to a million, though the episode length in frames will increase over time if the performance improves).
The text was updated successfully, but these errors were encountered: