A couple of questions ... #2

osh · 2016-05-15T03:32:50Z

The default memory is set up as 50 episodes in your code, the deep-mind Atari paper uses 1,000,000 -- this seems like a major limiting issue when training -- can you achieve similar performance with 50 ?
The deep-mind paper runs for 100 x 50,000 episode epochs during training roughly, do you see roughly the same training time to proficiency on the same games here with similar scores?
Looking at a few new environments and value fn networks, I have experienced somewhat unstable average max value function outputs (growing undoubtedly or large fluctuation) -- I'm wondering if it makes sense to do some kind of action value target clipping or re-scaling - this seems to occur when the network is having difficulty differentiating next states from current states (lots of the inputs are the same or indistinguishable but a few are slightly different) -- is this kind of instability expected? what kind of training time is expected to achieve more stability here?

milesbrundage · 2016-05-16T15:21:20Z

Re: number 1, note that 1,000,000 refers to frames, whereas 50 in this code refers to episodes. I am currently trying a 1,000 episode memory (among other hyperparameter changes) and will submit a pull request with those details if it works well. 1,000 episodes seemed to be in the ballpark of 1 million frames (60 fps * 1000 * 16 seconds or so per episode = close to a million, though the episode length in frames will increase over time if the performance improves).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A couple of questions ... #2

A couple of questions ... #2

osh commented May 15, 2016 •

edited

Loading

milesbrundage commented May 16, 2016 •

edited

Loading

A couple of questions ... #2

A couple of questions ... #2

Comments

osh commented May 15, 2016 • edited Loading

milesbrundage commented May 16, 2016 • edited Loading

osh commented May 15, 2016 •

edited

Loading

milesbrundage commented May 16, 2016 •

edited

Loading