Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A couple of questions ... #2

Open
osh opened this issue May 15, 2016 · 1 comment
Open

A couple of questions ... #2

osh opened this issue May 15, 2016 · 1 comment

Comments

@osh
Copy link

osh commented May 15, 2016

  1. The default memory is set up as 50 episodes in your code, the deep-mind Atari paper uses 1,000,000 -- this seems like a major limiting issue when training -- can you achieve similar performance with 50 ?
  2. The deep-mind paper runs for 100 x 50,000 episode epochs during training roughly, do you see roughly the same training time to proficiency on the same games here with similar scores?
  3. Looking at a few new environments and value fn networks, I have experienced somewhat unstable average max value function outputs (growing undoubtedly or large fluctuation) -- I'm wondering if it makes sense to do some kind of action value target clipping or re-scaling - this seems to occur when the network is having difficulty differentiating next states from current states (lots of the inputs are the same or indistinguishable but a few are slightly different) -- is this kind of instability expected? what kind of training time is expected to achieve more stability here?
@milesbrundage
Copy link

milesbrundage commented May 16, 2016

Re: number 1, note that 1,000,000 refers to frames, whereas 50 in this code refers to episodes. I am currently trying a 1,000 episode memory (among other hyperparameter changes) and will submit a pull request with those details if it works well. 1,000 episodes seemed to be in the ballpark of 1 million frames (60 fps * 1000 * 16 seconds or so per episode = close to a million, though the episode length in frames will increase over time if the performance improves).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants