-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Areas for improvement #1
Comments
Hi @milesbrundage, I like your ideas for improvement. I had been trying similar stuff myself, but didn't find time to complete the experiments. If you find success with certain hyperparameters and modifications, do send in a pull request. It would help people get started with DQNs, if the default run works and learns a good policy. Thanks again! |
Sounds good! Just started a new run with epsilon annealing and a lot of hyperparameter changes... will see how that goes and send a pull request if it goes well. |
How's your improvement? I would like to make grayscaling and frame skip too. |
I unfortunately haven't had time to do frame skip or grayscaling yet, but have been running training on Breakout with changes to hyperparameters and with epsilon annealing for about 7000 iterations so far - still early but I'm hopeful it will improve a lot eventually. If you are interested, here is my example.py with the epsilon annealing and video recording (record every 100 iterations, output to /tmp/) code. I think the epsilon annealing should probably go on longer, but this is one way to do it/is easily modified to go on longer by changing the numbers. It explores at the specified epsilon for 99 iterations and then goes full exploitation just for recording purposes.
|
(had an error in the above the first time I posted it, but just fixed - my computer has crashed a few times while running this so sometimes I've changed it when restoring, but I think the above is good now - let me know if you find any issues!) Update: this is now a pull request... I've never done a pull request before so go easy on me if I did it wrong ;) |
I also just saw that the description of the Breakout environment (and the other Atari environments) seems to suggest actions are already automatically repeated, though not sure how this should relate to implementing frame skip (?): https://gym.openai.com/envs/Breakout-v0 |
I'm working with this code and have made a few changes already (would submit a pull request but they way I've done them is pretty hacky and I've never done pull requests before :) ). They are:
Other possible areas for improvement:
I'd potentially be interested in pull requesting some of these if I can figure out how, but just thought I'd post this first to get thoughts on the above/see if people have other ideas for key areas of improvement.
The text was updated successfully, but these errors were encountered: