Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Areas for improvement #1

Open
milesbrundage opened this issue May 4, 2016 · 6 comments
Open

Areas for improvement #1

milesbrundage opened this issue May 4, 2016 · 6 comments

Comments

@milesbrundage
Copy link

milesbrundage commented May 4, 2016

I'm working with this code and have made a few changes already (would submit a pull request but they way I've done them is pretty hacky and I've never done pull requests before :) ). They are:

  • changing hyperparameters (.9 --> .99 for discount, .1 to .05 for epsilon). These are based on the Mnih et al. 2015 hyperparameters. Not sure what the performance effect is but .1 seems high for a final epsilon level.
  • adding video output
  • epsilon annealing (this was done hackily, by manually specifying epsilons for different episode intervals, but could be done more cleanly. Per Mnih et al., I am starting with epsilon of 1 and annealing roughly linearly for a thousand episodes).

Other possible areas for improvement:

  • grayscaling for efficiency
  • frame skip for efficiency

I'd potentially be interested in pull requesting some of these if I can figure out how, but just thought I'd post this first to get thoughts on the above/see if people have other ideas for key areas of improvement.

@sherjilozair
Copy link
Owner

sherjilozair commented May 6, 2016

Hi @milesbrundage,
Thanks for trying out the code.

I like your ideas for improvement. I had been trying similar stuff myself, but didn't find time to complete the experiments.

If you find success with certain hyperparameters and modifications, do send in a pull request. It would help people get started with DQNs, if the default run works and learns a good policy.

Thanks again!

@milesbrundage
Copy link
Author

Sounds good! Just started a new run with epsilon annealing and a lot of hyperparameter changes... will see how that goes and send a pull request if it goes well.

@ShibiHe
Copy link

ShibiHe commented May 31, 2016

How's your improvement? I would like to make grayscaling and frame skip too.

@milesbrundage
Copy link
Author

milesbrundage commented May 31, 2016

I unfortunately haven't had time to do frame skip or grayscaling yet, but have been running training on Breakout with changes to hyperparameters and with epsilon annealing for about 7000 iterations so far - still early but I'm hopeful it will improve a lot eventually. If you are interested, here is my example.py with the epsilon annealing and video recording (record every 100 iterations, output to /tmp/) code. I think the epsilon annealing should probably go on longer, but this is one way to do it/is easily modified to go on longer by changing the numbers. It explores at the specified epsilon for 99 iterations and then goes full exploitation just for recording purposes.

import sys
import gym
from dqn import Agent

num_episodes = 10000

env_name = sys.argv[1] if len(sys.argv) > 1 else "Breakout-v0"
env = gym.make(env_name)
env.monitor.start('/tmp/Breakout4-v0', video_callable=lambda count: count % 100 == 0)

agent = Agent(state_size=env.observation_space.shape,
              number_of_actions=env.action_space.n,
              save_name=env_name)

for e in xrange(num_episodes):
    if e < 100 and (e % 100 == 0) == False:
        epsilon = 1
    if 99 < e < 200 and (e % 100 == 0) == False:
        epsilon = .9
    if 199 < e < 300 and (e % 100 == 0) == False:
        epsilon = .8
    if 299 < e < 400 and (e % 100 == 0) == False:
        epsilon = .7
    if 399 < e < 500 and (e % 100 == 0) == False:
        epsilon = .6
    if 499 < e < 600 and (e % 100 == 0) == False:
        epsilon = .5
    if 599 < e < 700 and (e % 100 == 0) == False:
        epsilon = .4
    if 699 < e < 800 and (e % 100 == 0) == False:
        epsilon = .3
    if 799 < e < 900 and (e % 100 == 0) == False:
        epsilon = .2
    if 899 < e < 1000 and (e % 100 == 0) == False:
        epsilon = .1
    if 1000 < e and (e % 100 == 0) == False:
        epsilon = .05
    if e % 100 == 0:
        epsilon = 0
    if e == 0:
        epsilon = 0

    observation = env.reset()
    done = False
    agent.new_episode()
    total_cost = 0.0
    total_reward = 0.0
    frame = 0
    while not done:
        frame += 1
        #env.render()
        action, values = agent.act(observation)
        #action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        total_cost += agent.observe(reward)
        total_reward += reward
    print "total reward", total_reward
    print "mean cost", total_cost/frame

env.monitor.close()


@milesbrundage
Copy link
Author

milesbrundage commented May 31, 2016

(had an error in the above the first time I posted it, but just fixed - my computer has crashed a few times while running this so sometimes I've changed it when restoring, but I think the above is good now - let me know if you find any issues!)

Update: this is now a pull request... I've never done a pull request before so go easy on me if I did it wrong ;)

@milesbrundage
Copy link
Author

I also just saw that the description of the Breakout environment (and the other Atari environments) seems to suggest actions are already automatically repeated, though not sure how this should relate to implementing frame skip (?): https://gym.openai.com/envs/Breakout-v0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants