Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQN is not converging even after 15M timesteps #214

Closed
2 tasks done
MilanVZinzuvadiya opened this issue Nov 3, 2020 · 8 comments
Closed
2 tasks done

DQN is not converging even after 15M timesteps #214

MilanVZinzuvadiya opened this issue Nov 3, 2020 · 8 comments
Labels
question Further information is requested RTFM Answer is the documentation

Comments

@MilanVZinzuvadiya
Copy link

MilanVZinzuvadiya commented Nov 3, 2020

Question

I am training Pong-v4/PongNoFrameskip-v4 with DQN. It gives me around ~(-20 to -21) even after 1.5e7 timesteps. I tried various parameters for DQN still it gives me the same output. I could not find proper hyper-parameters for DQN. I think it is a problem with DQN.

Additional context

In the beginning Training of agent start with around -20.4 to -20.2. After 3e6 timesteps it reaches -21 and then it fluctuates in a range between -20.8 to -21.

I tried the following varieties of DQN in which I experimented with different combination of following:
learning_starts in [default-50k,5k,100k],
gamma [0.98,0.99,0.999],
exploration_final_eps [0.02,0.05],
learning_rate [1e-3,1e-4,5e-4] and
buffer_size [50k,500k,1000k].

Above combination is applied into below code.

model = DQN('CnnPolicy',env,verbose=1,learning_starts=50000,gamma=0.98,exploration_final_eps=0.02,learning_rate=1e-3)
model.learn(total_timesteps=int(1.5e7),log_interval=10)

Since I already tried the above mentions combinations, I tend to think to have a bug in DQN implementation.

Checklist

  • I have read the documentation (required)
  • I have checked that there is no similar issue in the repo (required)
@MilanVZinzuvadiya MilanVZinzuvadiya added the question Further information is requested label Nov 3, 2020
@Miffyli
Copy link
Collaborator

Miffyli commented Nov 3, 2020

Have you tried using the parameters and/or other code from the zoo repository? I used parameters from SB2-zoo (without priorization/dueling/etc) recently when matching the performance and things worked out as expected (see #110).

@MilanVZinzuvadiya
Copy link
Author

Thanks, @Miffyli ! For my research work, I have stuck with this issue for more than 2 weeks.
I didn't use exactly the same combination. Furthermore, I didn't use stacking mentioned in parameters.
Now, I started training with the exact same combination mentioned in parameters. I will give you an update after the training.

@araffin
Copy link
Member

araffin commented Nov 3, 2020

Yes, as mentioned in the documentation, please use the rl zoo if you want to replicate results. It is only one line:

python train.py --algo dqn --env PongNoFrameskip-v4 --eval-episodes 10 --eval-freq 50000

Note: the reward in evaluation is the clipped one for now (see #181 )

EDIT: I'm currently doing one run too to check and it gives me:
Eval num_timesteps=750000, episode_reward=-15.20 +/- 2.36 (so already looking good even before 1M steps)

@araffin araffin added the RTFM Answer is the documentation label Nov 3, 2020
@araffin
Copy link
Member

araffin commented Nov 3, 2020

Closing this as I'm getting Eval num_timesteps=1850000, episode_reward=20.40 +/- 0.66 (so almost perfect score after ~2M steps) with the rl zoo, see command in the previous comment (using SB3 v0.10.0).

@araffin araffin closed this as completed Nov 3, 2020
@chongyi-zheng
Copy link

I got the same issue here after training for 10M on Pong, so is there anything wrong with the benchmark hyperparameters or does the performance depend on pytorch version?

Here is my command

--algo dqn --env PongNoFrameskip-v4

and config log

========== PongNoFrameskip-v4 ==========
Seed: 3242554354
OrderedDict([('batch_size', 32),
             ('buffer_size', 10000),
             ('env_wrapper',
              ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
             ('exploration_final_eps', 0.01),
             ('exploration_fraction', 0.1),
             ('frame_stack', 4),
             ('gradient_steps', 1),
             ('learning_rate', 0.0001),
             ('learning_starts', 100000),
             ('n_timesteps', 10000000.0),
             ('optimize_memory_usage', True),
             ('policy', 'CnnPolicy'),
             ('target_update_interval', 1000),
             ('train_freq', 4)])

I will try the command above today and report it.

@araffin
Copy link
Member

araffin commented Apr 3, 2021

I got the same issue here after training for 10M on Pong, so is there anything wrong with the benchmark hyperparameters or does the performance depend on pytorch version?

make sure to have latest SB3, RL Zoo, gym and PyTorch version.

For the v1.0 release, I trained DQN on many environments and below is the learning curve for Pong (because of frameskip, the displayed number of timesteps must be divided by 4):

Training_Episodic_Reward_dqn_pong

You can find the pre-trained agent and associated hyperparameters in the rl-trained-agents folder.
To plot the training curve:

python scripts/plot_train.py -a dqn -e Pong -f rl-trained-agents/

@chongyi-zheng
Copy link

I have rerun my experiments with different seeds and see a weird result. Currently, the code seems to be seed dependent: I get poor performance with this seed = 1738194436 and promising performance with seed = 3242554354. Would you mind have a run to confirm this?

@araffin
Copy link
Member

araffin commented Apr 5, 2021

I have rerun my experiments with different seeds and see a weird result.

See doc "tips and tricks" and "reproducibility":

One thing you can do is augment the replay buffer size to 1e5 or 1e6 (if it fits in your RAM) (I think I may have forgotten to set it back to higher value, even though it seems to work in most cases, cf benchmark).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested RTFM Answer is the documentation
Projects
None yet
Development

No branches or pull requests

4 participants