DQN is not converging even after 15M timesteps #214

MilanVZinzuvadiya · 2020-11-03T00:07:43Z

Question

I am training Pong-v4/PongNoFrameskip-v4 with DQN. It gives me around ~(-20 to -21) even after 1.5e7 timesteps. I tried various parameters for DQN still it gives me the same output. I could not find proper hyper-parameters for DQN. I think it is a problem with DQN.

Additional context

In the beginning Training of agent start with around -20.4 to -20.2. After 3e6 timesteps it reaches -21 and then it fluctuates in a range between -20.8 to -21.

I tried the following varieties of DQN in which I experimented with different combination of following:
learning_starts in [default-50k,5k,100k],
gamma [0.98,0.99,0.999],
exploration_final_eps [0.02,0.05],
learning_rate [1e-3,1e-4,5e-4] and
buffer_size [50k,500k,1000k].

Above combination is applied into below code.

model = DQN('CnnPolicy',env,verbose=1,learning_starts=50000,gamma=0.98,exploration_final_eps=0.02,learning_rate=1e-3)
model.learn(total_timesteps=int(1.5e7),log_interval=10)

Since I already tried the above mentions combinations, I tend to think to have a bug in DQN implementation.

Checklist

I have read the documentation (required)
I have checked that there is no similar issue in the repo (required)

Miffyli · 2020-11-03T00:14:47Z

Have you tried using the parameters and/or other code from the zoo repository? I used parameters from SB2-zoo (without priorization/dueling/etc) recently when matching the performance and things worked out as expected (see #110).

MilanVZinzuvadiya · 2020-11-03T01:36:16Z

Thanks, @Miffyli ! For my research work, I have stuck with this issue for more than 2 weeks.
I didn't use exactly the same combination. Furthermore, I didn't use stacking mentioned in parameters.
Now, I started training with the exact same combination mentioned in parameters. I will give you an update after the training.

araffin · 2020-11-03T10:08:48Z

Yes, as mentioned in the documentation, please use the rl zoo if you want to replicate results. It is only one line:

python train.py --algo dqn --env PongNoFrameskip-v4 --eval-episodes 10 --eval-freq 50000

Note: the reward in evaluation is the clipped one for now (see #181 )

EDIT: I'm currently doing one run too to check and it gives me:
Eval num_timesteps=750000, episode_reward=-15.20 +/- 2.36 (so already looking good even before 1M steps)

araffin · 2020-11-03T11:43:30Z

Closing this as I'm getting Eval num_timesteps=1850000, episode_reward=20.40 +/- 0.66 (so almost perfect score after ~2M steps) with the rl zoo, see command in the previous comment (using SB3 v0.10.0).

chongyi-zheng · 2021-04-03T09:56:23Z

I got the same issue here after training for 10M on Pong, so is there anything wrong with the benchmark hyperparameters or does the performance depend on pytorch version?

Here is my command

--algo dqn --env PongNoFrameskip-v4

and config log

========== PongNoFrameskip-v4 ==========
Seed: 3242554354
OrderedDict([('batch_size', 32),
             ('buffer_size', 10000),
             ('env_wrapper',
              ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
             ('exploration_final_eps', 0.01),
             ('exploration_fraction', 0.1),
             ('frame_stack', 4),
             ('gradient_steps', 1),
             ('learning_rate', 0.0001),
             ('learning_starts', 100000),
             ('n_timesteps', 10000000.0),
             ('optimize_memory_usage', True),
             ('policy', 'CnnPolicy'),
             ('target_update_interval', 1000),
             ('train_freq', 4)])

I will try the command above today and report it.

araffin · 2021-04-03T10:18:26Z

I got the same issue here after training for 10M on Pong, so is there anything wrong with the benchmark hyperparameters or does the performance depend on pytorch version?

make sure to have latest SB3, RL Zoo, gym and PyTorch version.

For the v1.0 release, I trained DQN on many environments and below is the learning curve for Pong (because of frameskip, the displayed number of timesteps must be divided by 4):

You can find the pre-trained agent and associated hyperparameters in the rl-trained-agents folder.
To plot the training curve:

python scripts/plot_train.py -a dqn -e Pong -f rl-trained-agents/

chongyi-zheng · 2021-04-05T06:36:46Z

I have rerun my experiments with different seeds and see a weird result. Currently, the code seems to be seed dependent: I get poor performance with this seed = 1738194436 and promising performance with seed = 3242554354. Would you mind have a run to confirm this?

araffin · 2021-04-05T09:28:14Z

I have rerun my experiments with different seeds and see a weird result.

See doc "tips and tricks" and "reproducibility":

One thing you can do is augment the replay buffer size to 1e5 or 1e6 (if it fits in your RAM) (I think I may have forgotten to set it back to higher value, even though it seems to work in most cases, cf benchmark).

MilanVZinzuvadiya added the question Further information is requested label Nov 3, 2020

araffin added the RTFM Answer is the documentation label Nov 3, 2020

araffin closed this as completed Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DQN is not converging even after 15M timesteps #214

DQN is not converging even after 15M timesteps #214

MilanVZinzuvadiya commented Nov 3, 2020 •

edited

Loading

Miffyli commented Nov 3, 2020

MilanVZinzuvadiya commented Nov 3, 2020

araffin commented Nov 3, 2020 •

edited

Loading

araffin commented Nov 3, 2020

chongyi-zheng commented Apr 3, 2021

araffin commented Apr 3, 2021

chongyi-zheng commented Apr 5, 2021

araffin commented Apr 5, 2021

DQN is not converging even after 15M timesteps #214

DQN is not converging even after 15M timesteps #214

Comments

MilanVZinzuvadiya commented Nov 3, 2020 • edited Loading

Question

Additional context

Checklist

Miffyli commented Nov 3, 2020

MilanVZinzuvadiya commented Nov 3, 2020

araffin commented Nov 3, 2020 • edited Loading

araffin commented Nov 3, 2020

chongyi-zheng commented Apr 3, 2021

araffin commented Apr 3, 2021

chongyi-zheng commented Apr 5, 2021

araffin commented Apr 5, 2021

MilanVZinzuvadiya commented Nov 3, 2020 •

edited

Loading

araffin commented Nov 3, 2020 •

edited

Loading