-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQN is not converging even after 15M timesteps #214
Comments
Have you tried using the parameters and/or other code from the zoo repository? I used parameters from SB2-zoo (without priorization/dueling/etc) recently when matching the performance and things worked out as expected (see #110). |
Thanks, @Miffyli ! For my research work, I have stuck with this issue for more than 2 weeks. |
Yes, as mentioned in the documentation, please use the rl zoo if you want to replicate results. It is only one line: python train.py --algo dqn --env PongNoFrameskip-v4 --eval-episodes 10 --eval-freq 50000 Note: the reward in evaluation is the clipped one for now (see #181 ) EDIT: I'm currently doing one run too to check and it gives me: |
Closing this as I'm getting |
I got the same issue here after training for 10M on Pong, so is there anything wrong with the benchmark hyperparameters or does the performance depend on pytorch version? Here is my command
and config log
I will try the command above today and report it. |
make sure to have latest SB3, RL Zoo, gym and PyTorch version. For the v1.0 release, I trained DQN on many environments and below is the learning curve for Pong (because of frameskip, the displayed number of timesteps must be divided by 4): You can find the pre-trained agent and associated hyperparameters in the
|
I have rerun my experiments with different seeds and see a weird result. Currently, the code seems to be seed dependent: I get poor performance with this |
See doc "tips and tricks" and "reproducibility":
One thing you can do is augment the replay buffer size to 1e5 or 1e6 (if it fits in your RAM) (I think I may have forgotten to set it back to higher value, even though it seems to work in most cases, cf benchmark). |
Question
I am training Pong-v4/PongNoFrameskip-v4 with DQN. It gives me around ~(-20 to -21) even after 1.5e7 timesteps. I tried various parameters for DQN still it gives me the same output. I could not find proper hyper-parameters for DQN. I think it is a problem with DQN.
Additional context
In the beginning Training of agent start with around -20.4 to -20.2. After 3e6 timesteps it reaches -21 and then it fluctuates in a range between -20.8 to -21.
I tried the following varieties of DQN in which I experimented with different combination of following:
learning_starts in [default-50k,5k,100k],
gamma [0.98,0.99,0.999],
exploration_final_eps [0.02,0.05],
learning_rate [1e-3,1e-4,5e-4] and
buffer_size [50k,500k,1000k].
Above combination is applied into below code.
Since I already tried the above mentions combinations, I tend to think to have a bug in DQN implementation.
Checklist
The text was updated successfully, but these errors were encountered: