[Question] The actual training timesteps don't correspond with the hyper-parameters for Atari #367
Closed
5 tasks done
Labels
question
Further information is requested
❓ Question
Hi,
As the title says, it seems the issue only occurs in Atari. Here are some commands and images for reference:
Experiment Command:
python train.py --algo ppo --env PongNoFrameskip-v4
Training Plotting Command:
python scripts/plot_train.py -a ppo -e PongNoFrameskip-v4 -f logs
Evaluation Plotting Command:
python scripts/all_plots.py -a ppo -e PongNoFrameskip-v4 -f logs --no-million -max 10000000
We can tell the number of the training timesteps is about 4e7 instead of 1e7 (n_timesteps in the hyper-parameters). The issue doesn't exist in the environments except for Atari based on my experiment results. If you want to reproduce the same issue, you can simply replace the hyper-parameter n_timesteps with a small number like 1e4 and you will find there are much more than 1e4 samples according to the episodic lengths in the logs.
Thank you so much in advance!
Checklist
The text was updated successfully, but these errors were encountered: