Rework eval mode; major refactoring
Eval rework
This release adds an eval mode that is the same as OpenAI baseline. Spawn 2 environments, 1 for training and 1 more eval. In the same process (blocking), run training as usual, then at ckpt, run an episode on eval env and update stats.
The logic for the stats are the same as before, except the original body.df
is now split into two: body.train_df
and body.eval_df
. Eval df uses the main env stats except for t, reward
to reflect progress on eval env. Correspondingly, session analysis also produces both versions of data.
Data from body.eval_df
is used to generate session_df, session_graph, session_fitness_df
, whereas the data from body.train_df
is used to generate a new set of trainsession_df, trainsession_graph, trainsession_fitness_df
for debugging.
The previous process-based eval functionality is kept, but is now considered as parallel_eval
. This can be useful for more robust checkpointing and eval.
Refactoring
- purge useless computations
- properly and efficiently gather and organize all update variable computations.
This also speeds up run time by x2. For Atari Beamrider with DQN on V100 GPU, manual benchmark measurement gives 110 FPS for training every 4 frames, while eval achieves 160 FPS. This translates to 10M frames in roughly 24 hours.