Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval mode rework; refactoring #279

Merged
merged 29 commits into from
Feb 4, 2019
Merged

Eval mode rework; refactoring #279

merged 29 commits into from
Feb 4, 2019

Conversation

kengz
Copy link
Owner

@kengz kengz commented Feb 3, 2019

Eval rework

This PR adds an eval mode that is the same as OpenAI baseline. Spawn 2 environments, 1 for training and 1 more eval. In the same process (blocking), run training as usual, then at ckpt, run an episode on eval env and update stats.

The logic for the stats are the same as before, except the original body.df is now split into two: body.train_df and body.eval_df. Eval df uses the main env stats except for t, reward to reflect progress on eval env. Correspondingly, session analysis also produces both versions of data.

Data from body.eval_df is used to generate session_df, session_graph, session_fitness_df, whereas the data from body.train_df is used to generate a new set of trainsession_df, trainsession_graph, trainsession_fitness_df for debugging.

The previous process-based eval functionality is kept, but is now considered as parallel_eval. This can be useful for more robust checkpointing and eval.

Additionally:

  • spec key env.e.save_frequency is generalized to meta.eval_frequency
  • spec key env.e.max_tick_unit is generalized to meta.max_tick_unit
  • body.log_summary() now directly print from eval_df, train_df
  • truly group all update methods into agent.update() method to allow for clean eval mode
  • group all adhoc variable updates (entropy, log_prob, grad_norm) under its methods
  • introduce ctx_lab_mode for contextual lab mode to run eval loop with context
  • introduce run_eval_episode for eval mode
  • set variables such as explore_var to end_val using eval context, and restore after context
  • update specs

Refactor

  • declare lab modes EVAL_MODES, TRAIN_MODES; refactor by introducing util.in_eval_lab_modes()
  • purge body.last_loss in favor of a single body.loss
  • make NoOpLRScheduler API consistent
  • purge useless computations in memory, etc.

kengz added 29 commits February 2, 2019 11:26
@kengz kengz merged commit 033f4e3 into master Feb 4, 2019
@kengz kengz deleted the eval-env branch February 4, 2019 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant