-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eval mode rework; refactoring #279
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Eval rework
This PR adds an eval mode that is the same as OpenAI baseline. Spawn 2 environments, 1 for training and 1 more eval. In the same process (blocking), run training as usual, then at ckpt, run an episode on eval env and update stats.
The logic for the stats are the same as before, except the original
body.df
is now split into two:body.train_df
andbody.eval_df
. Eval df uses the main env stats except fort, reward
to reflect progress on eval env. Correspondingly, session analysis also produces both versions of data.Data from
body.eval_df
is used to generatesession_df, session_graph, session_fitness_df
, whereas the data frombody.train_df
is used to generate a new set oftrainsession_df, trainsession_graph, trainsession_fitness_df
for debugging.The previous process-based eval functionality is kept, but is now considered as
parallel_eval
. This can be useful for more robust checkpointing and eval.Additionally:
env.e.save_frequency
is generalized tometa.eval_frequency
env.e.max_tick_unit
is generalized tometa.max_tick_unit
body.log_summary()
now directly print fromeval_df, train_df
agent.update()
method to allow for clean eval modectx_lab_mode
for contextual lab mode to run eval loop with contextrun_eval_episode
for eval modeexplore_var
toend_val
using eval context, and restore after contextRefactor
EVAL_MODES, TRAIN_MODES
; refactor by introducingutil.in_eval_lab_modes()
body.last_loss
in favor of a singlebody.loss
NoOpLRScheduler
API consistent