Eval mode rework; refactoring #279

kengz · 2019-02-03T23:30:24Z

Eval rework

This PR adds an eval mode that is the same as OpenAI baseline. Spawn 2 environments, 1 for training and 1 more eval. In the same process (blocking), run training as usual, then at ckpt, run an episode on eval env and update stats.

The logic for the stats are the same as before, except the original body.df is now split into two: body.train_df and body.eval_df. Eval df uses the main env stats except for t, reward to reflect progress on eval env. Correspondingly, session analysis also produces both versions of data.

Data from body.eval_df is used to generate session_df, session_graph, session_fitness_df, whereas the data from body.train_df is used to generate a new set of trainsession_df, trainsession_graph, trainsession_fitness_df for debugging.

The previous process-based eval functionality is kept, but is now considered as parallel_eval. This can be useful for more robust checkpointing and eval.

Additionally:

spec key env.e.save_frequency is generalized to meta.eval_frequency
spec key env.e.max_tick_unit is generalized to meta.max_tick_unit
body.log_summary() now directly print from eval_df, train_df
truly group all update methods into agent.update() method to allow for clean eval mode
group all adhoc variable updates (entropy, log_prob, grad_norm) under its methods
introduce ctx_lab_mode for contextual lab mode to run eval loop with context
introduce run_eval_episode for eval mode
set variables such as explore_var to end_val using eval context, and restore after context
update specs

Refactor

declare lab modes EVAL_MODES, TRAIN_MODES; refactor by introducing util.in_eval_lab_modes()
purge body.last_loss in favor of a single body.loss
make NoOpLRScheduler API consistent
purge useless computations in memory, etc.

kengz added 29 commits February 2, 2019 11:26

add contextual lab mode in util

a2fbea2

store action_pd in body, group all such updates under agent.update()

cd75ce7

refactor calc_df_row and add body.eval_update

9b3d6c4

make calc_df_row use t to reflect consistency with reward

06f2d75

remove adhoc mean_entropy compute. delegate to body df update

a3ac629

use eval_df to compute body.current_reward_ma for ckpt

1e98816

simply to body.grad_norm, unify under action_pd_update efficiently

56a4864

update body debugging variables to use mean; move compute to flush

8c41437

retire body.last_loss

442eb92

guard ctx_lab_mode against None value

b871d40

make NoOpLRScheduler API consistent, simplify get_mean_lr

46333b4

improve logging for both train and eval using df directly

38b4847

format numerical logging to use :g instead of :.f

dc60e43

purge useless memory total_reward history

d7987a7

run_eval_episode in eval context

0ec465e

update changed key in openai gym max_timestep

250c195

set_rand_seed and close eval_env

83baa73

guard openai env re-register

5907f60

refactor parallel_eval method into retro_analysis, activate via metaspec

fb3858b

move max_tick_unit from env to meta_spec

9c68b28

generalize save_frequency to meta.eval_frequency

c1005f2

expand session analysis to work for both kinds of body df

ca9e07d

remove reindex_session_data

9a4fc03

refactor in_eval_lab_modes

377e7de

fix retro analysis wrong prefix in reading from file

df31c72

mute SessionSpace ckpt and eval

184f1a5

add tmp SpaceSession substitution for eval_df

e33a388

set vars to end_val for eval ctx

7a8cbce

fix codestyle

ec9756f

kengz merged commit 033f4e3 into master Feb 4, 2019

kengz deleted the eval-env branch February 4, 2019 08:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval mode rework; refactoring #279

Eval mode rework; refactoring #279

kengz commented Feb 3, 2019 •

edited

Loading

Eval mode rework; refactoring #279

Eval mode rework; refactoring #279

Conversation

kengz commented Feb 3, 2019 • edited Loading

Eval rework

Refactor

kengz commented Feb 3, 2019 •

edited

Loading