v3.1.0: L1 fitness norm, code and spec refactor, online eval
v3.1.0: L1 fitness norm, code and spec refactor, online eval
L1 fitness norm (breaking change)
- change fitness vector norm from L2 to L1 for intuitiveness and non-extreme values
code and spec refactor
- #254 PPO cleanup: remove hack and restore minimization scheme
- #255 remove
use_gae
anduse_nstep
param to infer fromlam, num_step_returns
- #260 fix decay
start_step
offset, add unit tests for rate decay methods - #262 make epi start from 0 instead of 1 for code logic consistency
- #264 switch
max_total_t
,max_epi
tomax_tick
andmax_tick_unit
for directness. retiregraph_x
for the unit above - #266 add Atari fitness std, fix CUDA coredump issue
- #269 update gym, remove box2d hack
Online Eval mode
#252 #257 #261 #267
Evaluation sessions during training on a subprocess. This does not interfere with the training process, but spawns multiple subprocesses to do independent evaluation, which then adds to an eval file, and at the end a final eval will finish and plot all the graphs and save all the data for eval.
- enabled by meta spec
'training_eval'
- configure
NUM_EVAL_EPI
inanalysis.py
- update
enjoy
andeval
mode syntax. see README. - change ckpt behavior to use e.g. tag
ckpt-epi10-totalt1000
- add new
eval
mode to lab. runs on a checkpoint file. see below
Eval Session
- add a proper eval Session which loads from the ckpt like above, and does not interfere with existing files. This can be ran on terminal, and it's also used by the internal eval logic, e.g. command
python run_lab.py data/dqn_cartpole_2018_12_20_214412/dqn_cartpole_t0_spec.json dqn_cartpole eval@dqn_cartpole_t0_s2_ckpt-epi10-totalt1000
- when eval session is done, it will average all of its ran episodes and append to a row in an
eval_session_df.csv
- after that it will delete the ckpt files it had just used (to prevent large storage)
- then, it will run a trial analysis to update
eval_trial_graph.png
, and an accompanyingtrial_df
as average of allsession_df
s
How eval mode works
- checkpoint will save the models using the scheme which records its
epi
andtotal_t
. This allows one to eval using the ckpt model - after creating ckpt files, if
spec.meta.training_eval in
trainmode, a subprocess will launch using the ckpt prepath to run an eval Session, using the same way above
python run_lab.py data/dqn_cartpole_2018_12_20_214412/dqn_cartpole_t0_spec.json dqn_cartpole eval@dqn_cartpole_t0_s2_ckpt-epi10-totalt1000` - eval session runs as above. ckpt will now run at the starting timestep, ckpt timestep, and at the end
- the main Session will wait for the final eval session and it's final eval trial to finish before closing, to ensure that other processes like zipping wait for them.
Example eval trial graph: