`train` mode with resume; `enjoy` mode refactor #455

kengz · 2020-04-14T07:16:59Z

`train` mode with resume

Fixes #444. This adds capability to resume a training in a past-future-consistent manner. See explanation below.

Suppose we run a training with 10 million (10M) frames to completion, and see that further improvements may be possible if we had run it for longer, say 20M frames. If only we could go back in time and set the frames to 20M to begin with.

The resume mode allows us to do that without time traveling. We can edit the spec file in the present and resume training so the run picks up where it left off as if it was already using the edited spec. Of course, the modification to the spec file must itself be consistent to the past and the future, e.g. we cannot suddenly modify the initial learning rate or variable values.

To achieve this, the lab relies on 3 objects and their load methods

algorithm.load(): this already loads the algorithm and their model weights for enjoy mode, now it's used for train@ mode
body.train_df: this object tracks the training metrics data, hence needs to be loaded
env.clock: this tracks the time within the session.

Since everything in the lab runs according to env.clock, the above are all we need to restore for resuming training. Once the network and training metrics are restored, and the clock is set correctly, everything runs from the designated point in time.

NOTE: for off-policy algorithms the replay memory is not restored simply due to the cost of storing replay data (GBs of data per session and slow write during frequent checkpoints). Hence the behavior of off-policy replay is slightly different: it will need to fill up again from resume-point and training will only start again at the specified replay size threshold, so we will lose a small fraction of the total timesteps.

Usage example

Specify train mode as train@{predir}, where {predir} is the data directory of the last training run, or simply use latest` to use the latest. e.g.:

python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train
# terminate run before its completion
# optionally edit the spec file in a past-future-consistent manner

# run resume with either of the commands:
python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train@latest
# or to use a specific run folder
python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole train@data/reinforce_cartpole_2020_04_13_232521

`enjoy` mode refactor

The train@ resume mode API allows for the enjoy mode to be refactored. Both share similar syntax. Continuing with the example above, to enjoy a train model, we now use:

python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_cartpole enjoy@data/reinforce_cartpole_2020_04_13_232521/reinforce_cartpole_t0_s0_spec.json

The refactored changes are summarized below:

API: enjoy@{prename} -> `enjoy@{session_spec_file}
removed eval_model_prepath and ckpt injection from meta spec and related methods
removed the need for ckpt entirely and related methods
refactored spec methods accordingly

Misc

cleaned up logging and self_desc for better clarity
renamed read_spec_and_run -> get_spec_and_run
renamed post_init_nets -> end_init_nets
renamed in_eval_lab_modes -> in_eval_lab_mode
added counterpart in_train_lab_mode

kengz added 19 commits April 13, 2020 16:18

refactor spec override methods

9c323f9

support train@predir mode, inject meta_spec.resume

0289d23

save the latest ckpt too

abb2e0d

load body session_df and env.clock in resume

8738ad6

refactor enjoy mode with train@ mode

3b759b0

distinguish resume vs enjoy loading behaviors

2154651

directly specify enjoy@session_spec_file

0d788bf

flip cond

b128198

cleanup in_eval_lab_modes usage

e0ec876

cleanup body spec ref

85c4f36

clarify post_init_nets as end_init_nets

1c87fb3

correct lab mode method naming

d3ac2f7

make self_desc concise and complete

f5c578f

correct inverted start_wall_t offset

dc60ce2

comment-document train@ and enjoy@ modes

1307528

keep is_venv check clean by removing lab_mode check

c188ada

remove unused methods in util

ad119af

cleanup imports

a75894b

move train_df drop_duplicates to append in body

111af12

kengz merged commit 7605a82 into master Apr 14, 2020

kengz deleted the resume branch April 14, 2020 15:28

kengz mentioned this pull request Apr 14, 2020

Resume a training #445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`train` mode with resume; `enjoy` mode refactor #455

`train` mode with resume; `enjoy` mode refactor #455

kengz commented Apr 14, 2020 •

edited

Loading

train mode with resume; enjoy mode refactor #455

train mode with resume; enjoy mode refactor #455

Conversation

kengz commented Apr 14, 2020 • edited Loading