-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train
mode with resume; enjoy
mode refactor
#455
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
train
mode with resumeFixes #444. This adds capability to resume a training in a past-future-consistent manner. See explanation below.
Suppose we run a training with 10 million (10M) frames to completion, and see that further improvements may be possible if we had run it for longer, say 20M frames. If only we could go back in time and set the frames to 20M to begin with.
The resume mode allows us to do that without time traveling. We can edit the spec file in the present and resume training so the run picks up where it left off as if it was already using the edited spec. Of course, the modification to the spec file must itself be consistent to the past and the future, e.g. we cannot suddenly modify the initial learning rate or variable values.
To achieve this, the lab relies on 3 objects and their
load
methodsalgorithm.load()
: this already loads the algorithm and their model weights for enjoy mode, now it's used fortrain@
modebody.train_df
: this object tracks the training metrics data, hence needs to be loadedenv.clock
: this tracks the time within the session.Since everything in the lab runs according to
env.clock
, the above are all we need to restore for resuming training. Once the network and training metrics are restored, and the clock is set correctly, everything runs from the designated point in time.NOTE: for off-policy algorithms the replay memory is not restored simply due to the cost of storing replay data (GBs of data per session and slow write during frequent checkpoints). Hence the behavior of off-policy replay is slightly different: it will need to fill up again from resume-point and training will only start again at the specified replay size threshold, so we will lose a small fraction of the total timesteps.
Usage example
Specify train mode as
train@{predir}
, where{predir} is the data directory of the last training run, or simply use
latest` to use the latest. e.g.:enjoy
mode refactorThe
train@
resume mode API allows for theenjoy
mode to be refactored. Both share similar syntax. Continuing with the example above, to enjoy a train model, we now use:The refactored changes are summarized below:
enjoy@{prename}
-> `enjoy@{session_spec_file}eval_model_prepath
andckpt
injection from meta spec and related methodsckpt
entirely and related methodsMisc
self_desc
for better clarityread_spec_and_run
->get_spec_and_run
post_init_nets
->end_init_nets
in_eval_lab_modes
->in_eval_lab_mode
in_train_lab_mode