-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PBT implementation #705
PBT implementation #705
Conversation
ab54b42
to
70f5c55
Compare
logger = logging.getLogger(__name__) | ||
|
||
|
||
class BaseExploit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An overall comments about Exploit
, for Deep Learning training Trial, how do we expect user to re-use the parent model weights
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weights should be saved in a file (any name, it does not matter) in trial.working_dir
. PBT takes care of copying the files over to child trials.
This is explained here, but maybe it should be explained in the docstring of BaseExploit
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so, user may need to know where they should save and load the weights in the their code, as we only do the copy, right? the actual save and load still depends on user code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed the save and load depends on user code, otherwise we would need to write specific versions for Pytorch, Tensorflow and other frameworks. Do you think I should improve the documentation of PBT to make this more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, may be mention that with PBT in Orion, where user code should save and load weights, so an under-performing model will be replaced with better performing one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to documentation: 6c9b5ff
Do you think there is something missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good for me. just one more thing, do we have any place explain in user code, how do they refer to the trial.working_dir
, as a env value or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is documented here: https://orion.readthedocs.io/en/stable/user/script.html#command-line-templating. I'll reference it in PBT's doc.
The directory was created inside the Consumer which is specific to the cmdline interface. It should be done inside workon so that the python and cmdline API uses the same logic. Also the generic WorkingDir class was not handling directly cases where experiment has or not a defined working dir. There is no apparent need for a generic class to create temporary directory so this commit rework the class to handle the experiment object directly. This also allows to handle in __exit__ resetting experiment.working_dir when it is set temporarily.
Why: The trial working dir should be unique to the trial and depend on the experiment's working dir. We can use the id of the trial (or variants ignoring fidelity, experiment id or lies) to define a unique working dir. How: Instead of setting the working dir directly, we set the experiment's working dir.
Why: Now that we can branch a trial we need to keep a trace to the trial's ancestry. Also, when branching a trial but keeping the same hyperparameters, it should not lead to the same ID since it is now a separate trial that will be executed independently. The hash of a trial will thus also depend on the value of the parent. How: For simplicity the parent attribute only reference to the ID of the parent trial (full id), just like it is done for the reference to the experiment. We should use a lazy tree node implementation like for the EVC experiment tree node to reference trial parent object instead of trial id. This should be done in the future TrialClient class.
Why: The tree node will be used for PBT and probably for trial objects as well in the future so it must be generalized.
We need these methods for fast retrievals in the tree in PBT algorithm.
Why: Database filled with previous version of Oríon will have trials with no exp_working_dir. We must make sure that these trials when suggested will have their exp_working_dir set properly before being passed to the function to optimize.
Why: We often need to compare trials and always relying on some specific attributes is cumbersome. We can use trial.id to easily support the __eq__ operator.
Co-authored-by: Lin Dong <[email protected]>
The name Lineage name was confusing. The class is not for a full Lineage and rather for a single node of the lineage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks
The algorithm Population Based Training is implemented using trees with jump edges across them to keep track of jumps across trees during exploitation phases. These edges are particularly important to keep track of trial ancestry. If a trial is broken, we look at the true ancestor (from original tree instead of exploited one) to avoid being stuck on a tree that often leads to broken trials and also to increase likelihood of exploiting other trees. This also facilitates the backtracking strategy and visualisations of population evolution.
Set and create exp working dir inside workon
The directory was created inside the Consumer which is specific to the
cmdline interface. It should be done inside workon so that the python
and cmdline API uses the same logic. Also the generic WorkingDir class
was not handling directly cases where experiment has or not a defined
working dir. There is no apparent need for a generic class to create
temporary directory so this commit rework the class to handle the
experiment object directly. This also allows to handle in exit
resetting experiment.working_dir when it is set temporarily.
Infer trial working dir based on exp.working_dir
Why:
The trial working dir should be unique to the trial and depend on the
experiment's working dir. We can use the id of the trial (or variants
ignoring fidelity, experiment id or lies) to define a unique working
dir.
How:
Instead of setting the working dir directly, we set the experiment's
working dir.
Add parent attribute to Trial
Why:
Now that we can branch a trial we need to keep a trace to the trial's
ancestry. Also, when branching a trial but keeping the same
hyperparameters, it should not lead to the same ID since it is now a
separate trial that will be executed independently. The hash of a trial
will thus also depend on the value of the parent.
How:
For simplicity the parent attribute only reference to the ID of the
parent trial (full id), just like it is done for the reference to the
experiment. We should use a lazy tree node implementation like for the
EVC experiment tree node to reference trial parent object instead of
trial id. This should be done in the future TrialClient class.
Move tree.py to utils
Why:
The tree node will be used for PBT and probably for trial objects as
well in the future so it must be generalized.
Add Tree.node_depth and Tree.get_nodes_at_depth
We need these methods for fast retrievals in the tree in PBT algorithm.