PBT implementation #705

bouthilx · 2021-12-01T02:41:11Z

The algorithm Population Based Training is implemented using trees with jump edges across them to keep track of jumps across trees during exploitation phases. These edges are particularly important to keep track of trial ancestry. If a trial is broken, we look at the true ancestor (from original tree instead of exploited one) to avoid being stuck on a tree that often leads to broken trials and also to increase likelihood of exploiting other trees. This also facilitates the backtracking strategy and visualisations of population evolution.

Set and create exp working dir inside workon

The directory was created inside the Consumer which is specific to the
cmdline interface. It should be done inside workon so that the python
and cmdline API uses the same logic. Also the generic WorkingDir class
was not handling directly cases where experiment has or not a defined
working dir. There is no apparent need for a generic class to create
temporary directory so this commit rework the class to handle the
experiment object directly. This also allows to handle in exit
resetting experiment.working_dir when it is set temporarily.

Infer trial working dir based on exp.working_dir

Why:

The trial working dir should be unique to the trial and depend on the
experiment's working dir. We can use the id of the trial (or variants
ignoring fidelity, experiment id or lies) to define a unique working
dir.

How:

Instead of setting the working dir directly, we set the experiment's
working dir.

Add parent attribute to Trial

Why:

Now that we can branch a trial we need to keep a trace to the trial's
ancestry. Also, when branching a trial but keeping the same
hyperparameters, it should not lead to the same ID since it is now a
separate trial that will be executed independently. The hash of a trial
will thus also depend on the value of the parent.

How:

For simplicity the parent attribute only reference to the ID of the
parent trial (full id), just like it is done for the reference to the
experiment. We should use a lazy tree node implementation like for the
EVC experiment tree node to reference trial parent object instead of
trial id. This should be done in the future TrialClient class.

Move tree.py to utils

Why:

The tree node will be used for PBT and probably for trial objects as
well in the future so it must be generalized.

Add Tree.node_depth and Tree.get_nodes_at_depth

We need these methods for fast retrievals in the tree in PBT algorithm.

src/orion/algo/pbt/pbt.py

donglinjy · 2021-12-25T15:01:45Z

src/orion/algo/pbt/exploit.py

+logger = logging.getLogger(__name__)
+
+
+class BaseExploit:


An overall comments about Exploit, for Deep Learning training Trial, how do we expect user to re-use the parent model weights?

The weights should be saved in a file (any name, it does not matter) in trial.working_dir. PBT takes care of copying the files over to child trials.

This is explained here, but maybe it should be explained in the docstring of BaseExploit too?

I guess so, user may need to know where they should save and load the weights in the their code, as we only do the copy, right? the actual save and load still depends on user code.

Indeed the save and load depends on user code, otherwise we would need to write specific versions for Pytorch, Tensorflow and other frameworks. Do you think I should improve the documentation of PBT to make this more clear?

yeah, may be mention that with PBT in Orion, where user code should save and load weights, so an under-performing model will be replaced with better performing one.

I added this to documentation: 6c9b5ff

Do you think there is something missing?

looks good for me. just one more thing, do we have any place explain in user code, how do they refer to the trial.working_dir, as a env value or something?

It is documented here: https://orion.readthedocs.io/en/stable/user/script.html#command-line-templating. I'll reference it in PBT's doc.

src/orion/algo/pbt/pbt.py

The directory was created inside the Consumer which is specific to the cmdline interface. It should be done inside workon so that the python and cmdline API uses the same logic. Also the generic WorkingDir class was not handling directly cases where experiment has or not a defined working dir. There is no apparent need for a generic class to create temporary directory so this commit rework the class to handle the experiment object directly. This also allows to handle in __exit__ resetting experiment.working_dir when it is set temporarily.

Why: The trial working dir should be unique to the trial and depend on the experiment's working dir. We can use the id of the trial (or variants ignoring fidelity, experiment id or lies) to define a unique working dir. How: Instead of setting the working dir directly, we set the experiment's working dir.

Why: Now that we can branch a trial we need to keep a trace to the trial's ancestry. Also, when branching a trial but keeping the same hyperparameters, it should not lead to the same ID since it is now a separate trial that will be executed independently. The hash of a trial will thus also depend on the value of the parent. How: For simplicity the parent attribute only reference to the ID of the parent trial (full id), just like it is done for the reference to the experiment. We should use a lazy tree node implementation like for the EVC experiment tree node to reference trial parent object instead of trial id. This should be done in the future TrialClient class.

Why: The tree node will be used for PBT and probably for trial objects as well in the future so it must be generalized.

We need these methods for fast retrievals in the tree in PBT algorithm.

Why: Database filled with previous version of Oríon will have trials with no exp_working_dir. We must make sure that these trials when suggested will have their exp_working_dir set properly before being passed to the function to optimize.

Why: We often need to compare trials and always relying on some specific attributes is cumbersome. We can use trial.id to easily support the __eq__ operator.

Co-authored-by: Lin Dong <[email protected]>

The name Lineage name was confusing. The class is not for a full Lineage and rather for a single node of the lineage.

donglinjy

lgtm, thanks

bouthilx changed the title ~~WIP: PBT implementation~~ PBT implementation Dec 15, 2021

bouthilx marked this pull request as ready for review December 15, 2021 13:59

bouthilx requested a review from donglinjy December 15, 2021 13:59

bouthilx force-pushed the feature/pbt branch from ab54b42 to 70f5c55 Compare December 15, 2021 21:10

donglinjy reviewed Dec 25, 2021

View reviewed changes

src/orion/algo/pbt/pbt.py Outdated Show resolved Hide resolved

donglinjy reviewed Dec 25, 2021

View reviewed changes

src/orion/algo/pbt/pbt.py Outdated Show resolved Hide resolved

donglinjy reviewed Dec 25, 2021

View reviewed changes

src/orion/algo/pbt/pbt.py Show resolved Hide resolved

donglinjy reviewed Dec 25, 2021

View reviewed changes

src/orion/algo/pbt/pbt.py Outdated Show resolved Hide resolved

donglinjy reviewed Dec 26, 2021

View reviewed changes

src/orion/algo/pbt/pbt.py Show resolved Hide resolved

bouthilx mentioned this pull request Jan 5, 2022

Handle broken trials in Hyperband, ASHA and EvolutionaryES #757

Open

bouthilx added the feature Introduces a new feature label Jan 12, 2022

bouthilx added 18 commits January 19, 2022 14:01

Move tree.py to utils

5ebcf0a

Why: The tree node will be used for PBT and probably for trial objects as well in the future so it must be generalized.

Add Tree.node_depth and Tree.get_nodes_at_depth

480f653

We need these methods for fast retrievals in the tree in PBT algorithm.

Add Lineage for PBT and tests

5c142b0

Adding tests for Lineages WIP

e12cd52

Add TreeNode.leafs

357a762

Modularize PBT

95c1126

Add tests for Explore module

d0c2870

Add tests for PBT fidelity budgets

12b34ff

Rename PopulationBasedTraining to PBT

84d2de7

Remove old base PBT modules

72b2946

Add logging and some fixes for exploit/explore

fadc818

Add documentation for PBT

6ae63f4

Add generic tests for PBT

2297afa

isort

61e927b

Handle Trial.parents for previous versions of Oríon

2ba4946

bouthilx and others added 8 commits January 19, 2022 14:07

Add PBT rst doc file

013e8fe

Add backward.ensure_trial_working_dir

b21281b

Why: Database filled with previous version of Oríon will have trials with no exp_working_dir. We must make sure that these trials when suggested will have their exp_working_dir set properly before being passed to the function to optimize.

Add Trial.__eq__

31fae92

Why: We often need to compare trials and always relying on some specific attributes is cumbersome. We can use trial.id to easily support the __eq__ operator.

Fix exploit & explore arg docs

68eab2b

Add missing SPACE_ERROR

6eed9dc

Update src/orion/algo/pbt/pbt.py

02a2150

Co-authored-by: Lin Dong <[email protected]>

Clarify PBT model weights saving in doc

e65cb4e

Rename Lineage to LineageNode

95eef3a

The name Lineage name was confusing. The class is not for a full Lineage and rather for a single node of the lineage.

bouthilx force-pushed the feature/pbt branch from cc6257d to 95eef3a Compare January 19, 2022 19:07

Adapt Lineage -> LineageNode in docs

4917a50

donglinjy approved these changes Jan 21, 2022

View reviewed changes

Clarify PBT doc on trial.working_dir

4b2a088

bouthilx merged commit 845cfb7 into Epistimio:develop Jan 26, 2022

bouthilx mentioned this pull request Feb 11, 2022

Release v0.2.2rc1 #794

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PBT implementation #705

PBT implementation #705

bouthilx commented Dec 1, 2021

donglinjy Dec 25, 2021

bouthilx Jan 5, 2022

donglinjy Jan 8, 2022

bouthilx Jan 10, 2022

donglinjy Jan 11, 2022

bouthilx Jan 11, 2022

donglinjy Jan 12, 2022

bouthilx Jan 21, 2022

donglinjy left a comment

PBT implementation #705

PBT implementation #705

Conversation

bouthilx commented Dec 1, 2021

Set and create exp working dir inside workon

Infer trial working dir based on exp.working_dir

Add parent attribute to Trial

Move tree.py to utils

Add Tree.node_depth and Tree.get_nodes_at_depth

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

donglinjy left a comment

Choose a reason for hiding this comment