Skip to content

[Tune] Synchronous Mode for PBT#10283

Merged
richardliaw merged 13 commits intoray-project:masterfrom
amogkam:sync-pbt
Aug 31, 2020
Merged

[Tune] Synchronous Mode for PBT#10283
richardliaw merged 13 commits intoray-project:masterfrom
amogkam:sync-pbt

Conversation

@amogkam
Copy link
Contributor

@amogkam amogkam commented Aug 24, 2020

Why are these changes needed?

Enable:

If True, will use synchronous implementation
            of PBT. Perturbations will occur only after all trials are
            synced at the same time_attr every perturbation_interval.
            Defaults to False. See Appendix A.1 here

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question on the exploitation in synchronous mode, other than that it looks fine.

trial.experiment_tag = new_tag
trial_executor.start_trial(
trial, new_state.last_checkpoint, train=False)
trial.on_checkpoint(new_state.last_checkpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the exploiting trial still need to restore the state from the (new state) last checkpoint? I.e. why is there a distinction between synchronous and non-synchronous mode here, and why is on_checkpoint called in sync mode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ maybe leave a note about this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. If the trial is paused, we don't want to start it and then pause it again. We can just do on_checkpoint to update the most recent checkpoint, and when it gets started again, it will use the new checkpoint.

@krfricke krfricke added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 25, 2020
@amogkam amogkam removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 28, 2020
@amogkam amogkam added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Aug 28, 2020
@richardliaw richardliaw merged commit afde3db into ray-project:master Aug 31, 2020
@amogkam amogkam deleted the sync-pbt branch October 2, 2020 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests-ok The tagger certifies test failures are unrelated and assumes personal liability.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants