-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hyperparameter Optimization with Ray Tune and Weights and Biases #76
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great, thank you for the effort!
I have a few more general questions about these changes.
The W&B was initialized inside the accelerate_base_model.py; I have moved it to trlx.py.
Can you explain why this had to be done?
W&B logging is disabled when ray.is_initialized. Ray Tune initializes multiple concurrent trials (experiments), and W&B was erroring out for a few trials and then a new run was initialized with metics logged to the errored out run. It was not an issue with random or grid search but with bayesian search only.
It seems to me hyperband as is, is a bit crippled, or maybe any other trial scheduler as well, since it keeps "pausing" runs, which effectively terminates them and then later restarts those runs from scratch. Have you looked into some way of customizing this process with resuming from checkpoints? Also can something similar may be done for wandb state?
Can you also help me with how to select parameter space for sweeps in W&B, since it doesn't pick up the correct ones. I follow your steps from the videos, but I still have to manually change sweep configuration. Had you done something special between your first video and the second one?
Thank you for the review.
Hey @reciprocated, to answer this quickly, I am adding a feature that will generate a W&B report with relevant charts so that you or any user will not have to worry about doing it manually. I am working on it. |
Hey @reciprocated, my latest commit adds the ability to create a W&B report automatically. I will share the result once the training is done. |
Hyperparameter optimization for TRLX:
Stack:
Context:
The hyperparameter optimization system is designed based on the structure of the
examples/
provided. For now I have built the system usingexamples/ppo_sentiments.py
.How to use it?
Setup search space
In Ray Tune's literature we need to define a
param_space
which is passed totune.TuneConfig(...)
. To define aparam_space
:configs/*.yml
toconfigs/ray_tune_configs/my_search_config.yml
.strategy
andvalues
to define the space as shown below:strategy
andvalues
are parsed usingget_param_space
function.Setup tune config
The tune config describes the search algorithm, scheduler, metric and mode. This is defined in the
configs/ray_tune_configs/my_search_config.yml
itself.bohb
to enable it. For random search userandom
.[HyperBandForBOHB](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-scheduler-bohb)
; passhyperbandforbohb
to enable it. By default it usesfifo
.The training function
The training function takes a
config
argument. It's theppo_sentiments.py
example packaged as a function. Multiple training functions can be implemented using the same format.The
train_sweep.py
fileThe main logic is implemented here.
Usage
python train_sweep.py --config configs/ray_tune_configs/ppo_config.yml --example-name ppo_sentiments
Notable changes
The W&B was initialized inside theaccelerate_base_model.py
; I have moved it totrlx.py
.ray.is_initialized
. Ray Tune initializes multiple concurrent trials (experiments), and W&B was erroring out for a few trials and then a new run was initialized with metics logged to the errored out run. It was not an issue with random or grid search but with bayesian search only.How is W&B tracking done?
Once all the trails are done, the metrics are logged using the
log_trials
function. Since we are not tracking live we will not have access to system metrics. But since we are looking to find the best hyperparameters and keep track of the experiments, I logged the metrics after the experiments are done. This also makes it flexible for any search algorithm to be used especially those that are hard to parallelize like Bayes opt.Result
The experiments are logged to the wandb
project_name
; example here. We can easily convert it to W&B sweep using the UI.Screen.Recording.2022-11-02.at.6.15.52.PM.mov
The most relevant hyperparameters
We can use the parameter importance chart.
We can easily find the best parameters from the parallel importance plot.
Screen.Recording.2022-11-02.at.6.20.20.PM.mov
TODOs: