Hyperparameter optimization #228

vwxyzjn · 2022-07-10T01:11:02Z

Description

This PR adds a first pass of hyperparameter optimization.

The API design roughly looks like

import optuna
from cleanrl_utils.tuner import Tuner
    
tuner = Tuner(
    script="cleanrl/ppo.py",
    metric="charts/episodic_return",
    metric_last_n_average_window=50,
    direction="maximize",
    target_scores={
        "CartPole-v1": [0, 500],
        "Acrobot-v1": [-500, 0],
    },
    params_fn=lambda trial: {
        "learning-rate": trial.suggest_loguniform("learning-rate", 0.0003, 0.003),
        "num-minibatches": trial.suggest_categorical("num-minibatches", [1, 2, 4]),
        "update-epochs": trial.suggest_categorical("update-epochs", [1, 2, 4]),
        "num-steps": trial.suggest_categorical("num-steps", [5, 16, 32, 64, 128]),
        "vf-coef": trial.suggest_uniform("vf-coef", 0, 5),
        "max-grad-norm": trial.suggest_uniform("max-grad-norm", 0, 5),
        "total-timesteps": 10000,
        "num-envs": 16,
    },
    pruner=optuna.pruners.MedianPruner(n_startup_trials=5),
    wandb_kwargs={"project": "cleanrl"},
)
tuner.tune(
    num_trials=10,
    num_seeds=3,
)

Preliminary docs are available at https://cleanrl-jlu83xh5n-vwxyzjn.vercel.app/advanced/hyperparameter-tuning/

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

vercel · 2022-07-10T01:11:05Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Aug 26, 2022 at 1:54AM (UTC)

vwxyzjn · 2022-07-18T22:28:45Z

@dosssma, @yooceii, @Dipamc77 would you mind giving this a try? See https://cleanrl-jlu83xh5n-vwxyzjn.vercel.app/advanced/hyperparameter-tuning/ for the current tutorial. Would love to hear your feedback.

dosssman · 2022-07-19T00:43:49Z

@vwxyzjn Thanks for the great addition.

Tried to follow up the instructions to get it to work, but there were a few snags along the way:

The poetry rule to install optuna do not seem present. Also, in the docs, shouldn't it be something like poetry install -E optuna instead of the current poetry install optuna ? For now, I just installed it using pip install optuna to test the scripts at least.
Running pip install optuna did not seem to be enough. I also had to run pip install rich to tuner_example.py to at least start.
A bit tangential to this hyparam search feature, but in the Cleanrl starting documentation, it is started that the library requires either python 3.8 or 3.9. (See corresponding single comment)

I have yet to test other tuner scripts than tyner_example.py, but it looks good so far.

vwxyzjn · 2022-07-19T01:12:09Z

Thanks will definitely add the dependencies in poetry - I just try to do this in the last step due to the potential poetry conflicts with other branches. Glad to hear tuner_example.py is working fine.

dipamc · 2022-07-19T17:56:54Z

Haven't been able to run the code yet, but I read the code, here are some thoughts.

The minimum and maximum reward being required beforehand is a bit of a limitation. Though it's not clear to me that normalization is needed when running on a single environment. If it isn’t maybe it can be left optional or at least stated in the docs so that anyone trying a new env doesn’t need to provide them.
Should link to the documentation of what other samplers are available under trial that is passed to sampler_fn

docs/advanced/hyperparameter-tuning.md

yooceii · 2022-07-24T23:41:28Z

Faced the same situation that dossman@ have when installing optuna. but other than that. Example works.
Wonder if it's better to do the hyperparameter sweep that works for multiple env or individual env as different env might have different optimal parameter settings.

vwxyzjn · 2022-08-23T03:02:23Z

Thanks, @Dipamc77 @dosssman @yooceii @kinalmehta for the review

The minimum and maximum reward being required beforehand is a bit of a limitation.

Addressed. Users can put target_scores = {"CartPole-v1": None}

Should link to the documentation of what other samplers are available under trial that is passed to sampler_fn

Done and added another API to pass user specified sampler.

Wonder if it's better to do the hyperparameter sweep that works for multiple env or individual env as different env might have different optimal parameter settings.

If the users want to do that, they could probably create multiple instances of the tuner like

from cleanrl_utils.tuner import Tuner
tuner = Tuner(
    script="cleanrl/ppo.py",
    metric="charts/episodic_return",
    metric_last_n_average_window=50,
    direction="maximize",
    target_scores={
        "CartPole-v1": None,
    },
    params_fn=lambda trial: {
        "learning-rate": trial.suggest_loguniform("learning-rate", 0.0003, 0.003),
        "num-minibatches": trial.suggest_categorical("num-minibatches", [1, 2, 4]),
        "update-epochs": trial.suggest_categorical("update-epochs", [1, 2, 4]),
        "num-steps": trial.suggest_categorical("num-steps", [5, 16, 32, 64, 128]),
        "vf-coef": trial.suggest_uniform("vf-coef", 0, 5),
        "max-grad-norm": trial.suggest_uniform("max-grad-norm", 0, 5),
        "total-timesteps": 10000,
        "num-envs": 4,
    },
)
tuner.tune(
    num_trials=100,
    num_seeds=3,
)
tuner = Tuner(
    script="cleanrl/ppo.py",
    metric="charts/episodic_return",
    metric_last_n_average_window=50,
    direction="maximize",
    target_scores={
        "Acrobat-v1": None,
    },
    params_fn=lambda trial: {
        "learning-rate": trial.suggest_loguniform("learning-rate", 0.0003, 0.003),
        "num-minibatches": trial.suggest_categorical("num-minibatches", [1, 2, 4]),
        "update-epochs": trial.suggest_categorical("update-epochs", [1, 2, 4]),
        "num-steps": trial.suggest_categorical("num-steps", [5, 16, 32, 64, 128]),
        "vf-coef": trial.suggest_uniform("vf-coef", 0, 5),
        "max-grad-norm": trial.suggest_uniform("max-grad-norm", 0, 5),
        "total-timesteps": 10000,
        "num-envs": 4,
    },
)
tuner.tune(
    num_trials=100,
    num_seeds=3,
)

cleanrl_utils/tuner.py

dosssman

Looking good to me.
Tested the tuner_example and it is working out of the box.
I had to abort it quite early though.
Great work.

docs/advanced/hyperparameter-tuning.md

cleanrl_utils/tuner.py

Comments addressed

vwxyzjn · 2022-08-26T02:05:28Z

Refactored the documentation a bit to help users better get started. Merging now.

braham-snyder · 2022-08-27T12:59:44Z

Any chance anyone wants to explain the biggest advantage of Optuna over wandb's hyperparameter optimization? The latter's practically already built-in.

(Btw thanks for a great library!)

vwxyzjn · 2022-08-27T13:09:40Z

Hi @braham-snyder, Wanda’s hyperparameter optimization is great. I have used it before and it’s easy to use.

Feature-wise optuna does support more functionalities. E.g., pruning less promising experiments or multi objective optimization.

braham-snyder · 2022-08-27T13:26:54Z

Thanks!

Hyperparameter optimization

6b01ee0

vwxyzjn added 2 commits July 9, 2022 21:57

add gitignore

72ac02b

pre-commit

161b165

vercel bot deployed to Preview July 10, 2022 01:59 View deployment

quick refactor

9a0355b

vwxyzjn mentioned this pull request Jul 15, 2022

Average PPO implementation #212

Closed

19 tasks

Add docs

c9cc5f1

vercel bot deployed to Preview July 16, 2022 03:31 View deployment

vwxyzjn requested review from dosssman and yooceii July 18, 2022 22:27

yooceii reviewed Jul 24, 2022

View reviewed changes

docs/advanced/hyperparameter-tuning.md Outdated Show resolved Hide resolved

yooceii approved these changes Jul 24, 2022

View reviewed changes

yooceii previously requested changes Jul 24, 2022

View reviewed changes

docs/advanced/hyperparameter-tuning.md Outdated Show resolved Hide resolved

vwxyzjn mentioned this pull request Aug 6, 2022

DQN on MountainCar #255

Closed

2 tasks

vwxyzjn added 2 commits August 22, 2022 18:00

Merge branch 'master' into hpopt

22edce4

pre-commit

3ac8795

vercel bot deployed to Preview August 22, 2022 22:02 View deployment

clarify docs

444cf67

vercel bot deployed to Preview August 22, 2022 22:07 View deployment

update docs

86e8ef1

vercel bot deployed to Preview August 22, 2022 22:16 View deployment

vwxyzjn added 3 commits August 22, 2022 22:39

push changes

c1ef1a0

push changes

60c5651

typo

94f830b

kinalmehta reviewed Aug 23, 2022

View reviewed changes

cleanrl_utils/tuner.py Outdated Show resolved Hide resolved

Quick fix

0fda282

vercel bot deployed to Preview August 23, 2022 13:25 View deployment

clarification

773ad72

vercel bot deployed to Preview August 24, 2022 21:23 View deployment

dosssman approved these changes Aug 25, 2022

View reviewed changes

docs/advanced/hyperparameter-tuning.md Outdated Show resolved Hide resolved

cleanrl_utils/tuner.py Outdated Show resolved Hide resolved

Update docs on python version

ff25c60

vercel bot deployed to Preview August 25, 2022 15:01 View deployment

add test cases

d6debc2

vercel bot deployed to Preview August 25, 2022 15:02 View deployment

add tests

fc45151

vercel bot deployed to Preview August 25, 2022 15:04 View deployment

update config

8d43bda

vercel bot deployed to Preview August 25, 2022 15:16 View deployment

update test cases

14b7bf5

vercel bot deployed to Preview August 25, 2022 21:04 View deployment

Merge branch 'master' into hpopt

fe3f087

vercel bot deployed to Preview August 25, 2022 21:18 View deployment

Refactor docs

bbe27fa

vercel bot deployed to Preview August 26, 2022 01:54 View deployment

This was referenced Aug 26, 2022

Multi-objective hyperparameter optimization #265

Open

Roadmap for CleanRL #115

Closed

vwxyzjn merged commit 25dc24e into master Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameter optimization #228

Hyperparameter optimization #228

vwxyzjn commented Jul 10, 2022 •

edited

Loading

vercel bot commented Jul 10, 2022 •

edited

Loading

vwxyzjn commented Jul 18, 2022 •

edited

Loading

dosssman commented Jul 19, 2022

vwxyzjn commented Jul 19, 2022

dipamc commented Jul 19, 2022

yooceii commented Jul 24, 2022

vwxyzjn commented Aug 23, 2022 •

edited

Loading

dosssman left a comment

vwxyzjn commented Aug 26, 2022

braham-snyder commented Aug 27, 2022

vwxyzjn commented Aug 27, 2022 •

edited

Loading

braham-snyder commented Aug 27, 2022 •

edited

Loading

Hyperparameter optimization #228

Hyperparameter optimization #228

Conversation

vwxyzjn commented Jul 10, 2022 • edited Loading

Description

Types of changes

Checklist:

vercel bot commented Jul 10, 2022 • edited Loading

vwxyzjn commented Jul 18, 2022 • edited Loading

dosssman commented Jul 19, 2022

vwxyzjn commented Jul 19, 2022

dipamc commented Jul 19, 2022

yooceii commented Jul 24, 2022

vwxyzjn commented Aug 23, 2022 • edited Loading

dosssman left a comment

Choose a reason for hiding this comment

vwxyzjn commented Aug 26, 2022

braham-snyder commented Aug 27, 2022

vwxyzjn commented Aug 27, 2022 • edited Loading

braham-snyder commented Aug 27, 2022 • edited Loading

vwxyzjn commented Jul 10, 2022 •

edited

Loading

vercel bot commented Jul 10, 2022 •

edited

Loading

vwxyzjn commented Jul 18, 2022 •

edited

Loading

vwxyzjn commented Aug 23, 2022 •

edited

Loading

vwxyzjn commented Aug 27, 2022 •

edited

Loading

braham-snyder commented Aug 27, 2022 •

edited

Loading