Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-objective optimization (e.g. qNEHVI) vs. scalarized objectives. Which to choose? #1210

Closed
sgbaird opened this issue Oct 15, 2022 · 6 comments
Assignees

Comments

@sgbaird
Copy link
Contributor

sgbaird commented Oct 15, 2022

I put together a tutorial illustrating the use of Ax's multi-objective optimization functionality and comparing this against scalarization approaches. When using a scalarized quantity to compare performance, it makes sense that the scalarized objectives do better than MOO. However, when looking at Pareto fronts and comparing them against a naive scalarization approach (sum the two objectives), I was surprised to see that, in general, the naive scalarization Pareto fronts seem better. This was on a straightforward, 3-parameter task with a single local maximum AFAIK. The task is meant as a teaching demo (see e.g. notebook tutorials). In particular, the notebook is 6.1-multi-objective.ipynb, linked above.

I noticed that I regularly got the following warning during MOO:

c:\Users\<USERNAME>\Miniconda3\envs\sdl-demo\lib\site-packages\ax\modelbridge\transforms\winsorize.py:240: UserWarning:

Automatic winsorization isn't supported for an objective in `MultiObjective` without objective thresholds. Specify the winsorization settings manually if you want to winsorize metric frechet.

c:\Users\sterg\Miniconda3\envs\sdl-demo\lib\site-packages\ax\modelbridge\transforms\winsorize.py:240: UserWarning:

Automatic winsorization isn't supported for an objective in `MultiObjective` without objective thresholds. Specify the winsorization settings manually if you want to winsorize metric luminous_intensity.

Out of the sklearn preprocessing scalers, winsorization seems most comparable to sklearn's RobustScaler (interesting that it was the 3rd hit when searching for winsorization sklearn). There's also a winsorization function in sklearn. This is my attempt to frame it in light of things I'm somewhat familiar with.

  • Maybe I chose a poorly suited task to use for making this comparison.
  • Does anything seem amiss in the implementation?
  • Is part of the issue perhaps that I'm not specifying thresholds?
  • Open to any thoughts/feedback

@sdaulton
Copy link
Contributor

Thanks for documenting this. It looks like you have a cool use case! I took a look at your notebook. Just to confirm my understanding:

For multi-objective optimization (qNEHVI):

  • you have 8 objectives (delta_*)
  • frechet and luminosity are tracking metrics (not objectives that are optimized)

For optimizing a scalarized objective:

  • you are optimizing a scalarized objective of frechet+luminosity

If this is the case, it is not surprising that optimizing a scalarized objective of frechet+luminosity works well since you are looking at the Pareto frontier for those metrics (separately) and those metrics are not targeted by qNEHVI (based on the configured experiment). Furthermore, optimizing (learning the pareto frontier across) 8 objectives simultaneously is difficult. Why not optimize frechet and luminosity with qNEHVI rather than (delta_*) if you care about frechet and luminosity? (Note: I don't know what these metrics are)

A couple other notes:

  • is 5000 a good choice of threshold for the delta_* metrics?
  • Are your simulations noisy? If not, get_observed_pareto_frontiers would an easy way of evaluating the pareto frontier across the evaluated (in-sample) designs

@sgbaird
Copy link
Contributor Author

sgbaird commented Oct 16, 2022

@sdaulton thanks for your response!

For multi-objective optimization (qNEHVI):

  • you have 8 objectives (delta_*)
  • frechet and luminosity are tracking metrics (not objectives that are optimized)

I had two sets of simulations in the order that I was exploring things, which probably made it confusing. In the first set of simulations, I had 8 objectives (delta_*), but in the second set of simulations, I defined Frechet distance of the currently observed discrete spectrum relative to the target spectrum as the first objective, and an approximate luminosity (i.e. the radiated power of the LEDs) as the second objective. I compared MOO with frechet and luminosity (set to minimize and no thresholds) to single objective optimization with scalarized_objective = frechet + luminosity.

If this is the case, it is not surprising that optimizing a scalarized objective of frechet+luminosity works well since you are looking at the Pareto frontier for those metrics (separately) and those metrics are not targeted by qNEHVI (based on the configured experiment). Furthermore, optimizing (learning the pareto frontier across) 8 objectives simultaneously is difficult. Why not optimize frechet and luminosity with qNEHVI rather than (delta_*) if you care about frechet and luminosity? (Note: I don't know what these metrics are)

AFAIK, qNEHVI was operating directly on frechet and luminosity (two-objective optimization). I was surprised to see that the Pareto fronts seemed better with the scalarized objective than for qNEHVI with frechet and luminosity.

A couple other notes:

  • is 5000 a good choice of threshold for the delta_* metrics?

Basically it's something "low", where the max might be 50k or so.

  • Are your simulations noisy? If not, get_observed_pareto_frontiers would an easy way of evaluating the pareto frontier across the evaluated (in-sample) designs

The simulations aren't noisy. Thank you! I was wondering about that. I'll plan on running it again with get_observed_pareto_frontiers.

@sgbaird
Copy link
Contributor Author

sgbaird commented Oct 16, 2022

@sdaulton I'm noticing that get_observed_pareto_frontiers takes different arguments than compute_posterior_pareto_frontier. In particular the following are not present in the former:

    primary_objective=objectives[0].metric,
    secondary_objective=objectives[1].metric,
    absolute_metrics=[objectives[0].metric_names[0], objectives[1].metric_names[0]],

Should I refactor my hacky scalarized objective (where I sum the two objectives in the evaluate function) and use a proper ax.core.objective.ScalarizedObjective instead? #883

Right now, the scalarized kwargs to compute_posterior_pareto_frontier are:

    primary_objective=experiment.tracking_metrics[0],
    secondary_objective=experiment.tracking_metrics[1],
    absolute_metrics=[
        experiment.tracking_metrics[0].name,
        experiment.tracking_metrics[1].name,
        scalar_name,
    ],

Or is there some other workaround you'd suggest for comparing the two Pareto frontiers?

@sdaulton
Copy link
Contributor

Ah thank for clarifying your setup. For the second MOO experiment on frechet and luminosity, what are the inferred objective thresholds? Also, it looks like those plots are gone from your notebook.

Should I refactor my hacky scalarized objective (where I sum the two objectives in the evaluate function) and use a proper ax.core.objective.ScalarizedObjective instead?

ScalarizedObjective will model the outcomes independently, whereas if you scalarize the metrics yourself and provide a single scalar metric to Ax, only thenscalarized metric will be modeled. If the objectives are quite correlated, then modeling the scalarized metric will likely give better results.

For plotting the observed metrics (including tracking metrics) for the evaluated designs (as in get_observed_pareto_frontier), it might be easier to follow this example : https://ax.dev/tutorials/multiobjective_optimization.html#Plot-empirical-data.

This style of plot is also nice because it shows the observations collected over time, which might provide more insight into the behavior of the method during data collecton

@lena-kashtelyan
Copy link
Contributor

@sgbaird, did you get a full answers to your questions or are there unresolved follow-ups?

@sgbaird
Copy link
Contributor Author

sgbaird commented Oct 31, 2022

@sdaulton my bad, I thought I had responded to this already. I will need to go back and check what the inferred thresholds were. Thanks for the detailed response!

I plan to follow the example you linked and post the updated results here.

@lena-kashtelyan I think it's resolved to a good enough point. Will close for now! Thanks for checking in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants