Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make performance test reproducible #837

Merged
merged 7 commits into from
Dec 6, 2022
Merged

make performance test reproducible #837

merged 7 commits into from
Dec 6, 2022

Conversation

sonichi
Copy link
Contributor

@sonichi sonichi commented Dec 4, 2022

Why are these changes needed?

Make the performance test in test_notebook_example.py reproducible.
Improve the retrain logic as in #829.

Related issue number

Closes #829, #794, #777

Checks

@@ -1459,6 +1463,8 @@ def config2params(self, config: dict) -> dict:
)
if self._task not in CLASSIFICATION and "criterion" in config:
params.pop("criterion")
if "random_state" not in params:
params["random_state"] = 12032022
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd to fix the random state here, might be surprising for users of the class. Maybe we can consider setting it for the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random state for other estimators is fixed. For random forest and extra trees it's not. I consider this a hidden bug that prevents reproducibility. I agree it could be surprising for users. I'll add it in documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense 👍

@sonichi sonichi requested a review from skzhang1 December 5, 2022 00:37
@sonichi sonichi linked an issue Dec 5, 2022 that may be closed by this pull request
flaml/automl.py Outdated
time_left = self._state.time_budget - self._state.time_from_start
time_budget_s = (
self._state.time_budget - self._state.time_from_start
if self._state.time_budget < 1e10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain in the documentation somewhere about this large constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it.

@@ -648,6 +652,7 @@ def custom_metric(
datasets, but will incur more overhead in time.
If dict: the dict contains the keywords arguments to be passed to
[ray.tune.run](https://docs.ray.io/en/latest/tune/api_docs/execution.html).
free_mem_ratio: float between 0 and 1, default=0. The free memory ratio to keep during training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide a guideline (could be done in another PR) on when one should consider setting this ratio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue #841 created.

flaml/automl.py Outdated
):
self.init_eci = learner_class.cost_relative2lgbm()
self.init_eci = learner_class.cost_relative2lgbm() if budget < 1e10 else 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define 1e10 as a constant variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants