-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make performance test reproducible #837
Conversation
@@ -1459,6 +1463,8 @@ def config2params(self, config: dict) -> dict: | |||
) | |||
if self._task not in CLASSIFICATION and "criterion" in config: | |||
params.pop("criterion") | |||
if "random_state" not in params: | |||
params["random_state"] = 12032022 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd to fix the random state here, might be surprising for users of the class. Maybe we can consider setting it for the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The random state for other estimators is fixed. For random forest and extra trees it's not. I consider this a hidden bug that prevents reproducibility. I agree it could be surprising for users. I'll add it in documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, makes sense 👍
flaml/automl.py
Outdated
time_left = self._state.time_budget - self._state.time_from_start | ||
time_budget_s = ( | ||
self._state.time_budget - self._state.time_from_start | ||
if self._state.time_budget < 1e10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain in the documentation somewhere about this large constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it.
@@ -648,6 +652,7 @@ def custom_metric( | |||
datasets, but will incur more overhead in time. | |||
If dict: the dict contains the keywords arguments to be passed to | |||
[ray.tune.run](https://docs.ray.io/en/latest/tune/api_docs/execution.html). | |||
free_mem_ratio: float between 0 and 1, default=0. The free memory ratio to keep during training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide a guideline (could be done in another PR) on when one should consider setting this ratio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue #841 created.
flaml/automl.py
Outdated
): | ||
self.init_eci = learner_class.cost_relative2lgbm() | ||
self.init_eci = learner_class.cost_relative2lgbm() if budget < 1e10 else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define 1e10 as a constant variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it.
Related work items: microsoft#493, microsoft#777, microsoft#820, microsoft#837, microsoft#843, microsoft#848, microsoft#849, microsoft#850, microsoft#853, microsoft#855, microsoft#857, microsoft#869, microsoft#870, microsoft#888, microsoft#894, microsoft#923, microsoft#924, microsoft#925, microsoft#934, microsoft#952, microsoft#962, microsoft#973, microsoft#975, microsoft#995
Why are these changes needed?
Make the performance test in test_notebook_example.py reproducible.
Improve the retrain logic as in #829.
Related issue number
Closes #829, #794, #777
Checks