[ADD] Robustly refit models in final ensemble in parallel #471

ravinkohli · 2022-08-16T13:52:40Z

Similar to fit_pipeline, refit function now runs the models found in the final ensemble in parallel using dask. It is also robust to failures while refitting where it reuses the original model instead.

Types of changes

Breaking change (fix or feature that would cause existing functionality to not work as expected)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Note that a Pull Request should only contain one of refactoring, new features or documentation changes.
Please separate these changes and send us individual PRs for each.
For more information on how to create a good pull request, please refer to The anatomy of a perfect pull request.

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.

Have you checked to ensure there aren't other open Pull Requests for the same update/change?
Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your core changes, as applicable?
Have you successfully ran tests with your changes locally?

Description

To enable catching errors and adding constraints, I have used the ExecuteTAEFuncWithQueue class. As the code for training models in parallel is also used for running the traditional models, I have created a run_models_on_dataset function which encapsulates this functionality.

Motivation and Context

Refit currently, only runs all the models sequentially and fails if any one of the models to be refitted fails. Moreover, there is no way to limit the time and the memory used for the refit. With this PR, I have added the regular TAE which is used for search and other model fittings, which allow us to gracefully exit when a refit fails as well as add the relevant constraints.

This PR fixes #469.

How has this been tested?

I have added a test for run_models_on_dataset which ensures that at least one of the 5 random configs is successful. I have also extended the test for tabular classification, to verify that refit works as expected, i.e, the ensemble is updated.

codecov · 2022-08-16T14:56:36Z

Codecov Report

Merging #471 (1a73f05) into development (c7220f7) will increase coverage by 20.58%.
The diff coverage is 83.79%.

❗ Current head 1a73f05 differs from pull request most recent head 6b5bcda. Consider uploading reports for the commit 6b5bcda to get more accurate results

@@               Coverage Diff                @@
##           development     #471       +/-   ##
================================================
+ Coverage        64.65%   85.23%   +20.58%     
================================================
  Files              231      232        +1     
  Lines            16304    16456      +152     
  Branches          3009     3048       +39     
================================================
+ Hits             10542    14027     +3485     
+ Misses            4714     1578     -3136     
+ Partials          1048      851      -197

Impacted Files	Coverage Δ
autoPyTorch/evaluation/test_evaluator.py	`94.59% <ø> (ø)`
...luation/time_series_forecasting_train_evaluator.py	`90.79% <ø> (+10.42%)`	⬆️
autoPyTorch/evaluation/train_evaluator.py	`89.06% <ø> (+1.56%)`	⬆️
...tup/traditional_ml/traditional_learner/learners.py	`81.20% <66.66%> (+18.53%)`	⬆️
...ml/traditional_learner/base_traditional_learner.py	`92.59% <77.77%> (+6.87%)`	⬆️
autoPyTorch/api/base_task.py	`82.60% <81.25%> (+1.66%)`	⬆️
...line/components/setup/traditional_ml/base_model.py	`74.35% <83.33%> (+10.47%)`	⬆️
autoPyTorch/ensemble/abstract_ensemble.py	`88.00% <85.71%> (-0.89%)`	⬇️
autoPyTorch/utils/parallel_model_runner.py	`86.36% <86.36%> (ø)`
autoPyTorch/evaluation/abstract_evaluator.py	`77.26% <100.00%> (+9.46%)`	⬆️
... and 155 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

nabenabe0928

Hi thanks for the PR.
I think the changes made the codebase looks better.
I added some minor comments.

autoPyTorch/ensemble/abstract_ensemble.py

autoPyTorch/utils/parallel_model_runner.py

autoPyTorch/api/base_task.py

dengdifan · 2022-08-16T16:43:11Z

autoPyTorch/api/base_task.py

+            if old_identifier_index is not None:
+                replace_old_identifiers_to_refit_identifiers[list(self.models_.keys())[old_identifier_index]] = refit_identifier
+            else:
+                self._logger.warning(f"Refit for {config} failed. Updating ensemble weights accordingly.")


Do we still update the ensemble weights?

thanks, I have fixed it

dengdifan · 2022-08-16T17:14:24Z

Thanks for the PR. Looks good to me!

nabenabe0928

Hi, thanks for the work!
I checked your changes and approved them:)

theodorju

Thanks for the changes. I'm just leaving some minor comments.

theodorju · 2022-08-18T14:19:50Z

autoPyTorch/api/base_task.py

+            metric=self._metric,
+            dask_client=self._dask_client,
+            backend=self._backend,
+            memory_limit=self._memory_limit,


Shouldn't this use the memory limit populated above at lines 807-809:

Suggested change

memory_limit=self._memory_limit,

memory_limit=memory_limit,

Thanks for pointing it out. I have fixed it now

theodorju · 2022-08-18T14:49:52Z

examples/20_basics/example_tabular_classification.py

+    temporary_directory='./tmp/autoPyTorch_example_tmp_01',
+    output_directory='./tmp/autoPyTorch_example_out_01',
+    delete_tmp_folder_after_terminate=False,
+    delete_output_folder_after_terminate=False,


If uncomment was on purpose l'd suggest we remove the lines above.

no, it was an artefact of debugging, I have fixed it now. Thanks

theodorju

Thanks for the changes. Looks good now.

dengdifan · 2022-08-23T08:21:14Z

autoPyTorch/pipeline/components/setup/traditional_ml/traditional_learner/learners.py

-        self.config["early_stopping_rounds"] = early_stopping
+
+        if self.has_val_set:
+            early_stopping = 150 if X_train.shape[0] > 10000 else max(round(150 * 10000 / X_train.shape[0]), 10)


We don't early stopping if self.has_val_set is set as False?

yeah, we can't as we rely on external libraries to implement this for us

theodorju · 2022-08-23T09:04:13Z

setup.py

@@ -21,7 +21,7 @@
 # noinspection PyInterpreter
 setuptools.setup(
    name="autoPyTorch",
-    version="0.2",
+    version="0.2.1",


I believe we should also update the version in __version__.py.

Yeah that's exactly my commit as well :P. I made the change but didn't add it in the previous commit.

theodorju · 2022-08-23T13:37:19Z

Looks good. I'll proceed with merging this PR.

#471)

* [FIX] Documentation and docker workflow file (#449) * fixes to documentation and docker * fix to docker * Apply suggestions from code review * add change log for release (#450) * [FIX] release docs (#452) * Release 0.2 * Release 0.2.0 * fix docs new line * [FIX] ADD forecasting init design to pip data files (#459) * add forecasting_init.json to data files under setup * avoid undefined reference in scale_value * checks for time series dataset split (#464) * checks for time series dataset split * maint * Update autoPyTorch/datasets/time_series_dataset.py Co-authored-by: Ravin Kohli <[email protected]> Co-authored-by: Ravin Kohli <[email protected]> * [FIX] Numerical stability scaling for timeseries forecasting tasks (#467) * resolve rebase conflict * add checks for scaling factors * flake8 fix * resolve conflict * [FIX] pipeline options in `fit_pipeline` (#466) * fix update of pipeline config options in fit pipeline * fix flake and test * suggestions from review * [FIX] results management and visualisation with missing test data (#465) * add flexibility to avoid checking for test scores * fix flake and test * fix bug in tests * suggestions from review * [ADD] Robustly refit models in final ensemble in parallel (#471) * add parallel model runner and update running traditional classifiers * update pipeline config to pipeline options * working refit function * fix mypy and flake * suggestions from review * fix mypy and flake * suggestions from review * finish documentation * fix tests * add test for parallel model runner * fix flake * fix tests * fix traditional prediction for refit * suggestions from review * add warning for failed processing of results * remove unnecessary change * update autopytorch version number * update autopytorch version number and the example file * [DOCS] Release notes v0.2.1 (#476) * Release 0.2.1 * add release docs * Update docs/releases.rst Co-authored-by: Difan Deng <[email protected]>

ravinkohli added 3 commits August 15, 2022 15:05

add parallel model runner and update running traditional classifiers

fadd1ba

update pipeline config to pipeline options

dc19128

working refit function

ce0611b

ravinkohli requested review from dengdifan and theodorju August 16, 2022 13:52

ravinkohli linked an issue Aug 16, 2022 that may be closed by this pull request

ValueError: Expected parameter df (Tensor of shape (32, 168, 1)) of distribution Chi2() to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values #469

Closed

3 tasks

fix mypy and flake

20c4cd3

nabenabe0928 reviewed Aug 16, 2022

View reviewed changes

dengdifan reviewed Aug 16, 2022

View reviewed changes

autoPyTorch/api/base_task.py Show resolved Hide resolved

ravinkohli added 2 commits August 16, 2022 18:23

suggestions from review

916d185

fix mypy and flake

c711105

ravinkohli added the enhancement New feature or request label Aug 16, 2022

ravinkohli changed the title ~~[ADD] Robustly refit models in final ensemble in parallel.~~ [ADD] Robustly refit models in final ensemble in parallel Aug 16, 2022

dengdifan reviewed Aug 16, 2022

View reviewed changes

suggestions from review

f7de610

ravinkohli added 4 commits August 16, 2022 19:39

finish documentation

ecabb4d

fix tests

5071ac6

add test for parallel model runner

56f4a73

fix flake

52a8f42

ravinkohli added the first priority PRs to be checked as a priority label Aug 17, 2022

nabenabe0928 approved these changes Aug 17, 2022

View reviewed changes

fix tests

dec2435

theodorju requested changes Aug 18, 2022

View reviewed changes

ravinkohli added 2 commits August 19, 2022 11:33

fix traditional prediction for refit

2e4f0e8

suggestions from review

5779043

theodorju approved these changes Aug 19, 2022

View reviewed changes

ravinkohli added 3 commits August 22, 2022 15:48

add warning for failed processing of results

a72d8e2

remove unnecessary change

1a73f05

update autopytorch version number

2801c7a

dengdifan reviewed Aug 23, 2022

View reviewed changes

theodorju reviewed Aug 23, 2022

View reviewed changes

update autopytorch version number and the example file

6b5bcda

theodorju merged commit ce78f89 into automl:development Aug 23, 2022

ravinkohli mentioned this pull request Aug 23, 2022

[RELEASE] v0.2.1 #475

Merged

10 tasks

github-actions bot pushed a commit that referenced this pull request Aug 23, 2022

Ravin Kohli: [ADD] Robustly refit models in final ensemble in parallel (

7d9f30d

#471)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADD] Robustly refit models in final ensemble in parallel #471

[ADD] Robustly refit models in final ensemble in parallel #471

ravinkohli commented Aug 16, 2022 •

edited

Loading

codecov bot commented Aug 16, 2022 •

edited

Loading

nabenabe0928 left a comment

dengdifan Aug 16, 2022

ravinkohli Aug 16, 2022

dengdifan commented Aug 16, 2022

nabenabe0928 left a comment

theodorju left a comment

theodorju Aug 18, 2022

ravinkohli Aug 19, 2022

theodorju Aug 18, 2022

ravinkohli Aug 19, 2022

theodorju left a comment

dengdifan Aug 23, 2022

ravinkohli Aug 23, 2022

theodorju Aug 23, 2022

ravinkohli Aug 23, 2022

theodorju commented Aug 23, 2022

[ADD] Robustly refit models in final ensemble in parallel #471

[ADD] Robustly refit models in final ensemble in parallel #471

Conversation

ravinkohli commented Aug 16, 2022 • edited Loading

Types of changes

Checklist:

Description

Motivation and Context

How has this been tested?

codecov bot commented Aug 16, 2022 • edited Loading

Codecov Report

nabenabe0928 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dengdifan commented Aug 16, 2022

nabenabe0928 left a comment

Choose a reason for hiding this comment

theodorju left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theodorju left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theodorju commented Aug 23, 2022

ravinkohli commented Aug 16, 2022 •

edited

Loading

codecov bot commented Aug 16, 2022 •

edited

Loading