Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADD] Robustly refit models in final ensemble in parallel #471

Merged
merged 18 commits into from
Aug 23, 2022

Conversation

ravinkohli
Copy link
Contributor

@ravinkohli ravinkohli commented Aug 16, 2022

Similar to fit_pipeline, refit function now runs the models found in the final ensemble in parallel using dask. It is also robust to failures while refitting where it reuses the original model instead.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Note that a Pull Request should only contain one of refactoring, new features or documentation changes.
Please separate these changes and send us individual PRs for each.
For more information on how to create a good pull request, please refer to The anatomy of a perfect pull request.

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Description

To enable catching errors and adding constraints, I have used the ExecuteTAEFuncWithQueue class. As the code for training models in parallel is also used for running the traditional models, I have created a run_models_on_dataset function which encapsulates this functionality.

Motivation and Context

Refit currently, only runs all the models sequentially and fails if any one of the models to be refitted fails. Moreover, there is no way to limit the time and the memory used for the refit. With this PR, I have added the regular TAE which is used for search and other model fittings, which allow us to gracefully exit when a refit fails as well as add the relevant constraints.

This PR fixes #469.

How has this been tested?

I have added a test for run_models_on_dataset which ensures that at least one of the 5 random configs is successful. I have also extended the test for tabular classification, to verify that refit works as expected, i.e, the ensemble is updated.

@codecov
Copy link

codecov bot commented Aug 16, 2022

Codecov Report

Merging #471 (1a73f05) into development (c7220f7) will increase coverage by 20.58%.
The diff coverage is 83.79%.

❗ Current head 1a73f05 differs from pull request most recent head 6b5bcda. Consider uploading reports for the commit 6b5bcda to get more accurate results

@@               Coverage Diff                @@
##           development     #471       +/-   ##
================================================
+ Coverage        64.65%   85.23%   +20.58%     
================================================
  Files              231      232        +1     
  Lines            16304    16456      +152     
  Branches          3009     3048       +39     
================================================
+ Hits             10542    14027     +3485     
+ Misses            4714     1578     -3136     
+ Partials          1048      851      -197     
Impacted Files Coverage Δ
autoPyTorch/evaluation/test_evaluator.py 94.59% <ø> (ø)
...luation/time_series_forecasting_train_evaluator.py 90.79% <ø> (+10.42%) ⬆️
autoPyTorch/evaluation/train_evaluator.py 89.06% <ø> (+1.56%) ⬆️
...tup/traditional_ml/traditional_learner/learners.py 81.20% <66.66%> (+18.53%) ⬆️
...ml/traditional_learner/base_traditional_learner.py 92.59% <77.77%> (+6.87%) ⬆️
autoPyTorch/api/base_task.py 82.60% <81.25%> (+1.66%) ⬆️
...line/components/setup/traditional_ml/base_model.py 74.35% <83.33%> (+10.47%) ⬆️
autoPyTorch/ensemble/abstract_ensemble.py 88.00% <85.71%> (-0.89%) ⬇️
autoPyTorch/utils/parallel_model_runner.py 86.36% <86.36%> (ø)
autoPyTorch/evaluation/abstract_evaluator.py 77.26% <100.00%> (+9.46%) ⬆️
... and 155 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@nabenabe0928 nabenabe0928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi thanks for the PR.
I think the changes made the codebase looks better.
I added some minor comments.

autoPyTorch/ensemble/abstract_ensemble.py Outdated Show resolved Hide resolved
autoPyTorch/utils/parallel_model_runner.py Outdated Show resolved Hide resolved
autoPyTorch/utils/parallel_model_runner.py Outdated Show resolved Hide resolved
autoPyTorch/utils/parallel_model_runner.py Outdated Show resolved Hide resolved
autoPyTorch/api/base_task.py Outdated Show resolved Hide resolved
autoPyTorch/api/base_task.py Show resolved Hide resolved
@ravinkohli ravinkohli added the enhancement New feature or request label Aug 16, 2022
@ravinkohli ravinkohli changed the title [ADD] Robustly refit models in final ensemble in parallel. [ADD] Robustly refit models in final ensemble in parallel Aug 16, 2022
if old_identifier_index is not None:
replace_old_identifiers_to_refit_identifiers[list(self.models_.keys())[old_identifier_index]] = refit_identifier
else:
self._logger.warning(f"Refit for {config} failed. Updating ensemble weights accordingly.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still update the ensemble weights?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I have fixed it

@dengdifan
Copy link
Contributor

Thanks for the PR. Looks good to me!

@ravinkohli ravinkohli added the first priority PRs to be checked as a priority label Aug 17, 2022
Copy link
Contributor

@nabenabe0928 nabenabe0928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the work!
I checked your changes and approved them:)

Copy link
Collaborator

@theodorju theodorju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. I'm just leaving some minor comments.

metric=self._metric,
dask_client=self._dask_client,
backend=self._backend,
memory_limit=self._memory_limit,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use the memory limit populated above at lines 807-809:

Suggested change
memory_limit=self._memory_limit,
memory_limit=memory_limit,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. I have fixed it now

Comment on lines 45 to 48
temporary_directory='./tmp/autoPyTorch_example_tmp_01',
output_directory='./tmp/autoPyTorch_example_out_01',
delete_tmp_folder_after_terminate=False,
delete_output_folder_after_terminate=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If uncomment was on purpose l'd suggest we remove the lines above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it was an artefact of debugging, I have fixed it now. Thanks

Copy link
Collaborator

@theodorju theodorju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Looks good now.

self.config["early_stopping_rounds"] = early_stopping

if self.has_val_set:
early_stopping = 150 if X_train.shape[0] > 10000 else max(round(150 * 10000 / X_train.shape[0]), 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't early stopping if self.has_val_set is set as False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can't as we rely on external libraries to implement this for us

@@ -21,7 +21,7 @@
# noinspection PyInterpreter
setuptools.setup(
name="autoPyTorch",
version="0.2",
version="0.2.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should also update the version in __version__.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's exactly my commit as well :P. I made the change but didn't add it in the previous commit.

@theodorju
Copy link
Collaborator

Looks good. I'll proceed with merging this PR.

@theodorju theodorju merged commit ce78f89 into automl:development Aug 23, 2022
@ravinkohli ravinkohli mentioned this pull request Aug 23, 2022
10 tasks
ravinkohli added a commit that referenced this pull request Aug 23, 2022
* [FIX] Documentation and docker workflow file (#449)

* fixes to documentation and docker

* fix to docker

* Apply suggestions from code review

* add change log for release (#450)

* [FIX] release docs (#452)

* Release 0.2

* Release 0.2.0

* fix docs new line

* [FIX] ADD forecasting init design to pip data files (#459)

* add forecasting_init.json to data files under setup

* avoid undefined reference in scale_value

* checks for time series dataset split (#464)

* checks for time series dataset split

* maint

* Update autoPyTorch/datasets/time_series_dataset.py

Co-authored-by: Ravin Kohli <[email protected]>

Co-authored-by: Ravin Kohli <[email protected]>

* [FIX] Numerical stability scaling for timeseries forecasting tasks (#467)

* resolve rebase conflict

* add checks for scaling factors

* flake8 fix

* resolve conflict

* [FIX] pipeline options in `fit_pipeline` (#466)

* fix update of pipeline config options in fit pipeline

* fix flake and test

* suggestions from review

* [FIX] results management and visualisation with missing test data (#465)

* add flexibility to avoid checking for test scores

* fix flake and test

* fix bug in tests

* suggestions from review

* [ADD] Robustly refit models in final ensemble in parallel (#471)

* add parallel model runner and update running traditional classifiers

* update pipeline config to pipeline options

* working refit function

* fix mypy and flake

* suggestions from review

* fix mypy and flake

* suggestions from review

* finish documentation

* fix tests

* add test for parallel model runner

* fix flake

* fix tests

* fix traditional prediction for refit

* suggestions from review

* add warning for failed processing of results

* remove unnecessary change

* update autopytorch version number

* update autopytorch version number and the example file

* [DOCS] Release notes v0.2.1 (#476)

* Release 0.2.1

* add release docs

* Update docs/releases.rst

Co-authored-by: Difan Deng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request first priority PRs to be checked as a priority
Projects
None yet
4 participants