Extract task class from automl #857

markharley · 2022-12-20T21:14:12Z

Why are these changes needed?

This is the next PR in the series of refactors. In this we introduce the Task class to begin abstracting the task types. At present, we only introduce the abstract base Task class and a GenericTask which contains the task specific logic that was contained in AutoML.

Related issue number

None

Checks

I've used pre-commit to lint the changes in this PR, or I've made sure lint with flake8 output is two 0s.
I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

Moved some of the packages into an automl subpackage to tidy before the task-based refactor. This is in response to discussions with the group and a comment on the first task-based PR. Only changes here are moving subpackages and modules into the new automl, fixing imports to work with this structure and fixing some dependencies in setup.py.

I'd moved this to automl as that's where it's used internally, but had missed that this is actually part of the public interface so makes sense to live where it was.

flaml.data, flaml.ml and flaml.model are re-added to the top level, being re-exported from flaml.automl for backwards compatability. Adding a deprecation warning so that we can have a planned removal later.

…-for-automl

Got to the point where the methods from AutoML are pulled to GenericTask. Started removing private markers and removing the passing of automl to these methods. Done with decide_split_type, started on prepare_data. Need to do the others after

…from-automl

flaml/automl/task/task.py

…from-automl

…into extract-task-class-from-automl

flaml/automl/task/task.py

int-chaos

everything else looks good to me!

int-chaos · 2023-02-24T05:21:50Z

notebook/automl_xgboost.ipynb

@@ -29,7 +29,7 @@
    "\n",
    "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the `notebook` option:\n",
    "```bash\n",
-    "pip install flaml[notebook]\n",
+    "pip install flaml[notebook]==1.0.8\n",


is there a reason why this version is 1.0.8 and not 1.1.2 since all the other ones are 1.1.2

Good point, not sure why. I'll bump it. Thanks!

I've bumped it now and pushed

…from-automl

sonichi

I reviewed all but generic_task.py, which contains a large chunk of code moved from automl.py. Can someone confirm there is no unnecessary change incurred during the move?

sonichi · 2023-02-27T17:05:41Z

flaml/automl/ml.py

@@ -674,14 +561,13 @@ def compute_estimator(
            free_mem_ratio=0,
        )
    else:
-        val_loss, metric_for_logging, train_time, pred_time = evaluate_model_CV(
+        val_loss, metric_for_logging, train_time, pred_time = task.evaluate_model_CV(


In future, get_val_loss can be moved into Task too.

sonichi · 2023-02-27T17:09:22Z

flaml/automl/task/generic_task.py

@@ -0,0 +1,849 @@
+import logging


Most code in this file is moved from automl.py. I didn't verify whether there is any unnecessary change when moving. I'll appreciate if another reviewer could check it.

flaml/automl/model.py

sonichi · 2023-03-06T22:57:30Z

flaml/automl/model.py

-                    columns=[self.hcrystaball_model.name],
-                    index=X.index,
-                )
+                forecast = Series(preds)


@int-chaos please check this.

Yes I did this. this fixes the issue with automl.py/predict(), where converting the DataFrame into a Series was causing an error. There's no fundamental change to the overall output since in automl.py/predict() when self._label_transformer is true, it converts it to a pd.Series.

Yes I did this. this fixes the issue with automl.py/predict(), where converting the DataFrame into a Series was causing an error. There's no fundamental change to the overall output since in automl.py/predict() when self._label_transformer is true, it converts it to a pd.Series.

Could you create an issue to update the notebooks and examples in the documentation website?

…om-automl # Conflicts: # flaml/automl/automl.py # flaml/automl/ml.py

…l' into extract-task-class-from-automl # Conflicts: # flaml/automl/ml.py # flaml/automl/task/generic_task.py

qingyun-wu · 2023-03-07T22:23:14Z

flaml/automl/automl.py

-                must be the timestamp column (datetime type). Other
-                columns in the dataframe are assumed to be exogenous
-                variables (categorical or numeric).
+                For time series forecast tasks, the first column of X_train must be the timestamp column (datetime type). Other columns in the dataframe are assumed to be exogenous variables (categorical or numeric).


Check formating issues.

CI formatting check passes, what specific issues should we check?

qingyun-wu · 2023-03-08T00:39:05Z

flaml/default/suggest.py

@@ -45,6 +46,7 @@ def meta_feature(task, X_train, y_train, meta_feature_names):


 def load_config_predictor(estimator_name, task, location=None):
+    task = str(task)


check task type

Not sure I understand. This will accept either str- or Task-valued inputs, what's there to check?

qingyun-wu · 2023-03-08T18:34:26Z

flaml/automl/ml.py

@@ -354,8 +356,8 @@ def sklearn_metric_loss_score(
    return score


-def get_y_pred(estimator, X, eval_metric, obj):
-    if eval_metric in ["roc_auc", "ap", "roc_auc_weighted"] and "binary" in obj:
+def get_y_pred(estimator, X, eval_metric, task):


Add the annotation on the required task type.

qingyun-wu · 2023-03-08T18:47:28Z

flaml/automl/task/task.py

+        return self.name
+
+    @abstractmethod
+    def evaluate_model_CV(


the role of get_val_loss is similar to evaluate_model_CV and needs to be moved from ml.py to here.

As far as I know, there's no need to modify get_val_loss according to task type, so why should it be moved to here? Could we say we leave it for now, and move it to the Task class if there ever arises a need to for different implementations according to task?

qingyun-wu · 2023-03-08T18:52:18Z

flaml/automl/automl.py

@@ -2539,7 +1535,10 @@ def cv_score_agg_func(val_loss_folds, log_metrics_folds):

        self._state._start_time_flag = self._start_time_flag = time.time()
        task = task or self._settings.get("task")
-        self._estimator_type = "classifier" if task in CLASSIFICATION else "regressor"
+        if isinstance(task, str):


Need to update the doc str for "task" in the fit function and the constructor?

Done in all of automl.py

qingyun-wu

Thanks for addressing the comments. The PR looks good to me now.

sonichi

Please address #857 (comment)

I'll merge after that.

Related work items: microsoft#493, microsoft#777, microsoft#820, microsoft#837, microsoft#843, microsoft#848, microsoft#849, microsoft#850, microsoft#853, microsoft#855, microsoft#857, microsoft#869, microsoft#870, microsoft#888, microsoft#894, microsoft#923, microsoft#924, microsoft#925, microsoft#934, microsoft#952, microsoft#962, microsoft#973, microsoft#975, microsoft#995

markharley and others added 19 commits November 14, 2022 19:44

Fix doc building post automl subpackage refactor

eb7aac9

Fix broken links in website post automl subpackage refactor

0eda959

Fix broken links in website post automl subpackage refactor

c3a567c

Remove vw from test deps as this is breaking the build

f148cac

Move default back to the top-level

7ce03a9

I'd moved this to automl as that's where it's used internally, but had missed that this is actually part of the public interface so makes sense to live where it was.

Re-add top level modules with deprecation warnings

739b256

flaml.data, flaml.ml and flaml.model are re-added to the top level, being re-exported from flaml.automl for backwards compatability. Adding a deprecation warning so that we can have a planned removal later.

Merge branch 'main' into subpackage-refactor-for-automl

4c008e8

Merge branch 'main' into subpackage-refactor-for-automl

e7c8f91

Merge branch 'main' into subpackage-refactor-for-automl

f845df8

Merge remote-tracking branch 'upstream/main' into subpackage-refactor…

36ffb1e

…-for-automl

Merge microsoft/main into here

2386989

Merge remote-tracking branch 'upstream/main' into subpackage-refactor…

3c53eca

…-for-automl

Fix model.py line-endings

d747851

WIP

3a6b95b

WIP - Notes below

1e51966

Got to the point where the methods from AutoML are pulled to GenericTask. Started removing private markers and removing the passing of automl to these methods. Done with decide_split_type, started on prepare_data. Need to do the others after

Merge remote-tracking branch 'upstream/main' into extract-task-class-…

938e3c9

…from-automl

Re-add generic_task

1cc1f1d

Merge remote-tracking branch 'upstream/main' into extract-task-class-…

9ca5f18

…from-automl

markharley requested a review from liususan091219 December 20, 2022 21:14

markharley marked this pull request as draft December 20, 2022 21:15

markharley requested review from qingyun-wu, sonichi and int-chaos December 20, 2022 21:17

sonichi reviewed Dec 21, 2022

View reviewed changes

flaml/automl/task/task.py Outdated Show resolved Hide resolved

EgorKraevTransferwise and others added 5 commits January 9, 2023 10:42

Fix tests: add Task.__str__

5a0694b

Fix tests: test for ray.ObjectRef

b5b6cc8

Hotwire TS_Sklearn wrapper to fix test fail

dfcca3b

Merge remote-tracking branch 'upstream/main' into extract-task-class-…

84114ff

…from-automl

Merge remote-tracking branch 'origin/extract-task-class-from-automl' …

0e2877a

…into extract-task-class-from-automl

coffepowered reviewed Feb 22, 2023

View reviewed changes

flaml/automl/task/task.py Show resolved Hide resolved

int-chaos reviewed Feb 24, 2023

View reviewed changes

markharley added 4 commits February 25, 2023 14:04

Merge remote-tracking branch 'upstream/main' into extract-task-class-…

c6aa576

…from-automl

Bump version to 1.1.2 in automl_xgboost

000603c

Add docstrings to the Task ABC

138b536

Fix import in custom_learner

88f258d

sonichi reviewed Feb 27, 2023

View reviewed changes

int-chaos added 2 commits March 6, 2023 15:58

fix 'optimize_for_horizon' for ts_sklearn

148f0c9

remove debugging print statements

eb9ef2c

sonichi reviewed Mar 6, 2023

View reviewed changes

EgorKraevTransferwise added 6 commits March 7, 2023 15:27

Merge remote-tracking branch 'origin/main' into extract-task-class-fr…

22f8562

…om-automl # Conflicts: # flaml/automl/automl.py # flaml/automl/ml.py

Check for is_forecast() before is_classification() in decide_split_type

9ab48c5

Merge remote-tracking branch 'mark_fork/extract-task-class-from-autom…

1e57d61

…l' into extract-task-class-from-automl # Conflicts: # flaml/automl/ml.py # flaml/automl/task/generic_task.py

Attempt to fix formatting fail

3b723a3

Another attempt to fix formatting fail

8d94b68

And another attempt to fix formatting fail

3a95f09

qingyun-wu reviewed Mar 8, 2023

View reviewed changes

EgorKraevTransferwise added 3 commits March 9, 2023 15:33

Add type annotations for task arg in signatures and docstrings

d7ce833

Fix formatting

927eb96

Fix linting

57f104e

qingyun-wu approved these changes Mar 10, 2023

View reviewed changes

sonichi approved these changes Mar 10, 2023

View reviewed changes

sonichi added this pull request to the merge queue Mar 10, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 10, 2023

sonichi added this pull request to the merge queue Mar 10, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 10, 2023

sonichi added this pull request to the merge queue Mar 11, 2023

Merged via the queue into microsoft:main with commit 27b2712 Mar 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract task class from automl #857

Extract task class from automl #857

markharley commented Dec 20, 2022

int-chaos left a comment

int-chaos Feb 24, 2023

markharley Feb 25, 2023

markharley Feb 26, 2023

sonichi left a comment

sonichi Feb 27, 2023

sonichi Feb 27, 2023

sonichi Mar 6, 2023

int-chaos Mar 7, 2023

sonichi Mar 8, 2023

qingyun-wu Mar 7, 2023

EgorKraevTransferwise Mar 9, 2023

qingyun-wu Mar 8, 2023

EgorKraevTransferwise Mar 9, 2023

qingyun-wu Mar 8, 2023

EgorKraevTransferwise Mar 9, 2023

qingyun-wu Mar 8, 2023

EgorKraevTransferwise Mar 9, 2023

qingyun-wu Mar 8, 2023

EgorKraevTransferwise Mar 9, 2023

qingyun-wu left a comment

sonichi left a comment

		@@ -45,6 +46,7 @@ def meta_feature(task, X_train, y_train, meta_feature_names):


		def load_config_predictor(estimator_name, task, location=None):
		task = str(task)

Extract task class from automl #857

Extract task class from automl #857

Conversation

markharley commented Dec 20, 2022

Why are these changes needed?

Related issue number

Checks

int-chaos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonichi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingyun-wu left a comment

Choose a reason for hiding this comment

sonichi left a comment

Choose a reason for hiding this comment