Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract task class from automl #857

Merged
merged 66 commits into from
Mar 11, 2023

Conversation

markharley
Copy link
Collaborator

Why are these changes needed?

This is the next PR in the series of refactors. In this we introduce the Task class to begin abstracting the task types. At present, we only introduce the abstract base Task class and a GenericTask which contains the task specific logic that was contained in AutoML.

Related issue number

None

Checks

markharley and others added 19 commits November 14, 2022 19:44
Moved some of the packages into an automl subpackage to tidy before the
task-based refactor. This is in response to discussions with the group
and a comment on the first task-based PR.

Only changes here are moving subpackages and modules into the new
automl, fixing imports to work with this structure and fixing some
dependencies in setup.py.
I'd moved this to automl as that's where it's used internally, but had
missed that this is actually part of the public interface so makes sense
to live where it was.
flaml.data, flaml.ml and flaml.model are re-added to the top level,
being re-exported from flaml.automl for backwards compatability. Adding
a deprecation warning so that we can have a planned removal later.
Got to the point where the methods from AutoML are pulled to
GenericTask. Started removing private markers and removing the passing
of automl to these methods. Done with decide_split_type, started on
prepare_data. Need to do the others after
flaml/automl/task/task.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@int-chaos int-chaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything else looks good to me!

@@ -29,7 +29,7 @@
"\n",
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the `notebook` option:\n",
"```bash\n",
"pip install flaml[notebook]\n",
"pip install flaml[notebook]==1.0.8\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why this version is 1.0.8 and not 1.1.2 since all the other ones are 1.1.2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, not sure why. I'll bump it. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've bumped it now and pushed

Copy link
Contributor

@sonichi sonichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed all but generic_task.py, which contains a large chunk of code moved from automl.py. Can someone confirm there is no unnecessary change incurred during the move?

@@ -674,14 +561,13 @@ def compute_estimator(
free_mem_ratio=0,
)
else:
val_loss, metric_for_logging, train_time, pred_time = evaluate_model_CV(
val_loss, metric_for_logging, train_time, pred_time = task.evaluate_model_CV(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future, get_val_loss can be moved into Task too.

@@ -0,0 +1,849 @@
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most code in this file is moved from automl.py. I didn't verify whether there is any unnecessary change when moving. I'll appreciate if another reviewer could check it.

flaml/automl/model.py Show resolved Hide resolved
columns=[self.hcrystaball_model.name],
index=X.index,
)
forecast = Series(preds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@int-chaos please check this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did this. this fixes the issue with automl.py/predict(), where converting the DataFrame into a Series was causing an error. There's no fundamental change to the overall output since in automl.py/predict() when self._label_transformer is true, it converts it to a pd.Series.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did this. this fixes the issue with automl.py/predict(), where converting the DataFrame into a Series was causing an error. There's no fundamental change to the overall output since in automl.py/predict() when self._label_transformer is true, it converts it to a pd.Series.

Could you create an issue to update the notebooks and examples in the documentation website?

must be the timestamp column (datetime type). Other
columns in the dataframe are assumed to be exogenous
variables (categorical or numeric).
For time series forecast tasks, the first column of X_train must be the timestamp column (datetime type). Other columns in the dataframe are assumed to be exogenous variables (categorical or numeric).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check formating issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI formatting check passes, what specific issues should we check?

@@ -45,6 +46,7 @@ def meta_feature(task, X_train, y_train, meta_feature_names):


def load_config_predictor(estimator_name, task, location=None):
task = str(task)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check task type

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. This will accept either str- or Task-valued inputs, what's there to check?

@@ -354,8 +356,8 @@ def sklearn_metric_loss_score(
return score


def get_y_pred(estimator, X, eval_metric, obj):
if eval_metric in ["roc_auc", "ap", "roc_auc_weighted"] and "binary" in obj:
def get_y_pred(estimator, X, eval_metric, task):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the annotation on the required task type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return self.name

@abstractmethod
def evaluate_model_CV(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the role of get_val_loss is similar to evaluate_model_CV and needs to be moved from ml.py to here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, there's no need to modify get_val_loss according to task type, so why should it be moved to here? Could we say we leave it for now, and move it to the Task class if there ever arises a need to for different implementations according to task?

@@ -2539,7 +1535,10 @@ def cv_score_agg_func(val_loss_folds, log_metrics_folds):

self._state._start_time_flag = self._start_time_flag = time.time()
task = task or self._settings.get("task")
self._estimator_type = "classifier" if task in CLASSIFICATION else "regressor"
if isinstance(task, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the doc str for "task" in the fit function and the constructor?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in all of automl.py

Copy link
Contributor

@qingyun-wu qingyun-wu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments. The PR looks good to me now.

Copy link
Contributor

@sonichi sonichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address #857 (comment)

I'll merge after that.

@sonichi sonichi added this pull request to the merge queue Mar 10, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 10, 2023
@sonichi sonichi added this pull request to the merge queue Mar 10, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 10, 2023
@sonichi sonichi added this pull request to the merge queue Mar 11, 2023
Merged via the queue into microsoft:main with commit 27b2712 Mar 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants