-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract task class from automl #857
Extract task class from automl #857
Conversation
Moved some of the packages into an automl subpackage to tidy before the task-based refactor. This is in response to discussions with the group and a comment on the first task-based PR. Only changes here are moving subpackages and modules into the new automl, fixing imports to work with this structure and fixing some dependencies in setup.py.
I'd moved this to automl as that's where it's used internally, but had missed that this is actually part of the public interface so makes sense to live where it was.
flaml.data, flaml.ml and flaml.model are re-added to the top level, being re-exported from flaml.automl for backwards compatability. Adding a deprecation warning so that we can have a planned removal later.
Got to the point where the methods from AutoML are pulled to GenericTask. Started removing private markers and removing the passing of automl to these methods. Done with decide_split_type, started on prepare_data. Need to do the others after
…into extract-task-class-from-automl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything else looks good to me!
notebook/automl_xgboost.ipynb
Outdated
@@ -29,7 +29,7 @@ | |||
"\n", | |||
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the `notebook` option:\n", | |||
"```bash\n", | |||
"pip install flaml[notebook]\n", | |||
"pip install flaml[notebook]==1.0.8\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why this version is 1.0.8 and not 1.1.2 since all the other ones are 1.1.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, not sure why. I'll bump it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've bumped it now and pushed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed all but generic_task.py, which contains a large chunk of code moved from automl.py. Can someone confirm there is no unnecessary change incurred during the move?
@@ -674,14 +561,13 @@ def compute_estimator( | |||
free_mem_ratio=0, | |||
) | |||
else: | |||
val_loss, metric_for_logging, train_time, pred_time = evaluate_model_CV( | |||
val_loss, metric_for_logging, train_time, pred_time = task.evaluate_model_CV( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future, get_val_loss
can be moved into Task too.
@@ -0,0 +1,849 @@ | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most code in this file is moved from automl.py. I didn't verify whether there is any unnecessary change when moving. I'll appreciate if another reviewer could check it.
columns=[self.hcrystaball_model.name], | ||
index=X.index, | ||
) | ||
forecast = Series(preds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@int-chaos please check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did this. this fixes the issue with automl.py/predict(), where converting the DataFrame into a Series was causing an error. There's no fundamental change to the overall output since in automl.py/predict() when self._label_transformer is true, it converts it to a pd.Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did this. this fixes the issue with automl.py/predict(), where converting the DataFrame into a Series was causing an error. There's no fundamental change to the overall output since in automl.py/predict() when self._label_transformer is true, it converts it to a pd.Series.
Could you create an issue to update the notebooks and examples in the documentation website?
…om-automl # Conflicts: # flaml/automl/automl.py # flaml/automl/ml.py
…l' into extract-task-class-from-automl # Conflicts: # flaml/automl/ml.py # flaml/automl/task/generic_task.py
must be the timestamp column (datetime type). Other | ||
columns in the dataframe are assumed to be exogenous | ||
variables (categorical or numeric). | ||
For time series forecast tasks, the first column of X_train must be the timestamp column (datetime type). Other columns in the dataframe are assumed to be exogenous variables (categorical or numeric). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check formating issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI formatting check passes, what specific issues should we check?
@@ -45,6 +46,7 @@ def meta_feature(task, X_train, y_train, meta_feature_names): | |||
|
|||
|
|||
def load_config_predictor(estimator_name, task, location=None): | |||
task = str(task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check task type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. This will accept either str- or Task-valued inputs, what's there to check?
flaml/automl/ml.py
Outdated
@@ -354,8 +356,8 @@ def sklearn_metric_loss_score( | |||
return score | |||
|
|||
|
|||
def get_y_pred(estimator, X, eval_metric, obj): | |||
if eval_metric in ["roc_auc", "ap", "roc_auc_weighted"] and "binary" in obj: | |||
def get_y_pred(estimator, X, eval_metric, task): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the annotation on the required task type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return self.name | ||
|
||
@abstractmethod | ||
def evaluate_model_CV( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the role of get_val_loss
is similar to evaluate_model_CV
and needs to be moved from ml.py
to here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, there's no need to modify get_val_loss
according to task type, so why should it be moved to here? Could we say we leave it for now, and move it to the Task class if there ever arises a need to for different implementations according to task?
@@ -2539,7 +1535,10 @@ def cv_score_agg_func(val_loss_folds, log_metrics_folds): | |||
|
|||
self._state._start_time_flag = self._start_time_flag = time.time() | |||
task = task or self._settings.get("task") | |||
self._estimator_type = "classifier" if task in CLASSIFICATION else "regressor" | |||
if isinstance(task, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update the doc str for "task" in the fit function and the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in all of automl.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments. The PR looks good to me now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address #857 (comment)
I'll merge after that.
Related work items: microsoft#493, microsoft#777, microsoft#820, microsoft#837, microsoft#843, microsoft#848, microsoft#849, microsoft#850, microsoft#853, microsoft#855, microsoft#857, microsoft#869, microsoft#870, microsoft#888, microsoft#894, microsoft#923, microsoft#924, microsoft#925, microsoft#934, microsoft#952, microsoft#962, microsoft#973, microsoft#975, microsoft#995
Why are these changes needed?
This is the next PR in the series of refactors. In this we introduce the Task class to begin abstracting the task types. At present, we only introduce the abstract base Task class and a GenericTask which contains the task specific logic that was contained in AutoML.
Related issue number
None
Checks