Converting sklearn pipelines with ebm models to onnx #9

ReneeErnst · 2023-06-13T17:31:33Z

ebm2onnx version: 3.1.1
Python version: 3.9.7
Operating System: ubuntu

Description

Feature request: Ability to use EBM models in Sklearn pipelines, and be able to convert that pipeline to ONNX. Would require some work to be able to register the model when using sklearn-onnx.

I want to save a sklearn pipeline that includes an EBM model to ONNX, rather than JUST the EBM model. This is a common use case where you want to pair your data processing with the model object in a pipeline. It does not appear that this functionality is included at this time.

Ideally ebm2onnx would have functionality to handle saving these pipelines that include ebm models to onnx. Example script that would ideally work below.

What I Did

import ebm2onnx
import pandas as pd
from interpret import glassbox
from sklearn import compose, impute, pipeline, preprocessing

features = [
    "feature_a",
    "feature_b",
    "feature_c",
    "feature_d",
    "feature_e",
    "feature_f",
    "feature_g",
]

df_train = pd.DataFrame(
    {
        "feature_a": [0, 0.5, 2, 5],
        "feature_b": [0, 0.5, 2, 5],
        "feature_c": [0, 0.5, 2, 5],
        "feature_d": [0, 0.5, 2, 5],
        "feature_e": [0, 1, 0, 1],
        "feature_f": [1, 0, 1, 0],
        "feature_g": ["a", "b", "can_not_determine", "can_not_determine"],
        "target": [1, 1, 0, 0],
    }
)
numeric_mean_transformer = pipeline.Pipeline(
    steps=[
        ("imputer", impute.SimpleImputer(strategy="mean")),
        ("scaler", preprocessing.StandardScaler()),
    ]
)

numeric_median_transformer = pipeline.Pipeline(
    steps=[
        ("imputer", impute.SimpleImputer(strategy="median")),
        ("scaler", preprocessing.StandardScaler()),
    ]
)

categorical_transformer = pipeline.Pipeline(
    steps=[
        (
            "onehot",
            preprocessing.OneHotEncoder(
                sparse=True,
                # Assumes I have 2 bool and 1 cat feature, and I'm specifying what
                # values I want to drop when one hot encoding.
                drop=list([0, 0, "can_not_determine"]),
                handle_unknown="ignore",
            ),
        )
    ]
)

preprocessor = compose.ColumnTransformer(
    transformers=[
        (
            "num_mean",
            numeric_mean_transformer,
            ["feature_a", "feature_b"],
        ),
        (
            "num_median",
            numeric_median_transformer,
            ["feature_c", "feature_d"],
        ),
        ("cat", categorical_transformer, ["feature_e", "feature_f", "feature_g"]),
    ]
)

my_pipeline = pipeline.Pipeline(
    [
        ("preprocessor", preprocessor),
        (
            "model",
            glassbox.ExplainableBoostingClassifier(
                max_bins=8,
                min_samples_leaf=2,
                max_leaves=2,
                learning_rate=0.5,
                validation_size=0.5,
                early_stopping_rounds=5,
                interactions=10,
                random_state=42,
            ),
        ),
    ]
)

my_pipeline.fit(df_train[features], df_train["target"])

onnx_pipeline = ebm2onnx.to_onnx(
    my_pipeline, ebm2onnx.get_dtype_from_pandas(df_train[features])
)

The text was updated successfully, but these errors were encountered:

ReneeErnst · 2023-06-13T17:36:13Z

I may have jumped into this too fast, and will update if I get this working. I think I can register this converter and make it work, as documented here: https://onnx.ai/sklearn-onnx/pipeline.html

ReneeErnst · 2023-06-13T18:01:30Z

Yeah, looks like some additional work would be needed. It would be great to have this included in ebm2onnx.

MainRo · 2023-06-26T09:47:47Z

Yes, this is something that would be great to have.
The conversion of the full skleanr pipeline cannot be done by emb2onnx. This converter is only for the ebm model.

I will look further in the sklearn documentation but according to the link you provided, we just need to register the ebm converter:
https://onnx.ai/sklearn-onnx/pipeline.html#new-converters-in-a-pipeline

Then, skl2onnx should be able to convert the whole pipeline including the ebm model.

ReneeErnst · 2023-06-28T01:46:29Z

That makes complete sense. After poking around a bit, I figured that it likely wouldn't end up in ebm2onnx, but instead need to be registered. Hopefully that's something that could happen - it would be an awesome add.

MainRo · 2024-11-04T10:44:26Z

Hello @ReneeErnst I have something working.
Is it ok that I add your code as an example?

ReneeErnst · 2024-11-04T13:18:43Z

Hello @ReneeErnst I have something working. Is it ok that I add your code as an example?

Sure - go for it!

MainRo · 2024-11-05T17:33:02Z

Release v3.3.0 contains an initial support for scikit-learn pipelines.
There is still some improvements needed (regressors, and sometimes sklearn predictions differ from onnx-runtime), but it does work.
You can test it here: https://mybinder.org/v2/gh/interpretml/ebm2onnx/master?filepath=examples%2Fsklearn-pipeline.ipynb

MainRo added the enhancement New feature or request label Jun 26, 2023

MainRo closed this as completed Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting sklearn pipelines with ebm models to onnx #9

Converting sklearn pipelines with ebm models to onnx #9

ReneeErnst commented Jun 13, 2023 •

edited

Loading

ReneeErnst commented Jun 13, 2023

ReneeErnst commented Jun 13, 2023

MainRo commented Jun 26, 2023

ReneeErnst commented Jun 28, 2023

MainRo commented Nov 4, 2024

ReneeErnst commented Nov 4, 2024

MainRo commented Nov 5, 2024

Converting sklearn pipelines with ebm models to onnx #9

Converting sklearn pipelines with ebm models to onnx #9

Comments

ReneeErnst commented Jun 13, 2023 • edited Loading

Description

What I Did

ReneeErnst commented Jun 13, 2023

ReneeErnst commented Jun 13, 2023

MainRo commented Jun 26, 2023

ReneeErnst commented Jun 28, 2023

MainRo commented Nov 4, 2024

ReneeErnst commented Nov 4, 2024

MainRo commented Nov 5, 2024

ReneeErnst commented Jun 13, 2023 •

edited

Loading