Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting sklearn pipelines with ebm models to onnx #9

Closed
ReneeErnst opened this issue Jun 13, 2023 · 7 comments
Closed

Converting sklearn pipelines with ebm models to onnx #9

ReneeErnst opened this issue Jun 13, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@ReneeErnst
Copy link

ReneeErnst commented Jun 13, 2023

  • ebm2onnx version: 3.1.1
  • Python version: 3.9.7
  • Operating System: ubuntu

Description

Feature request: Ability to use EBM models in Sklearn pipelines, and be able to convert that pipeline to ONNX. Would require some work to be able to register the model when using sklearn-onnx.

I want to save a sklearn pipeline that includes an EBM model to ONNX, rather than JUST the EBM model. This is a common use case where you want to pair your data processing with the model object in a pipeline. It does not appear that this functionality is included at this time.

Ideally ebm2onnx would have functionality to handle saving these pipelines that include ebm models to onnx. Example script that would ideally work below.

What I Did

import ebm2onnx
import pandas as pd
from interpret import glassbox
from sklearn import compose, impute, pipeline, preprocessing

features = [
    "feature_a",
    "feature_b",
    "feature_c",
    "feature_d",
    "feature_e",
    "feature_f",
    "feature_g",
]

df_train = pd.DataFrame(
    {
        "feature_a": [0, 0.5, 2, 5],
        "feature_b": [0, 0.5, 2, 5],
        "feature_c": [0, 0.5, 2, 5],
        "feature_d": [0, 0.5, 2, 5],
        "feature_e": [0, 1, 0, 1],
        "feature_f": [1, 0, 1, 0],
        "feature_g": ["a", "b", "can_not_determine", "can_not_determine"],
        "target": [1, 1, 0, 0],
    }
)
numeric_mean_transformer = pipeline.Pipeline(
    steps=[
        ("imputer", impute.SimpleImputer(strategy="mean")),
        ("scaler", preprocessing.StandardScaler()),
    ]
)

numeric_median_transformer = pipeline.Pipeline(
    steps=[
        ("imputer", impute.SimpleImputer(strategy="median")),
        ("scaler", preprocessing.StandardScaler()),
    ]
)

categorical_transformer = pipeline.Pipeline(
    steps=[
        (
            "onehot",
            preprocessing.OneHotEncoder(
                sparse=True,
                # Assumes I have 2 bool and 1 cat feature, and I'm specifying what
                # values I want to drop when one hot encoding.
                drop=list([0, 0, "can_not_determine"]),
                handle_unknown="ignore",
            ),
        )
    ]
)

preprocessor = compose.ColumnTransformer(
    transformers=[
        (
            "num_mean",
            numeric_mean_transformer,
            ["feature_a", "feature_b"],
        ),
        (
            "num_median",
            numeric_median_transformer,
            ["feature_c", "feature_d"],
        ),
        ("cat", categorical_transformer, ["feature_e", "feature_f", "feature_g"]),
    ]
)

my_pipeline = pipeline.Pipeline(
    [
        ("preprocessor", preprocessor),
        (
            "model",
            glassbox.ExplainableBoostingClassifier(
                max_bins=8,
                min_samples_leaf=2,
                max_leaves=2,
                learning_rate=0.5,
                validation_size=0.5,
                early_stopping_rounds=5,
                interactions=10,
                random_state=42,
            ),
        ),
    ]
)

my_pipeline.fit(df_train[features], df_train["target"])

onnx_pipeline = ebm2onnx.to_onnx(
    my_pipeline, ebm2onnx.get_dtype_from_pandas(df_train[features])
)
@ReneeErnst
Copy link
Author

I may have jumped into this too fast, and will update if I get this working. I think I can register this converter and make it work, as documented here: https://onnx.ai/sklearn-onnx/pipeline.html

@ReneeErnst
Copy link
Author

Yeah, looks like some additional work would be needed. It would be great to have this included in ebm2onnx.

@MainRo MainRo added the enhancement New feature or request label Jun 26, 2023
@MainRo
Copy link
Collaborator

MainRo commented Jun 26, 2023

Yes, this is something that would be great to have.
The conversion of the full skleanr pipeline cannot be done by emb2onnx. This converter is only for the ebm model.

I will look further in the sklearn documentation but according to the link you provided, we just need to register the ebm converter:
https://onnx.ai/sklearn-onnx/pipeline.html#new-converters-in-a-pipeline

Then, skl2onnx should be able to convert the whole pipeline including the ebm model.

@ReneeErnst
Copy link
Author

That makes complete sense. After poking around a bit, I figured that it likely wouldn't end up in ebm2onnx, but instead need to be registered. Hopefully that's something that could happen - it would be an awesome add.

@MainRo
Copy link
Collaborator

MainRo commented Nov 4, 2024

Hello @ReneeErnst I have something working.
Is it ok that I add your code as an example?

@ReneeErnst
Copy link
Author

Hello @ReneeErnst I have something working. Is it ok that I add your code as an example?

Sure - go for it!

@MainRo
Copy link
Collaborator

MainRo commented Nov 5, 2024

Release v3.3.0 contains an initial support for scikit-learn pipelines.
There is still some improvements needed (regressors, and sometimes sklearn predictions differ from onnx-runtime), but it does work.
You can test it here: https://mybinder.org/v2/gh/interpretml/ebm2onnx/master?filepath=examples%2Fsklearn-pipeline.ipynb

@MainRo MainRo closed this as completed Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

2 participants