Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MissingShapeCalculator: Unable to find a shape calculator for type '<class 'xgboost.sklearn.XGBClassifier'>'. #1076

Open
Annanapan opened this issue Feb 28, 2024 · 3 comments

Comments

@Annanapan
Copy link

Annanapan commented Feb 28, 2024

When converting the pipeline to onnx, I met the error:

Unable to find a shape calculator for type '<class 'xgboost.sklearn.XGBClassifier'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.

The pipeline code contain a preprocessor and a XGB decision tree model, I created is as followed:

num_features = X.select_dtypes(include=['int64', 'float64']).columns
num_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')), 
    ('scaler', StandardScaler())])

cat_features = X.select_dtypes(include=['object', 'category']).columns
cat_transformer = Pipeline(steps=[
    ('ordinal', OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)),
    ('imputer', SimpleImputer(strategy='constant', fill_value=-1)),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', num_transformer, num_features),
        ('cat', cat_transformer, cat_features)
    ])


model_xgbt = xgb.XGBClassifier(
    booster='gbtree',
    n_estimators=1000,
    learning_rate=0.1,
    max_depth=5,
    scale_pos_weight=4,
    random_state=42
)

pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                           ('classifier', model_xgbt)])
pipeline.fit(X_train, y_train)


initial_type = [
    ('age', FloatTensorType([None, 1])),
    ('zipCode', StringTensorType([None, 1])),
    ('es', StringTensorType([None, 1])),
    ('ec', StringTensorType([None, 1])),
    ('oc', StringTensorType([None, 1])),
    ('income', StringTensorType([None, 1])),
    ('nw', StringTensorType([None, 1])),
    ('ie', StringTensorType([None, 1])),
    ('irt', StringTensorType([None, 1])),
    ('ig', StringTensorType([None, 1])),
    ('r, StringTensorType([None, 1])),
    ('tr', StringTensorType([None, 1])), 
    ('st', Int64TensorType([None, 1]))
]

onnx_model = convert_sklearn(pipeline, initial_types=initial_type)

with open("pipeline_model_xgbt.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

The error occurs when converting the pipeline. I researched that all the steps in preprocessor are acceptable. I wonder whether it's onnx that cannot deal with complex transformers.

versions:
skl2onnx: 1.16.0
sklearn: 1.4.0
Python: 3.11.7

@Annanapan Annanapan changed the title Unable to find a shape calculator for type '<class 'xgboost.sklearn.XGBClassifier'>'. MissingShapeCalculator: Unable to find a shape calculator for type '<class 'xgboost.sklearn.XGBClassifier'>'. Feb 28, 2024
@Annanapan
Copy link
Author

I add:

update_registered_converter(
    XGBClassifier,
    "XGBoostClassifier",
    calculate_linear_classifier_output_shapes,
    convert_xgboost,
    options={"nocl": [True, False], "zipmap": [True, False, "columns"]}
)

and the error dissappears, but the predictions are different from the raw predictions using pipelines directly

@Annanapan
Copy link
Author

Is this the right way to prepare the input for prediction using onnx?

prepare X_test_inputs for onnx model

X_test_inputs = {c: X_test[c].values for c in X_test.columns}

for c in num_features:
    v = X_test[c].dtype
    if v == "float64":
       X_test_inputs[c] = X_test_inputs[c].astype(np.float32)
for k in X_test_inputs:
    X_test_inputs[k] = X_test_inputs[k].reshape((X_test_inputs[k].shape[0], 1))

@xadupre
Copy link
Collaborator

xadupre commented Apr 4, 2024

You should follow this tutorial to register a XGB model: https://onnx.ai/sklearn-onnx/auto_tutorial/plot_gexternal_xgboost.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants