-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Divergence in HistGradientBoostingClassifier's scores #1051
Comments
After some digging, I think this might be related to missing categorical support — everything works as expected when using one-hot encoding in the preprocessor. @xadupre I am happy to try filing a PR if you think it's a good idea to add support for categoricals. Wdyt? |
I did not check their implementation recently but if scikit-learn supports categories the same way lightgbm does, I guess they use the rule |
The right of doing it is to implement the latest onnx specifications (onnx/onnx#5874) and then to update onnxruntime to support it. |
The probloem with one-hot encoding is Histogram Gradient boosting might learn weird interactions between each one-hot encoded feature during modeling. Therefore, it might not be the same as specifying that feature as categorical in model definition with |
Sorry for being so late in replying. Sadly, we haven't found the capacity to contribute upstream this quarter. 👎
Agreed. The other downside of one-hot encoding is that you need a lot of memory when the cardinality of the categorical feature(s) is high. |
I think an update to onnxruntime is pending review :) |
Hi,
I am trying to build a standard pipeline for tabular data that works nicely with ONNX. Ideally, the pipeline would:
To keep debugging simple, I have built a pipeline that covers points 1-3. Preprocessing works fine, but
HistGradientBoostingClassifier
returns different predictions (see gist).Any ideas why this might happen? Are there known issues with
HistGradientBoostingClassifier
?Thank you!
Package versions:
The text was updated successfully, but these errors were encountered: