-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code generated for XGBoost models returns invalid scores when tree_method is set to "hist" #168
Comments
Hey @eafpres ! Thank you so much for reporting this. This is indeed quite bad. It'd be very helpful if you can provide some steps to reproduce the issue. Perhaps model's hyperparameters, or some sort of a synthetic dataset on which the problem reproduces. UPD If you can confirm whether the same problem occurs in other languages (eg Java) that'd be just fantastic. |
@izeigerman I will try to create a repro for you! Thank you for looking into it. FYI, to be clear it isn't just multiclass, the binary cases seem to produce the same issue (for me). |
Something is clearly wrong there. Eg. row 13 - probabilities don't even add up to 1.0 in m2cgen's predictions, which doesn't make sense. |
@eafpres Got it! May I please also ask you to share the m2cgen version that you're using ( |
Hi--version: m2cgen 0.6.0 note: installed using pip on 2002-02-19 |
Excellent, thank you. Another thing you may try is to downgrade to version 0.5.0. We had some significant changes in XGBoost-related code in 0.6.0 which may or may not have caused this behavior. |
Hello @eafpres ! Brief question: are you using DART in your XGBoost model? |
@StrikerRUS -- I am not using DART, here is my configuration:
|
I've got a reproduce.
(Last row) |
@eafpres for the sake of this investigation, can you please try without |
@izeigerman -- I will test it and report back. Thanks for the amazing support. |
Hi @izeigerman -- I changed my configuration to use tree_method = 'exact' and the results are now exactly identical from the two models. This was for the multi-class case. I'll also be testing the binary case and confirm that. |
@eafpres Got it, thank you so much for helping to triage the problem. We'll then be looking into why it misbehaves when the |
@eafpres @izeigerman Lines 93 to 94 in d058b8f
|
@StrikerRUS as far as I remember this is exactly how I reproduced it. Have you tried the |
@izeigerman Hmm, interesting... Thank you for the hint! I'll continue investigation. |
I can get you a repro in the next couple of days.
On Sat, Mar 21, 2020 at 7:34 AM Nikita Titov ***@***.***> wrote:
@izeigerman <https://github.com/izeigerman> Hmm, interesting... Thank you
for the hint! I'll continue investigation.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#168 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHIYPNYZT27YIVSWUSYH6JTRIS67DANCNFSM4KYGNTOQ>
.
--
Best regards,
Blaine Bateman
Founder & Chief Data Officer, EAF LLC
Inline image
|
@eafpres the issue has been fixed in version |
I saw the release yesterday! I have to update a model that I'm using this for this week, I'll test it both ways and advise if it works. A apologize for not getting the repro to you. Thanks for working on this so promptly! |
Hi, @izeigerman have you any thoughts on my previous comment or I need to open a new issue with my problem. |
I have trained xgboost models in Python and am using the CLI interface to convert the serialized models to pure python. However, when I use the pure python, the results differ from the predictions using the model directly.
Python 3.7
xgboost 0.90
My model has a large number of parameters (somewhat over 500).
Here are predicted class probabilities from the original model:
Here are the same predicted probabilities using the generated python code via m2cgen:
We can see that the results are similar but not the same. The result is a significant number of cases that are moved into different classes between the two sets of predictions.
I have also tested this with binary classification models and have the same issues.
The text was updated successfully, but these errors were encountered: