-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn when passing labels with missing values #4483
Comments
Hi , I want to make my first contribution . Can I work on this? |
Sure. Please do give it a shot. |
Thanks for starting this discussion, @ChristophAymannsQC ! @kurchi1205 before you get too far into trying to implement this, I think we should get maintainers' opinions on this question. I have some initial thoughts.
I'm also not totally convinced that the cost of this check would be worth the value of the warning. Picking this up as described would make every Dataset construction in LightGBM a bit slower. I'm curious to here what @StrikerRUS and @shiyu1994 think. |
@jameslamb Thanks a lot for your thoughts. Didn't want to step on anyone's toes here. :) |
I agree that we should raise a warning instead of dropping samples silently. At least, I cannot figure out when it would make sense to pass a sample with |
@ChristophAymannsQC Thank you for pointing that out! |
^ @shiyu1994 what do you think about this? Do you think it wouldn't be too costly in Dataset construction, and that we should add this as a feature request? |
I think checking NaN in labels has little effect on the speed in Dataset construction, since the heavier part is on parsing features and constructing bins. We can add this as a feature request. |
Ok thanks! I've added this to #2302 with other features. @kurchi1205 are you still interested in contributing this? We'd love the help! If you pick this up, please consider the following:
Let us know here if you need some additional guidance. |
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM! |
Summary
Currently (it seems to me) missing values (
np.nan
) are silently replaced by 0s whenobjective=regression
(I haven't checked other objective functions), possibly related to this. It would be nice if this was made transparent to the user who might be inadvertently passing missing values. Alternatively, observations with missing values could be dropped.Minimal example
The text was updated successfully, but these errors were encountered: