-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pandas nullable types to the sklearn api #4173
Comments
I'd like to highlight that
But this warning has been there at least since 2019: #2486 (comment). |
Closed in favor of being in #2302. We decided to keep all feature requests in one place. Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature. |
Reopening since I'm working on this. My understanding of LightGBM/python-package/lightgbm/basic.py Lines 549 to 551 in af5b40e
is that the data gets converted to either float32 or float64, depending on the common dtype. So my approach is checking if there are any I'd like to get feedback on this approach before opening a PR. The latency for predictions actually seems to be lower with this, since this allows me to change the logic in LightGBM/python-package/lightgbm/basic.py Line 504 in af5b40e
pd.api.types.is_numeric_dtype on each dtype.
|
…4927) * map nullable dtypes to regular float dtypes * cast x3 to float after introducing missing values * add test for regular dtypes * use .astype and then values. update nullable_dtypes test and include test for regular numpy dtypes * more specific allowed dtypes. test no copy when single float dtype df * use np.find_common_type. set np.float128 to None when it isn't supported * set default as type(None) * move tests that use lgb.train to test_engine * include np.float32 when finding common dtype * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * add linebreak Co-authored-by: Nikita Titov <[email protected]>
Hi @jmoralez thanks for pushing this ahead. I am currently using lightgbm v3.3.2 and attempting to use Int64 as a dtype for some features but i am still seeing the usual error
Can you confirm the status of this feature? is this a bug or am i missing something? |
Hi @DanielMS93. The changes were merged recently so they weren't included in the 3.3.2 version, they will be available in the next release. If you want to use them now you can install from GitHub. |
... or install nightly build if you are not comfortable to compile cpp code |
It seems like they also failed to make it into the 3.3.3 release. At least I dont see it in the release notes and I am still seeing the error with an
Any word on when this will make it in? |
@leahmcguire v3.3.3 was a special patch release just to keep CRAN from removing the R package. It doesn't contain the fix from #4927. The change from #4927 will be in v4.0.0. We don't have an estimated date, but you can subscribe to #5153 to be notified when that release goes out. |
This issue has been automatically locked since there has not been any recent activity since it was closed. |
Summary
I would like to use lightgbm's sklearn api with the new pandas nullable types. Currently, lightgbm does not recognize these types as valid. Reproducer using
3.0.0
and pandas0.24.1
:Motivation
As a user, I would like to build a data processing pipeline using the latest pandas features. I can work around lightgbm's limitations but this is cumbersome and some information could be lost in the conversion.
Thank you!
Description
References
Pandas nullable integer dtype documentation.
The text was updated successfully, but these errors were encountered: