Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent prediction when the order of the columns of pandas dataframe is different from training #4102

Closed
louis925 opened this issue Mar 24, 2021 · 3 comments

Comments

@louis925
Copy link

louis925 commented Mar 24, 2021

Description

Reorder the columns of a pandas dataframe during prediction will result in inconsistent result if that columns order is different from what was used in training. Lightgbm python API doesn't seem to recognize the column names of pandas dataframes.

Reproducible example

import numpy as np
import pandas as pd
import lightgbm as lgb
df = pd.DataFrame(
    [[0, 1, 1]] * 24 + [[0, 1, 0]] * 24 + [[1, 0, 0]] * 21 + [[1, 0, 1]] * 21,
    columns=['y', 'x1', 'x2'],
)
cols_feat = ['x1', 'x2']
y_true = df['y']
lgb_train = lgb.Dataset(df[cols_feat], y_true)
model = lgb.train({}, lgb_train)

# original order of columns
y_pred = model.predict(df[cols_feat])
print(y_pred)
# [1.23953194e-05, ..., 9.99985834e-01, ...]
print('loss:', np.mean((y_pred - y_true)**2))
# loss: 1.7559308212239284e-10

# reverse the order of columns
y_pred2 = model.predict(df[cols_feat[::-1]])
print(y_pred2)
# [1.23953194e-05, ..., 9.99985834e-01, ..., 1.23953194e-05. ...]
print('loss:', np.mean((y_pred2 - y_true)**2))
# loss: 0.49998666045229395, much worse than the original order

Environment info

LightGBM version or commit hash: 2.3.1

Command(s) you used to install LightGBM

pip3 install lightgbm

Additional Comments

Not sure if this is a bug or should be a feature request. Since Lightgbm model clearly know the feature name (model.feature_name()) but the prediction stage doesn't check this, which can lead to some confusion, it will be great if lightgbm can check the order of columns for prediction. Thanks!

@StrikerRUS
Copy link
Collaborator

Not sure if this is a bug or should be a feature request.

It is more likely to be a feature request. Please refer to #4020 (comment).

@StrikerRUS
Copy link
Collaborator

I remembered we already have a such feature request. Anybody is welcome to contribute!
Refer to #812 (comment).

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants