Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightGBMError: Check failed: best_split_info.left_count > 0 for ranking task #2742

Closed
kenyeung128 opened this issue Feb 5, 2020 · 33 comments · Fixed by #2824
Closed

LightGBMError: Check failed: best_split_info.left_count > 0 for ranking task #2742

kenyeung128 opened this issue Feb 5, 2020 · 33 comments · Fixed by #2824

Comments

@kenyeung128
Copy link

kenyeung128 commented Feb 5, 2020

hi,

when would lightgbm 2.3.2 be released? as the documentation online (https://lightgbm.readthedocs.io/) is 2.3.2 with some new features but the latest python package is lightgbm 2.3.1. Thanks.

@StrikerRUS
Copy link
Collaborator

@kenyeung128 Hi!

You can download the latest nightly version from this page: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html if you do not want to build from sources.

I think we need some time to test large changes introduced in #2699, fix some critical issues, wait for any news for #2628 and so on.

Also, I believe the next release should be 2.4.0 at least, or even 3.0.0, according to the semantic versioning we try to follow due to removal of some parameters.

cc @guolinke

@kenyeung128
Copy link
Author

hi @StrikerRUS thanks for the info, i have followed the installation guide and installed the python version with the latest source code.

however, i get an error as below when i tried to fit a LGBMRanker

LightGBMError: Check failed: best_split_info.left_count > 0 at /data/github/LightGBM/src/treelearner/serial_tree_learner.cpp, line 706 .

do u have any idea?

@guolinke
Copy link
Collaborator

guolinke commented Feb 6, 2020

@kenyeung128 can you provide the reproduce example?

@kenyeung128
Copy link
Author

hi @guolinke

the example is like this

gbm = lgb.LGBMRanker(n_estimators=800, score='ncdg')
gbm.fit(x_train.drop(columns=columns_exclude), y_train, group=q_train,
categorical_feature=columns_cateogry,
eval_set=[(x_valid.drop(columns=columns_exclude), y_valid)],
eval_group=[q_valid], eval_at=[1,2,3], early_stopping_rounds=200, verbose=True,
eval_metric='ndcg')

and found out seems the root cause is that it's using categorical_feature, there are examples in x_train but not in x_valid, so it throws out the exception. But in previous version 2.3.1 lightgbm, it's working

@guolinke
Copy link
Collaborator

guolinke commented Feb 9, 2020

Thanks @kenyeung128 , could you provide the data for debugging?

@kenyeung128
Copy link
Author

kenyeung128 commented Feb 10, 2020

@guolinke it's quite a big dataset, but i tried to provide data in smaller dataset but it didn't come up with the error. I suspect it's something with the split of data not evenly distributed between train/valid (for those categorical ones), and somehow it throws the exception

@StrikerRUS StrikerRUS changed the title lightgbm 2.3.2 release LightGBMError: Check failed: best_split_info.left_count > 0 for ranking task Feb 17, 2020
@guolinke
Copy link
Collaborator

@kenyeung128 is that possible to reproduce this by randomly generated data?

@rgranvil
Copy link
Contributor

@guolinke I've also been running into this error when fitting models with categorical predictors in the latest master branch. Here is a reproducible example in R v3.5.2:

Code:

library(lightgbm)
set.seed(1)
data <- data.frame(
  y = as.integer(runif(1000) > .5)
  ,x = sample(c(1,1,1,2), 1000, replace = TRUE)
)
data_matrix <- as.matrix(data[, "x", drop = FALSE])
dtrain <- lgb.Dataset(data_matrix, label = data$y, categorical_feature = "x")

model <- lgb.train(
  params = list(objective = "binary")
  ,data = dtrain
  ,nrounds = 1
)

Error:

[LightGBM] [Fatal] Check failed: best_split_info.right_count > 0 at /tmp/RtmpE5HQrB/R.INSTALL1226e7bebac9e/lightgbm/src/src/treelearner/serial_tree_learner.cpp, line 706 .

@guolinke
Copy link
Collaborator

@kenyeung128 @rgranvil thanks very much, could you try the #2824 ?

@rgranvil
Copy link
Contributor

@guolinke I confirmed #2824 fixes the issue for me. Thanks for the quick fix.

@chameleonTK
Copy link

I still found this issue when I use GPU.

@DeiPlusAY
Copy link

I also found this issue on GPU build.

@Dronablo
Copy link

Dronablo commented Jun 2, 2020

I also found this issue occurring from time to time on relatively fresh GPU build. For ~4 Gb dataset with 250000 rows and 1500 columns it could train normally for hours, than I got
LightGBMError: Check failed: (best_split_info.right_count) > (0) at D:\Python\projects\lib\2003_lgbm\LightGBM\src\treelearner\serial_tree_learner.cpp, line 614 which is at

CHECK_GT(best_split_info.right_count, 0);
in current code. Any ideas @guolinke, @StrikerRUS ?

@chutcheson
Copy link

chutcheson commented Oct 4, 2020

I'm encountering a very similar issue to this on 3.0.0 on GPU. I dropped my categorical columns because I was concerned that was the cause (looking at another issue on the GitHub).

But, I still get the error, even though I might hit it less frequently?

Also, I seem to hit the error quite randomly, so for instance, this set of parameters when passed to the train class created the error:

{'n_estimators': 70, 'learning_rate': 0.5087241341951028, 'objective': 'multiclass', 'num_classes': 12, 'num_leaves': 18, 'device': 'GPU', 'verbose': -1}

But, these two did not:

{'n_estimators': 97, 'learning_rate': 0.25286309514637206, 'objective': 'multiclass', 'num_classes': 12, 'num_leaves': 15, 'device': 'GPU', 'verbose': -1}

{'n_estimators': 169, 'learning_rate': 0.6275184529172673, 'objective': 'multiclass', 'num_classes': 12, 'num_leaves': 20, 'device': 'GPU', 'verbose': -1}

And the exact error that I recieved:

Trial 20 failed because of the following error: LightGBMError('Check failed: (best_split_info.left_count) > (0) at /tmp/pip-install-aw2q0tg1/lightgbm/compile/src/treelearner/serial_tree_learner.cpp, line 630 .\n')

Just, for reference, to show my original function call:

clf = lightgbm.train(params=params, train_set=train_dataset)

@MrRobot2211
Copy link

Is this still going on? we should start a new entry

@Marmeladenbrot
Copy link

Marmeladenbrot commented Jan 15, 2021

I have this error on CPU with the latest version on Windows 10 x64.

Data is private so I sadly can't share it.

Check failed: (best_split_info.right_count) > (0) at c:\users\vssadministrator\appdata\local\temp\pip-req-build-lqyem8a1\compile\src\treelearner\serial_tree_learner.cpp, line 661 .

@muttoni
Copy link

muttoni commented Jan 21, 2021

Same here on CPU with macOS using virtualenv

File "/Users/user/.virtualenvs/flask/lib/python3.8/site-packages/lightgbm/sklearn.py", line 770, in fit
    super(LGBMRegressor, self).fit(X, y, sample_weight=sample_weight,
  File "/Users/user/.virtualenvs/flask/lib/python3.8/site-packages/lightgbm/sklearn.py", line 612, in fit
    self._Booster = train(params, train_set,
  File "/Users/user/.virtualenvs/flask/lib/python3.8/site-packages/lightgbm/engine.py", line 252, in train
    booster.update(fobj=fobj)
  File "/Users/user/.virtualenvs/flask/lib/python3.8/site-packages/lightgbm/basic.py", line 2458, in update
    _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
  File "/Users/user/.virtualenvs/flask/lib/python3.8/site-packages/lightgbm/basic.py", line 55, in _safe_call
    raise LightGBMError(decode_string(_LIB.LGBM_GetLastError()))
lightgbm.basic.LightGBMError: Check failed: (best_split_info.left_count) > (0) at /Users/runner/work/1/s/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 651 .

Dependencies:

Requirement already satisfied: lightgbm in /Users/user/.virtualenvs/flask/lib/python3.8/site-packages (3.1.1)
Requirement already satisfied: scikit-learn!=0.22.0 in /Users/user/.virtualenvs/flask/lib/python3.8/site-packages (from lightgbm) (0.24.0)
Requirement already satisfied: scipy in /Users/user/.virtualenvs/flask/lib/python3.8/site-packages (from lightgbm) (1.6.0)
Requirement already satisfied: wheel in /Users/user/.virtualenvs/flask/lib/python3.8/site-packages (from lightgbm) (0.36.2)
Requirement already satisfied: numpy in /Users/user/.virtualenvs/flask/lib/python3.8/site-packages (from lightgbm) (1.19.5)
Requirement already satisfied: joblib>=0.11 in /Users/user/.virtualenvs/flask/lib/python3.8/site-packages (from scikit-learn!=0.22.0->lightgbm) (1.0.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/user/.virtualenvs/flask/lib/python3.8/site-packages (from scikit-learn!=0.22.0->lightgbm) (2.1.0)

The df is a pkl'd file that is working fine on Kaggle notebooks.

@shiyu1994
Copy link
Collaborator

@muttoni @chutcheson Could you provide a reproducible example, or, if possible, share your dataset with us? We are trying to remove this bug before the next release. Any help is really appreciated. Thanks.

@arkothiwala
Copy link

arkothiwala commented Feb 9, 2021

lgbm bug.zip

The issue is reproducible with attached dataset @shiyu1994 @StrikerRUS

Error message:

LightGBMError: Check failed: (best_split_info.left_count) > (0) at D:\a\1\s\python-package\compile\src\treelearner\serial_tree_learner.cpp, line 651 .

Code to reproduce:

from sklearn.model_selection import train_test_split
import pandas as pd
import lightgbm

train_df, test_df  = pd.read_csv('train.csv'),  pd.read_csv('test.csv')
final_df = pd.concat([train_df, test_df], ignore_index = True)
y = final_df["domain"]
x = final_df.drop("domain", axis = 1)

clf = lightgbm.LGBMClassifier(n_jobs=-1, boosting_type='goss', random_state=0)

# Change test_size parameter to remove error 
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=0) 
clf.fit(x_train,y_train)

# Uncomment this line to remove error
#clf.fit(x,y)                                             

Few Observations

  1. Does not throw error upon changing test_size in train_test_split from 0.2 to any other value (0.21, 0.3 etc.)
  2. Does not throw error if you run clf.fit(x,y) instead of clf.fit(x_train, y_train)
  3. Throws same error if random_state is changed in either sklearn.model_selection.train_test_split or lightgbm.LGBMClassifier

@victor-ab
Copy link

I ran into the same problem here when using multiclassova. Multiclass worked nicely.

@zacheberhart-kd
Copy link

I also have this issue when using a similar workflow as the example @arkothiwala posted.

@shiyu1994
Copy link
Collaborator

shiyu1994 commented Feb 19, 2021

@arkothiwala @zacheberhart-kd Thanks! The example is OK with the latest master branch. You can clone the source code and build the python package from source.

@Teeeto
Copy link

Teeeto commented Mar 12, 2021

I built from source today, have this error on GPU both Windows and Linux

Params = {
'objective': 'multiclass',
'metric':['auc_mu','multiclass'],
'device':'gpu',
'num_threads': 6,
'num_leaves': 127,
'num_class': 7,
'max_bin': 63,
'min_data_in_leaf':50,
'bin_construct_sample_cnt' :51001000,
'learning_rate': 0.03,
'verbose': 1
}

[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 29705
[LightGBM] [Info] Number of data points in the train set: 8293105, number of used features: 964
[LightGBM] [Info] Using GPU Device: TITAN RTX, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 64 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 935 dense feature groups (7402.75 MB) transferred to GPU in 2.895087 secs. 1 sparse feature groups
[LightGBM] [Info] Start training from score -1.620340
[LightGBM] [Info] Start training from score -2.736469
[LightGBM] [Info] Start training from score -2.609875
[LightGBM] [Info] Start training from score -2.799483
[LightGBM] [Info] Start training from score -2.936053
[LightGBM] [Info] Start training from score -3.110144
[LightGBM] [Info] Start training from score -0.682573
[LightGBM] [Fatal] Check failed: (best_split_info.left_count) > (0) at /C/lgb/LightGBM/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 653 .

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.8/dist-packages/lightgbm/engine.py", line 249, in train
booster.update(fobj=fobj)
File "/usr/local/lib/python3.8/dist-packages/lightgbm/basic.py", line 2636, in update
_safe_call(_LIB.LGBM_BoosterUpdateOneIter(
File "/usr/local/lib/python3.8/dist-packages/lightgbm/basic.py", line 110, in _safe_call
raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
lightgbm.basic.LightGBMError: Check failed: (best_split_info.left_count) > (0) at /C/lgb/LightGBM/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 653 .

@shiyu1994
Copy link
Collaborator

@Teeeto Thanks for the information. For GPU version, this bug has a separated issue #2793. Currently, we are focusing on releasing a new version. And fixing this bug is not included in the new release plan. But I can investigate this next week.

@abbey2017
Copy link

@Teeeto Increasing the number of features resolved the exception for me on CPU.

@Teeeto
Copy link

Teeeto commented Mar 15, 2021

@Teeeto Increasing the number of features resolved the exception for me on CPU.

I dont have this issue on CPU. However CPU is slow - one iteration per 2 minutes on my dataset (a 60-core machine).
GPU is like 30x faster.

@Teeeto Thanks for the information. For GPU version, this bug has a separated issue #2793. Currently, we are focusing on releasing a new version. And fixing this bug is not included in the new release plan. But I can investigate this next week.

Thank You! I think it would be great to prioritize this fix since the issue is prohibitive in certain use cases.

@Teeeto
Copy link

Teeeto commented Mar 29, 2021

Just built on linux - problem still persists, there is a circle of issues referencing each other all closed, not sure which one to reopen.

@Jumabek
Copy link

Jumabek commented Jun 28, 2021

Issue disappered when

pip uninstall lightgbm-3.1.1  
pip install lightgbm-3.2.1

@sharpe5
Copy link

sharpe5 commented Aug 13, 2021

Summary of this entire thread:

  • Warning occurs sporadically when category 0 is the most common feature.
  • Install v3.2.1 to fix (tested on Windows 10 x64, CPU only).

Solution from @Jumabek might work on Linux, but under Windows, pip does not have v3.2.1 yet.

If using Anaconda install LightGBM v3.2.1:

conda install -c conda-forge lightgbm

@StrikerRUS
Copy link
Collaborator

@sharpe5

but under Windows, pip does not have v3.2.1 yet.

Wheel file v3.2.1 for Windows has been on PyPI since April, please double check:
https://pypi.org/project/lightgbm/#files

image

@dpalbrecht
Copy link

dpalbrecht commented Dec 19, 2022

I've been consistently having this issue in Kaggle and Colab notebooks (package v3.3.2) while using the HDFSequence example and a GPU when I increase the number of leaves. Is there a generally accepted fix for this issue?

@ZaydH
Copy link

ZaydH commented Jan 1, 2023

I also observe this issue with LightGBM version 3.3.2 and a GPU.

My setup for what it's worth:

  • OS: Pop!_OS 22.04
  • Python: 3.7.13
  • CUDA: 12.0
  • Nvidia Driver: 525.60.11
  • GPU: Nvidia 3090

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.