-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: Add parameter to control maximum group size for Lambdarank #5053
Comments
Thanks for using LightGBM and for this report. This error message comes from the following place in the code: LightGBM/src/metric/dcg_calculator.cpp Lines 134 to 144 in 9a4e706
I found that by running git grep 'exceeds upper limit' Note that the threshold is hard-coded, here. LightGBM/src/metric/dcg_calculator.cpp Line 17 in 9a4e706
Using the git blame, I can see that that limit has been set to 10,000 since the very first commit of LightGBM 6 years ago. I think this might be hard-coded instead of being determined by
@shiyu1994 @guolinke do you think LightGBM should allow increasing this limit via a new parameter? I'm not that familiar with Lambarank, so not sure if (for example), the use of such large query groups in LightGBM should be discouraged. |
@jameslamb , yes it is hardcoded. |
@antaradas94 Thanks for using LightGBM. The limitation if hard coded here LightGBM/src/metric/dcg_calculator.cpp Line 17 in 8e721c5
So you may try to enlarge the number in the source code to meet your need. And then recompile the python package from the source code. Guidelines for compiling python package:
A maximum number of documents per query is limited mainly because the complexity of computation of gradients per query in Lambdarank is Sorry for the late response, if you have any further question. Please feel free to post here. |
@jameslamb I think we can change |
I think we can turn that constant into parameter to not force users re-compile LightGBM. Although, I guess it's quite rare cases when users need to increase the default value. |
Ok, thanks @StrikerRUS and @shiyu1994 . I've changed the title of this issue and added it to #2302. Per this repo's policy on feature requests, I'm going to close this issue for now. @antaradas94 if you are interested in contributing this feature, please comment here and we can answer any questions you have. If not, anyone else reading this is encouraged to comment here if you're interested in contributing this feature. Otherwise, you can change the source code and recompile LightGBM yourself. |
@jameslamb, hi. |
I'm not sure about the relationship between those two configuration values, sorry. @guolinke can you answer this question? I think it is like
I'll re-open this since it's being discussed. |
@octatour not, all documents are used, truncation_level is for the loss calculation. It is used to ensure at least one document in pair (in the pair-wise loss accumulation) is above the truncation_level. |
@guolinke okay, got it. Thank you for explanation. It wasn't clear for me from documentation and paper, that truncation_level controls number of pairs for loss. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, this was locked accidentally. Just unlocked it. We'd still love help with this feature! |
This comment was marked as off-topic.
This comment was marked as off-topic.
@NigamSomya thanks for using LightGBM. Your question seems to be generally "how do I build the Python package from source" and not specific to Lambdarank, so I've created #6437 and hidden your comment here. Let's please discuss over there.
There is not. At the moment, you'll have to change it in the code and recompile LightGBM. |
Hi, would love to contribute if this is still open! |
sure! We'd welcome the contribution. You can ask any questions here, and tag @shiyu1994 and @metpavel |
Description
I have around 12 groups for a dataset of over a ~1 Million rows. And several groups have easily over 10,000 rows.
It would be really helpful if the quota for each query is increased , or maybe we can additionally set it the number of rows we wish.
Im tried to seach if there is a way to increase the upper limit, but havent really come across any. If there exists please do let me know,
Thanks :)
References
The text was updated successfully, but these errors were encountered: