-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large LightGBM causes javac error "Code too Large" #103
Comments
Hey @chris-smith-zocdoc! This is rather weird because we're aware of this Java limitations and this is why we came up with subroutines in the first place. I remember testing this implementation with as many as 500 - 1000 estimators with XGBoost, LightGBM and Random Forest without any problem. Can you please provide some steps to reproduce this issue locally? Ideally using some public dataset or the one available in the scikit-learn package? |
Ah, I see. So the individual tree is pretty large. Ok, we may want to consider to wrap individual estimators in their own subroutines based on some threshold values for |
Hi, the same situation happens with large (sklearns) decision trees. Do you have a solution for that (is it planned if it doesn't exist?) |
Hey @MilanCugur ! Please take a look at the following discussion - #297. The workaround mentioned there may potentially help with your issue. |
When generating code for a large number of trees, the generated code exceeds the 64KB limit in java.
From Stackoverflow
One solution is to add subfunctions https://github.com/BayesWitnesses/m2cgen/blob/master/m2cgen/assemblers/boosting.py#L43-L48 instead of having the body of every tree inside
subroutine0
. The amount of code that will fit inside each function is dependent on its depth + width so we might require some heuristic or tunable parameter. In my case, I ended up with 10 trees per subfunctionI'm not sure if there are similar limits in other languages
The text was updated successfully, but these errors were encountered: