Large LightGBM causes javac error "Code too Large" #103

chris-smith-zocdoc · 2019-10-01T04:28:33Z

When generating code for a large number of trees, the generated code exceeds the 64KB limit in java.

A single method in a Java class may be at most 64KB of bytecode.

One solution is to add subfunctions https://github.com/BayesWitnesses/m2cgen/blob/master/m2cgen/assemblers/boosting.py#L43-L48 instead of having the body of every tree inside subroutine0. The amount of code that will fit inside each function is dependent on its depth + width so we might require some heuristic or tunable parameter. In my case, I ended up with 10 trees per subfunction

I'm not sure if there are similar limits in other languages

The text was updated successfully, but these errors were encountered:

izeigerman · 2019-10-01T15:58:04Z

Hey @chris-smith-zocdoc!

This is rather weird because we're aware of this Java limitations and this is why we came up with subroutines in the first place. I remember testing this implementation with as many as 500 - 1000 estimators with XGBoost, LightGBM and Random Forest without any problem.

Can you please provide some steps to reproduce this issue locally? Ideally using some public dataset or the one available in the scikit-learn package?

chris-smith-zocdoc · 2019-10-01T16:15:30Z

I think the issue is that the trees are not in separate subroutines

To reproduce

import lightgbm as lgb
import m2cgen as m2c
import numpy as np


N = 10000
np.random.seed(seed=7)
data = np.random.random(size=(N, 200))
target = np.random.random(size=(N, 1))


estimator = lgb.LGBMRegressor(n_estimators=100, random_state=1, max_depth=64, num_leaves=100)

estimator.fit(data, target)

res = m2c.export_to_java(estimator)


with open('Model.java', 'w') as f:
    f.write(res)

Then

javac Model.java

izeigerman · 2019-10-03T15:09:03Z

Ah, I see. So the individual tree is pretty large. Ok, we may want to consider to wrap individual estimators in their own subroutines based on some threshold values for max_depth and num_leaves.

Fixes BayesWitnesses#103

Fixes #103

cugurm · 2020-08-26T10:52:02Z

Hi, the same situation happens with large (sklearns) decision trees.

Do you have a solution for that (is it planned if it doesn't exist?)

izeigerman · 2020-08-26T22:16:13Z

Hey @MilanCugur ! Please take a look at the following discussion - #297. The workaround mentioned there may potentially help with your issue.

izeigerman added the help wanted Extra attention is needed label Oct 3, 2019

chris-smith-zocdoc added a commit to Zocdoc/m2cgen that referenced this issue Oct 14, 2019

Limit the number of leaves in each subroutine for gradient boosted trees

b19cf34

Fixes BayesWitnesses#103

chris-smith-zocdoc added a commit to Zocdoc/m2cgen that referenced this issue Oct 14, 2019

Limit the number of leaves in each subroutine for gradient boosted trees

70ed7d6

Fixes BayesWitnesses#103

chris-smith-zocdoc mentioned this issue Oct 14, 2019

Limit the number of leaves in each subroutine for gradient boosted trees #105

Closed

chris-smith-zocdoc added a commit to Zocdoc/m2cgen that referenced this issue Oct 22, 2019

Limit the number of leaves in each subroutine for gradient boosted trees

b14c4a8

Fixes BayesWitnesses#103

izeigerman pushed a commit that referenced this issue Nov 30, 2019

Limit the number of leaves in each subroutine for gradient boosted trees

02388fa

Fixes #103

izeigerman pushed a commit that referenced this issue Nov 30, 2019

Limit the number of leaves in each subroutine for gradient boosted trees

74841d5

Fixes #103

izeigerman closed this as completed in 65c07eb Nov 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large LightGBM causes javac error "Code too Large" #103

Large LightGBM causes javac error "Code too Large" #103

chris-smith-zocdoc commented Oct 1, 2019

izeigerman commented Oct 1, 2019

chris-smith-zocdoc commented Oct 1, 2019

izeigerman commented Oct 3, 2019

cugurm commented Aug 26, 2020

izeigerman commented Aug 26, 2020

Large LightGBM causes javac error "Code too Large" #103

Large LightGBM causes javac error "Code too Large" #103

Comments

chris-smith-zocdoc commented Oct 1, 2019

izeigerman commented Oct 1, 2019

chris-smith-zocdoc commented Oct 1, 2019

izeigerman commented Oct 3, 2019

cugurm commented Aug 26, 2020

izeigerman commented Aug 26, 2020