Read and create the booster model from a JSON file #5370

KhaitovR · 2022-07-13T09:34:16Z

Hi guys!

I have looked through all the tasks related to saving models in JSON. Unfortunately, I could not figure out whether we can create a booster object from a JSON file?
I would really appreciate it if you could tell me how I can create a booster using json.

I think the size of the JSON model will be smaller than the STRING model.
Perhaps there is a lot of necessary information missing in the dump json model. Maybe that's why json weighs so much less.
I'll be glad if you correct me.

Reproducible example

import lightgbm as lgb
from sklearn.datasets import make_regression
from sys import getsizeof

X, y = make_regression(n_samples=10000, n_features=50, n_informative=20, noise=0.3, random_state=7)

model = lgb.train(
    params={
        'objective':'regression',
        'max_depth':8,
        'num_leaves':int(2**8*0.6),
        'verbose':-1
    },
    train_set=lgb.Dataset(X, y, free_raw_data=True)
)

print('lgb version:', lgb.__version__)
print('train shape (rows, columns)', (X.shape[0], X.shape[1]))
print('JSON-size mb:', format(getsizeof(model.dump_model(importance_type='gain'))/(1024*1024), '.4f'))
print('STRING-size mb:', format(getsizeof(model.model_to_string())/(1024*1024), '.4f'))

# lgb version: 3.3.2
# train shape (rows, columns) (10000, 50)
# JSON-size mb: 0.0006
# STRING-size mb: 0.7999

Additional Comments

Related task: #2604

dishkakrauch · 2022-07-13T11:26:31Z

@StrikerRUS could help us out with this issue please?

jmoralez · 2022-07-14T18:55:23Z

Hi. I see that the dump model functions were added in #97 as a way to analyze the underlying model. They're currently used in the trees to dataframe functions:

LightGBM/python-package/lightgbm/basic.py

Line 2820 in 44fe591

def trees_to_dataframe(self) -> pd_DataFrame:

LightGBM/R-package/R/lgb.model.dt.tree.R

Line 51 in 44fe591

lgb.model.dt.tree <- function(model, num_iteration = NULL) {

I don't think that format was meant to be an alternative to the original serialization but there's an ongoing discussion in #4887 about using JSON as the default (and in the future only) serialization format. But to answer your question, I don't think it's currently possible to restore a booster from JSON.

KhaitovR · 2022-07-15T07:37:39Z

I appreciate your quick response!
I hope that this feature will be added as soon as possible.
Should I close the task?

jmoralez · 2022-07-21T02:12:56Z

I think we can close it but feel free to comment on #4887 if you'd like to add anything to the proposal.

github-actions · 2023-08-19T03:32:56Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added the question label Jul 13, 2022

jmoralez added the awaiting response label Jul 14, 2022

github-actions bot removed the awaiting response label Jul 15, 2022

jmoralez closed this as completed Jul 21, 2022

github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read and create the booster model from a JSON file #5370

Read and create the booster model from a JSON file #5370

KhaitovR commented Jul 13, 2022

dishkakrauch commented Jul 13, 2022

jmoralez commented Jul 14, 2022

KhaitovR commented Jul 15, 2022

jmoralez commented Jul 21, 2022

github-actions bot commented Aug 19, 2023