Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read and create the booster model from a JSON file #5370

Closed
KhaitovR opened this issue Jul 13, 2022 · 5 comments
Closed

Read and create the booster model from a JSON file #5370

KhaitovR opened this issue Jul 13, 2022 · 5 comments
Labels

Comments

@KhaitovR
Copy link

Hi guys!

I have looked through all the tasks related to saving models in JSON. Unfortunately, I could not figure out whether we can create a booster object from a JSON file?
I would really appreciate it if you could tell me how I can create a booster using json.

I think the size of the JSON model will be smaller than the STRING model.
Perhaps there is a lot of necessary information missing in the dump json model. Maybe that's why json weighs so much less.
I'll be glad if you correct me.

Reproducible example

import lightgbm as lgb
from sklearn.datasets import make_regression
from sys import getsizeof

X, y = make_regression(n_samples=10000, n_features=50, n_informative=20, noise=0.3, random_state=7)

model = lgb.train(
    params={
        'objective':'regression',
        'max_depth':8,
        'num_leaves':int(2**8*0.6),
        'verbose':-1
    },
    train_set=lgb.Dataset(X, y, free_raw_data=True)
)

print('lgb version:', lgb.__version__)
print('train shape (rows, columns)', (X.shape[0], X.shape[1]))
print('JSON-size mb:', format(getsizeof(model.dump_model(importance_type='gain'))/(1024*1024), '.4f'))
print('STRING-size mb:', format(getsizeof(model.model_to_string())/(1024*1024), '.4f'))

# lgb version: 3.3.2
# train shape (rows, columns) (10000, 50)
# JSON-size mb: 0.0006
# STRING-size mb: 0.7999

Additional Comments

Related task: #2604

@dishkakrauch
Copy link

@StrikerRUS could help us out with this issue please?

@jmoralez
Copy link
Collaborator

Hi. I see that the dump model functions were added in #97 as a way to analyze the underlying model. They're currently used in the trees to dataframe functions:

def trees_to_dataframe(self) -> pd_DataFrame:
lgb.model.dt.tree <- function(model, num_iteration = NULL) {

I don't think that format was meant to be an alternative to the original serialization but there's an ongoing discussion in #4887 about using JSON as the default (and in the future only) serialization format. But to answer your question, I don't think it's currently possible to restore a booster from JSON.

@KhaitovR
Copy link
Author

I appreciate your quick response!
I hope that this feature will be added as soon as possible.
Should I close the task?

@jmoralez
Copy link
Collaborator

I think we can close it but feel free to comment on #4887 if you'd like to add anything to the proposal.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants