Load back parameters when loading Booster from lgb.cv() saved to text file? #4883

acmilannesta · 2021-12-12T20:28:30Z

Could I also specify keep_training_booster = True in lgbm.cv function?
When I save returned cv booster from each fold and reload later, the booster.param will return an empty dictionary.

Originally posted by @acmilannesta in #1364 (comment)

The text was updated successfully, but these errors were encountered:

jameslamb · 2021-12-28T05:11:36Z

Thanks for opening a new issue.

Can you please add more details for what you are trying to do? A reproducible example with an explanation of what you expected would be very helpful.

shiyu1994 · 2021-12-30T05:25:47Z

@acmilannesta Thanks for using LightGBM. Could you tell us how did you save and reload the boosters? That would be helpful for us.

acmilannesta · 2021-12-30T18:50:02Z

@acmilannesta Thanks for using LightGBM. Could you tell us how did you save and reload the boosters? That would be helpful for us.

I basically call the lgbm.cv module and passed the return_cvbooster=True to get a list of oof cvboosters. I saved the model using booster.save_model into .txt file. When I later on try to load the model using model = lgbm.Booster(model_file="xxx") and called the model.params, I got a empty dictionary.

I learned that by passing keep_training_booster = True the argument, it will keep the training parameters. But I didn't find it in the lgbm.cv.

jameslamb · 2022-01-01T21:40:40Z

Ok thanks @acmilannesta , I think I understand (although a reproducible example would eliminate the need to guess).

CVBooster.save_model() does not actually work. In the next release, that method will raise a NotImplementedError and the CVBooster object will be pickleable. See the discussion in #3556 (comment).

I think the code below captures the behavior you're talking about.

import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=4_000, n_informative=10)

params = {
    'objective': 'regression_l2',
    'learning_rate': 0.123,
    'min_data_in_leaf': 3,
    'verbose': -1
}
dtrain = lgb.Dataset(data=X, label=y)

cv_results = lgb.cv(
    params=params,
    train_set=dtrain,
    return_cvbooster=True,
    nfold=2,
    stratified=False,
    shuffle=False,
)
cv_booster = cv_results["cvbooster"]

# check parameters of each Booster
for bst in cv_booster.boosters:
    print(bst.params)

# {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}
# {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}

# save one of the booster and reload it
cv_booster.boosters[0].save_model("first-booster.txt")
loaded_booster = lgb.Booster(
    model_file="first-booster.txt"
)
print(loaded_booster.params)
# {}

However, I don't believe the statement "by passing keep_training_booster = True, it will keep the training parameter" is correct.

# train a single model and save it to file
bst = lgb.train(
    params=params,
    train_set=dtrain,
    keep_training_booster=True
)
bst.save_model("lgb-train.txt")

# re-load that model and check params
loaded_booster = lgb.Booster(model_file="lgb-train.txt")
loaded_booster.params
# {}

#2613 documents the feature request "populate Booster.params when loading a model from a text file", but it isn't implemented yet.

Given all that....please subscribe to #2613 and #3556 for notifications about improvements to saving and loading Booster and CVBooster objects.

Until those updates are made, I think you could achieve the behavior you want with one of the following approaches:

Use pickle / joblib / cloudpickle to store Boosters instead of the saving them as text files.

 import joblib
 joblib.dump(bst, "lgb-train.pkl")
 loaded_booster = joblib.load("lgb-train.pkl")
 loaded_booster.params
 # {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}

Store params in a JSON file alongside your model, and re-set them when loading from a text file.

 import json

 # save model and params to text files
 with open("lgb-train-params.json", "w") as f:
     f.write(json.dumps(params))
     bst.save_model("lgb-train.txt")

 # load model and paraams from text files
 with open("lgb-train-params.json", "r") as f:
     loaded_params = json.loads(f.read())
     loaded_booster = lgb.Booster(
         model_file="lgb-train.txt",
         params=loaded_params
     )

Use your own code to recover parameters from the model .txt file and set them when creating the Booster.

acmilannesta · 2022-01-03T02:38:06Z

Ok thanks @acmilannesta , I think I understand (although a reproducible example would eliminate the need to guess).

CVBooster.save_model() does not actually work. In the next release, that method will raise a NotImplementedError and the CVBooster object will be pickleable. See the discussion in #3556 (comment).

I think the code below captures the behavior you're talking about.
import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=4_000, n_informative=10)

params = {
    'objective': 'regression_l2',
    'learning_rate': 0.123,
    'min_data_in_leaf': 3,
    'verbose': -1
}
dtrain = lgb.Dataset(data=X, label=y)

cv_results = lgb.cv(
    params=params,
    train_set=dtrain,
    return_cvbooster=True,
    nfold=2,
    stratified=False,
    shuffle=False,
)
cv_booster = cv_results["cvbooster"]

# check parameters of each Booster
for bst in cv_booster.boosters:
    print(bst.params)

# {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}
# {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}

# save one of the booster and reload it
cv_booster.boosters[0].save_model("first-booster.txt")
loaded_booster = lgb.Booster(
    model_file="first-booster.txt"
)
print(loaded_booster.params)
# {}
However, I don't believe the statement "by passing keep_training_booster = True, it will keep the training parameter" is correct.
# train a single model and save it to file
bst = lgb.train(
    params=params,
    train_set=dtrain,
    keep_training_booster=True
)
bst.save_model("lgb-train.txt")

# re-load that model and check params
loaded_booster = lgb.Booster(model_file="lgb-train.txt")
loaded_booster.params
# {}
#2613 documents the feature request "populate Booster.params when loading a model from a text file", but it isn't implemented yet.

Given all that....please subscribe to #2613 and #3556 for notifications about improvements to saving and loading Booster and CVBooster objects.

Until those updates are made, I think you could achieve the behavior you want with one of the following approaches:
Use pickle / joblib / cloudpickle to store Boosters instead of the saving them as text files.
 import joblib
 joblib.dump(bst, "lgb-train.pkl")
 loaded_booster = joblib.load("lgb-train.pkl")
 loaded_booster.params
 # {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}
Store params in a JSON file alongside your model, and re-set them when loading from a text file.
 import json

 # save model and params to text files
 with open("lgb-train-params.json", "w") as f:
     f.write(json.dumps(params))
     bst.save_model("lgb-train.txt")

 # load model and paraams from text files
 with open("lgb-train-params.json", "r") as f:
     loaded_params = json.loads(f.read())
     loaded_booster = lgb.Booster(
         model_file="lgb-train.txt",
         params=loaded_params
     )
Use your own code to recover parameters from the model .txt file and set them when creating the Booster.

Thank you for the detailed explanations! I think the joblib.pickle solution will work for me.

jameslamb · 2022-01-03T03:43:56Z

Great! Sorry for the inconvenience. Hopefully #2613 will be resolved in one of the next few releases.

github-actions · 2023-08-16T00:18:53Z

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

jameslamb added the question label Dec 28, 2021

jameslamb mentioned this issue Dec 28, 2021

[Python] Can we add the ability to save model parameters to a Booster object? #1364

Closed

jameslamb added the awaiting response label Dec 29, 2021

no-response bot removed the awaiting response label Dec 30, 2021

jameslamb added the awaiting response label Jan 1, 2022

jameslamb changed the title ~~Could I also specify keep_training_booster = True in lgbm.cv function?~~ Load back parameters when loading Booster from lgb.cv() saved to text file Jan 1, 2022

jameslamb changed the title ~~Load back parameters when loading Booster from lgb.cv() saved to text file~~ Load back parameters when loading Booster from lgb.cv() saved to text file? Jan 1, 2022

no-response bot removed the awaiting response label Jan 3, 2022

jameslamb closed this as completed Jan 3, 2022

github-actions bot locked as resolved and limited conversation to collaborators Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load back parameters when loading Booster from lgb.cv() saved to text file? #4883

Load back parameters when loading Booster from lgb.cv() saved to text file? #4883

acmilannesta commented Dec 12, 2021

jameslamb commented Dec 28, 2021

shiyu1994 commented Dec 30, 2021

acmilannesta commented Dec 30, 2021

jameslamb commented Jan 1, 2022

acmilannesta commented Jan 3, 2022 •

edited

Loading

jameslamb commented Jan 3, 2022

github-actions bot commented Aug 16, 2023

Load back parameters when loading Booster from lgb.cv() saved to text file? #4883

Load back parameters when loading Booster from lgb.cv() saved to text file? #4883

Comments

acmilannesta commented Dec 12, 2021

jameslamb commented Dec 28, 2021

shiyu1994 commented Dec 30, 2021

acmilannesta commented Dec 30, 2021

jameslamb commented Jan 1, 2022

acmilannesta commented Jan 3, 2022 • edited Loading

jameslamb commented Jan 3, 2022

github-actions bot commented Aug 16, 2023

acmilannesta commented Jan 3, 2022 •

edited

Loading