Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load back parameters when loading Booster from lgb.cv() saved to text file? #4883

Closed
acmilannesta opened this issue Dec 12, 2021 · 7 comments
Labels

Comments

@acmilannesta
Copy link

Could I also specify keep_training_booster = True in lgbm.cv function?
When I save returned cv booster from each fold and reload later, the booster.param will return an empty dictionary.

Originally posted by @acmilannesta in #1364 (comment)

@jameslamb
Copy link
Collaborator

Thanks for opening a new issue.

Can you please add more details for what you are trying to do? A reproducible example with an explanation of what you expected would be very helpful.

@shiyu1994
Copy link
Collaborator

@acmilannesta Thanks for using LightGBM. Could you tell us how did you save and reload the boosters? That would be helpful for us.

@acmilannesta
Copy link
Author

@acmilannesta Thanks for using LightGBM. Could you tell us how did you save and reload the boosters? That would be helpful for us.

I basically call the lgbm.cv module and passed the return_cvbooster=True to get a list of oof cvboosters. I saved the model using booster.save_model into .txt file. When I later on try to load the model using model = lgbm.Booster(model_file="xxx") and called the model.params, I got a empty dictionary.

I learned that by passing keep_training_booster = True the argument, it will keep the training parameters. But I didn't find it in the lgbm.cv.

@jameslamb
Copy link
Collaborator

Ok thanks @acmilannesta , I think I understand (although a reproducible example would eliminate the need to guess).

CVBooster.save_model() does not actually work. In the next release, that method will raise a NotImplementedError and the CVBooster object will be pickleable. See the discussion in #3556 (comment).

I think the code below captures the behavior you're talking about.

import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=4_000, n_informative=10)

params = {
    'objective': 'regression_l2',
    'learning_rate': 0.123,
    'min_data_in_leaf': 3,
    'verbose': -1
}
dtrain = lgb.Dataset(data=X, label=y)

cv_results = lgb.cv(
    params=params,
    train_set=dtrain,
    return_cvbooster=True,
    nfold=2,
    stratified=False,
    shuffle=False,
)
cv_booster = cv_results["cvbooster"]

# check parameters of each Booster
for bst in cv_booster.boosters:
    print(bst.params)

# {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}
# {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}

# save one of the booster and reload it
cv_booster.boosters[0].save_model("first-booster.txt")
loaded_booster = lgb.Booster(
    model_file="first-booster.txt"
)
print(loaded_booster.params)
# {}

However, I don't believe the statement "by passing keep_training_booster = True, it will keep the training parameter" is correct.

# train a single model and save it to file
bst = lgb.train(
    params=params,
    train_set=dtrain,
    keep_training_booster=True
)
bst.save_model("lgb-train.txt")

# re-load that model and check params
loaded_booster = lgb.Booster(model_file="lgb-train.txt")
loaded_booster.params
# {}

#2613 documents the feature request "populate Booster.params when loading a model from a text file", but it isn't implemented yet.


Given all that....please subscribe to #2613 and #3556 for notifications about improvements to saving and loading Booster and CVBooster objects.

Until those updates are made, I think you could achieve the behavior you want with one of the following approaches:

  1. Use pickle / joblib / cloudpickle to store Boosters instead of the saving them as text files.
    •  import joblib
       joblib.dump(bst, "lgb-train.pkl")
       loaded_booster = joblib.load("lgb-train.pkl")
       loaded_booster.params
       # {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}
  2. Store params in a JSON file alongside your model, and re-set them when loading from a text file.
    •  import json
      
       # save model and params to text files
       with open("lgb-train-params.json", "w") as f:
           f.write(json.dumps(params))
           bst.save_model("lgb-train.txt")
      
       # load model and paraams from text files
       with open("lgb-train-params.json", "r") as f:
           loaded_params = json.loads(f.read())
           loaded_booster = lgb.Booster(
               model_file="lgb-train.txt",
               params=loaded_params
           )
  3. Use your own code to recover parameters from the model .txt file and set them when creating the Booster.

@jameslamb jameslamb changed the title Could I also specify keep_training_booster = True in lgbm.cv function? Load back parameters when loading Booster from lgb.cv() saved to text file Jan 1, 2022
@jameslamb jameslamb changed the title Load back parameters when loading Booster from lgb.cv() saved to text file Load back parameters when loading Booster from lgb.cv() saved to text file? Jan 1, 2022
@acmilannesta
Copy link
Author

acmilannesta commented Jan 3, 2022

Ok thanks @acmilannesta , I think I understand (although a reproducible example would eliminate the need to guess).

CVBooster.save_model() does not actually work. In the next release, that method will raise a NotImplementedError and the CVBooster object will be pickleable. See the discussion in #3556 (comment).

I think the code below captures the behavior you're talking about.

import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=4_000, n_informative=10)

params = {
    'objective': 'regression_l2',
    'learning_rate': 0.123,
    'min_data_in_leaf': 3,
    'verbose': -1
}
dtrain = lgb.Dataset(data=X, label=y)

cv_results = lgb.cv(
    params=params,
    train_set=dtrain,
    return_cvbooster=True,
    nfold=2,
    stratified=False,
    shuffle=False,
)
cv_booster = cv_results["cvbooster"]

# check parameters of each Booster
for bst in cv_booster.boosters:
    print(bst.params)

# {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}
# {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}

# save one of the booster and reload it
cv_booster.boosters[0].save_model("first-booster.txt")
loaded_booster = lgb.Booster(
    model_file="first-booster.txt"
)
print(loaded_booster.params)
# {}

However, I don't believe the statement "by passing keep_training_booster = True, it will keep the training parameter" is correct.

# train a single model and save it to file
bst = lgb.train(
    params=params,
    train_set=dtrain,
    keep_training_booster=True
)
bst.save_model("lgb-train.txt")

# re-load that model and check params
loaded_booster = lgb.Booster(model_file="lgb-train.txt")
loaded_booster.params
# {}

#2613 documents the feature request "populate Booster.params when loading a model from a text file", but it isn't implemented yet.

Given all that....please subscribe to #2613 and #3556 for notifications about improvements to saving and loading Booster and CVBooster objects.

Until those updates are made, I think you could achieve the behavior you want with one of the following approaches:

  1. Use pickle / joblib / cloudpickle to store Boosters instead of the saving them as text files.

    •  import joblib
       joblib.dump(bst, "lgb-train.pkl")
       loaded_booster = joblib.load("lgb-train.pkl")
       loaded_booster.params
       # {'objective': 'regression_l2', 'learning_rate': 0.123, 'min_data_in_leaf': 3, 'verbose': -1, 'num_iterations': 100}
  2. Store params in a JSON file alongside your model, and re-set them when loading from a text file.

    •  import json
      
       # save model and params to text files
       with open("lgb-train-params.json", "w") as f:
           f.write(json.dumps(params))
           bst.save_model("lgb-train.txt")
      
       # load model and paraams from text files
       with open("lgb-train-params.json", "r") as f:
           loaded_params = json.loads(f.read())
           loaded_booster = lgb.Booster(
               model_file="lgb-train.txt",
               params=loaded_params
           )
  3. Use your own code to recover parameters from the model .txt file and set them when creating the Booster.

Thank you for the detailed explanations! I think the joblib.pickle solution will work for me.

@jameslamb
Copy link
Collaborator

Great! Sorry for the inconvenience. Hopefully #2613 will be resolved in one of the next few releases.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants