-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support init score #32
Comments
Just updated master branch. Here is an example. Would you mind creating a concrete example showing the advantage of using 'init_scores' ? Thanks |
Here is a draft but does not produce the desired effect (i.e import numpy as np
from sklearn import datasets, metrics, model_selection
from pylightgbm.models import GBMRegressor
# Parameters
seed = 1337
path_to_exec = "~/Documents/apps/LightGBM/lightgbm"
offset = 1e4
np.random.seed(seed) # for reproducibility
X, y = datasets.make_regression(n_samples=1000, random_state=seed)
# shifting distribution by a huge margin to see if `init_scores` help for convergence
y += offset
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=seed)
params = {'exec_path': path_to_exec,
'num_iterations': 10, 'learning_rate': 0.1,
'num_leaves': 10, 'is_training_metric': True,
'min_data_in_leaf': 10, 'is_unbalance': False,
'early_stopping_round': 10, 'verbose': False}
clf = GBMRegressor(**params)
clf.fit(x_train, y_train,
test_data=[(x_test, y_test)])
y_pred = clf.predict(x_test)
print("MSE: {}, best round: {}".format(metrics.mean_squared_error(y_test, y_pred), clf.best_round))
clf.fit(x_train, y_train,
test_data=[(x_test, y_test)],
init_scores=offset * np.ones(len(x_train)))
y_pred = clf.predict(x_test)
print("MSE: {}, best round: {}".format(metrics.mean_squared_error(y_test, y_pred), clf.best_round)) Any thoughts ? |
I am solving regression task where np.average(target_learn) = 15 And I have matrix [500k x 400], best setting for me so far is 7500 iterations with learning_rate = 0.002. In such case (very small learning_rate) this option can save about 1500 iterations (20% boost) (in xgboost version). I will test this and let you know. Thx, I appreciate your work! |
hmm I believe there is some bug (not sure here or on lightGBM) in your example I need to do: y_pred = clf.predict(x_test) + offset * np.ones(len(x_test)) to get correct test prediction... also log shows wrong test error if init_score is presented.. |
thinking about this: y_pred = clf.predict(x_test) + offset * np.ones(len(x_test)) it is probably correct behavior for general purpose usage (mainly for continiously learning from another model) so only fixing printing val_error can be done... |
There is possibility to give init score (as array) in LightGBM in form additioonal file (train.txt.init).
Can you support this as well? As input to fit() function?
It is very suitable for regression task where init in form of zeros is not good and better choice is mean of target.
thx
The text was updated successfully, but these errors were encountered: