Support init score #32

gugatr0n1c · 2016-11-13T09:33:29Z

There is possibility to give init score (as array) in LightGBM in form additioonal file (train.txt.init).

Can you support this as well? As input to fit() function?

It is very suitable for regression task where init in form of zeros is not good and better choice is mean of target.

thx

ArdalanM · 2016-11-13T12:54:46Z

Just updated master branch. Here is an example.

Would you mind creating a concrete example showing the advantage of using 'init_scores' ?

Thanks

ArdalanM · 2016-11-13T13:03:51Z

Here is a draft but does not produce the desired effect (i.e init_score does not make the model converge faster given the same number of iterations)

import numpy as np
from sklearn import datasets, metrics, model_selection
from pylightgbm.models import GBMRegressor

# Parameters
seed = 1337
path_to_exec = "~/Documents/apps/LightGBM/lightgbm"
offset = 1e4

np.random.seed(seed) # for reproducibility
X, y = datasets.make_regression(n_samples=1000, random_state=seed)

# shifting distribution by a huge margin to see if `init_scores` help for convergence
y += offset
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=seed)

params = {'exec_path': path_to_exec,
          'num_iterations': 10, 'learning_rate': 0.1,
          'num_leaves': 10, 'is_training_metric': True,
          'min_data_in_leaf': 10, 'is_unbalance': False,
          'early_stopping_round': 10, 'verbose': False}
clf = GBMRegressor(**params)

clf.fit(x_train, y_train,
        test_data=[(x_test, y_test)])
y_pred = clf.predict(x_test)
print("MSE: {}, best round: {}".format(metrics.mean_squared_error(y_test, y_pred), clf.best_round))

clf.fit(x_train, y_train,
        test_data=[(x_test, y_test)],
        init_scores=offset * np.ones(len(x_train)))
y_pred = clf.predict(x_test)
print("MSE: {}, best round: {}".format(metrics.mean_squared_error(y_test, y_pred), clf.best_round))

Any thoughts ?

gugatr0n1c · 2016-11-13T13:47:27Z

I am solving regression task where np.average(target_learn) = 15

And I have matrix [500k x 400], best setting for me so far is 7500 iterations with learning_rate = 0.002. In such case (very small learning_rate) this option can save about 1500 iterations (20% boost) (in xgboost version).

I will test this and let you know. Thx, I appreciate your work!

gugatr0n1c · 2016-11-13T15:14:04Z

hmm I believe there is some bug (not sure here or on lightGBM)

in your example I need to do:

y_pred = clf.predict(x_test) + offset * np.ones(len(x_test))

to get correct test prediction...

also log shows wrong test error if init_score is presented..

gugatr0n1c · 2016-11-13T18:49:32Z

thinking about this: y_pred = clf.predict(x_test) + offset * np.ones(len(x_test)) it is probably correct behavior for general purpose usage (mainly for continiously learning from another model)

so only fixing printing val_error can be done...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support init score #32

Support init score #32

gugatr0n1c commented Nov 13, 2016

ArdalanM commented Nov 13, 2016

ArdalanM commented Nov 13, 2016

gugatr0n1c commented Nov 13, 2016

gugatr0n1c commented Nov 13, 2016

gugatr0n1c commented Nov 13, 2016

Support init score #32

Support init score #32

Comments

gugatr0n1c commented Nov 13, 2016

ArdalanM commented Nov 13, 2016

ArdalanM commented Nov 13, 2016

gugatr0n1c commented Nov 13, 2016

gugatr0n1c commented Nov 13, 2016

gugatr0n1c commented Nov 13, 2016