Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about transfer learning development #407

Closed
JDE65 opened this issue Jan 30, 2023 · 9 comments
Closed

Question about transfer learning development #407

JDE65 opened this issue Jan 30, 2023 · 9 comments

Comments

@JDE65
Copy link

JDE65 commented Jan 30, 2023

The EBM will first look for minimizing the error without interactions (step 1), than with 1st order interactions (step 2), then with higher order interactions (step 3).
Is it possible to replace the higher order interaction process in step 3 (or to add a step 4?) with a new added feature that would benefit from the learned parameters from steps 1 to 2 (or 3), so that we would realise a transfer learning ?
If so, do you have any view on how to implement this?

@paulbkoch
Copy link
Collaborator

Hi @JDE65 -- I'm a little unclear on what you're asking for here. Are you trying to construct an EBM with 3-way and possibly 4-way terms? And to do that you would like to use the list of pairs when deciding which 3-way interactions to consider because the list of possible 3 and 4 way interactions is very large?

@JDE65
Copy link
Author

JDE65 commented Jan 30, 2023

Hi @paulbkoch
Thank you for your reaction. And sorry for being unclear.
I would like to see how I could use the current new release 0.3.0 to build up a transfer learning mechanism.
Let's assume I have trained EBM on 20 features with a large dataset, with no interactions or 1st order interactions. I would like to train the models on a new smaller dataset with (let say) 22 features : 20 similar to the large dataset and 2 new features that were unavailable with the original raining set.
If I understand the model correctly, it is first trained on the features, than on the 1st order interactions, than on higher order interactions.
I wonder whether I could extend the process and continue the training on the 2 new features rather than (or after) on high order interactions and what I would need to do to achieve this ?

@paulbkoch
Copy link
Collaborator

paulbkoch commented Jan 30, 2023

Ok, I see now what you are trying to do @JDE65. The short answer is that this is theoretically possible to do with the EBM model class, but to do it currently you would need to modify our code and write some of your own utility functions to handle the post-process merging.

I'll just first point out that you might, in some settings, want to continue shaping the original 20 features when training on the 22 features later on, but you've indicated that you only want to shape the 2 new features. That's good because it's simpler.

The first thing you would need is to be able to continue training from a previous model. If you have a regression problem you can do this easily by calculating the residual error using the original EBM that you constructed in the first step and subtracting the predicted value from the actual. For classification this doesn't work though, and you would need something akin to the init_score parameter in LightGBM. We have a PR that adds this functionality actually (#371), but it still needs some work before merging, so this functionality is not part of our package yet.

Using one of the methods above for either regression or classification, you would create a new EBM that would have only 2 features. At this point you would have one EBM with 20 features and another EBM with 2 features. Next you need to combine these. Since the models are additive, this works mathematically, but you need to modify all our existing attributes to accept 22 features. Mostly this involves appending the attributes of the two new features to the existing attributes in the EBM with 20 features.

We do have a function called merge_ebms that you might want to look at (see example: https://github.com/interpretml/interpret/blob/develop/examples/python/notebooks/Merging%20EBM%20Models.ipynb). It won't do what you are looking for since it is designed to merge two EBMs that have the same number of features, but you may find some inspiration there.

One additional quirk about EBMs that I'll point out. A legal EBM can consist of only 2 attributes (ebm.bins_ and ebm.term_scores_), so you can first try getting it to work with just those two attributes and delete the rest. Of course, nice things like visualizations are better if you have histograms and all the rest of the other attributes.

@JDE65
Copy link
Author

JDE65 commented Jan 31, 2023

Thanks a lot. Answer is "crystal clear".
I had noticed the merge_ebms function, indeed.
I'll investigate deeper the PR #371 and get inspiration from the init_score of LGBM.

Now a dumb question that relates to the handling of missing values.
Let's assume I have a dataset about 2 different sets of data. The large data set (let's say with US consumers information), I have 20 features, whereas I have 22 features, the 20 features similar to the large datset + 2 additionals. Would it make sense to train a 1st model EBM_US with 22 features, were 2 features would just be filled with NA:

EBM = ExplainableBoostingClassifier(**EBM_HYPERPARAMETERS)
EBM_US = EBM.fit(X_US, y_US)

I would then use this trained model EBM_US with Canadian dataset that has 22 features, and do
EBM_CA = EBM_US.fit(X_CA, y_CA)
I would somehow achieve a partial and simplified transfer learning, wouldn't I ?

@paulbkoch
Copy link
Collaborator

Hi @JDE65 -- That won't do what you expect. In scikit-learn, calling fit a second time will overwrite the previous model, so you'll just get a trained model from the data X_CA & y_CA.

But also, for EBMs we learn from missing values. If all the values for those two extra features are missing then there won't be any signal in those features however, so you should get a corresponding ebm.term_scores_[term_index] that is filled with zeros. Doing it like this might help you though in avoiding some work to later munge the two models together though, so I think you have a good idea here to reduce some of the work.

@JDE65
Copy link
Author

JDE65 commented Jan 31, 2023

Thanks a lot for this.
In scikit-learn, there are 'partial_fit' and 'warm_start' that are available for many models. This isn't available for EBM, as I understand. Is it considered for future development?

@paulbkoch
Copy link
Collaborator

Hi @JDE65 -- Stagewise fitting is indeed something we've discussed. There's a longer discussion regarding it in issue
#403 and also issue #304, and it looks like one of our users might create a PR for this 👍

I was talking with Rich Caruana the other day and he suggested a much easier way to approach your original problem. Instead of modifying our code to stagewise fit and then edit models to add new univariate features afterwards, you can get very close to the final solution by making a "partly useless pair". Imagine you made a 23rd feature that was always false. Since it's always false, if you made a pair with that feature one axis will always be useless and it won't learn anything from it. Your "pair" is now effectively a main. So now, you just need to do something like this:

ebm=ExplainableBoostingClassifier(mains=[0,1,2,...19], interactions=[(20,22), (21,22)])

And since our framework is setup by default to do stagewise fitting of pairs, it'll do what you want for less hassle.

@JDE65
Copy link
Author

JDE65 commented Feb 10, 2023

Thanks so much, I love the pragmatism of the approach you debated with Rich Caruana.
I'll test it.

Of course, I will also investigate the #403 and #304 and am waiting for a PR to come...👍👍👍

@paulbkoch
Copy link
Collaborator

Closing this issue for now since we're tracking the sub-topics discussed in other threads already. Please re-open if you want to add more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants