-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Boosting in multiple iterations #403
Comments
Hi @mtl-tony -- This is a very good discussion to have, and to be honest it's something that I have been thinking about for a while without coming to a firm viewpoint yet. Everything you are proposing makes sense, and the main questions IMHO are about how to present this as a coherent interface. I'll start by describing a few building blocks that I've mostly concluded we should have. First off, I think your earlier PR If we agree that we need an init_score parameter that accepts raw scores, then we might as well give it additional functionality and allow it to also accept a previously built model, or a predict function. Our measure_interactions function does this already for similar purposes:
and then:
Another critical question is how we specify which features or interactions get boosted on within each phase. I have a few ideas on this which I'll post in the other thread that you're on ( #225 ) The last question, which I believe to be the hardest, is how we end up combining the models. A couple of ideas:
My lean at this point would be to enable this scenario with a combination of init_score and merge_ebms. The plus here is that these functions already exist for other reasons, so re-using them adds no additional complexity to our API. The downside is that this isn't the most integrated approach in the way that a pipeline-like solution would be. We'd need to write some clear examples and documentation to highlight these capabilities at minimum. There's another related aspect worth mentioning here: We try to intelligently guess the feature types, and also where to cut continuous features into separate bins. merge_ebms handles the scenario where the features were binned differently, but it's less than ideal when the bin cuts do not match up since the bins in the final model need to be a super-set of the models being merged. The number of cut points therefore increases with each merge operation when they do not match up. Ideally, if multiple EBMs are meant to be merged, they would all be constructed with the same feature preprocessor definitions. This will happen by default if you pass in the exact same data each time, so in the above scenario it shouldn't be a problem. Having some way of harmonizing the preprocessor definitions is something we'll need though to support other complex merging scenarios. You can already mostly do this through the feature_types parameter, but it's currently clunky and more complex than I'd like. Perhaps allowing something like "feature_types=other_ebm" might work. It's another area that needs more investigation. I'm really interested in getting feedback on any of the above ideas, or suggestions for alternate approaches. |
Hello Paul, Sorry about closing and reopening I misclicked. Thanks so much for taking the time to give such a detailed response and it's really cool seeing that your team seem to be working on some of these ideas already. I'm curious to what the best way to implement this concept of opening up the boosting to users is as I agree it's tough to make it fit in a nice way where users can play around with it. I would need to think more as well but I feel a pipeline method would work, I'm just unsure if it's the cleanest way as you mentioned. I'll try some stuff on my side and share with you if I have a prototype since as you said most of the functions are coded so it's just about figuring out the structure. Regarding the preprocessing portion mentioned, I believe that the method you described of taking a different EBM as an input to allow for the same buckets would be a great feature "feature_types=other_ebm". For people that might need to update models, it might be ideal for them to not have the buckets change every time as to allow them to better compare new updated models on newer data. I guess 2 follow up question to this discussion.
|
Hi @mtl-tony -- Yes, I agree that "feature_types=other_ebm" would be a nice feature to have in the package. It's also important in the context of federated learning, which was the original impetus for merge_ebms. In the federated learning scenario we would also prefer to have synchronized bins across the federated models. I think one possible solution there is to create an initial "shell" EBM by specifying zero boosting rounds on some representative dataset. This shell EBM would act sort of like a scikit-learn preprocessor, but with support for interactions, which isn't natural in the scikit-learn model. You could then distribute the shell EBM to the federated locations, and they would each train EBMs with a common binning definition via feature_types=shell_ebm. The shell EBM could also be editable with the suite of model editing tools that we have yet to write. Another option would be to design a new class for preprocessing that is interaction aware, but even if we do that, I think supporting feature_types=other_ebm would still be a nice to have. It would be great to brainstorm more on the pipeline solution as it's the least threshed out idea, and having a prototype would be the best way to explore it. We don't call merge_ebms for the outer bags, but we do share some internal processing (see this function which is called by both:
For 3rd order interactions, the memory issues are solvable since we really just need to keep a list of the top N interactions, and we can do that efficiently with a heap. A naive implementation would still need to examine all possible triples which grows at a cubic rate, so even after fixing the memory issues there are still CPU considerations. Probably with some simple heuristics though we can prune the search space aggressively. For example, we could only allow triples from the set of features which formed pairs. There are lots of variations on how you might do this though. Lastly, the public interface needs a bit of thought. Today we have one parameter for main binning (max_bins=256) and pairs (max_interaction_bins=32). If we wanted triples, we'd probably want to change this to something like max_bins=256, max_pair_bins=32, max_higher_bins=8, or something similar. We'd also need a way to specify how many interactions to generate. Currently we have the interactions=10 parameter which only makes pairs, so we'd have to break that into pairs=10 and higher_interactions=5, or something like that. We also currently overload the interactions parameter to specify specific pairs/triples/etc with interactions=[(0,1), (1,2,11), ...], and that all gets a bit messier if we have multiple parameters for interactions. What if someone specifies pairs=[(1,2,3)] for example? Sure, we can throw an error if a triple is passed as a pair, but it just makes everything a bit more confusing to new users who probably just want to use defaults to start with anyways. Given the unknowns, and the less than elegant interface changes (unless someone figures out something better), my default thinking is that maybe we should enable triples/quads/etc via the more complex and flexible interface that we're discussing here and see how people use it. If everyone raves about it, then consider making triples a more default experience. One more aspect to mention: Today the measure_interactions function does not work with 3rd order interactions, but upgrading it to do so is relatively easy (in C++ though). That'll be important toward letting people experiment with GA3Ms. |
Hello, I've been thinking about this more today and I think the pipeline method would be nice. I think the advantage of the pipeline method is that you can give more freedom to your users but at the same time still include your core classes of ExplainableBoostingClassifier and ExplainableBoostingRegressor who just want a standard GA2M in which these classes simply build a predefined pipeline for you. I imagine the format would be somewhere along these lines of: And you would have the default pipelines that you currently have by simply calling ExplainableBoostingRegressor or ExplainableBoostingClassifier . Do you see any issues with framing the problem in this way? I feel torch nn and sklearn pipeline frameworks could be borrowed to make this work, but I wanted your opinion if I'm forgetting anything important in the building of an EBM model. I assume each boosting class or whatever it could be named would also include the preprocessing/binning portion. Also regarding the C++ I'm no expert but I'm curious if you could send me where the Fast Implementation is and where it would need to be added for 3rd level interactions? Even if not to include in the model, I still think having a 3rd order interaction detection algorithm would be useful for general data analysis. Thanks again, |
Hi @mtl-tony -- Makes sense to me, and I think that format is something that people in the community are comfortable with and probably expect. I think the equivalent in a procedural format would be something like:
With the rule that if the parameter "merge" is used in the fit function then the mains are excluded, and things like init_score and the bin definitions are taken from the given model. The obvious drawback of the procedural format is it's more verbose and requires repetitively passing in X, y, and the model from the previous stage. On the positive side, it's probably easier to introspect interior state like the pairs variable in a notebook environment, and also probably easier to insert more customized steps between the stages (think something weird like accessing a database or web API). Perhaps the right approach is to have a pipeline like system as syntactic sugar over something like the above? The C++ code that does interaction detection is here:
The code above needs to be changed to loop over an arbitrary number of dimensions up to k_cDimensionsMax. Handling tensors in C++ is a lot more work than in python. For an example, here's another function that loops over the tensor dimensions:
|
Hey @paulbkoch Thanks for sharing the procedural format! I think if we have something like that it would fit most needs while still allowing users to 'look into' the model easily. Once those functions are ready I think it would be good and yeah the pipeline could come later as it would be easy to have once the procedural portion is there. Main focus would be adding that merge parameter you mentioned so I'll start working on that portion as it's sort of one of the main keys to having the flexibility of fitting. Regarding the interaction detection algorithm I took a look and I think my C++ skills are not adequate yet so I'll maybe come back to that problem once I've had some more experience there. Considering you've answered all my questions regarding this topic I'll close the issue and do a PR once I have something concrete regarding the fit + merge. Thanks so much for your insights. Tony |
Hi @mtl-tony -- That would be great! My recommendation here would be to try breaking up that bigger feature into smaller PR chunks. I would try to independently develop and test the "feature_types=other_ebm" ability first since that is going to be a necessary component for any "merge" parameter. Once that and the init_score are both in, then "merge" just becomes an act of expressing both together, and also excluding the mains, which we already have. |
Another possibility would be to use scikit-learn's warm_start methadology. It might look something like:
|
I'm currently attempting to fit the EBM in multiple boosting iterations for a specific problem I have. I'd compare this to how first all main features are fit first then based off the residuals we fit the interaction terms as shown below in your code ebm.py.
Fitting main Features:
interpret/python/interpret-core/interpret/glassbox/ebm/ebm.py
Lines 529 to 552 in 133a893
Fitting Interactions:
interpret/python/interpret-core/interpret/glassbox/ebm/ebm.py
Lines 665 to 688 in 133a893
For my problem I need to fit my main features in 2 iterations. I'm planning to first fit a subset of my feature X in a first boosting iteration which we can call model 1. Then would be to fit a different subset of features using the residuals from model 1 (leveraging the init_score available in the cyclic_gradient_boost function). I'm doing this by simply modifying the ebm.py code to allow me to make this adjustment.
I have 2 questions pertaining to this.
_fit_main()
and_fit_interactions()
. For standard problems users could still call.fit()
which would work the same way it does now and within the.fit()
it would use the_fit_main()
and_fit_interactions()
functions. For more specific problems you could call them yourself in a specific order like the example below.Step 1)
._fit_main()
for a first subset of featuresStep 2)
._fit_main()
for the rest of the featuresStep 3)
._fit_interactions()
for your pairwise interactionsStep 4)
._fit_interactions()
for your 3rd order + 4th order interactions that are available now as of 0.3I appreciate any feedback you have :)
The text was updated successfully, but these errors were encountered: