-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random assignment for a subset of variables in predict. #5177
Comments
Call Path For PredictionI have organized most of the call paths for LightGBM's prediction functions to figure out how to implement this feature. There may be some trivial mistakes because I'm new to LightGBM's source code, but these shouldn't be problems:smirk: Thoughts on implementationIt seems to me that this feature is about adding one or more input parameters in Apis in the External Interface Layer and passing them down through a call path to control prediction behavior in Model Layer.
As to control prediction behavior in Model Layer, I think there is a choice here:
Consider that we may have some similar feature requests in the future, choice 2 is a more scalable option. But choice 2 is a obviously more complex modification that possibly causes some potential code defects. In contrast, choice 1 is more easy to understand and less couple with other code designs. Although stacking more similar functions is not a elegant code design, but also not a problem. So I think choice 1 is a better option. It seems this issue has no one working on it. I'd like to work on it and already have an preliminary scheme. May I have this issue assigned to me and get some suggestions for my preliminary scheme? @jameslamb |
Hi @suncanghuai , thank you very much for the detailed plan and careful thoughts! The scheme looks good to me, I think option 1 is better for now; option 2 could be left in the future if there are indeed more similar functions. Please go ahead, and ping us if you have any further questions. |
* introduce a struct of parameters PredictionControlParameter * add a new field random_assign_features to class Config * add new GetLeaf functions that apply random assign mechanism to class Tree
* introduce a struct of parameters PredictionControlParameter * add a new field random_assign_features to class Config * add new GetLeaf functions that apply random assign mechanism to class Tree
* Introduce a struct of parameters PredictionControlParameter * Add a new field random_assign_features to class Config * Add new GetLeaf functions that apply random assign mechanism to class Tree
* Introduce a struct of parameters PredictionControlParameter * Add a new field random_assign_features to class Config * Add new GetLeaf functions that apply random assign mechanism to class Tree
* Introduce a struct of parameters PredictionControlParameter * Add a new field random_assign_features to class Config * Add new GetLeaf functions that apply random assign mechanism to class Tree
* Introduce a struct of parameters PredictionControlParameter * Add a new field random_assign_features to class Config * Add new GetLeaf functions that apply random assign mechanism to class Tree
* Introduce a struct of parameters PredictionControlParameter * Add a new field random_assign_features to class Config * Add new GetLeaf functions that apply random assign mechanism to class Tree
* Introduce a struct of parameters PredictionControlParameter * Add a new field random_assign_features to class Config * Add new GetLeaf functions that apply random assign mechanism to class Tree
* Introduce a struct of parameters PredictionControlParameter * Add a new field random_assign_features to class Config * Add new GetLeaf functions that apply random assign mechanism to class Tree
* Introduce a struct of parameters PredictionControlParameter * Add a new field random_assign_features to class Config * Add new GetLeaf functions that apply random assign mechanism to class Tree
* Introduce a struct of parameters PredictionControlParameter * Add a new field random_assign_features to class Config * Add new GetLeaf functions that apply random assign mechanism to class Tree
* Introduce a struct of parameters PredictionControlParameter * Add a new field random_assign_features to class Config * Add new GetLeaf functions that apply random assign mechanism to class Tree
Summary
Add random assignment for a subset of variables in predict.
Motivation
Random assignment for a variable V in a tree is a way to drop the effect of that variable in the model. The procedure is assign randomly in every node that uses V for spliting and in the rest do as usual.
Random assignment of a subset of variables W = {V1, V2..Vi} is similar. The assignment in nodes which splitting use any variable in W is random and the rest as usual.
This is usefull for computing honest variable importance over a new test set as the difference between the prediction error with and without perturbation.
This method can be used for interaction computation of two subsets of variables V and W, defined as the difference between paired importance (importance when both V and W are assigned randomly) vs added importance (sum of importance when each of W or V is assigned randomly).
References
The method is referenced in:
Hemant Ishwaran. "Variable importance in binary regression trees and forests." Electron. J. Statist. 1 519 - 537, 2007. https://doi.org/10.1214/07-EJS039
The text was updated successfully, but these errors were encountered: