-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add approx_contrib option for feature contributions #4219
Comments
@gravesee Thanks a lot for the feature request! Just to clarify, are you aware of that you can get feature importance with the help of the following function (and corresponding Booster methods in language wrappers)? LightGBM/include/LightGBM/c_api.h Lines 1233 to 1246 in d517ba1
|
@StrikerRUS thanks for the response. It looks like the code you linked to returns the feature importance gain in the model for each feature overall. What I am requesting in this issue is the approximate feature contribution for all features for each input record. The output would be a matrix where the number of rows matches the number of inputs and the number of columns matches the number of features + 1 (with one extra column for the model bias). |
@gravesee Ah, I see! Thanks for clarifying! |
Closed in favor of being in #2302. We decided to keep all feature requests in one place. Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature. |
xgboost has two methods for calculating feature contributions: TreeSHAP and Approximate. The approximate is much faster and has some nice properties w.r.t. monotonicity that aren't guaranteed when using TreeSHAP. It is also easier to port the approximate method to production systems when the lightgbm object can't be used directly.
Motivation
TreeSHAP is the gold standard but there are practical reasons for preferring a fast, simple method of calculating feature contributions. The approach outlined below can still be used for interpretation, is easier to port to other production systems, and is quicker to calculate.
Description
The approximate method of feature contribution first distributes the leaf weights up through the internal nodes of the tree. The parent weight is equal to the cover-weighted sum of the left and right child weights. If lightgbm already calculates internal leaf weights then this becomes even simpler to implement.
After the weights are distributed up through all nodes of the tree, then the feature contribution for a split is calculated by subtracting the parent node weight from the child node weights and aggregating across all features and trees in the ensemble.
References
http://blog.datadive.net/interpreting-random-forests/
The text was updated successfully, but these errors were encountered: