Add approx_contrib option for feature contributions #4219

gravesee · 2021-04-23T01:30:08Z

xgboost has two methods for calculating feature contributions: TreeSHAP and Approximate. The approximate is much faster and has some nice properties w.r.t. monotonicity that aren't guaranteed when using TreeSHAP. It is also easier to port the approximate method to production systems when the lightgbm object can't be used directly.

Motivation

TreeSHAP is the gold standard but there are practical reasons for preferring a fast, simple method of calculating feature contributions. The approach outlined below can still be used for interpretation, is easier to port to other production systems, and is quicker to calculate.

Description

The approximate method of feature contribution first distributes the leaf weights up through the internal nodes of the tree. The parent weight is equal to the cover-weighted sum of the left and right child weights. If lightgbm already calculates internal leaf weights then this becomes even simpler to implement.

After the weights are distributed up through all nodes of the tree, then the feature contribution for a split is calculated by subtracting the parent node weight from the child node weights and aggregating across all features and trees in the ensemble.

References

http://blog.datadive.net/interpreting-random-forests/

StrikerRUS · 2021-04-28T12:54:50Z

@gravesee Thanks a lot for the feature request!

Just to clarify, are you aware of that you can get feature importance with the help of the following function (and corresponding Booster methods in language wrappers)?

LightGBM/include/LightGBM/c_api.h

Lines 1233 to 1246 in d517ba1

    
           /*! 
        
            * \brief Get model feature importance. 
        
            * \param handle Handle of booster 
        
            * \param num_iteration Number of iterations for which feature importance is calculated, <= 0 means use all 
        
            * \param importance_type Method of importance calculation: 
        
            *   - ``C_API_FEATURE_IMPORTANCE_SPLIT``: result contains numbers of times the feature is used in a model; 
        
            *   - ``C_API_FEATURE_IMPORTANCE_GAIN``: result contains total gains of splits which use the feature 
        
            * \param[out] out_results Result array with feature importance 
        
            * \return 0 when succeed, -1 when failure happens 
        
            */ 
        
           LIGHTGBM_C_EXPORT int LGBM_BoosterFeatureImportance(BoosterHandle handle, 
        
                                                               int num_iteration, 
        
                                                               int importance_type, 
        
                                                               double* out_results);

gravesee · 2021-04-28T20:31:20Z

@StrikerRUS thanks for the response. It looks like the code you linked to returns the feature importance gain in the model for each feature overall. What I am requesting in this issue is the approximate feature contribution for all features for each input record. The output would be a matrix where the number of rows matches the number of inputs and the number of columns matches the number of features + 1 (with one extra column for the model bias).

StrikerRUS · 2021-04-29T21:42:05Z

@gravesee Ah, I see! Thanks for clarifying!

StrikerRUS · 2021-04-29T21:43:41Z

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

StrikerRUS added feature request help wanted labels Apr 29, 2021

StrikerRUS mentioned this issue Apr 29, 2021

Feature Requests & Voting Hub #2302

Open

StrikerRUS closed this as completed Apr 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add approx_contrib option for feature contributions #4219

Add approx_contrib option for feature contributions #4219

gravesee commented Apr 23, 2021

StrikerRUS commented Apr 28, 2021

gravesee commented Apr 28, 2021

StrikerRUS commented Apr 29, 2021

StrikerRUS commented Apr 29, 2021

Add approx_contrib option for feature contributions #4219

Add approx_contrib option for feature contributions #4219

Comments

gravesee commented Apr 23, 2021

Motivation

Description

References

StrikerRUS commented Apr 28, 2021

gravesee commented Apr 28, 2021

StrikerRUS commented Apr 29, 2021

StrikerRUS commented Apr 29, 2021