-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHAP feature contribution for linear trees #4002
Comments
@SpeckledJim2 feature contributions is not yet implemented for linear trees. I'll make a PR to update the docs to mention this. Pull requests are welcome if anyone would like to work on implementing SHAP (i.e. predicting feature contributions) for linear trees. Regarding your other comment, I agree it's somewhat confusing to have both the constant leaf values and the linear coefficients in the output. But on the other hand, it might be worth keeping both since the constant values automatically get calculated even for linear trees (so it is no extra work to calculate them), and it gives us the option to recover the basic constant-value tree from the output. But I don't have a very strong view either way on this, comments from others are welcome. |
Thanks for the reply, re tree output table, I agree that leaving the constant values in there makes sense as you get them "for free" anyway. The extra thing to include I think would be the coefficients of the linear model for each leaf if possible, but it might not be something that has widespread use so I don't have a strong opinion on it either. Re feature contributions for linear models, I am happy to help test anything developed, but my skills are limited to R at the moment - but if there is a way to help there, do let me know. |
@SpeckledJim2 the coefficients of the linear model are already available in the output of |
Closed in favor of being in #2302. We decided to keep all feature requests in one place. Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature. |
Description
When using linear_tree = TRUE and predict() with predcontrib = TRUE, the sum of the feature contributions does not equal the predicted value.
Reproducible example
library(lightgbm)
x <- matrix(data = sample(rnorm(100L), size = 100L), ncol = 1L)
y <- 2L * x + runif(nrow(x), 0L, 0.1)
lgb_params_1 <- list(
objective = "regression"
, linear_tree = FALSE
, verbose = -1L
, metric = "mse"
, seed = 0L
, num_leaves = 2L
, bagging_freq = 1L
, subsample = 1.0
)
dtrain <- lgb.Dataset(data = x, label = y)
bst_lin_1 <- lgb.train(data = dtrain, nrounds = 10L, params = lgb_params_1, valids = list("train" = dtrain))
lgb_params_2 <- lgb_params_1
lgb_params_2$linear_tree <- TRUE # this is the only parameter that has changed
dtrain <- lgb.Dataset(data = x, label = y)
bst_lin_2 <- lgb.train(data = dtrain, nrounds = 10L, params = lgb_params_2, valids = list("train" = dtrain))
pred_1 <- predict(bst_lin_1, x, predcontrib = FALSE) # predict on model 1
pred_contrib_1 <- rowSums(predict(bst_lin_1, x, predcontrib = TRUE)) # predict on model 1 with feature contribs
diff_1 <- pred_1 - pred_contrib_1
sd(diff_1) # very close to zero as expected
pred_2 <- predict(bst_lin_2, x, predcontrib = FALSE) # predict on model 2
pred_contrib_2 <- rowSums(predict(bst_lin_2, x, predcontrib = TRUE)) # predict on model 2 with feature contribs
diff_2 <- pred_2 - pred_contrib_2
sd(diff_2) # not zero - rowSums do not total to predicted values
Environment info
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
LightGBM installed from source (following R installation instructions), version 3.1.1.99
Additional Comments
This is possibly the same as issue #3998
I think that the lgb.model.dt.tree function might need some work as well as you might not want to show the constant leaf values in the output if linear_tree = TRUE.
Many thanks
The text was updated successfully, but these errors were encountered: