-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a function to plot tree with a case #4784
Comments
@782365867 Thanks for using LightGBM. Could you please elaborate more on your bad cases? Or could you please provide an example to illustrate this? |
I m using LightGBM for a regression problem.
For example, feature:x1,x2,x3…. Label: y
But one case in the test_data had high mape error, so I wanted to find why;
In the first tree: this is the badcase, the predict result was 123.918, higher than the real value 40;
So I found one feature’s value is too high(cspu_id_mean_code), so I modify this feature’s value as the cspu_id_median_code, and the tree like this
Is had the less error than before, so I substitute the feature cspu_id_median_code for cspu_id_mean_code and train again.
The issue #4784 is a tools to find and mine the reason more easily;
…------------------ 原始邮件 ------------------
发件人: "microsoft/LightGBM" ***@***.***>;
发送时间: 2021年11月10日(星期三) 上午9:42
***@***.***>;
***@***.******@***.***>;
主题: Re: [microsoft/LightGBM] Add a function to plot tree with a case (Issue #4784)
@782365867 Thanks for using LightGBM. Could you please elaborate more on your bad cases? Or could you please provide an example to illustrate this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
Is it convenient to share your model and data so that we can check if there's any but in this case? As for lighting the nodes and path of the bad case, I think it is hard to clearly define what a |
@782365867 I think LightGBM might already have some of the features you need to explore the tree structure in the way you're talking about. Here's a small example in Python that shows how to use this feature and LightGBM's data frame representation to answer the question "given a single observation, figure out the path that observation matches in a specific tree". import lightgbm as lgb
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
reg = lgb.LGBMRegressor()
reg.fit(X, y)
# figure out which leaf node a particular row fell into
case_idx = 123
data_for_case = X[0, :].reshape(1, X.shape[1])
# for each iteration, what lead node did this case fall in?
leaf_preds = reg.predict(data_for_case, pred_leaf=True)
iteration_idx = 10
leaf_node_idx = leaf_preds[0, iteration_idx]
# in the 10th tree, what splits did this observation match?
tree_df = reg.booster_.trees_to_dataframe()
nodes_to_keep = []
node_index_str = f"{iteration_idx}-L{leaf_node_idx}"
while True:
nodes_to_keep.append(node_index_str)
parent_node = tree_df[tree_df["node_index"] == node_index_str]["parent_index"].values[0]
if parent_node is None:
break
else:
node_index_str = parent_node
print(tree_df[
(tree_df["node_index"].isin(nodes_to_keep)) &
(tree_df["tree_index"] == iteration_idx)][["node_depth", "node_index", "split_feature", "threshold", "decision_type", "missing_direction"]]
) This example displays something like the following.
Just noting it in case it might help you right now, since any possible feature coming out of this discussion won't be released for a little while. |
Thank you.It may be useful for my work. I will try it this weekend. ^_^发自我的荣耀手机-------- 原始邮件 --------发件人: James Lamb ***@***.***>日期: 2021年11月12日周五 12:10收件人: microsoft/LightGBM ***@***.***>抄送: Petrus ***@***.***>, Mention ***@***.***>主 题: Re: [microsoft/LightGBM] Add a function to plot tree with a case (Issue #4784)
find and mine the reason more easily;
@782365867 I think LightGBM might already have some of the features you need to explore the tree structure in the way you're talking about. predict(X, pred_leaf=True) will return an array of shape rows_in_X, num_iterations, where a value [i, j] means "the index of the leaf node that observation i falls into in tree j" (all 0-based).
Here's a small example in Python that shows how to use this feature and LightGBM's data frame representation to answer the question "given a single observation, figure out the path that observation matches in a specific tree".
import lightgbm as lgb
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
reg = lgb.LGBMRegressor()
reg.fit(X, y)
# figure out which leaf node a particular row fell into
case_idx = 123
data_for_case = X[0, :].reshape(1, X.shape[1])
# for each iteration, what lead node did this case fall in?
leaf_preds = reg.predict(data_for_case, pred_leaf=True)
iteration_idx = 10
leaf_node_idx = leaf_preds[0, iteration_idx]
# in the 10th tree, what splits did this observation match?
tree_df = reg.booster_.trees_to_dataframe()
nodes_to_keep = []
node_index_str = f"{iteration_idx}-L{leaf_node_idx}"
while True:
nodes_to_keep.append(node_index_str)
parent_node = tree_df[tree_df["node_index"] == node_index_str]["parent_index"].values[0]
if parent_node is None:
break
else:
node_index_str = parent_node
print(tree_df[
(tree_df["node_index"].isin(nodes_to_keep)) &
(tree_df["tree_index"] == iteration_idx)][["node_depth", "node_index", "split_feature", "threshold", "decision_type", "missing_direction"]]
)
This example displays something like the following.
node_depth node_index split_feature threshold decision_type missing_direction
380 1 10-S0 Column_12 5.1550 <= left
381 2 10-S2 Column_7 3.7972 <= left
383 3 10-S5 Column_5 6.9470 <= left
384 4 10-L3 None NaN None None
Just noting it in case it might help you right now, since any possible feature coming out of this discussion won't be released for a little while.
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.Triage notifications on the go with GitHub Mobile for iOS or Android.
|
I think @782365867 wanted something like this: |
Hey nice @jmoralez ! I definitely think that's what @782365867 was looking for. |
@jmoralez do you think the code you wrote to create that plot should be added to |
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM! |
Sure. I have to handle some details because I deleted some of the constraints stuff to make it work. I'll reopen this when I have a working implementation. |
@jameslamb I was trying this out but I'm not sure on how to integrate it with the constraints coloring. Do you think making the edges bold on the path the sample took would be a good alternative? |
@jmoralez WDYT about using blue color for arrows and edges? |
Thanks @StrikerRUS! I'll try that and open a PR. |
…`plot_tree` and `create_tree_digraph` (fixes microsoft#4784) (microsoft#5119) * highlight path in plot_tree * lint * rename x to example_case. support categorical features. add test * lint * check for exactly one row. test empty example_case * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * handle missing values in numeric splits * remove literal. add categorical split function * make categorical feature more important. lint * add enum. update categorical split. apply suggestions * update numeric split decision * lint * Update python-package/lightgbm/plotting.py Co-authored-by: Nikita Titov <[email protected]> Co-authored-by: Nikita Titov <[email protected]>
yes,it is. i had finished this function myself last year.hahahaha |
This issue has been automatically locked since there has not been any recent activity since it was closed. |
Summary
When we use LGBM to predict, we may find some bad case. I wanna a function to plot the tree, and the nodes and path of the case will be lighted(may be colored with “orange”);
Motivation
It’s useful to find the reason of the badcase;
Description
The plot_tree function can be used. Just add one param ‘ test_data’, default None, if the param is None, the function plot the tree. If the param is the instance of DataFrame of np.array, the function can plot the tree and light the nodes and path of the badcase;
References
The text was updated successfully, but these errors were encountered: