Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function to plot tree with a case #4784

Closed
782365867 opened this issue Nov 9, 2021 · 15 comments · Fixed by #5119
Closed

Add a function to plot tree with a case #4784

782365867 opened this issue Nov 9, 2021 · 15 comments · Fixed by #5119

Comments

@782365867
Copy link

Summary

When we use LGBM to predict, we may find some bad case. I wanna a function to plot the tree, and the nodes and path of the case will be lighted(may be colored with “orange”);

Motivation

It’s useful to find the reason of the badcase;

Description

The plot_tree function can be used. Just add one param ‘ test_data’, default None, if the param is None, the function plot the tree. If the param is the instance of DataFrame of np.array, the function can plot the tree and light the nodes and path of the badcase;

References

@shiyu1994
Copy link
Collaborator

@782365867 Thanks for using LightGBM. Could you please elaborate more on your bad cases? Or could you please provide an example to illustrate this?

@782365867
Copy link
Author

782365867 commented Nov 11, 2021 via email

@shiyu1994
Copy link
Collaborator

@782365867

In the first tree: this is the badcase, the predict result was 123.918, higher than the real value 40;

Is it convenient to share your model and data so that we can check if there's any but in this case?

As for lighting the nodes and path of the bad case, I think it is hard to clearly define what a bad case is.

@jameslamb
Copy link
Collaborator

find and mine the reason more easily;

@782365867 I think LightGBM might already have some of the features you need to explore the tree structure in the way you're talking about. predict(X, pred_leaf=True) will return an array of shape rows_in_X, num_iterations, where a value [i, j] means "the index of the leaf node that observation i falls into in tree j" (all 0-based).

Here's a small example in Python that shows how to use this feature and LightGBM's data frame representation to answer the question "given a single observation, figure out the path that observation matches in a specific tree".

import lightgbm as lgb
from sklearn.datasets import load_boston

X, y = load_boston(return_X_y=True)
reg = lgb.LGBMRegressor()
reg.fit(X, y)

# figure out which leaf node a particular row fell into
case_idx = 123
data_for_case = X[0, :].reshape(1, X.shape[1])

# for each iteration, what lead node did this case fall in?
leaf_preds = reg.predict(data_for_case, pred_leaf=True)

iteration_idx = 10
leaf_node_idx = leaf_preds[0, iteration_idx]

# in the 10th tree, what splits did this observation match?
tree_df = reg.booster_.trees_to_dataframe()
nodes_to_keep = []

node_index_str = f"{iteration_idx}-L{leaf_node_idx}"
while True:
    nodes_to_keep.append(node_index_str)
    parent_node = tree_df[tree_df["node_index"] == node_index_str]["parent_index"].values[0]
    if parent_node is None:
        break
    else:
        node_index_str = parent_node

print(tree_df[
    (tree_df["node_index"].isin(nodes_to_keep)) &
    (tree_df["tree_index"] == iteration_idx)][["node_depth", "node_index", "split_feature", "threshold", "decision_type", "missing_direction"]]
)

This example displays something like the following.

     node_depth node_index split_feature  threshold decision_type missing_direction
380           1      10-S0     Column_12     5.1550            <=   left
381           2      10-S2      Column_7     3.7972            <=   left
383           3      10-S5      Column_5     6.9470            <=   left
384           4      10-L3          None        NaN          None   None

Just noting it in case it might help you right now, since any possible feature coming out of this discussion won't be released for a little while.

@782365867
Copy link
Author

782365867 commented Nov 12, 2021 via email

@jmoralez
Copy link
Collaborator

I think @782365867 wanted something like this:
image
where x is a sample and the path that sample takes through the tree is highlighted. Is that correct @782365867?

@jameslamb
Copy link
Collaborator

Hey nice @jmoralez ! I definitely think that's what @782365867 was looking for.

@jameslamb
Copy link
Collaborator

@jmoralez do you think the code you wrote to create that plot should be added to lightgbm? Seems like it was really useful for model inspection like in #3835 (comment)

@no-response
Copy link

no-response bot commented Jan 4, 2022

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@no-response no-response bot closed this as completed Jan 4, 2022
@jmoralez
Copy link
Collaborator

jmoralez commented Jan 4, 2022

@jmoralez do you think the code you wrote to create that plot should be added to lightgbm?

Sure. I have to handle some details because I deleted some of the constraints stuff to make it work. I'll reopen this when I have a working implementation.

@jmoralez
Copy link
Collaborator

jmoralez commented Apr 1, 2022

@jameslamb I was trying this out but I'm not sure on how to integrate it with the constraints coloring. Do you think making the edges bold on the path the sample took would be a good alternative?

image

@StrikerRUS
Copy link
Collaborator

@jmoralez WDYT about using blue color for arrows and edges?

image

@jmoralez
Copy link
Collaborator

jmoralez commented Apr 1, 2022

Thanks @StrikerRUS! I'll try that and open a PR.

@jmoralez jmoralez reopened this Apr 1, 2022
rserran pushed a commit to rserran/LightGBM that referenced this issue Aug 10, 2022
…`plot_tree` and `create_tree_digraph` (fixes microsoft#4784) (microsoft#5119)

* highlight path in plot_tree

* lint

* rename x to example_case. support categorical features. add test

* lint

* check for exactly one row. test empty example_case

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* handle missing values in numeric splits

* remove literal. add categorical split function

* make categorical feature more important. lint

* add enum. update categorical split. apply suggestions

* update numeric split decision

* lint

* Update python-package/lightgbm/plotting.py

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: Nikita Titov <[email protected]>
@782365867
Copy link
Author

I think @782365867 wanted something like this: image where x is a sample and the path that sample takes through the tree is highlighted. Is that correct @782365867?

yes,it is.

i had finished this function myself last year.hahahaha

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants