-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plot tree with high cardinality feature #5687
Comments
Hi @moziada, thanks for using LightGBM. I think a possible solution would be to check the number of categories in the split and if there are too many we should collapse them and show them in the tooltip instead, which would look like this: diff --git a/python-package/lightgbm/plotting.py b/python-package/lightgbm/plotting.py
index d2a57209..87c287c0 100644
--- a/python-package/lightgbm/plotting.py
+++ b/python-package/lightgbm/plotting.py
@@ -493,7 +493,13 @@ def _to_graphviz(
direction = _determine_direction_for_numeric_split(
example_case[split_feature], root['threshold'], root['missing_type'], root['default_left']
)
- label += f"<B>{_float2str(root['threshold'], precision)}</B>"
+ if is_categorical_split and len(root['threshold'].split('||')) > 5:
+ threshold = '...'
+ tooltip = root['threshold']
+ else:
+ threshold = root['threshold']
+ tooltip = None
+ label += f"<B>{_float2str(threshold, precision)}</B>"
for info in ['split_gain', 'internal_value', 'internal_weight', "internal_count", "data_percentage"]:
if info in show_info:
output = info.split('_')[-1]
@@ -525,7 +531,8 @@ def _to_graphviz(
if "data_percentage" in show_info:
label += f"<br/>{_float2str(root['leaf_count'] / total_count * 100, 2)}% of data"
label = f"<{label}>"
- graph.node(name, label=label, shape=shape, style=style, fillcolor=fillcolor, color=color, penwidth=penwidth)
+ tooltip = None
+ graph.node(name, label=label, shape=shape, style=style, fillcolor=fillcolor, color=color, penwidth=penwidth, tooltip=tooltip)
if parent is not None:
graph.edge(parent, name, decision, color=color, penwidth=penwidth) |
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM! |
I shouldn't have set the awaiting response for this, it's a valid issue and I think should be fixed. I'm reopening. |
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM! |
Hey @jameslamb sorry I did not receive your response earlier, I will look into it and try to make a contribution. |
After searching I have found that this is a common issue with JupyterLab and people have opened issues on it (like this issuse). Also I have tested the tooltip on Jupyter Notebook it works fine no workarounds are needed |
@jmoralez sorry I forgot to mention you with my results |
I think this is a good feature. We can maybe add an argument with the maximum number of thresholds to show and in the description of that argument specify that the user may need to use the |
Sure I am working on a PR |
I am having difficulties on testing the changes I have made, is there any systematic way to make changes and install the new version to the current python environment? |
Since you're only touching Python code, compile the library just one time by running the following from the root of the repo. rm -r ./build
mkdir ./build
cd ./build
cmake ..
make -j2 Then every time you change the Python package, run this from the root of the repo. cd ./python-package
python setup.py install --precompile Then every time you want to run the tests, point pytest tests/python_package_test/test_plotting.py |
I have made the changes but I noticed that all |
That page is automatically generared from the docstrings in |
You can also render the docs locally following this. |
Thanks I have managed to render it correctly and it seems fine to me, I have opened pull request here #5818 |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
When I try to plot a tree that splits on a feature with high cardinality the overall diagram gets missed up no matter how I tried to adjust the width and height
There must be a way to ignore writing all the values that are used on the split within the nodes
The text was updated successfully, but these errors were encountered: