-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] dont format string values with precision #1465
Conversation
Sure, here is a minimal example: import pandas as pd
import numpy as np
import lightgbm as lgb
np.random.seed(0)
x = pd.concat([
pd.Series(np.random.choice(['a', 'b'], size=(100)), dtype='category'),
pd.Series(np.random.randint(0, 2, size=100)),
], axis=1)
y = np.random.randint(0, 2, size=100)
ds = lgb.Dataset(x, y)
params = {'min_child_samples': 2}
gbm = lgb.train(params, ds)
lgb.create_tree_digraph(gbm, precision=2) Not sure of the best way to modify the test:, the cheapest thing I could do is to convert all 30 columns to categorical. This guarantees that the test only passes with the modification. But I feel like it's not good for the test overall.
If I only convert a couple of columns to categorical, the model will just try to ignore those and split on the other columns. |
Thanks for the example! The error is caused by the scientific notation, e.g. I think that
Now it works correctly with scientific notation and notifies user that |
I thought the error is caused by categorical values? The split threshold would be on a set of categories ie: |
LightGBM cannot handle strings at all (see #789). |
Im not talking about the categorical values or codes of the dataset itself. I'm talking about the threshold that will be shown in the plot. It's a string if the split is based on set membership. |
Threshold cannot be string as well. |
Idk man, when I use categorical data, |
Please provide the example and the resulted graph with strings you're talking about. |
Oh, I see. I thought you're having the threshold value like this one |
python-package/lightgbm/plotting.py
Outdated
@@ -268,7 +268,8 @@ def _to_graphviz(tree_info, show_info, feature_names, precision=None, | |||
raise ImportError('You must install graphviz to plot tree.') | |||
|
|||
def float2str(value, precision=None): | |||
return "{0:.{1}f}".format(value, precision) if precision is not None else str(value) | |||
return "{0:.{1}f}".format(value, precision) \ | |||
if (precision is not None) and (not isinstance(value, str)) else str(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the brackets and use string_type
from compat.py
module instead of str
in isinstance()
.
OK, thank you! No need to do this. |
Please remove these brackets too and I'll merge.
|
I think the travis error is unrelated? |
Thank you very much for your contribution and patience!
Yeah, I've restart the test. |
I was running into this:
ValueError: Unknown format code 'f' for object of type 'str'
for string values when setting precision in
create_tree_digraph