-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: histogram(nbins=N) off by one error #9687
Comments
Yeah, looks like a bug! Great example, really makes the problem obvious :) |
/take |
Took a quick look at the code, it'll always put the maximum value of the column in an extra bin based on the current design I'm not sure about the Also, the |
Yes, I agree it would be lovely to have that functionality to specify the lower and upper range of the bins, along with a few other useful features like non-uniform bin-widths and weights like in |
## Description of changes ``` import ibis ibis.options.interactive = True ibis.options.repr.interactive.max_rows = 20 t = ibis.range(1000).unnest().name("index").as_table() t.select(hist=t["index"].histogram(nbins=10)).value_counts() ``` ``` ┏━━━━━━━┳━━━━━━━━━━━━┓ ┃ hist ┃ hist_count ┃ ┡━━━━━━━╇━━━━━━━━━━━━┩ │ int64 │ int64 │ ├───────┼────────────┤ │ 5 │ 100 │ │ 9 │ 100 │ │ 0 │ 100 │ │ 3 │ 100 │ │ 6 │ 100 │ │ 2 │ 100 │ │ 7 │ 100 │ │ 8 │ 100 │ │ 1 │ 100 │ │ 4 │ 100 │ └───────┴────────────┘ ``` ## Issues closed * Resolves #9687. I had to make a slight change to ``histogram`` to account for an edge case that was tested for Impala. It would fail if ``nbins`` was not passed, which is a rather niche use case because ``np.histogram`` for example requires the number of bins to be passed either explicitly or implicitly. I also found a slight quirk with the current design when fixing this because if a ``base`` is passed that is not the minimum value, it would assign those out-of-bound values smaller than the base a negative bin index. It now clips those out-of-bound values to the bin index of -1 to group them together, rather than potentially having bin indices of -1 and -2 for example, so this now aligns with how ``np.histogram`` assigns a bin index of 0 for out-of-bound values.
What happened?
there is one value in bin 10 and 99 in bin 9
What version of ibis are you using?
main
What backend(s) are you using, if any?
duckdb
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: