Fixed error in plot_histogram when number_to_keep is smaller that the number of keys#7481
Fixed error in plot_histogram when number_to_keep is smaller that the number of keys#7481mergify[bot] merged 26 commits intoQiskit:mainfrom
Conversation
Pull Request Test Coverage Report for Build 2454345820
💛 - Coveralls |
releasenotes/notes/fix_plot_histogram_number-a0a4a023dfad3c70.yaml
Outdated
Show resolved
Hide resolved
releasenotes/notes/fix_plot_histogram_number-a0a4a023dfad3c70.yaml
Outdated
Show resolved
Hide resolved
|
See also implementation in #6698. That one (mine) stalled out b/c lack of review, and failing docs build on CI (master branch) at the time. Your implementation is cleaner, but that one includes a test against a reference image to check the bar alignment. The |
|
Several months ago, something with CI & the docs build wasn't working, and it was making the CI show as failing. I'm fairly sure the issue was also present on the |
|
@drewrisinger Thank you for your comments. Could you check again. I added a test for the multiple datasets issue. |
drewrisinger
left a comment
There was a problem hiding this comment.
Diff LGTM, thanks for the hard work :)
Passes CI tests ✔️
| with BytesIO() as img_buffer_ref: | ||
| figure_ref.savefig(img_buffer_ref, format="png") | ||
| img_buffer_ref.seek(0) |
There was a problem hiding this comment.
Someone from the qiskit team will need to check how saving reference images is supposed to work (is it supposed to be saved to a file?). I didn't figure that out on my previous pass at roughly this issue.
Looks good otherwise, took me a second to figure out what this test is actually doing but looks good now.
There was a problem hiding this comment.
Yeah.. that.
We use snapshot testing for this kind of things (introduced in #4544). However, it seems like test/python/visualization/test_plot_histogram.py is not using that method and fixing that is beyond this PR. Do you think it make sense to have this on hold until that is fixed?
| with BytesIO() as img_buffer: | ||
| figure_truncated.savefig(img_buffer, format="png") | ||
| img_buffer.seek(0) | ||
| self.assertImagesAreEqual(Image.open(img_buffer_ref), Image.open(img_buffer), 0.2) |
There was a problem hiding this comment.
I'm not sure about this 0.2 threshold. I don't have a good reference for how big this should be.
| all_inds = [] | ||
| # if multiple executions, we consider number_to_keep for each execution | ||
| # and this may result in more than number_to_keep slots | ||
| multiple_exec_keys_dict = OrderedDict() |
There was a problem hiding this comment.
You're using an OrderedDict here. I'm not quite sure if it's relevant, but have you checked that this works if the keys are in different orders in different dictionaries?
There was a problem hiding this comment.
@drewrisinger you're right. No need for OrderedDict here.
There was a problem hiding this comment.
OrderedDict is essentially the default in python 3.7+ anyways.
|
@jakelishman Could you review this PR as well? Thank you! |
| all_inds = [] | ||
| # if multiple executions, we consider number_to_keep for each execution | ||
| # and this may result in more than number_to_keep slots | ||
| multiple_exec_keys_dict = defaultdict() |
There was a problem hiding this comment.
I do not fully get why this one needs to be a defaultdict. Isn't the regular dict enough in this context?
| labels_dict[key] = 1 | ||
| values.append(0) |
There was a problem hiding this comment.
this block is the same as True branch of if number_to_keep is None: I think it is equivalent to write if number_to_keep is None or key in multiple_exec_keys_dict: in line 294.
| if HAS_MATPLOTLIB: | ||
| import matplotlib as mpl |
There was a problem hiding this comment.
consider using lazy testers for this (see https://github.com/Qiskit/qiskit-terra/blob/main/CONTRIBUTING.md#checking-for-optionals), introduced in https://github.com/Qiskit/qiskit-terra/pull/7525/files
| with BytesIO() as img_buffer_ref: | ||
| figure_ref.savefig(img_buffer_ref, format="png") | ||
| img_buffer_ref.seek(0) |
There was a problem hiding this comment.
Yeah.. that.
We use snapshot testing for this kind of things (introduced in #4544). However, it seems like test/python/visualization/test_plot_histogram.py is not using that method and fixing that is beyond this PR. Do you think it make sense to have this on hold until that is fixed?
jakelishman
left a comment
There was a problem hiding this comment.
I'm sorry I had disappeared for so long - I've been busy writing my thesis at the same time, and my notifications got all overwhelmed.
I pushed up a couple of commits to fix up some minor things (documentation, release note, etc), but I have a larger problem. I tried just to do some simple two-dataset plots, and it doesn't show the right thing:
from qiskit.visualization import plot_histogram
data = [
{"00": 3, "01": 5, "10": 6, "11": 12},
{"00": 5, "01": 7, "10": 6, "11": 12},
]
plot_histogram(data, number_to_keep=2)produces this plot:

but I'd expect it to show the bars 10 and 11 in the first data set, and 01 and 11 in the second data set.
The handling in _plot_histogram_data is rather confusing to me - it loops through all the data multiple times, and applies most_common twice as well. The multiple_exec_keys_dict is populated only with the value "1", but these are never read (perhaps it should just be a set?), but even then, I don't really understand its purpose. I would really expect that the only processing that needs to be done should happen at the very top of plot_histogram, such as:
def _keep_largest_items(execution, number_to_keep):
"""Keep only the largest values in a dictionary, and sum the rest into a new key 'rest'."""
sorted_counts = sorted(execution.items(), key=lambda p: p[1])
rest = sum(count for key, count in sorted_counts[:-number_to_keep])
return dict(sorted_counts[-number_to_keep:], rest=rest)
def _unify_labels(data):
"""Make all dictionaries in data have the same set of keys, using 0 for missing values."""
data = tuple(data)
all_labels = set().union(*(execution.keys() for execution in data))
base = {label: 0 for label in all_labels}
out = []
for execution in data:
new_execution = base.copy()
new_execution.update(execution)
out.append(new_execution)
return out
def plot_histogram(...):
# ... normal setup ...
if isinstance(data, dict):
data = [data]
if number_to_keep is not None:
data = _unify_labels(_keep_largest_items(execution, number_to_keep) for execution in data)
# continue with normal processingThat's a much more logical approach - number_to_keep modifies the data we should consider when performing all other metrics. In this model, all the additional handling of labels in _plot_histogram_data would be totally unnecessary - it wouldn't need to take number_to_keep as an argument any more, nor return a modified set of labels.
I didn't test that code so it probably needs tweaking, but I think it's logically much simpler, and should avoid the bugs in the present version.
It would be good to have testing against some reference images for this function - then a reviewer would be able to see that the outcomes look correct just by looking at the test cases, without running their own examples (which is what we look for in a test suite).
|
@jakelishman Thank you for the review! |
|
@1ucian0 Please check my implementation of snapshot testing/ |
1ucian0
left a comment
There was a problem hiding this comment.
LGTM! Thanks @iuliazidaru
Any further comment @jakelishman ?
jakelishman
left a comment
There was a problem hiding this comment.
Sorry again...
I'm not 100% convinced that the output for multiple data sets with number_to_keep is non-zero is the least surprising/confusing thing we could do (it seems to suggest that certain keys in data sets are 0 when in reality they're just accounted for in the "rest" bar), but I don't really have a suggestion for how to do it better, and I've let this PR drag on far too long. If somebody has a better way, they can open a new issue.
Thanks for sticking with us, despite my terminal slowness @iuliazidaru.
Fixes #7461
Fixes #6692
Summary
Fix issue #7461 plot_histogram fails when given number_to_keep parameter
Fixes also related issue #6692, as it affects the same code.
Details and comments
When number_to_keep parameter is used and has a value less than the number of keys, the plot didn't work.
This PR considers also the case of displaying multiple result lists.