Skip to content

Fixed error in plot_histogram when number_to_keep is smaller that the number of keys#7481

Merged
mergify[bot] merged 26 commits intoQiskit:mainfrom
iuliazidaru:issue7461
Jun 7, 2022
Merged

Fixed error in plot_histogram when number_to_keep is smaller that the number of keys#7481
mergify[bot] merged 26 commits intoQiskit:mainfrom
iuliazidaru:issue7461

Conversation

@iuliazidaru
Copy link
Contributor

@iuliazidaru iuliazidaru commented Jan 5, 2022

Fixes #7461
Fixes #6692

Summary

Fix issue #7461 plot_histogram fails when given number_to_keep parameter
Fixes also related issue #6692, as it affects the same code.

Details and comments

When number_to_keep parameter is used and has a value less than the number of keys, the plot didn't work.
This PR considers also the case of displaying multiple result lists.

@iuliazidaru iuliazidaru changed the title plot_histogram fails when given number_to_keep parameter #7461 plot_histogram fails when given number_to_keep parameter Jan 5, 2022
@coveralls
Copy link

coveralls commented Jan 5, 2022

Pull Request Test Coverage Report for Build 2454345820

  • 18 of 20 (90.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 84.415%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/visualization/counts_visualization.py 18 20 90.0%
Totals Coverage Status
Change from base Build 2454202826: 0.01%
Covered Lines: 54669
Relevant Lines: 64762

💛 - Coveralls

Copy link
Contributor

@drewrisinger drewrisinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Fix spelling errors
  • Clarify release note fixed code example
  • Add test to validate results from _plot_histogram_data()

@drewrisinger
Copy link
Contributor

See also implementation in #6698. That one (mine) stalled out b/c lack of review, and failing docs build on CI (master branch) at the time. Your implementation is cleaner, but that one includes a test against a reference image to check the bar alignment.

The plot_histogram tests are also missing skipUnless(HAS_MATPLOTLIB) (see e.g. https://github.com/Qiskit/qiskit-terra/blob/309292946906843f3c535c0a4465de534cf2ac69/test/python/visualization/test_circuit_drawer.py#L50-L57), should be straightforward to add to the pre-existing test as well as yours.

@iuliazidaru
Copy link
Contributor Author

See also implementation in #6698. T
@drewrisinger What was the problem with the build? I think I have the same issue now and I don't know how to fix it...

@drewrisinger
Copy link
Contributor

What was the problem with the build?

Several months ago, something with CI & the docs build wasn't working, and it was making the CI show as failing. I'm fairly sure the issue was also present on the main/master branch (according to the issues at the time, I don't remember which one). I don't remember anything more than that, it's been a while. The CI log should still be present on that PR if you want to look at it for comparison.

@iuliazidaru
Copy link
Contributor Author

@drewrisinger Thank you for your comments. Could you check again. I added a test for the multiple datasets issue.

drewrisinger
drewrisinger previously approved these changes Jan 14, 2022
Copy link
Contributor

@drewrisinger drewrisinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff LGTM, thanks for the hard work :)

Passes CI tests ✔️

Comment on lines +207 to +209
with BytesIO() as img_buffer_ref:
figure_ref.savefig(img_buffer_ref, format="png")
img_buffer_ref.seek(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone from the qiskit team will need to check how saving reference images is supposed to work (is it supposed to be saved to a file?). I didn't figure that out on my previous pass at roughly this issue.

Looks good otherwise, took me a second to figure out what this test is actually doing but looks good now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.. that.

We use snapshot testing for this kind of things (introduced in #4544). However, it seems like test/python/visualization/test_plot_histogram.py is not using that method and fixing that is beyond this PR. Do you think it make sense to have this on hold until that is fixed?

with BytesIO() as img_buffer:
figure_truncated.savefig(img_buffer, format="png")
img_buffer.seek(0)
self.assertImagesAreEqual(Image.open(img_buffer_ref), Image.open(img_buffer), 0.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this 0.2 threshold. I don't have a good reference for how big this should be.

all_inds = []
# if multiple executions, we consider number_to_keep for each execution
# and this may result in more than number_to_keep slots
multiple_exec_keys_dict = OrderedDict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using an OrderedDict here. I'm not quite sure if it's relevant, but have you checked that this works if the keys are in different orders in different dictionaries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewrisinger you're right. No need for OrderedDict here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrderedDict is essentially the default in python 3.7+ anyways.

@iuliazidaru
Copy link
Contributor Author

@jakelishman Could you review this PR as well? Thank you!

@1ucian0 1ucian0 self-assigned this Feb 14, 2022
@1ucian0 1ucian0 changed the title plot_histogram fails when given number_to_keep parameter Fixed error in plot_histogram when number_to_keep is smaller that the number of keys Feb 22, 2022
Copy link
Member

@1ucian0 1ucian0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good in general. Just two potentially blocking comments:

  • The snapshot testing
  • Maybe adding a release note about the fix of #6692 ?

thanks!

all_inds = []
# if multiple executions, we consider number_to_keep for each execution
# and this may result in more than number_to_keep slots
multiple_exec_keys_dict = defaultdict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not fully get why this one needs to be a defaultdict. Isn't the regular dict enough in this context?

Comment on lines 300 to 301
labels_dict[key] = 1
values.append(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block is the same as True branch of if number_to_keep is None: I think it is equivalent to write if number_to_keep is None or key in multiple_exec_keys_dict: in line 294.

Comment on lines 23 to 24
if HAS_MATPLOTLIB:
import matplotlib as mpl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +207 to +209
with BytesIO() as img_buffer_ref:
figure_ref.savefig(img_buffer_ref, format="png")
img_buffer_ref.seek(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.. that.

We use snapshot testing for this kind of things (introduced in #4544). However, it seems like test/python/visualization/test_plot_histogram.py is not using that method and fixing that is beyond this PR. Do you think it make sense to have this on hold until that is fixed?

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I had disappeared for so long - I've been busy writing my thesis at the same time, and my notifications got all overwhelmed.

I pushed up a couple of commits to fix up some minor things (documentation, release note, etc), but I have a larger problem. I tried just to do some simple two-dataset plots, and it doesn't show the right thing:

from qiskit.visualization import plot_histogram
data = [
    {"00": 3, "01": 5, "10": 6, "11": 12},
    {"00": 5, "01": 7, "10": 6, "11": 12},
]
plot_histogram(data, number_to_keep=2)

produces this plot:
Figure_2
but I'd expect it to show the bars 10 and 11 in the first data set, and 01 and 11 in the second data set.

The handling in _plot_histogram_data is rather confusing to me - it loops through all the data multiple times, and applies most_common twice as well. The multiple_exec_keys_dict is populated only with the value "1", but these are never read (perhaps it should just be a set?), but even then, I don't really understand its purpose. I would really expect that the only processing that needs to be done should happen at the very top of plot_histogram, such as:

def _keep_largest_items(execution, number_to_keep):
    """Keep only the largest values in a dictionary, and sum the rest into a new key 'rest'."""
    sorted_counts = sorted(execution.items(), key=lambda p: p[1])
    rest = sum(count for key, count in sorted_counts[:-number_to_keep])
    return dict(sorted_counts[-number_to_keep:], rest=rest)

def _unify_labels(data):
    """Make all dictionaries in data have the same set of keys, using 0 for missing values."""
    data = tuple(data)
    all_labels = set().union(*(execution.keys() for execution in data))
    base = {label: 0 for label in all_labels}
    out = []
    for execution in data:
        new_execution = base.copy()
        new_execution.update(execution)
        out.append(new_execution)
    return out

def plot_histogram(...):
    # ... normal setup ...
    if isinstance(data, dict):
        data = [data]
    if number_to_keep is not None:
        data = _unify_labels(_keep_largest_items(execution, number_to_keep) for execution in data)

    # continue with normal processing

That's a much more logical approach - number_to_keep modifies the data we should consider when performing all other metrics. In this model, all the additional handling of labels in _plot_histogram_data would be totally unnecessary - it wouldn't need to take number_to_keep as an argument any more, nor return a modified set of labels.

I didn't test that code so it probably needs tweaking, but I think it's logically much simpler, and should avoid the bugs in the present version.

It would be good to have testing against some reference images for this function - then a reviewer would be able to see that the outcomes look correct just by looking at the test cases, without running their own examples (which is what we look for in a test suite).

@iuliazidaru
Copy link
Contributor Author

@jakelishman Thank you for the review!
I changed an if yesterday, let me check if that affected the code or there is other issue.

@iuliazidaru
Copy link
Contributor Author

@1ucian0 Please check my implementation of snapshot testing/
@jakelishman Could you review again.
Thank you!

Copy link
Member

@1ucian0 1ucian0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @iuliazidaru

Any further comment @jakelishman ?

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry again...

I'm not 100% convinced that the output for multiple data sets with number_to_keep is non-zero is the least surprising/confusing thing we could do (it seems to suggest that certain keys in data sets are 0 when in reality they're just accounted for in the "rest" bar), but I don't really have a suggestion for how to do it better, and I've let this PR drag on far too long. If somebody has a better way, they can open a new issue.

Thanks for sticking with us, despite my terminal slowness @iuliazidaru.

@jakelishman jakelishman added Changelog: Fixed Add a "Fixed" entry in the GitHub Release changelog. automerge labels Jun 7, 2022
@mergify mergify bot merged commit 1658bb4 into Qiskit:main Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: Fixed Add a "Fixed" entry in the GitHub Release changelog.

Projects

None yet

5 participants