-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Layout and MDC Store Creation #448
Conversation
The creation of COPA layouts relies on a number of specialized circuit structures which require non-trivial time to construct. In the context of iterative GST estimation with nested circuit lists (i.e. the default) this results in unnecessarily repeat construction of these objects. This is an initial implementation of a caching scheme allowing for more efficient re-use of these circuit structures across iterations.
Cache the expanded SPAM-free circuits to reduce recomputing things unnecessarily.
This updates the implementation of the SeparatePOVMCircuit containter class. The most important change is adding an attribute for the full_effect_labels that avoids uneeded reconstruction. To add protection then, to ensure that this is kept in sync with everything else, the povm_label and effect_labels attributes (which feed into full_effect_labels) have been promoted to properties with setters that ensure the full_effect_labels are kept synced.
Adds a new method to OpModel that allows for doing instrument expansion and povm expansion in bulk, speeding things up be avoiding recomputation of shared quantities. Also adds a pipeline for re-using completed or split circuits (as produced by the related OpModel methods) for more efficient re-use of done work.
Some minor performance oriented tweaks to the init for COPA layouts.
Refactor some of the ordered dictionaries in matrix layout creation into regular ones.
Start adding infrastructure for caching things used in MDC store creation and for plumbing in stuff from layout creation.
Performance optimization for the method for adding omitted frequencies to incorporate caching of the number of outcomes per circuit (which is somewhat expensive since it goes through the instrument/povm expansion code). Additionally refactor some other parts of this code for improved efficiency. Also makes a few minor tweaks to the method for adding counts to speed that up as well. Can probably make this a bit faster still by merging the two calls to reduce redundancy, but that is a future us problem. Additionally make a few microoptimizations to the dataset code for grabbing counts, and to slicetools adding a function for directly giving a numpy array for a slice (instead of needing to cast from a list). Miscellaneous cleanup of old commented out code that doesn't appear needed any longer.
Fix a bug I introduced in dataset indexing into something that could be None.
Another minor bug caught by testing.
Not sure why this didn't get caught on the circuit update branch, but oh well...
Fixes minor error in split_circuits.
Improve the performance of __getitem__ when indexing into static circuits by making use of the _copy_init code path.
Implement caching of circuit structures tailored to the map forward simulator's requirements.
This finishes the process of refactoring expand_instruments_and_separate_povm from a circuit method to a method of OpModel.
Refactor expand_instruments_and_separate_povm to use the multi-circuit version under the hood to reduce code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this branch came off of the feature-faster-circuit-primitives branch. That's making Git think this PR contains all the material from PR #445. I'll hold off on a full review of this PR until (1) #445 is merged and (2) this PR is updated, as needed, in order for Git to see the minimal set of changes.
That said, I do have three actionable comments. Two comments are given in-line. Here's a broader comment.
Right now the changes introduce free-functions like create_matrix_copa_layout_circuit_cache
and create_map_copa_layout_circuit_cache
. I'd prefer that these were static class functions, and that the name class signifier (_matrix_
, _map__
, etc..) be dropped from the function names. From there, you could invoke the function just with mld.sim.create_copa_layout_circuit_cache(...)
.
If you'd like you could make create_copa_layout_circuit_cache
a method in the ForwardSimulator base class. If that method were to raise a NotImplementedError by default then you could replace some branching if-statements with
try:
precomp_layour_circuit_cache = mdl.sim.create_copa_layout_circuit_cache(...)
except NotImplementedError:
precomp_layour_circuit_cache = None
Refactor cache creation functions into static methods of the corresponding forward simulator class. Also add an empty base version of this method, and clean up a few miscellaneous things caught by review.
Thanks for the feedback and suggestions, @rileyjmurray. I have incorporated your feedback and refactored the cache creation methods into static methods of the corresponding forward simulator classes. I also addressed the two inline comments, so I have marked those as resolved. |
@@ -78,24 +82,29 @@ def add_expanded_circuits(indices, add_to_this_dict): | |||
for i in indices: | |||
nospam_c = unique_nospam_circuits[i] | |||
for unique_i in circuits_by_unique_nospam_circuits[nospam_c]: # "unique" circuits: add SPAM to nospam_c | |||
observed_outcomes = None if (dataset is None) else dataset[ds_circuits[unique_i]].unique_outcomes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for folks, but particularly aimed at @enielse. A comparable line to this one for computing the quantity named observed_outcomes
appears here, but in the map layout code this queries the outcomes
property of DataSetRow
instead of the unique_outcomes
property as is done here. I think this should be unique_outcomes
in both cases (I don't see a reason why it wouldn't also be for map), but I wanted to run this by you before I made that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason why map cannot also use unique_outcomes
instead of outcomes
, but am basing that just on looking through this code while reviewing. Expert @enielse opinion still welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall, but the current changes have broken unit tests for some forward simulators. I don't know why all of the failures are happening. One of the failures is because TorchForwardSimulator uses Circuit.expand_instruments_and_separate_povm
.
TIL that you can copy a link to a specific line of the github actions log, neat. Yup, this makes sense. One of the changes made on this branch was moving that method from the |
Fix a few minor issues related to refactored code and updates made in this branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to get this in and get those speedups in layout creation! First, I have a few comments: one which I think will error in certain edge cases, and others that are optional but may cut down on branching to increase maintainability.
|
||
#pre-compute a dictionary caching completed circuits for layout construction performance. | ||
unique_circuits = list({ckt for circuit_list in circuit_lists for ckt in circuit_list}) | ||
if isinstance(mdl.sim, (_fwdsims.MatrixForwardSimulator, _fwdsims.MapForwardSimulator)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we checking this because only matrix and map forward sims have this defined? If so, thoughts on a try/catch on NotImplementedError instead so that if create_[...]_cache
is defined on another forward sim, we don't have to remember and come back to change this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This staticmethod is implemented in the ForwardSimulator
base class, but raises a NotImplementedError
. Though that might eventually change (it would be possible in principle to implement caching for the base COPA layout class' cache creation, but I haven't done so yet).
I've been spending too much time the past few months on SWE youtube, and one of the SWE design philosophies du jour that I've been enamored with as of late is 'Locality of Behavior' (LoB, as described here). While I'm not a purist by any means, in this instance I think it is easier (for someone reading this for the first time, or the first time in a while) to understand this portion of the code by explicitly signaling for which forward simulators the caching will happen, rather than the try/except paradigm which would require looking into the source code for the various forward simulator classes to identify when it won't happen.
That said, you're right there is a trade-off in doing so, which is needing to remember to come back to this and explicitly enable caching in protocols which support it. (Realistically it'd most likely be one of either me, you or Riley messing around with pyGSTi internals at this level, so hopefully we'd remember).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't heard of LoB before, but I like it.
cache['split_circuits'] = {ckt: split_ckt for ckt, split_ckt in zip(circuits, split_circuits)} | ||
|
||
#There is some potential aliasing that happens in the init that I am not | ||
#doing here, but I think 90+% of the time this ought to be fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? In the <10% of cases that you expect to run into, will the cache/resulting layout still be created correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, it took me a minute to remember what I was thinking about here. (It wasn't loaded into cache, ba dum dum tsss...)
The thing that happens in the init, that isn't happening here, are the following calls:
unique_circuits, to_unique = self._compute_unique_circuits(circuits)
aliases = circuits.op_label_aliases if isinstance(circuits, _CircuitList) else None
ds_circuits = _lt.apply_aliases_to_circuits(unique_circuits, aliases)
This allows for the possibility of aliasing in the keys of the DataSet
. This aliasing is the part that I didn't implement in the code shown below. The thing that would happen in the <10% of cases I was alluding to, i.e. the cases where someone is using aliasing (as far as I am aware this is not a widely used feature) is that ds_row
would be None
some amount (maybe all) of the time, and we'd pass a list with None
entries into bulk_expand_instruments_and_separate_povm
. This is allowed (and is in fact the default), but it would result in expanded circuits being created for every POVM and instrument effect, instead of just those for which outcomes were observed. The resulting layout wouldn't be wrong, but would be less memory efficient than it could otherwise be, as it would have space allocated for probabilities we won't be calculating. It would also make the bulk_expand_instruments_and_separate_povm
less efficient than it otherwise could be.
All of that said, I am not entirely sure why I didn't implement the aliasing for the DataSet
keys at the time. The answer is probably something along the lines of: "I wasn't sure how the aliasing code worked and couldn't be arsed to figure it out at the time." Anyhow, I see no reason not to add this back in (I'm pretty sure I can just essentially copy and paste the lines above into the cache creation routine), so I'll give it a crack.
@@ -78,24 +82,29 @@ def add_expanded_circuits(indices, add_to_this_dict): | |||
for i in indices: | |||
nospam_c = unique_nospam_circuits[i] | |||
for unique_i in circuits_by_unique_nospam_circuits[nospam_c]: # "unique" circuits: add SPAM to nospam_c | |||
observed_outcomes = None if (dataset is None) else dataset[ds_circuits[unique_i]].unique_outcomes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason why map cannot also use unique_outcomes
instead of outcomes
, but am basing that just on looking through this code while reviewing. Expert @enielse opinion still welcome.
Add in support for data set key aliasing in COPA layout cache creation.
Rework some of the if statement branching in the layout creation to instead use fallback behavior of get more.
I accidentally put down the wrong directory for temp testing files in the RB testing code.
Thanks for the careful feedback, @sserita! I have just pushed some changes addressing this feedback, to summarize the main changes:
There is also a miscellaneous typo fix in the RB unit tests that I accidentally introduced with the IRB branch that I'm throwing in for good measure, in case you notice the bizarre inclusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aliasing looks good. I just have one question left on the RB test change.
test/unit/protocols/test_rb.py
Outdated
@@ -163,9 +163,9 @@ def test_combined_design_access(self): | |||
|
|||
def test_serialization(self): | |||
|
|||
self.irb_design.write('../../test_packages/temp_test_files/test_InterleavedRBDesign_serialization') | |||
self.irb_design.write('../../test/test_packages/temp_test_files/test_InterleavedRBDesign_serialization') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about why this change is needed. Aren't we in test/unit/protocols
, so ../../test_packages
is the right way to get to those files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so originally myself, but I noticed when running unit tests locally last night that a new test_packages
directory was being created at the base level (i.e. at the same level as test
and pygsti
) and populated with these files. In retrospect I suppose I could have equivalently removed one of the sets of '..'. Not sure why this is the case though, tbh, maybe something to do with how pytest sets the cwd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also possible this was related to which directory I was launching pytest from, so if you have a venv handy to run these tests on and have a different experience that'd be good to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serialization test paths are indeed relative to where you launch pytest. Launching in the repo root dir (as the Actions do), this actually meant the temp files were being written outside of the pyGSTi repo. I've changed this to use absolute paths based on where the test file is located.
That was the only outstanding issue, so I'm approving this!
Building off of the changes in PR #445 as part of an effort at reducing the overhead associated with running GST analysis, this PR further speeds up the construction of COPA layouts and MDC stores/objective functions. Both in general, but even more so in the context of iterative GST estimation.
Note: This branch is forked off of 'feature-faster-circuit-primitives', and so some of these commits are from that branch and will get cleaned up whenever #445 gets merged into develop (in case you notice duplicates).
Here is a summary of the changes:
expand_instruments_and_separate_povm
which is quite expensive. Like layouts we construct a fresh MDC store at each iteration, so by caching this information we significantly reduce the amount of repeated calculations.expand_instruments_and_separate_povm
method has been refactored, moving it from being a method of Circuits which takes as input a model into being a method of OpModels which takes as input a circuit. This method was a strange fit for the Circuit class, as it relies pretty heavily on knowledge of the internal details of a model, and from an abstraction hierarchy level this was also a out of character thing for a Circuit to be asked to do. It makes much more sense for an OpModel (the implementation effectively assumed you'd only be calling it on an OpModel given the attributes and methods it assumed were implemented, so now that is enforced), and OpModel is also where the conceptually similarcomplete_circuit(s)
andsplit_circuit(s)
methods live. In addition to the refactor, I have also added a newbulk_expand_instruments_and_separate_povm
which more efficiently performs this operation on a list of circuits.I look forward to any and all feedback.
P.S. I didn't mention how these changes affect the bottom line on runtime yet. For single-qubit GST with the XYI gate set, the full TP parameterization, and a maximum depth of 128 (as before, I haven't yet tested this on the 2Q case, or for other experiment designs, so ymmv) overall we have:
Those changes are measured relative to the runtimes following the changes made on #445. Combined with #445 this gives an overall reduction in end-to-end runtime (for the 1Q test case described above) of ~50% when using the MatrixForwardSimulator, and ~15% when using the MapForwardSimulator (proportionally less of the overall runtime is spent performing these subroutines with map, so this difference makes sense).