Faster Circuit Primitives #445

coreyostrove · 2024-05-31T04:09:13Z

I have always been slightly perturbed by the amount of time that things such as COPA layout creation take when performing GST analysis. There are a number of reasons for this, but one family of reasons is that layout generation relies heavily on a number of Circuit primitives that are not as performant as they can be. This PR is a step toward rectifying this through re-implementation of a number of methods identified as bottlenecks through profiling in the context of actual end-to-end GST analysis.

In terms of the top-level numbers, for 1-qubit GST fitting using the 'full TP' parameterization with an experiment design containing 1120 circuits (max depth 128), the changes on this branch collectively reduce the time required for layout creation by ~50%, and reduce the overall runtime of the end-to-end protocol by ~15%. Your mileage will of course vary, however, and I would anticipate higher relative savings for larger/deeper experiment designs. I have not done profiling for the two-qubit case yet, so I won't posit any guesses on the impact there (I would guess it to be less relatively speaking, but can't easily ballpark this without actually testing).

Summary of Changes:

Here is a high-level summary of the changes. I will also be leaving some comments on particular bits of code to explain any more detailed reasoning for some changes (when in doubt the answer is likely that I tried a few different things and picked the fastest way to do something).

Note on synthetic benchmarks: For the quoted figures below these were mostly done using a randomly generated set of test circuits generated according to the following code snippet:

import pygsti
from pygsti.circuits.circuitconstruction import list_random_circuits_onelen
from pygsti.modelpacks import smq1Q_XYI
import numpy as np
from numpy.random import default_rng
from itertools import product
lens = [1,2,4,8,16,32,64]
op_labels = tuple(smq1Q_XYI.target_model().operations.keys())
random_circuits = []
for length in lens:
    random_circuits.extend(list_random_circuits_onelen(op_labels, length, 100, seed = 1234))
random_circuits_2 = []
for length in lens:
    random_circuits_2.extend(list_random_circuits_onelen(op_labels, length, 100, seed = 12345))
rng = default_rng(1234)
rng.shuffle(random_circuits)
rng = default_rng(12345)
rng.shuffle(random_circuits_2)
random_pairs = list(product(random_circuits,repeat=2))

For equality testing this was run against random_pairs, for inclusion testing against random_circuits and random_circuits_2, and for other tests against random_circuits unless stated otherwise.

Circuit equality testing: The primary change was adding a length check to allow quicker short circuiting. A somewhat faster codepath was also added for comparisons between static circuits, avoiding recomputation of a circuit's tuple representation. In synthetic testing this gave a ~3X speed up.
Inclusion testing: Nothing specific was added here, but the change to circuit equality testing also sped this up by a slightly larger factor of ~3.4.
Hashing: Hashes are now precomputed and cached at the point at which a circuit is made static (this is an accepted practice in python for immutable objects, e.g. strings cache their hashes). Related to this change, at staticization time we precompute the tuple representation of the circuit and store this as well. This is the thing which actually gets hashed. Previously the tup property was called whenever this was needed and this property call regenenerated the tuple representation from scratch each time, which is expensive. In synthetic tests this resulted in a ~4.5X speedup, but in the context of the GST profiling described above this was closer to 10X (the computational savings should scale with circuit depth, so this isn't too shocking of a difference). As part of this change I have added static assertions to a few methods previously missing them for safety (we don't want to inadvertently invalidate the cached hash).
Copying: Previously copying a circuit went through the standard __init__ pathway to create a copied circuit instance. With the new implementation we utilize a much faster specialized codepath inspired by the previously existing _fastinit one. But with a few more microptimizations to take advantage of some of the additional promises we can rely on in the context of copying (which we can't as easily in _fastinit). Speedups here range from 3-15X depending on the input and output circuit types. With the low end corresponding to switching static to editable (or vice versa), and the high end keeping the input type unchanged.

This is the end of the changes where I profiled these using synthetic tests. For the rest I did this in the context of GST optimization using cProfile/snakeviz, but didn't carefully record before/after relative changes.

Microoptimization galore: There are a bunch of microoptimizations of the sort that usually don't matter except for when code is particularly hot, and the circuit code is. I won't give an exhaustive list, but this includes things like:
- Attribute access is faster than method/property access/resolution (by 2X or more in some cases). So many trivial (and some non-trivial) property calls have had the corresponding attributes inlined.
- Inlining functions to reduce function call overhead.
- List comprehensions are faster than generator expressions. So, e.g. tuple([i for i in iterable]) is measurably faster than tuple(i for i in iterable) in practice.
- List comprehensions are faster than the 'build via appends in for loop' approach. Often by a notable amount.
- Using falsy/truthiness directly is faster than the if len(obj)>0 idiom if all you're doing is checking if an iterable is empty.
Rework SeparatePOVMCircuit internals: The full_effect_labels property was getting called a lot, and each time it was it was regenerating the result from scratch (with some non-trivial preprocessing). I've updated the internals to cache this, and also deal with updating the cached value when necessary. This gave a measurable speedup.
New sandwich method for circuits. This takes as input two tuples of Labels which are prepended and appended to a circuit. This is around twice as fast as doing this with a pair of calls to __add__ (since we only need 1 call to _fastinit). This is not an uncommon task (I added as part of some changes to the circuit completion code in OpModel), so if you spot any places where this is being done it'd be worth switching (maybe @rileyjmurray can write a fancy regex to find these instances).

Additional changes (either outside of the Circuit class, or otherwise not directly performance focused):

Bulk versions of complete_circuit and split_circuit: There are now versions of the complete_circuit and split_circuit methods for the OpModel class, which respectively add prep and povm labels to a circuit or split them off, which operate on a complete list of circuits all at once. This allows for avoiding some model specific lookups (for values which don't change circuit-to-circuit) many times and is faster overall.
expand_subcircuits_inplace: The implementation of expand_subcircuits_inplace has been reworked so that the behavior is now to recursively expand CircuitLabel objects. Previously the expansion routine only took a single pass through the circuit, meaning that if the circuit contained any nested CircuitLabels (which are perfectly happy with being nested, an easy way to get something like this is to call to_label on a circuit containing CircuitLabels) we would only end up expanding out the top level. I noticed there were a few places in the layout creation/forward simulation code that implicitly assumed only a single-level of CircuitLabels would be present, which would likely have eventually led to some really annoying bugs to track down. This change covers that edge case, and should only be slightly more expensive than the previous routine in the typical case where there is only a single-level of CircuitLabels.
I've added a deprecation warning to the simulate method. If there is strong opposition to removing this functionality we can discuss this, of course. My reasoning here is that given the rest of pygsti's structure this functionality really isn't something we should be asking a Circuit to do, as the preferred pattern nowadays is delegate this job to the Model/ForwardSimulator, rather than asking the circuit itself. I don't know how widespread the use of this is (my guess is that this is largely vestigial), but I'd recommend slating this for removal in 0.9.14. Related note, I don't know if we have a running list of things we've deprecated and their expiry date, but if not we should assemble one.

Finally, there are a number of miscellaneous documentation fixes, bugfixes, and updates to unit tests.

P.S. There are some more layout creation performance enhancements inbound on a separate branch, but I've opted to keep these changes together in a separate PR to keep things thematic and make it easier to review without context switching.

This commit speeds up the implementation of some core circuit primitives used in pyGSTi. The most notable are the circuit inequality check and hashing. 1. Equality checking is sped up by introducing a relatively inexpensive length comparison which allows fast short circuiting of the test. 2. For hashing we now leverage staticness more directly by caching both the hash, and the tuple representation used for hashing. To go along with these changes are (pre-emptive) bugfixes for some inplace circuit modification methods which previously did not check for whether a circuit was static (which would result in hash invalidation). Similarly we have added restrictions to the setter methods for some circuit parameters that feed into the hash to prevent modification of these for static circuits.

Remove duplicated docstring on class definition, leaving the attribute and class description alone.

This refactors the value stored in the private attribute _compilable_layer_indices_tup to remove the leading '__CMPLBL__' string value. This was only used in the construction of the python tuple representation of the circuit when calling the `tup` property. Elsewhere the inclusion required awkwardly having to check the length and then slicing into the tuple to skip this string when accessing the actual indices. As such, the inclusion of this string has been added to the code for `tup`, and the remaining code has been cleaned up to match. In addition to being cleaner, this should be more performant (as we need to do fewer checks when working with the value of this object now). Additionally, this commit also switches to directly inlining access to the _compilable_layer_indices_tup instead of going through the property in a few places to reduce method call overhead.

Inline access to _line_labels, _occurence_id where appropriate to reduce method call overhead. Additionally add a codepath for static circuit comparisons to __lt__ and __gt__.

This reimplements the circuit copy method using a faster code path inspired by the _fastinit code path used for quickly creating new circuits elsewhere in module. This adds a specialized _copy_init method, takes advantage of caching pre-computed values in static circuits, and generally inlining and reducing function call and assignment overheads across the board. Initial testing indicated 20-60x speed ups over original copy method.

This is out of place as a circuit method now, and should be removed.

The expected format for the labels when a circuit is editable is a nested list of lists of simple labels. When making an editable copy of a static circuit the new version of copy was not putting the labels in this format as needed. We now cache the previously editable version of the circuit layers at the point where the circuit is made static, and use this for speeding up the conversion back to an editable format.

Correctly handle empty compilable_layer_indices_tup case.

I am going to be almost immediately reverting this change, but I want this version saved in the git history for future reference.

This reverts commit 2b8e375.

This reverts commit 510e6f7.

Refactors the `is_simple` method for Labels into a class attribute.

This commit adds the following: A new helper function for speeding up copying by avoiding double circuit tuple construction. A new method for sandwiching a circuit with two labels (faster than is using the add operation twice). A bugfix for add which checks for the need to add to the line_labels of a circuit when the added label has a new one not presently in the circuit. Corrections and inlining of editable label to static label conversions. A slightly faster reworking of expand_subcircuits_inplace.

…cuits Bulk implementation of split_circuits and new option for complete_circuits that returns the split form of the circuit as an optional return value (which can speed things up in layout creation when both are needed).

Update the implementation of the expand_subcircuits_inplace method to recursively expand nested subcircuits.

This updates the implementation of the SeparatePOVMCircuit containter class. The most important change is adding an attribute for the full_effect_labels that avoids uneeded reconstruction. To add protection then, to ensure that this is kept in sync with everything else, the povm_label and effect_labels attributes (which feed into full_effect_labels) have been promoted to properties with setters that ensure the full_effect_labels are kept synced.

Fixes a bug in the updated subcircuit expansion function that didn't account for circuit repetitions in calculating depths.

Update the name of the circuit _fastinit kwarg used in stdinput.

Adds new unit tests for new circuit functionality, and fills in a couple gaps for previously untested methods/conditions. Removes skipped test for a long gone method.

Minor unit test updates to fix errors in test_packages. Also fixes some wonky formatting left over from profiling.

Not sure why this didn't get caught on the circuit update branch, but oh well...

This timer test occasionally fails due to sleep lasting slightly less time than the specified duration. I think this is fundamentally due simply to the relatively sloppy specs on sleep provided by OSes (most are apparently only accurate at the 1-10ms level, and while usually they run long, they can occasionally run short too).

sserita

Thanks for the great work as usual @coreyostrove!

It seems like the major parts of this PR are in the new initialization via _copy_init, the tup property, the copy() and expand_subcircuits_inplace() functions, as well as some of the dunder functions (__eq__ most notably). These look pretty good to me reading through them, but I think it'll be good to get this into develop and pay attention to any issues as we get the rest of 0.9.13 stuff in.

I've also mentioned some other smaller things that will hopefully make it more readable or changes I wasn't sure were necessary.

pygsti/circuits/circuit.py

pygsti/baseobjs/label.py

pygsti/circuits/circuit.py

pygsti/models/model.py

pygsti/circuits/circuit.py

This incorporates some feedback from Stefan's review. Refactors the split_circuit and complete_circuit methods to use the multi-circuit variants under the hood. Fixes some bugs that were discovered in testing as part of making that change. Reduce duplication in the tup method and make some efficiency improvements.

coreyostrove · 2024-06-04T15:44:59Z

Thanks for the feedback, @sserita! I have responded to a number of your questions/comments above. I've also just pushed a commit incorporating some of your suggested changes.

Fully remove the old methods and switch to using the class attribute.

Fixes minor error in split_circuits.

rileyjmurray

I left a million comments (and by a million, I mean 31).

Practically all of my comments fell into one of two categories.

I'm uneasy about having is_simple be an exposed static variable for the various Label classes. I'd like to know what alternatives there might be.
There's lots of code that uses list comprehensions when generators would suffice and be faster. I've opened an issue about this so we know to investigate more broadly (Audit code for use of list comprehensions where generators would be faster #450). I suspect that an automated fix of this with find-and-replace might not be possible, so I think we should address these inefficiencies when we see them.

rileyjmurray · 2024-06-05T12:11:39Z

pygsti/baseobjs/label.py

@@ -201,6 +198,10 @@ class LabelTup(Label, tuple):
    acted upon by the object this label refers to.
    """

+    #flag used in certain Circuit subroutines
+    #Whether this is a "simple" (opaque w/a true name, from a circuit perspective) label or not.
+    is_simple= True


Missing white space around equals sign.

pygsti/baseobjs/label.py

pygsti/circuits/circuit.py

pygsti/models/explicitmodel.py

test/test_packages/cmp_chk_files/idt_nQsequenceCache.pkl

rileyjmurray · 2024-06-05T18:42:52Z

@coreyostrove I've gone through all my PR comments and (tried to) close the ones that were just about generators vs list comprehensions. There are still some meaningful comments left, so here's direct links to them:

Faster Circuit Primitives #445 (comment) (there are many copies of this one)
Faster Circuit Primitives #445 (comment)
Faster Circuit Primitives #445 (comment)

This removes unneeded list construction from a few sorted calls. Also refactors filter calls into list comprehensions and refactors a few instances of chain (which can be kind of slow).

Minor update to assign to _line_labels directly.

coreyostrove · 2024-06-05T23:15:28Z

Thanks for the feedback, @sserita and @rileyjmurray. I have addressed a number of your comments, so I have gone through and marked those as such to make it easier to see which conversation threads still need following up on.

Right now it looks like the main open discussions remaining are:

Code reuse options for the new _copy_init codepath. (Incidentally, I've actually wound up using _copy_init for something other than direct copying on the 'feature-faster-layout-creation' branch, so it is proving useful for accelerating some other circuit methods besides copying).
The right way to refactor the is_simple attribute/method for Label classes.
Whether or not we want to add a static flag/static checks to the SeparatePOVMCircuit class.

Let me know if there is anything I have missed, or if there is something I have marked as resolved which you think still needs working on.

sserita · 2024-06-06T16:22:30Z

@coreyostrove My current stance on your short list:

This would be something that would be great to sort out, but if you have already tried and it was unclear the best path for reuse, then your current solution of comments pointing people to keep the functions in sync is a reasonable one. It does make some sense that the restricted version of _copy_init has optimizations that makes it hard to use within _bare_init... It was just not obvious to me at first glance that those were truly limiting.
I think this is definitely something we should fix, and I left a suggestion on the relevant thread.
When I made my comment about this, I didn't realize we were in SeparatePOVMCircuit. I'm OK with shelving this for now - I just wanted to make sure that Circuit had consistent static logic, not necessarily force other classes to adopt the static mode.

Update the convention used for the Label class attribute, and add a property for accessing this attribute externally.

coreyostrove · 2024-06-06T21:30:50Z

Awesome, I just pushed the changes addressing the Label attribute conventions, so with that I think we're in good shape. Thanks @sserita and @rileyjmurray for your careful reviews and valuable feedback!

sserita

Thanks for the great work as usual @coreyostrove!

I'm approving, but this is again going to wait on merge until we clear beta tests up...

enielse · 2024-06-11T16:11:08Z

These updates all look great to me.

sserita · 2024-07-25T18:44:14Z

Beta tests clear, merging!

Corey Ostrove added 20 commits May 21, 2024 14:55

Remove duplicate docstring

76cc86c

Remove duplicated docstring on class definition, leaving the attribute and class description alone.

Inline attribute access and optimize methods for static circuits

33acdb1

Inline access to _line_labels, _occurence_id where appropriate to reduce method call overhead. Additionally add a codepath for static circuit comparisons to __lt__ and __gt__.

Deprecate simulate

205ff31

This is out of place as a circuit method now, and should be removed.

Fix tuple creation

64505d3

Correctly handle empty compilable_layer_indices_tup case.

Implement caching of editable labels

2b8e375

I am going to be almost immediately reverting this change, but I want this version saved in the git history for future reference.

Revert "Implement caching of editable labels"

f74dff3

This reverts commit 2b8e375.

Revert "Fix mistakes in creation of editable copy"

d02eff9

This reverts commit 510e6f7.

Refactor simple label check

54e34a6

Refactors the `is_simple` method for Labels into a class attribute.

Make expand_subcircuits_inplace recursive

682c512

Update the implementation of the expand_subcircuits_inplace method to recursively expand nested subcircuits.

Merge branch 'develop' into feature-faster-circuit-primitives

130cea6

expand_subcircuits_inplace bugfix

6c382c9

Fixes a bug in the updated subcircuit expansion function that didn't account for circuit repetitions in calculating depths.

Update kwarg for compilable_layer_indices

49f6ace

Update the name of the circuit _fastinit kwarg used in stdinput.

Update circuit unit tests

83e3b81

Adds new unit tests for new circuit functionality, and fills in a couple gaps for previously untested methods/conditions. Removes skipped test for a long gone method.

coreyostrove requested a review from enielse May 31, 2024 04:09

coreyostrove self-assigned this May 31, 2024

coreyostrove added this to the 0.9.13 milestone May 31, 2024

Minor Updates/Fixes

e5872d8

Minor unit test updates to fix errors in test_packages. Also fixes some wonky formatting left over from profiling.

coreyostrove marked this pull request as ready for review May 31, 2024 05:55

coreyostrove requested review from rileyjmurray and a team as code owners May 31, 2024 05:55

coreyostrove requested a review from sserita May 31, 2024 05:56

coreyostrove changed the title ~~WIP: Faster Circuit Primitives~~ Faster Circuit Primitives May 31, 2024

This was referenced May 31, 2024

Pickles are Bad, JSON is Good #446

Open

PyTorch-backed forward simulation #390

Merged

Corey Ostrove added 2 commits June 2, 2024 21:03

Update test_stdinputparser.py

981762f

Not sure why this didn't get caught on the circuit update branch, but oh well...

sserita requested changes Jun 3, 2024

View reviewed changes

Corey Ostrove added 2 commits June 4, 2024 12:49

Refactor Label is_simple

cdd3d50

Fully remove the old methods and switch to using the class attribute.

Fix indentation error

ef1904f

Fixes minor error in split_circuits.

coreyostrove mentioned this pull request Jun 5, 2024

Faster Layout and MDC Store Creation #448

Open

rileyjmurray mentioned this pull request Jun 5, 2024

Audit code for use of list comprehensions where generators would be faster #450

Closed

rileyjmurray requested changes Jun 5, 2024

View reviewed changes

Corey Ostrove added 2 commits June 5, 2024 15:00

Incorporate additional feedback

a2082b1

This removes unneeded list construction from a few sorted calls. Also refactors filter calls into list comprehensions and refactors a few instances of chain (which can be kind of slow).

Minor update

5b8da09

Minor update to assign to _line_labels directly.

Update Label Attribute Conventions

22838c9

Update the convention used for the Label class attribute, and add a property for accessing this attribute externally.

sserita approved these changes Jun 6, 2024

View reviewed changes

sserita merged commit 6a708e5 into develop Jul 25, 2024
4 checks passed

sserita deleted the feature-faster-circuit-primitives branch July 25, 2024 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster Circuit Primitives #445

Faster Circuit Primitives #445

coreyostrove commented May 31, 2024 •

edited

Loading

sserita left a comment

coreyostrove commented Jun 4, 2024

rileyjmurray left a comment

rileyjmurray Jun 5, 2024

rileyjmurray commented Jun 5, 2024

coreyostrove commented Jun 5, 2024 •

edited

Loading

sserita commented Jun 6, 2024

coreyostrove commented Jun 6, 2024

sserita left a comment

enielse commented Jun 11, 2024

sserita commented Jul 25, 2024

Faster Circuit Primitives #445

Faster Circuit Primitives #445

Conversation

coreyostrove commented May 31, 2024 • edited Loading

Summary of Changes:

sserita left a comment

Choose a reason for hiding this comment

coreyostrove commented Jun 4, 2024

rileyjmurray left a comment

Choose a reason for hiding this comment

rileyjmurray Jun 5, 2024

Choose a reason for hiding this comment

rileyjmurray commented Jun 5, 2024

coreyostrove commented Jun 5, 2024 • edited Loading

sserita commented Jun 6, 2024

coreyostrove commented Jun 6, 2024

sserita left a comment

Choose a reason for hiding this comment

enielse commented Jun 11, 2024

sserita commented Jul 25, 2024

coreyostrove commented May 31, 2024 •

edited

Loading

coreyostrove commented Jun 5, 2024 •

edited

Loading