In developing a suite of benchmarks intended by to evaluate quantum computers by their ability to provide an end user with a usable, productive result, a practical view must be taken in defining metrics of success of any test quantum algorithms. In particular, quantum state tomography, while able to provide insight into the path through Hilbert space that a quantum state takes throughout a quantum algorithm, requires an exponentially large number of measurements to be made to determine the accuracy with which a quantum computer performs the prescribed algorithm. Citations on state tomography.
Other ways of defining the empirical difference between outputs of a quantum computer can take into account only what the user is able to see; that is, the measurement probabilities over the computational basis states. In this case, it is standard to borrow measures of divergence from probability theory, such as the Hellinger Distance.
The Hellinger Distance is a similarity measure for two probability distributions of the following form:
The squared Hellinger Fidelity, defined using the Hellinger Distance above and used in quantum computing applications like Qiskit, is defined as
A type of f-divergence, it is easy to show that will be equal to zero if and are identical, and equal to one if and have disjoint supports. Citation on Hellinger distance. As a consequence, the squared Hellinger Fidelity will always be nonzero if there is any shared support between and , which leads to a problem when considering the expected behavior of highly noisy quantum devices.
It is tempting to prescribe using the squared Hellinger Fidelity as the measure of success for the output of a quantum computer. Say that a quantum algorithm, when run on a noiseless device acting on some initial state, should produce the state
When tested however, a real device instead returns the (possibly mixed) state
Then the benchmarker is left only with the resulting set of estimators in the form of the normalized measurement results of the noisy algorithm. In constructing a benchmark suite, it is easy enough to choose algorithm instances wherein the values are known analytically. Therefore, for a given algorithm and empirical results, the squared Hellinger Fidelity measuring the quality of the results given the theoretical noiseless results is calculated as
In practice, this yields a quantity bounded on which offers a scale on which to compare two quantum devices running the same algorithm; a higher value of means a result closer to the theoretical result and therefore the device with the higher has performed the test algorithm better.
Consider the output of any quantum algorithm running on a quantum computer with maximal noise; that is, near-zero coherence time and uncorrelated errors between each one or two qubit gate. On average, the result of a quantum algorithm of any significant length with be completely thermalized by accumulated noise and error, leading to a uniformly random output .
It is impossible to obtain information from such a device - the output is independent of the input - and yet the squared Hellinger Fidelity between the output distribution and the theoretically correct state will always be nonzero. Namely, this limiting quantity is
The limiting behavior of the squared Hellinger Fidelity measured on a maximally noisy device is essentially a measure of the spread of the noiseless distribution. In the case of quantum algorithms where the expected distribution is highly centralized for a single state (most of the probability mass is on a single marked state), a maximally noisy quantum device will see a squared Hellinger Fidelity fall off exponentially with the number of qubits:
This result already poses a problem in the low qubit count limit. If it is understood that theoretically, the squared Hellinger Fidelity is capable of returning values less than this parameter, a user may believe that the device is achieving more than it really is. In this case, despite the fact that the scale theoretically goes as low as , a completely useless quantum device would achieve by sheer random guessing for a -qubit instance.
It is also of interest to measure the quality of results for quantum algorithms with some significant variance in the theoretical result distribution. In this case, the skew can be much worse. For the sake of simplicity, consider the case where the noiseless state is an equal superposition over states:
Then the squared Hellinger Fidelity returns
When trying to measure the performance of quantum algorithms with significant spread, the high noise limiting behavior of the squared Hellinger Fidelity may lead to artificially high results, especially with low qubit counts. This directly impedes the usefulness of the measure in a well-rounded applications focused benchmarking suite, in which a sliding scale is intended to provide the user confidence that high performance means the algorithm is far from noise-limited.
There are several ways to amend the squared Hellinger Fidelity as a metric to account for the shortcomings described so far. The authors argue that due to the dependence on spread of noiseless distribution, a simple rescaling to account for the fidelity inflation due to low qubit counts would not be enough. Instead, the best approach is to take full advantage of the fact that the analytic noiseless quantum state results are known, and normalize the measure against the performance of full thermalizing noise for each algorithm instance\footnotemark . That is, define the Noise-Normalized Squared Hellinger Fidelity as a linear transformation which ensures that a uniform result distribution is always evaluated to be equal to , leaving the measure consistent in the case that the noiseless distribution is fully centralized. The authors believe that the exponential decay with respect to qubit count is sufficient to distinguish incredibly noisy devices, so long as it is understood that this method will have artificially high minimum measured fidelities at low qubit counts. In algorithms where the effective spread is a nontrivial function of , the scaling can be arbitrary; this situation makes it that much more difficult to distinguish the behavior of accurate quantum computers producing non-localized results, and high noise machines matching some amount of spread anyway.
The proposed linear transformation takes the form:
which guarantees that a fidelity equal to one is unchanged and the uniform distribution will decay exponentially; and .
In the case we want to use polarization fidelity, replace the above with: