Many Known Quantum Algorithms Are Optimal: Symmetry-Based Proofs

,


Formulation of the Problem
1.1. Need for quantum computing 8 Modern computers are extremely fast, but still there are many practical problem 9 that require even faster computations. For example, high-performance computers, after 10 computing for several hours, help us come up with a reasonably accurate prediction of 11 tomorrow's weather. It turns out that similar algorithms can help us predict where a 12 tornado will turn in the next 15 minutes -but this computation also requires several 13 hours on modern computers, too late for this prediction to be practically useful. 14 How can we make computer faster? There are many interesting engineering ideas 15 how to do it, but there is also a fundamental limitation -that, according to relativity 16 theory, nothing can travel faster than the speed of light c = 300000 km/sec; see, e.g., [9, 17 33]. For a usual laptop which is about 30 cm in size this means that it takes 10 −9 seconds 18 -1 nanosecond -for a signal to go from one side of the laptop to the other. During this 19 time, a usual 4 GHz laptop already performs 4 operations. From this viewpoint, the only 20 way to make computer substantially faster is to make them significantly smaller. 21 Already in modern computers, each memory cell is very small -up to 10 nanometers 22 (nm), comparable with the nm size of a single molecule. As a result, each cell contains 23 several thousand molecules. If we make cells even smaller, their size will be comparable 24 with the size of a single molecule. At such sizes, we can no longer use Newtonian 25 mechanics, we need to take into account that the micro-world is governed by different 26 equations -the equations of quantum physics [9,33]. Computing on such a level is 27 known as quantum computing 28 1.2. Need for quantum algorithms 29 One of the important challenges of quantum computing is that in quantum physics - 30 in contrast to Newtonian physics -the results are non-deterministic: we can only predict 31 the probabilities of different outcomes. The classical example of such a probabilistic 32 uncertainty is radioactivity, one of the first observed quantum phenomena: we can 33 predict the probability that an atom will decay -and thus, accurately predict the amount 34 of radiation -but we cannot predict at which moment of time each individual atom will 35 decay.

•
We want to understand how the world works, to predict what will happen -this is, algorithm that speeds us such a search. This algorithm -proposed by Lev  finds an element in an unsorted list in time √ n, which is much faster than n steps 65 needed in the non-quantum case [16,17,26,35]. Quantum algorithms are also useful 66 in optimization.

67
• An additional way to speed up computations comes from the fact that in prediction 68 problems -such as predicting tomorrow's weather -to be on the safe side, we take 69 into account today's meteorological data in all nearby locations, even though most 70 of this data is actually irrelevant. To speed up computations, it is desirable to decide 71 which inputs and relevant and which are not. In this analysis, quantum computing 72 also help -namely, we can use Deutsch-Jozsa algorithm; see, e.g., [26,35]. behind the proofs of their optimality. In Section 3, we describe the relation between 92 optimality -that we want to prove -and symmetries -i.e., invariance with respect 93 to different transformations. After that, we present the proofs of optimality of differ-94 ent quantum algorithms for quantum data processing: Grover's algorithm in Section 95 4, parallel-related teleportation algorithm in Section 5, and an optimization-related 96 quantum annealing algorithm in Section 6.

97
It should be mentioned that other quantum algorithms are also known to be optimal: 98 optimality of Deutsch-Josza algorithm is proven in [20] Deutsch-Josza, and optimality of 99 quantum communication algorithm in [15]. In "classical" (= non-quantum) physics, each object, each system can be in different states s, s , . . . In quantum physics, such classical state are denoted by |s , |s , etc. An unusual feature of quantum physics is that, in addition to such states, we can also have superpositions of such states, i.e., states of the type where c, c , . . . are complex numbers for which where, as usual, for a complex number c = a + b · i, its modulus |c| is defined as If the system is in the state (1), and we use a classical measurement 104 instrument to measure the state, then:

105
• we will get state s with probability |c| 2 ,

106
• we will get state s with probability |c | 2 , etc.

107
These probabilities should add up to 1, which explains the formula (2).

108
In particular, a quantum analogue of a bit (binary digit) -i.e., of a system that can be in two different states 0 and 1 -is a quantum but (qubit, for short) that can be in any state where c 0 and c 1 are complex numbers for which In the state (3), the probability that we will observe 0 is |c 0 | 2 , and the probability that we 109 will observe 1 is equal to |c 1 | 2 .

110
Similarly, for a 2-bit system -which in classical physics, can be in 4 different states 00, 01, 10, and 11 -a general quantum state is equal to In general, if we have n classical states s 1 , . . . , s n , and we want to detect, in a 115 quantum state ∑ α i · s i , which of these states we are in, we get each s i with probability 116 |α i | 2 -and once the measurement process detects the state s i , the actual state turns into s i .

117
Instead of the classical states s 1 , . . ., we can use any other sequence of states s i = 118 ∑ j t ij · s j , as long as they are orthonormal (= orthogonal and normal) in the sense that:

122
In this case, if we have a state ∑ α i · s i , then with probability |α i | 2 , the measurement 123 result is s i and the state turns into s i .

124
In general, instead of a sequence of orthogonal vectors, we can have a sequence  In quantum physics, if the first system was in the general quantum state (1) and the second system is in a similar quantum state a · |t + a · |t + . . . , (6) then the state of the composite system -known as the tensor product of the states (1) and (6): (c · |s + c · |s + . . .) ⊗ (a · |t + a · |t + . . .), is equal to c · a · |s, t + c · a · |s, t + . . . + c · a · |s , t + c · a · |s , t + . . .
In particular, for classical states, e.g., when c = a = 1 and c = . . . = a = . . . = 0, we get 137 |s ⊗ |t = |s, t .  States may change with time. In quantum physics, all changes are linear -for the same reason why composition of two states is linear. In other words, each state is transformed into the state for which Note that every such transformation is reversible: once we apply the transformation 148 T, we can then apply the transformation T † and, due to the property T † T = I, get back 149 the original state.

150
For 1-qubit systems, one of such transformation is Hadamard transformation H for which

151
In this section, we will deal only with functions y = f (x 1 , . . . , x n ) of boolean (0-1) irreversible. For example, for the "and"-function To make the corresponding transformation reversible, a function y = f (x 1 , . . . , x n ) is represented as where a ⊕ b is exclusive "or" -or, what is the same, addition modulo 2, an operation 161 for which 0 ⊕ 0 = 1 ⊕ 1 = 0 and 0 ⊕ 1 = 1 ⊕ 0 = 1. One can check that thus defined 162 transformation is reversible: namely, if we apply the transformation T f twice, we get 163 back the original state (x 1 , . . . , x n , y) -simply because a ⊕ a = 0 for all a.

164
Comment. While this is the prevailing representation of functions in quantum computing, 165 it should be mentioned in some cases, a different representation is preferable; see, 166 e.g., [11].  168

What is invariance (symmetry)
169 In many cases, there are some natural transformations that does not change the 170 system. This "not changing" is called invariance. For example, suppose that we have 171 an unsorted list, and we are looking for an element with a ceratin property in this list.

172
For convenience, we can denote one of the list's elements by s 1 , another one by s 2 , invariances.

177
In physics, invariance is called symmetry -since it naturally generalizes geometric 178 invariances (symmetries); see, e.g., [33? ]. 180 Usually, when we talk about "optimal", we mean that on the set of all alternatives i.e., in general, probabilistic -algorithms for solving a given problem, we may want to 185 maximize that probability f (A) that the algorithm A will lead to the desired solution. • when A and A are equally good; we will denote it by A ∼ A .

206
In precise terms, by an optimality criterion that the set A of all alternatives, we mean a 207 pair of relations < and ∼ with the following natural properties:

227
Lemma. For every final T-invariant optimality criterion, its optimal alternative A opt is also Proof. The fact that A opt means that for every A ∈ A, we have either A < A opt 230 or A ∼ A opt . In particular, this is true for . This is true for every alternative A, which means that the alternative 233 T(A opt ) is also optimal. However, the optimality criterion is final, which means that 234 there is only one optimal criterion. Thus, indeed, T(A opt ) = A opt . The lemma is proven.

236
Due to this lemma, if the optimality criterion is T-invariant, then to find optimal 237 alternative, it is sufficient to find T-invariant alternatives. Let us start checking optimality 238 with Grover's algorithm. this element has the desired property. We want to find an element that has this property.

244
For simplicity, we will consider the case when there is exactly one such element i 0 .

245
Let us describe this problem is quantum computing-related terms. What we want is an index i 0 of the desired element. In quantum computing terms, this means that we want to end up in a state |i 0 . As we have mentioned, in general, quantum processes are probabilistic, so instead of the exact state |i 0 , we may end up in a superposition state: In quantum terms, the algorithm that checks whether a given element has the

249
In terms of the Hadamard states |0 and |1 , we get the following: So, for |0 , nothing changes, and for |1 , the additional bit |1 remains the same, but the previous state (13) changes to: Let us denote this transformation from (13) to (14) by U.

254
Our goal is to start with some state, and, by applying this transformation U and is therefore reasonable to require that our optimality criterion is invariant with respect 260 to all permutations. Due to the above Lemma, this implies that the optimal algorithm 261 should also be permutation-invariant, in particular:

262
• that the initial state should be permutation-invariant, and

263
• that all transformations S should be permutation-invariant.

265
The fact that the initial state is permutation-invariant means that c i = c i for all i and i -since every two indices i and i can be obtained from each other by an appropriate permutation. Thus, the initial state must have the form for some c 1 . Due to the normalization requirement (2), we have |c 1 | = 1/ √ n. In quantum mechanics, states differing by a constant are considered the same state, so we can simply take c 1 = 1/ √ n. Then the initial state takes the form: This is exactly the initial state of Grover's algorithm.

266
A general transformation is describes by a matrix S ij . For this matrix, permutation invariance means that all the elements S ii are equal to each other -similar argument as before. Let us denote this common value by a. Similarly, all the elements S ij with i = j should also be equal to each other. Let us denote this common value by b. In these terms, the corresponding linear transformation transforms the vector c i into a new vector This expression can be equivalently described as We want to make sure that this transformation preserved the fact that the probabilities add up to 1, i.e., that n ∑ j=1 |c j | 2 = 1.
As we have mentioned earlier, it is sufficient to consider situations in which all the coefficients c i are real numbers. In this case, |c j | 2 = (c j ) 2 , and, due to (17), the condition (18) takes the form i.e., due to the fact that This equality has to hold for all C, so we must have If b = 0, then a = ±1, so the transformation S either leaves the state unchanged or multiplies all the coefficient c i by −1 -i.e., in effect, also leaves the state unchanged. So, to get a non-trivial transformation, we need to take b = 0. In this case, 2 · (a − b) + n · b = 0; since, without losing generality, we can take a − b = 1, we get b = −2/n. Thus, a = (a − b) + b = 1 − 2/n, and this transformation takes the form This is also exactly the transformation used in Grover characteristics. This is how, e.g., 3D printing works. This solution is based on the fact 317 that in classical (non-quantum) physics we can, in principle, measure all characteristic of 318 a system without changing it.

319
The problem is that in quantum physics, such a straightforward approach is not if we start with a state α 0 · |0 + α 1 · |1 , all we get after the measurement is either 0 or 323 1, with no way to reconstruct the values α 0 and α 1 that characterize the original state.

324
Since we cannot use the usual straightforward approach for communicating a state, we 325 need to use an indirect approach. This approach is known as teleportation. be naturally described if we associate these two possible states with 0 and 1.

336
In these terms, the problem is as follows:

338
• As a result of this process, Bob should have the same state. Let us indicate states corresponding to Alice with a subscript A, and states corresponding to Bob with a subscript B. The state (22) is not exclusively Alice's and it is not exclusively Bob's, so to describe this state, we will use the next letter -letter C. In these terms, Alice has a state that she wants to communicate to Bob. To make teleportation possible, Alice and Bob prepare a special entangled state: In the beginning, the state C is independent of A and B. So, the joint state is a tensor product of the AB-state (24) and the C-state (23): In the first stage of the standard teleportation algorithm, Alice performs a measurement procedure on the parts A and C which are available to her. In general, to describe the possible results of measuring a state s with respect to linear spaces L i , we need to represent s as the sum In the standard teleportation algorithm, we perform the measurement with respect to the following four linear spaces L i = L B ⊗ t i , where L B is the set of all possible linear combinations of |0 B and |1 B , and the states t i have the following form: One can easily check that the states t i are orthonormal, hence the spaces L i are orthogonal.

349
To describe the result of measuring the state (25) with respect to these linear spaces, we must first represent the state (25) in the form s = ∑ s i , with s i ∈ L i . For this purpose, we can use the fact that, due to the formulas (27), we have Substituting the expressions (28) into the formula (25), we get So, we get a representation of the type (26), with Here, for each i, we have So, with equal probability of 1 4 , we get one of the following four states -and Alice knows which one it is:  Teleportation is possible because we have prepared an entangled state (24), i.e., a state s AB in which the states of Alice and Bob are not independent, i.e., a state that does not have a form s A ⊗ s B . However, (24) is not the only possible entangled state. Let us consider, instead, a general joint state of two qubits: What will happen if we use this more general entangled state instead of the one that is 367 used in the known teleportation algorithm? For the state (24a), the joint state of all three subsystems has the form Substituting expressions (28) into this formula, we get and S 2 , . . . are described by similar expressions.

370
This means that after the measurement, Bob will have the normalized state S 1 / S 1 .

371
To perform teleportation, we need to transform this state into the original state α 0 · 372 |0 B + α 1 · |1 B . Thus, the transformation from the resulting state S 1 / S 1 to the original

383
In terms of these new states, the entangled state (24a) takes the form From the requirement that the sum of the squares of absolute values of all the coefficients add up to 1, we conclude that 2 · const 2 = 1. Then const = 1 √ 2 and the entangled state takes the familiar form This is exactly the entangled state used in the standard teleportation algorithm. So, we 384 can make the following conclusion.  is that often, the existing optimization techniques lead to a local optimum. One way to 396 avoid local optima is annealing: whenever we find ourselves in a possibly local optimum, 397 we jump out with some probability and continue search for the true optimum. Since 398 quantum processes are probabilistic, a natural way to organize such a probabilistic 399 perturbation of the deterministic optimization is to use quantum effects, i.e., to perform 400 quantum annealing. This idea was first proposed in [10,19] and has been used successfully It turns out that often, quantum annealing works much better than non-quantum 403 one; see, e.g., see, e.g., [4][5][6]8,18,21,22,24,28,31,32,34]. Quantum annealing is the main 404 technique behind the only commercially available computational devices that use quan-405 tum effects -D-Wave computers; see, e.g., [4,21,32].

406
The efficiency of quantum annealing depends on the proper selection of the an-407 nealing schedule, i.e., schedule that describes how the perturbations decrease with time.

408
Empirically, it has been found that two schedules work best: power law and exponential 409 ones. In this section, following [14], we prove that these two schedules are indeed 410 optimal (in some reasonable sense). In general, the state of a quantum system is described by a complex-valued function ψ(t) (known as the wave function), and the dynamics of a quantum system is described by Schroedinger's equations where, as before, i def = √ −1 and H is a corresponding linear operator. In these terms, annealing-type modification means adding additional terms -decreasing with time -to the operator H, i.e., replacing the original equation (30) with the modified equation where H 0 describes the deviation, and γ(t) monotonically tends to 0 as t increases.

413
The efficiency of quantum annealing strongly depends on the proper selection of 414 the annealing schedule, i.e., the dependence of γ(t) on time t. Empirically, depending 415 on the specific optimization problem, two scheduled work the best:

420
In this section, we provide a theoretical proof that these schedules are indeed optimal.  we will have C · H, where C is the ratio of the units.
If for the original operator H the best schedule was γ(t), then for the new operator C · H, the best schedule is C · γ(t), since the corresponding equation i ·h · ∂ψ ∂t = C · Hψ + C · γ(t) · H 0 ψ (32) is equivalent to the original equation (31) if we re-scale the time, i.e., consider t/C instead 441 of the original time t.

442
Since, as we have mentioned, the choice of the energy unit is rather arbitrary, this 443 means that we cannot select a single annealing schedule γ(t): with each such optimal 444 schedule, in different energy units, a schedule C · γ(t) is optimal. Thus, we can only is replaced by a new value λ · t. For example, if we replace minutes by seconds, then 2 452 minutes becomes 60 · 2 = 120 seconds.
The numerical value of time also depends on the choice of the starting point. If we replace the original starting point with the one which is t 0 units earlier, then all numerical values t are replaced with shifted values t + t 0 . It also makes sense to require that the relative quality of two families not depend on the choice of the starting point, i.e., that if {C · γ 1 (t)} C>0 < {C · γ 2 (t)} C>0 , then we should have {C · γ 1 (t + t 0 )} C>0 < {C · γ 2 (t + t 0 )} C>0 .

454
According to our Lemma, once we assumed that the optimality criterion is invariant, 455 then the optimal family must be invariant.

456
For invariance with respect to changing the unit of time, this means that This inequality means, in particular, that the function γ(λ · t) from the first family 457 belongs to the second family, i.e., that for every t and λ, we have γ(λ · t) = C(λ) · γ(t) 458 for some C depending on λ. It is known (see, e.g., [1]) that the only monotonic solutions 459 to this functional equation are power laws γ(t) = A · t a .
Thus, the power law and the exponential law are indeed the only invariant functions 466 -and thus, the optimal law must be either a power law or an exponential law.

467
Author Contributions: All three authors contributed equally to the paper.