The Quantum Esscher Transform
Abstract
The Esscher Transform is a tool of broad utility in various domains of applied probability. It provides the solution to a constrained minimum relative entropy optimization problem. In this work, we study the generalization of the Esscher Transform to the quantum setting. We examine a relative entropy minimization problem for a quantum density operator, potentially of wide relevance in quantum information theory. The resulting solution form motivates us to define the quantum Esscher Transform, which subsumes the classical Esscher Transform as a special case. Envisioning potential applications of the quantum Esscher Transform, we also discuss its implementation on fault-tolerant quantum computers. Our algorithm is based on the modern techniques of block-encoding and quantum singular value transformation (QSVT). We show that given block-encoded inputs, our algorithm outputs a subnormalized block-encoding of the quantum Esscher transform within accuracy in queries to the inputs, where is the condition number of the input density operator and is the number of constraints.
Contents
1 Introduction
In probability and statistics, it is often important to find low relative-entropy distributions from a given fixed distribution. In addition, further constraints, the form and interpretation of which depend on the problem at hand, are frequently imposed on the target distribution.
An interesting example is the following: consider the process of inferring probability distributions from a set of measurement data. These data play the role of the constraints—they put restrictions on what the true distribution could be—and the available data may not suffice to uniquely determine a probability distribution. In this situation, a common approach is to invoke Jaynes’ maximum entropy principle (MaxEnt) [Jay57]. In essence, MaxEnt advocates that the selected distribution be the one that simultaneously maximizes entropy and satisfies the given constraints.
However, the situation becomes more nuanced if we already possess some knowledge of the system, say, a prior distribution. In such cases, a more refined strategy emerges: the minimum relative entropy principle. As expounded in [SJ80, OP07, ZTF13], this principle, regarded as a generalization of MaxEnt, operates by minimizing the distinguishability (characterized by the relative entropy) between the prior distribution and the distribution to be selected, while respecting the imposed constraints. This systematic approach to incorporating new data makes it fundamental in Bayesian statistics. The updating procedure results in the posterior distribution which reflects the most current understanding of the system in light of the observed data.
In the case when the measurement data is presented in the form of expectation values of selected random variables, the solution to the corresponding relative entropy minimization problem takes the form known as an Esscher Transform. Named after Swedish mathematician and economist Fredrik Esscher, who introduced the concept in 1932 in his work on risk theory [Esc32], the Esscher Transform, also known as ‘exponential tilting’ in statistics, and its various extensions have since then found many applications beyond minimizing relative entropy. Notable examples include option pricing (in mathematical finance) [GS93], importance sampling (for rare-event simulation) [Sie76] and Lévy processes (in financial economics) [HS06]. More recently, it has also made inroads into machine learning [BSS23], in the context of empirical risk minimization.
In this paper, we discuss the extension of the above problem to the quantum setting. We consider the following optimization problem:
(1.1) | |||||
s.t. | |||||
where is the a priori state and , are observables. Refer to Definition 2.4 for the precise formulation. In the first part of this work, we show the formal solution to this constrained optimization problem. The solution methodology is modelled after its classical predecessor, albeit with added technical intricacies to manage. The form of the corresponding solution then motivates us to define the quantum Esscher Transform, see Definition 2.8. The proof of the solution to the optimization problem is found in Theorem 2.5. The quantum Esscher Transform can be viewed as a generalization of the (classical) Esscher Transform, and indeed subsumes the latter as a special case. In the second part of this work, with an eye toward potential applications, we discuss the implementation of the quantum Esscher Transform on fault-tolerant quantum computers. Our algorithm is based on the modern techniques of block-encoding and the quantum singular value transformation (QSVT) [GSLW19, MRTC21]. As an input model we consider purifications of the density operator and block-encodings of the operators . The main algorithm is Algorithm 1, whose complexity is discussed in Theorem 4.3. The quantum Esscher transform could find applications in quantum analogues of problems in statistics, machine learning, and finance.
1.1 Preliminaries and notation
We define the following notations. Let be the set of positive natural numbers. For , . Here , , and refer to the spectral, -, - and trace norms respectively. The symbol denotes component-wise product, e.g. for vectors , for matrices . Throughout this paper, will be base . For convenience, when calculus is involved we shall differentiate as if it were base . For a matrix we write to mean the eigenvalues of are in . Thus, means is positive semidefinite. We denote a Hilbert space by , if its dimension is to be explicitly specified, the set of linear operators on by , and the set of density operators on by . Let . The kernel of is and the support of is . Note that . denotes the -qubit identity operator, i.e. it is of size . We use to hide polylog factors, i.e., . We use to define expression in terms of .
A probability space is denoted by , where is the sample space, is the -algebra over , and is the probability measure on . While all the discussions in our work are well-defined for general probability spaces, for our purposes we shall restrict our discussion to finite sample spaces, i.e., , and set . In this setting, can be viewed as a -dimensional vector residing in the hypercube , with components , and normalization . Note that technically, a probability measure is a function on the -algebra , not . Since we are dealing with a finite sample space here, knowing for all gives us full knowledge of , from the additivity property of measures. Thus we can and shall simply view as a function on and write in place of . Finally, given probability measures and , we say is absolutely continuous with respect to (written ) if for all .
2 Quantum Esscher Transform
2.1 Esscher Transform
The Esscher Transform was first defined by F. Esscher in his work on risk theory [Esc32]. Let be a probability mass function, where and . The function is also a probability mass function, and it is called the Esscher Transform of with parameter . We can replace probability mass functions with probability density functions (accordingly, ).
The Esscher Transform is a map from and onto the space of probability mass/density functions, as . In this work, we never invoke and simply call the Esscher Transform of , in the same spirit as the Fourier Transform. In the context of probability theory, let be a probability space and a random dimensional vector. This setting motivates the equivalent definition (see Remark 2.3 below) of Esscher Transforms for measures/distributions.
Definition 2.1 (Esscher Transform for probability distributions).
Given a probability distribution on a finite sample space , a random variable and . The probability distribution
is called the Esscher Transform of with parameter , with respect to . For brevity, we say is the -Esscher Transform of .
This definition is connected to the following problem. Fix . When and how can we derive from another probability measure such that the expectation of with respect to , is equal to ? Among such probability measures, if they exist, how can we find the one that is closest (in some sense) to ? Take as a measure of closeness the relative entropy between and ,
The definition of requires that be absolutely continuous with respect to , otherwise . Without loss of generality, we can assume is strictly positive on . If this were not so, then let denote the subset on which . Since is absolutely continuous w.r.t. , we have , so we are reduced to an ‘effective ’ on which is strictly positive. The aforementioned question can then be cast as an optimization problem with multiple constraints:
(2.1) | |||||
s.t. | |||||
Note that there are constraints on , hence in feasible, non-redundant cases we have . We have the following solution to the optimization problem.
Theorem 2.2.
The proof is elaborated in Appendix A.
Remark 2.3.
Let us comment on a subtlety. Above, we have called the Esscher Transform of . Recall that the Esscher Transform as originally defined by Esscher pertains to probability mass/density functions instead of measures. Here we show that using the same terminology for probability measures is well-justified (at least for the case when is discrete). The random variable induces from the probability measure the probability mass function on . Assume we have, for probability measures and random variable , that
Then for the probability mass functions and we have
i.e., is the Esscher Transform of as defined above.
2.2 Quantum version
2.2.1 Problem statement
Many entities in classical probability theory have meaningful generalizations in quantum theory. For example, sample spaces, probability distributions and random variables find their respective counterparts in Hilbert spaces, density operators and observables (the latter also include the former as special instances). The quantum counterpart of the relative entropy is the quantum relative entropy,
defined for density operators . As in the classical case, the definition of imposes constraints on and in order to have . Namely, (see Chapter 11, [Wil13]) or equivalently, . Using terminology from measure theory, if this condition is satisfied we say is absolutely continuous with respect to (). This is analogous to the absolute continuity between probability distributions in classical probability theory. Now we formally state the quantized version of Problem 2.1.
Problem 2.4.
Let be an -dimensional Hilbert space and be a density operator. With , for , let be an observable with and denoting its smallest and largest eigenvalue respectively. For with , solve
(2.2) | |||||
s.t. | |||||
Here denotes a generic eigenvalue of . Note that because are Hermitian, is real. As before, we require , otherwise the constraints cannot be satisfied. Finally, we can assume WLOG that . This amounts to dividing the constraint throughout by if necessary.
2.2.2 Solution
Before delving into the solution, let us briefly comment on a few possible concerns. First, requires taking the logarithm of , which poses a problem if is not strictly positive definite. This issue is circumvented if, as mentioned above, . The analysis becomes relatively straightforward if we partition the Hilbert space into suitable subspaces and examine over them separately. To this end, we introduce the following notation. Let be a subspace of . For , denote , where is the projector onto .
Second, as in the classical case, we hope to solve this optimization problem using Lagrange multipliers. With a fixed , is a real-valued function of complex matrices. How do we optimize such functions? In principle we could convert everything into real numbers—, so we could view as a function of real parameters and implement conventional optimization methods. However, this conversion is generally tedious, and the resulting expression for cumbersome. The ‘Wirtinger Calculus’ provides a relatively simple methodology for the optimization of such functions, through the use of ‘Wirtinger derivatives’. We state the main definitions and results of this framework in Appendix B.
We have the following result, which partially resolves Problem 2.4:
Theorem 2.5.
The solution to Problem 2.4 takes the form
(2.3) |
where
(2.4) |
The optimal values are to be determined from the constraints
(2.5) |
Proof.
To facilitate the presentation of the solution, certain parts of the argument sequence are collated into lemmas and placed below the main body of this proof.
Step 1. First, for any candidate solution we enforce . By Lemma 2.6, this implies and furthermore enables the decomposition of into a direct sum: . With this decomposition, we can consider the trace of the operators over just the subspace . More specifically, 111Recall that for any , , so . and
Thus, we can replace in Problem 2.4 by , and the operators by their restrictions to . Note that is positive definite.
Step 2. Next we obtain the form of . For ease of presentation let us simply denote by . With now positive definite, is well-defined. Now we invoke Proposition B.1 to extract the optimal by setting .
Set up the Lagrangian
(2.6) |
where and are the Lagrange multipliers. Making use of Propositions B.2 and B.3, setting to zero gives
It remains to determine from the constraints . Plugging in the above expression for into the constraints we have
Step 3. Now we show that as given in Eq. 2.4 indeed minimizes . But this follows easily from Lemma 2.7. Furthermore, since is a strictly convex functional of , it can have at most one minimizer in the convex set , thereby showing the uniqueness of . Finally, again by Lemma 2.7 we note that satisfies , where the last equality holds because and share the same minimum/maximum points, provided at those points. ∎
Lemma 2.6.
Let be normal operators, so that they have spectral decompositions. If , then and can be partitioned into a direct sum:
Proof.
Expand in terms of the eigenbasis of , . Let be the index subset such that , so . We have
where the annihilation of the last three terms comes about because for , .
Note that the partition of an operator into a direct sum over another operator’s ker and supp subspaces does not hold in general. ∎
The following lemma is the quantized version of Lemma A.1. We employ analogous arguments and notation, starting with
Lemma 2.7.
Let and be observables on . Fix . Then for any density operator satisfying , we have
(2.7) |
Moreover the inequality is saturated if for some :
(2.8) |
Proof.
Each gives rise to a corresponding (note that need not be in ). Then for any satisfying , we have
(2.9) | |||||
Since this holds for all , we conclude that . Furthermore, if is such that , then letting and rerunning the same argument sequence above gives
In particular, this also shows that . ∎
Motivated by the form of the state in Theorem 2.5, we make the following definition:
Definition 2.8 (Quantum Esscher Transform).
Given a density operator , observables and . The density operator
is called the -quantum Esscher transform of .
Remark 2.9.
The state in Theorem 2.5 is thus a -quantum Esscher transform of . Also note that the quantum Esscher transform subsumes the classical Esscher transform as a special case, wherein are diagonal and thus commute.
2.2.3 Connection to quantum imaginary time evolution
Quantum imaginary-time evolution (QITE) is a conceptual tool which relates to the finding of ground states of Hamiltonians [MJE19, MST20]. From the real-time Schrödinger equation one obtains the imaginary-time Schrödinger equation by performing a Wick rotation, i.e. . For general mixed states , the imaginary-time Liouville-von Neumann equation [BK91] is given by
(2.10) |
from which the solution is derived as
(2.11) |
where is the normalisation factor.
In [OP07] it was asserted that under certain conditions, namely ‘when the prior and posterior states are close to each other with respect to the Fisher information metric’, the minimizing relative entropy problem could be solved by formally integrating a ‘quantum trajectory’ equation [OP07, Bra96]. This equation takes on the same form as Eq. 2.10, and thus its solution is given by Eq. 2.11. More specifically, we have
where are the Lagrange multipliers. Here we simply observe that resembles the imaginary-time-evolved state in Eq. (2.11)
if is one-dimensional and after making the substitution . Since the quantum Esscher transform provides an exact solution to the problem, under the aforementioned condition we note
the connection between the quantum Esscher transform and QITE.
Next, we discuss how to implement the quantum Esscher Transform on quantum computers using modern techniques based on block-encodings (BE) and the quantum singular value transformation (QSVT). Before doing so we collate the relevant tools and techniques of the framework in the next section.
3 Overview on block-encodings and quantum singular value transformations
The technique of quantum signal processing [LYC16] and its lifting, via ‘qubitization’, to quantum singular value transformation (QSVT) [LC19, GSLW19] provide a concise way to formulate quantum algorithms, particularly for linear algebraic tasks. This framework has provided more efficient implementations of several existing quantum algorithms, such as Hamiltonian simulation [LC17, LC19], amplitude amplification and estimation [GSLW19, RF23] and quantum linear systems solving [GSLW19], and even led to the discovery of new algorithms. For our purposes, we do not actually need the full generality of QSVT. As our matrices of interest are Hermitian and thus admit spectral decompositions, a relaxed version of QSVT—quantum eigenvalue transformation (QET)—suffices. We direct readers interested in learning more about QSVT to [GSLW19, MRTC21, DMB23].
Definition 3.1 (Block-Encoding).
Let be an -qubit matrix, and . We say that the -qubit unitary is an -block-encoding of if
Remark 3.2.
Note that if is an -BE of , then equivalently it is a -BE of . Also, if we have a -BE of then we also have a -BE of , where and . Making the increment simply corresponds to tacking on an extra -qubit identity operator . More specifically, if is an -BE of then is an -BE of , since
Finally, if is already an error bound, clearly serves as another error bound, albeit a weaker one.
[GSLW19] provides a construction of exact block-encodings for density operators, assuming access to oracles which prepare the purifications of the density operators:
Definition 3.3 (Purified quantum query-access).
Let be an -qubit density operator. We say has purified quantum query-access if we have access to a -qubit unitary operator , where
prepares , the purification of (i.e. ) with the help of ancilla qubits.222Theoretically, any -qubit quantum state can be purified with at most ancilla qubits, so one can assume . In practice however, it could be more convenient to use more than ancillas for purification. Thus we make the more relaxed assumption that .
Proposition 3.4 (Block-encoding of density operators – Lemma 45, [GSLW19]).
Let be an -qubit density operator with purified quantum query-access via . Then is a -BE of .
For general matrices which need not be density operators, [CGJ18, GSLW19] also showed how to implement their block-encodings efficiently, assuming the existence of quantum random access memory (QRAM) [GLM08]. Given block-encodings of operators , we can construct block-encodings of their linear combinations and products. For linear combinations, we make use of an auxiliary tool known as a ‘state preparation pair’. Recall that is the /Manhattan norm.
Definition 3.5 (State Preparation Pair).
Let and . The pair of unitaries is called a ()-state-preparation-pair for if
such that and for .
One can think of a state preparation pair as encoding the desired state/vector in the first elements of a length- column vector whose elements are , up to an error of . The role of is to take care of normalization.
Proposition 3.6 (Linear combination of block-encoded matrices – Lemma 52, [GSLW19]).
Let
-
i.
be -qubit operators with respective ()-BEs ,
-
ii.
for ,
-
iii.
be a -state-preparation-pair for .
Then there exists a -BE of , given by
where
is a -qubit unitary.
In Proposition 3.6, the subnormalization factors of the ’s are to be the same. Later on, we will need a slight generalization of the above result whereby this requirement is dropped.
Proposition 3.7 (Generalized linear combination of block-encoded matrices).
Let
-
i.
be -qubit operators with respective ()-BEs for ,
-
ii.
for ,
-
iii.
be a -state-preparation-pair for .
Then there exists a -BE of , given by
where
is a -qubit unitary.
Proof.
The following is adapted from the proof of Lemma 52, [GSLW19]. By definition of state-preparation pairs (see Definition 3.5), and such that . First we evaluate the block extraction of . We have
In going from the first equality to the second, we have made use of the fact that for state preparation pairs for . The second summand in is thus annihilated. Therefore,
where the last inequality was obtained using . ∎
Remark 3.8.
In the special case where the block-encodings of the ’s have the same subnormalization factors, i.e., for all , we recover Proposition 3.6 from Proposition 3.7 . To see this, observe that if is a -state-preparation-pair for , then , thus implying is a -state-preparation-pair for . According to Proposition 3.6, is then a -BE of . This is in agreement with Proposition 3.7.
We now arrive at a milestone within the QSVT framework. Namely, the ability to implement block-encodings of polynomials of a matrix from a given block-encoding of the matrix. In many applications however, the functions of interest are not polynomials. In such cases, one has to first approximate the desired function by a polynomial in order to apply QSVT/QET.
Theorem 3.9 (Polynomial Eigenvalue Transformation – Theorem 56, [GSLW19]).
Let be an -encoding of a Hermitian matrix (equivalently, a -encoding of ) and be a degree- polynomial satisfying on . Then, one can construct a quantum circuit which is a -encoding of . consists of and gates, one controlled-, and other one- and two-qubit gates.
Proposition 3.10 (Bounded Polynomial Approximation – Corollary 66, [GSLW19]).
Let , , and let be such that for all . Suppose is such that . Let , then there is an efficiently computable polynomial of degree such that
(3.1) | ||||
(3.2) | ||||
(3.3) |
If we choose sufficiently large such that , then we also have an -independent bound on : .
Theorem 3.9 and Proposition 3.10 are to be used in conjunction to produce block-encodings of general functions of Hermitian matrices. In doing so, we first note that Theorem 3.9 produces an encoding of , not . Thus, with a polynomial approximation of , say , it is generally not true that . What we need is a polynomial approximation not of , but of a (horizontally) scaled version of , , so that . Second, we also have to take into account the polynomial approximation error incurred in producing the final desired block encoding . We take care of these matters in Corollary 3.11, which, given the block-encoding of an arbitrary Hermitian matrix , produces a block-encoding of , where is a generic real-valued function.
Corollary 3.11 (Block-encoding functions of general Hermitian matrices).
Given
-
i.
A Hermitian matrix , and , an -encoding of .
-
ii.
, a smooth function on an open interval containing . Assume the function satisfies the conditions in Proposition 3.10 with and series-of-coefficients bound .
-
iii.
Polynomial approximation error tolerance for : .
Then there exists a quantum circuit which is a -encoding of . The construction of makes queries to .
Proof.
First, . Define the scaling map , so that under this map . By assumption on there exists , , such that (i.) , (ii.) on and (iii.) for some .
By Proposition 3.10, given polynomial approximation error tolerance there exists a polynomial of degree which -approximates on and is bounded above by on . Since , we have
In order to apply Theorem 3.9, our polynomial has to be real and upper-bounded by on . Observe that for any complex-valued function and domain ,
Since itself is real-valued, is qualified to assume the role of in Proposition 3.10. That is, the real polynomial also -approximates on and is bounded above by on . Thus, letting in Theorem 3.9 we obtain , a -encoding of , where . Putting these together and noting that , we have
Thus, choosing gives us a -encoding of . ∎
4 Implementation on quantum computers
In this section, we provide a quantum algorithm implementing the quantum Esscher Transform, based on block-encodings and QSVT. We assume the inputs come in the form of block-encodings. Our algorithm outputs the Esscher-transformed state in block-encoded form (and subsequent translations to the physical state itself).
Reference [GSLW19] demonstrates how to construct block-encodings for density operators within the purified quantum query-access model (see Definition 3.3 and Proposition 3.4 above). For the Hermitian operators which are generally not density operators, their block-encodings can be constructed efficiently for many physical Hamiltonians, or if the ’s are stored in sparse data structures or KP trees. Along the way we shall also need as an auxiliary tool ‘state-preparation pairs’ (see Definition 3.5), to prepare linear combinations of the Hamiltonians. We assume immediate access to these, as we do for block-encodings. For the construction of state-preparation pairs, one can refer to [vAG18].
4.1 Technical lemmas
The logarithm of the density matrix is a key ingredient of the quantum Esscher transform. Here we provide a technical lemma on constructing a block-encoding of the logarithm of a density matrix from the block-encoding of that matrix.
Lemma 4.1 (Block-encoding of ).
Given , a -BE of an -qubit density operator , where , and polynomial approximation error tolerance . Then we have a -BE of , the construction of which makes queries to .
Proof.
First we construct a polynomial approximation of . More specifically, we check that the function satisfies the conditions of Proposition 3.10, with the appropriate and . Corollary 3.11 then gives us the desired block-encoding.
The following derivation is based on the proof of Corollary 67, [GSLW19] and Lemma 11, [GL19]. Negative power functions share with the common property of going to infinity as approaches , thus the Taylor expansions of these functions are performed about . Choose , and . The Taylor series of about is . With , the series-of-coefficients bound in Proposition 3.10 is
Corollary 3.11 gives us the unitary , which is a -encoding of , which can be constructed using queries to . ∎
Next, we provide a lemma to construct the block-encoding of an exponentiated matrix from the block-encoding of that matrix.
Lemma 4.2 (Block-encoding of ).
Given , a -BE of and polynomial approximation error tolerance , there is a -BE of , constructible using queries to . Here
4.2 Algorithm
We now provide the algorithm implementing the quantum Esscher transform, see Algorithm 1. We specify the constraints on the inputs and the guarantees on the output in the algorithm itself. A step-by-step analysis of Algorithm 1 is provided below in detail, whereafter the overall (query) complexity is stated. We summarize these information in Theorem 4.3.
Theorem 4.3.
Proof of Theorem 4.3.
Now we analyze the steps of Algorithm 1 in more detail to give the query complexity of QEsscher().
-
Step 1. From Proposition 3.4 we construct , a -BE of . This makes queries to .
-
Step 2. This step entails a polynomial approximation to the logarithm function on the interval . Denote by the approximation error tolerance. Choose . Lemma 4.1 gives , a -BE of . The construction of makes queries to , where is the degree of the approximating polynomial (see Proposition 3.10/Corollary 3.11).
-
Step 4. Now we make use of our access to the state-preparation-pair . To form linear combinations of block-encodings, the number of ancilla qubits required for each constituent block-encoding should be the same, see Proposition 3.6/3.7. Remark 3.2 shows that we can always equalize this number of ancilla qubits by padding with additional ancillas. The equalized number of ancillas is . We could also take , but we want to minimize the number of ancilla qubits. From Proposition 3.7 we get , a -BE of , making 1 query to and 1 query to and each .
-
Step 5. Finally, we construct a block-encoding for . At this stage, we have a -BE of . Lemma 4.2 gives a -BE of (thus ), where . It remains to make judicious choices for (note that the at this step need not be the same as the one in Step 2) and in order to ensure the overall block-encoding error is less than , i.e.
(4.1) Now given a sufficently small such that , choose and
These choices ensure Equation 4.1 is satisfied. Note that , so as . The degree of the approximating polynomial, and thus the number of queries to required, is . Recall that constructing itself makes 1 query to and each . Lastly, observe that , so is a valid subnormalization factor.
Overall complexity: makes queries to . queries and each exactly once, and in turn makes queries to . Accordingly, the implementation of makes
queries to and queries to each , thus
queries to , the constraint operators collectively considered. ∎
4.3 Further discussion
If the positive definite is full rank, the condition number is since the eigenvalue lower bound must be . Then the -query complexity grows at least linearly with . Hence, our Esscher transform is most relevant for low-rank cases. Assume we have non-zero eigenvalues . As a consequence holds. While the condition number can still be exponential if the smallest eigenvalue is exponentially small, when the smallest eigenvalue is , we obtain a well-behaved query complexity. In addition we can allow for smaller eigenvalues, especially when we are interested only in low-rank approximations of the Esscher transform. Let , with the effective condition number . With slight adaptations, our method can implement the Esscher transform on the effectively well-conditioned subspace, while leaving the other part undefined. This incurs an error compared to the full Esscher transform proportional to the importance of the neglected eigenvalues, but may be acceptable in many practical situations. Recall that low-rank approximations are frequently performed in statistics and machine learning.
If the desired output model is a normalized state, one can apply similar techniques for Gibbs sampling to extract the normalized Esscher-transformed state from the output of Algorithm 1. We briefly describe this procedure and the overhead cost it incurs. More details can be found in Chapter 3 of [Gil19]. Let denote the desired precision in trace distance between our approximate output and the ideal state. First, we prepare a maximally entangled state on two registers. Use Algorithm 1 to construct a -block-encoding of where , with block-encoding error . Then apply to the second register to obtain a state , so that tracing out the first register yields an approximate subnormalized state with trace distance error of . That is,
With , this state, when postselected after steps of fixed-point amplitude amplification (refer to Theorem 27 in [GSLW19]), results in a density operator -close to the normalized Esscher-transformed state
in trace distance. Taking this overhead cost into account and assuming is sufficiently small (such that the block-encoding error satisfies ), the total query complexity of preparing the approximate Esscher-transformed state is
5 Conclusion
In this paper, we considered a minimum relative entropy problem for the density operator subject to equality constraints. We formally solved this problem and the solution form inspired us to define the Quantum Esscher Transform (QUEST), a generalization of the classical Esscher transform to the quantum setting. We discussed its implementation on fault-tolerant quantum computers, leveraging techniques based on the QSVT framework. Given as inputs block-encodings of the initial quantum state and the constraint operators, the algorithm outputs an -approximate block-encoding of the Esscher-transformed state with -query complexity
and -query complexity
Several avenues remain open for future work:
-
•
Is there a quantum algorithmic framework that can fully solve the minimum relative entropy problem? Our current approach only presents the formal solution for the optimal parameter . Approaches such as Newton’s algorithm with backtracking was suggested in [ZTF13], the quantized version of which could be studied. Additionally, [AAKS20] demonstrated that can, in principle, be found with a convex optimization program. Can we design a quantum algorithm to effectively address this problem?
-
•
One could explore strategies for alternative input models. Our current work exclusively considered the purified access model, wherein the preparation of the purification of the input state was assumed. In contrast, the sampling access model, which assumes multiple independent copies of the input state, is another commonly used model. Gilyén et al. [GP22] has proposed an approach to implement approximate block-encodings of , starting with sample access. This approach is based on a combination of density matrix exponentiation [LMR14, KLL17] and QSVT, and allows us to implement the quantum Esscher transform in the sampling access model. We leave the total cost of this procedure for further analysis.
-
•
In Section 2.2.3, we noted potential connections between the quantum Esscher transform and imaginary-time evolution. To give these substance, further investigation is required.
-
•
Various applications could be envisioned for the quantum Esscher transform. Its classical version has found usage for numerous problems in domains such as statistics, machine learning, and finance. These problems have quantum analogues, which could benefit from the quantum Esscher transform and its implementation on quantum computers.
Acknowledgments
The authors would like to thank Po-Wei Huang, Xiufan Li, Zhan Yu, Serge Massar and Roberto Rubboli for helpful discussions. This work is supported by the National Research Foundation, Singapore, and A*STAR under its CQT Bridging Grant and its Quantum Engineering Programme under grant NRF2021-QEP2-02-P05. KK acknowledges support from Leong Chuan Kwek, under project grant R-710-000-007-135.
Appendix A Proof of Theorem 2.2
Before delving into the proof, we introduce some notation and state a lemma to facilitate its presentation. The exponential family of with respect to the random variable is the set of measures
Also, let
Lemma A.1.
(Proposition 3.24 – [FS11]) Let be a probability measure on and be a random variable on . Fix . Then for any probability measure on satisfying , we have
(A.1) |
Moreover the inequality is saturated if for some :
(A.2) |
Proof.
Each gives rise to a corresponding (note that need not be in ). Then for any arbitrary , we have
Since this holds for all , we conclude that . Furthermore, if is such that , then letting and rerunning the same argument sequence above gives
∎
Proof of Theorem 2.2.
First, we have required because otherwise the constraints cannot be satisfied. The Lagrangian function is
Setting the first-order derivatives of with respect to to zero gives
where is to be determined from the constraints :
(A.4) | ||||
The last equivalence holds because and share the same minimum/maximum points, provided at those points. It remains to show indeed minimizes , subject to the constraints . But this follows easily from Lemma A.1. Furthermore, since is a strictly convex function, is a strictly convex functional of and so it can have at most one minimizer in the convex set , thereby showing the uniqueness of . Finally, again using Lemma A.1 we have . ∎
Appendix B Wirtinger Calculus
The ‘Wirtinger Calculus’ provides a methodology for optimization problems involving complex matrices. It enables ‘differentiation as usual’ with respect to complex matrices. In this appendix, we state only the main definitions and results needed to solve Problem 2.4. For a more thorough exposition of this framework, we direct the reader to [KQKR23, Hjø11, KD09].
Consider functions of the form . Since is endowed with the multiplication operation , we can view
For regard as functions from to , where and .333The notations may raise questions on independence. This is irrelevant—one may simply write if one wishes. We emphasize that (for each ) the fundamental input variables are the two real numbers and . Then we have a function such that
(B.1) |
Partial differentiating with respect to each and , and then rearranging terms, we have for
(B.2) | ||||
To preserve the matrix structure of the parameters and we use the standard notation
(B.9) |
and similarly for and . Then Equation B.2 is concisely stated as
(B.10) | ||||
and are the matrix Wirtinger derivatives of . Often, we abuse notation and write both and , so we can write
(B.11) |
The following three propositions are all we need in this paper. We omit their proofs, which can all be found in [KQKR23].
Proposition B.1.
Let be a real-valued function of complex matrices. Then has a stationary point at if and only if
Whether the solution of the above equation actually gives a minimum/maximum/saddle point has to be checked via additional considerations or by inspecting higher-order derivatives.
Proposition B.2.
Let be a complex, unstructured (see below) matrix and be analytic. Define the scalar function . Then
where is the complex derivative of .
So far, by writing we have implicitly assumed the input matrices have independent components (we call such matrices ‘unstructured’). This condition often does not hold, e.g. when our matrices of interest are symmetric/Hermitian etc. To obtain the correct Wirtinger derivatives with respect to structured matrices, we resort to the chain rule.
Proposition B.3 (Wirtinger derivatives with respect to Hermitian matrices).
Let be a function of complex Hermitian matrices. Then the Wirtinger derivatives of with respect to are given by
Here, the tildes above indicate that they are unstructured matrices. Thus, to derive the Wirtinger derivatives with respect to Hermitian matrices, first obtain the Wirtinger derivative of , assuming the inputs are unstructured. Then form the correct expressions given above and reinstate the structured matrices as the arguments.
References
- [AAKS20] Anurag Anshu, Srinivasan Arunachalam, Tomotaka Kuwahara, and Mehdi Soleimanifar. Sample-efficient learning of quantum many-body systems. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 685–691. IEEE, 2020.
- [BCC15] Dominic W Berry, Andrew M Childs, Richard Cleve, Robin Kothari, and Rolando D Somma. Simulating hamiltonian dynamics with a truncated taylor series. Physical review letters, 114(9):090502, 2015.
- [BK91] Michael Berman and Ronnie Kosloff. Time-dependent solution of the liouville-von neumann equation: Non-dissipative evolution. Computer physics communications, 63(1-3):1–20, 1991.
- [Bra96] Samuel L Braunstein. Geometry of quantum inference. Physics Letters A, 219(3-4):169–174, 1996.
- [BSS23] Ahmad Beirami, Maziar Sanjabi, and Virginia Smith. On tilted losses in machine learning: Theory and applications. Journal of Machine Learning Research, 24:1–79, 2023.
- [CGJ18] Shantanav Chakraborty, András Gilyén, and Stacey Jeffery. The power of block-encoded matrix powers: improved regression techniques via faster hamiltonian simulation. arXiv preprint arXiv:1804.01973, 2018.
- [DMB23] Alexander M Dalzell, Sam McArdle, Mario Berta, Przemyslaw Bienias, Chi-Fang Chen, András Gilyén, Connor T Hann, Michael J Kastoryano, Emil T Khabiboulline, Aleksander Kubica, et al. Quantum algorithms: A survey of applications and end-to-end complexities. arXiv preprint arXiv:2310.03011, 2023.
- [Esc32] F Escher. On the probability function in the collective theory of risk. Skand. Aktuarie Tidskr., 15:175–195, 1932.
- [FS11] Hans Föllmer and Alexander Schied. Stochastic finance: an introduction in discrete time. Walter de Gruyter, 2011.
- [Gil19] András Gilyén. Quantum singular value transformation & its algorithmic applications. PhD thesis, University of Amsterdam, 2019.
- [GL19] András Gilyén and Tongyang Li. Distributional property testing in a quantum world. arXiv preprint arXiv:1902.00814, 2019.
- [GLM08] Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. Quantum random access memory. Physical review letters, 100(16):160501, 2008.
- [GP22] András Gilyén and Alexander Poremba. Improved quantum algorithms for fidelity estimation. arXiv preprint arXiv:2203.15993, 2022.
- [GS93] Hans U Gerber, Elias SW Shiu, et al. Option pricing by Esscher transforms. HEC Ecole des hautes études commerciales, 1993.
- [GSLW19] András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 193–204, 2019.
- [Hjø11] Are Hjørungnes. Complex-valued matrix derivatives: with applications in signal processing and communications. Cambridge University Press, 2011.
- [HS06] Friedrich Hubalek and Carlo Sgarra. Esscher transforms and the minimal entropy martingale measure for exponential lévy models. Quantitative finance, 6(02):125–145, 2006.
- [Jay57] Edwin T Jaynes. Information theory and statistical mechanics. Physical review, 106(4):620, 1957.
- [KD09] Ken Kreutz-Delgado. The complex gradient operator and the cr-calculus. arXiv preprint arXiv:0906.4835, 2009.
- [KLL17] Shelby Kimmel, Cedric Yen-Yu Lin, Guang Hao Low, Maris Ozols, and Theodore J Yoder. Hamiltonian simulation with optimal sample complexity. npj Quantum Information, 3(1):13, 2017.
- [KQKR23] Kelvin Koor, Yixian Qiu, Leong Chuan Kwek, and Patrick Rebentrost. A short tutorial on Wirtinger Calculus with applications in quantum information. arXiv preprint arXiv:2312.04858, 2023.
- [LC17] Guang Hao Low and Isaac L Chuang. Optimal hamiltonian simulation by quantum signal processing. Physical review letters, 118(1):010501, 2017.
- [LC19] Guang Hao Low and Isaac L Chuang. Hamiltonian simulation by qubitization. Quantum, 3:163, 2019.
- [LMR14] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. Quantum principal component analysis. Nature Physics, 10(9):631–633, 2014.
- [LYC16] Guang Hao Low, Theodore J Yoder, and Isaac L Chuang. Methodology of resonant equiangular composite quantum gates. Physical Review X, 6(4):041067, 2016.
- [MJE19] Sam McArdle, Tyson Jones, Suguru Endo, Ying Li, Simon C Benjamin, and Xiao Yuan. Variational ansatz-based quantum simulation of imaginary time evolution. npj Quantum Information, 5(1):75, 2019.
- [MRTC21] John M Martyn, Zane M Rossi, Andrew K Tan, and Isaac L Chuang. Grand unification of quantum algorithms. PRX Quantum, 2(4):040203, 2021.
- [MST20] Mario Motta, Chong Sun, Adrian TK Tan, Matthew J O’Rourke, Erika Ye, Austin J Minnich, Fernando GSL Brandao, and Garnet Kin-Lic Chan. Determining eigenstates and thermal states on a quantum computer using quantum imaginary time evolution. Nature Physics, 16(2):205–210, 2020.
- [OP07] Stefano Olivares and Matteo GA Paris. Quantum estimation via the minimum kullback entropy principle. Physical Review A, 76(4):042120, 2007.
- [RF23] Patrick Rall and Bryce Fuller. Amplitude estimation from quantum signal processing. Quantum, 7:937, 2023.
- [Sie76] David Siegmund. Importance sampling in the monte carlo study of sequential tests. The Annals of Statistics, pages 673–684, 1976.
- [SJ80] John Shore and Rodney Johnson. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Transactions on information theory, 26(1):26–37, 1980.
- [vAG18] Joran van Apeldoorn and András Gilyén. Improvements in quantum sdp-solving with applications. arXiv preprint arXiv:1804.05058, 2018.
- [Wil13] Mark M Wilde. Quantum information theory. Cambridge university press, 2013.
- [ZTF13] Mattia Zorzi, Francesco Ticozzi, and Augusto Ferrante. Minimum relative entropy for quantum estimation: Feasibility and general solution. IEEE transactions on information theory, 60(1):357–367, 2013.