Classical Bandit Algorithms for Entanglement Detection in Parameterized Qubit States
Abstract
Entanglement is a key resource for a wide range of tasks in quantum information and computing. Thus, verifying availability of this quantum resource is essential. Extensive research on entanglement detection has led to no-go theorems [1] that highlight the need for full state tomography (FST) in the absence of adaptive or joint measurements. Recent advancements, as proposed by [2], introduce a single-parameter family of entanglement witness measurements which are capable of conclusively detecting certain entangled states and only resort to FST when all witness measurements are inconclusive. We find a variety of realistic noisy two-qubit quantum states that yield conclusive results under this witness family. We solve the problem of detecting entanglement among quantum states in , of which states are entangled, with potentially unknown. We recognize a structural connection of this problem to the Bad Arm Identification problem in stochastic Multi-Armed Bandits (MAB). In contrast to existing quantum bandit frameworks, we establish a new correspondence tailored for entanglement detection and term it the -quantum Multi-Armed Bandit. We implement two well-known MAB policies for arbitrary states derived from , present theoretical guarantees on the measurement/sample complexity and demonstrate the practicality of the policies through numerical simulations. More broadly, this paper highlights the potential for employing classical machine learning techniques for quantum entanglement detection.
Index Terms:
quantum computing, quantum states, entanglement detection, FST, entanglement witness, multi-armed bandit, bad arm identificationI Introduction
The emergence of quantum information theory has changed our understanding of quantum entanglement, transforming it from a property of quantum states to a vital resource. Entanglement allows us to perform non-classical tasks, such as quantum communication, quantum teleportation, and quantum information processing, to name a few [3, 4, 5]. However, checking if a given unknown state is entangled can be highly non-trivial. The first issue is theoretical, even if one completely determines an unknown state via full state tomography (FST), checking if a known state is entangled can be hard. The second issue is practical, real-world laboratory conditions introduce imperfections and noise which make it difficult to carry out FST or directly test if an unknown state is entangled or separable.
There is a vast literature dedicated to FST (see [6, 7, 8, 9, 10, 11, 12] and references therein and also see [13, 14, 15, 16, 17] for machine learning based approaches). Using entangled measurements, one can carry out FST with almost optimal copy complexity [9, 18]. In practice, entangled measurements are harder to carry out and one does single copy measurements. From data generated by single copy measurements, one can recover the state being measured using a variety of techniques such as linear inversion, maximum likelihood estimation, and maximum a posteriori estimation [19, 20]. From the reconstructed state it is possible to ascertain whether the state is entangled or separable using well-known criterion (some are outlined in Sec. II-B). However, this FST method becomes impractical as the number of qubits in the quantum system are increased due to computational challenges and exponential scaling in number of measurements required. If one is interested in testing for entanglement, it may not be necessary in practice to carry out FST. Furthermore, the sample complexity for determining FST does not provide an obvious measurement/sample complexity for entanglement detection.
Entanglement can be assessed by measuring entanglement witnesses [21, 22, 23, 24]. These observables indicate the presence of some entangled states. Although no single witness can detect all entangled states, it is important to note that each witness measurement contributes information about the state. If entanglement is not detected by any of the witnesses, the information given by the witness can eventually facilitate FST. This FST can then be used to check for entanglement using standard tests. This insight has been effectively explored in [2], which constructs a set of measurements that serve simultaneously as entanglement witnesses, and also enable FST. For bipartite qubit systems, [2] proposes a measurement scheme that requires six witness operator measurements. Rather than merely determining the expectation value of the witness operator, one can measure the eigenbasis of a single-parameter family of witnesses. Based on the frequencies of these witness measurement outcomes, the authors formulate a criterion for separability , that yields non-negative values for all separable states and negative values for some entangled states. For entangled states that cannot be detected by this witness family, a tomographic reconstruction of the state can be performed (see Sections II-A and II-B for further details).
Given an eigenbasis , achieving high-precision estimation of is pivotal but requires measurements of numerous copies of the state, imposing a significant resource constraint. This challenge is further compounded in scenarios involving multiple (say ) states, among which states may be entangled. We see that performing FST for all states may be unnecessary for entanglement detection. In such instances where resource and time efficiency is paramount, the necessity for a large number of measurements for accurate estimation of parameters can be circumvented by identifying certain ‘winning’ trends dictated by sample data estimates and choosing when and how measurements need to be made. This fits neatly into the well-studied Multi-Armed Bandits (MAB) framework in classical machine learning.
The MAB setting tackles sequential decision-making problems faced with a finite set of options (arms), with each arm yielding stochastic rewards with unknown average rewards. Arm selections unfold iteratively in rounds, with a learner choosing arms based on a predefined policy. Following each selection, the learner receives a reward corresponding to the chosen arm, influencing subsequent decisions and possible policy adjustments. There are two main objectives of the MAB framework. The first balances exploration (finding high-reward arms) and exploitation (selecting the arm with the highest observed reward) to maximize cumulative rewards [25]. The second objective involves pure exploration with the goal of identifying the arm with the highest expected reward, i.e., Best Arm Identification (BAI) [26]. A variant of BAI called the -Good Arm Identification (GAI) problem ( unknown) has a goal of identifying ‘good’ arms (out of ) whose expected rewards lie above a specified threshold [27]. Equivalently, -Bad Arm Identification aims to identify ‘bad’ arms (out of ) whose expected rewards lie below a specified threshold. Two orthogonal parameters influence the performance of BAI policies: sample complexity and the probability of error in identifying the best arm. More details on MAB and BAI policies are in Section II-C.
The overarching goal of this paper is to utilize stochastic MAB policies to address the problem of entanglement detection, and to characterise the sample complexity for such an approach. The organisation and key contributions in this paper are as summarised below:
- •
-
•
In Section III, we highlight the key contribution of our paper, recognizing a structural connection between the separability criterion outlined in [2] and the Best Arm Identification (BAI) problem of stochastic Multi-Armed Bandits (MAB). Specifically, the -Bad Arm Identification problem corresponds to the -quantum Multi Armed Bandit problem ( potentially unknown) with the goal of identifying ‘bad’ arms and entangled states derived from , respectively.
-
•
Another significant contribution of our paper lies in achieving conclusive entanglement detection without the explicit need for FST for commonly seen noisy two-qubit states which we find to be in . In Section IV, we discuss two distinct MAB policies for entanglement detection based on Successive Elimination and Hybrid Dilemma of Confidence (Refer Section II-C). With well-defined confidence intervals, we demonstrate the correctness and characterise the sample complexity these policies.
-
•
In Section V-A, we present numerical results on the performance of the MAB policies for depolarised Bell states.
-
•
In Section V-B, we demonstrate the efficiency of the MAB policies and the WBMs in identifying general two-qubit entangled states and present numeric examples of pure and mixed two-qubit entangled states, where the single-parameter family of witnesses fail to provide conclusive results, thus necessitating FST.
II Preliminaries
Let be a finite dimensional Hilbert space with dimension . A pure quantum state is represented by a unit norm vector . Let be the space of linear operators on , the Frobenius inner product for any ), where represents conjugate transpose. A Hermitian operator satisfies . A density operator is Hermitian, positive semi-definite, , and has unit trace, ; it can represents both pure and mixed states. A positive operator value measure (POVM) is collection of positive operators that sum to the identity, . A POVM represents a measurement where corresponds to measurement outcome , but sometimes we compress this and just say is a measurement outcome.
Let and be finite-dimensional Hilbert spaces with dimensions and , respectively, and , where represents tensor product, be a bipartite Hilbert space with dimension . A density operator is called separable if it can be written as a convex combination of product states, that is,
(1) |
where such that and is a product of two pure states. We denote the set of all separable density operators by . Conversely, is entangled if it can not be written in the form (1). We discuss some preliminaries on entanglement witnesses and witness-based measurements in Section II-A, the various separability criteria for entanglement detection in Section II-B and the framework and background on stochastic multi-armed problems in Section II-C.
II-A Entanglement Witnesses and Witness Operators Measurements
Entanglement can be detected by measuring entanglement witnesses and can be defined as follows:
Definition 1 (Entanglement Witness).
An entanglement witness, denoted as , is a Hermitian operator that detects some entangled state such that,
(2) | |||
(3) |
Conceptually, a witness defines a hyperplane that delineates a set of entangled states it can detect from all other states. When comparing two arbitrary witnesses and , if is contained within , then is considered finer than . Further insights into this topology are detailed in [28, Lemma 1]. A witness is said to be optimal when no other witness is finer, suggesting that it touches the boundary of the convex set of separable states [29].
To improve the efficacy of identifying entangled states, [2] proposes a method to construct a set of measurements called Witness Operator Measurements (WOM), which we briefly discuss here. Let us consider the rank-one projector onto a pure entangled state denoted by , where . Here, the Schmidt coefficients and are arranged in non-increasing order as . Consequently, is chosen to adhere to this order.
In this paper, we consider the specific form of the witnesses from [2], namely, . That is, consider a rank-one POVM with outcomes such that and ’s are projectors onto pure states with outcomes. We can construct a WOM with outcomes where , where signifies a transpose operation on the second subsystem and is the largest eigenvalue across all s.
II-B Separability criteria for entanglement detection
Using FST techniques, briefly outlined earlier, one can do a tomographic reconstruction of the state and subsequently determine its entanglement status using well-known separability criteria. For bipartite qubit systems, the Peres-Horodecki criterion [30, 31] establishes that a density operator is separable if and only if the eigenvalues of its partial transpose are non-negative. This criterion remains necessary and sufficient even when and but is violated in higher dimensions by a class of entangled states with non-negative partial transposition. Other criteria include the range criterion [32], the matrix realignment criterion [33], the covariance matrix (CM) criterion [34], and additional methods discussed in [35, 36].
Another criterion for separability is obtained from the Witness Operator Measurements (WOMs) described in Section II-A, which are highly efficient for entanglement detection. We review this criterion from [2] next. Specifically, let us consider two-qubit witnesses of the form:
(4) |
where such that and . We denote the projectors onto the set of eigenstates of by . Each operator satisfies , , and , forming a Positive Operator-Valued Measure (POVM). Throughout the paper, we refer to this POVM as a Witness Basis Measurement (WBM).
Let us consider a quantum state . Let be the probability of obtaining outcome when the state is measured using WBM . The expected value of the witness can be expressed in terms of . If this expected value is less than a certain threshold (in our case, 0), we can conclude that is entangled else, this test is inconclusive. When this test is inconclusive, we pick the witnesses in Table I sequentially. These subsequent witnesses are obtained by applying unitary transformations and on each of the qubits to change in the eigenbasis of the underlying state as shown in (5).
(5) |
Witness | ||
---|---|---|
1 | ||
2 | ||
3 | ||
4 | ||
5 | ||
6 |
Expressing the eigenstates of the first witeness (4) in terms of Pauli operators yields three observables: , , and . Estimates for these three observables come from measuring the first witness. Similarly, the second witness listed in Table I yields estimates for , , and . Thus, for a pair of witnesses, we obtain estimates for five observables by applying suitable unitary transformations, and each of the other two witness pairs provides another five expectation values. In total, we obtain estimates for 15 expectation values, providing sufficient information about the two-qubit state. This, reduction of the number of witnesses from sixteen to six offers significant practical benefits. Instead of relying solely on comparing the expected value of the witness against a threshold, the authors [2] suggest adopting a more stringent criterion:
(6) |
which holds for all separable states and is violated by set of entangled states that can be detected by this family of witnesses. The above optimisation leads to the following quadratic WBM criterion,
(7) |
In essence, the process of measuring the linear entanglement witnesses corresponds to measuring the projectors onto the eigenstate basis. It is important to note that the value of (7) depends on the underlying WBM. Thus, for a WBM and state , we denote (7) as .
II-C Stochastic Multi-Armed Bandits
The stochastic Multi-Armed Bandit (MAB) framework is an archetype for many sequential decision-making problems. Within this framework, a bandit instance (problem instance) encompasses arms (or actions) situated in an environment where stochastic rewards are yielded upon the selection of an arm (termed pulling) or the execution of an action. We note that each arm is described by a probability distribution over , with known support and an unknown expectation . We denote the problem instance by . Arm selection occurs iteratively in rounds, where during each round , a learner (or agent) selects an arm according to a specified policy. Subsequently, the learner receives a stochastic reward corresponding to the selected arm. Upon receiving the reward, the learner can terminate the process or continue by updating its policy to pursue a specific objective.
In the MAB literature, two objectives have been focal points of study. The first objective involves maximizing the cumulative reward accumulated over multiple game rounds, necessitating a trade-off between exploration (discovering arms with potentially higher rewards) and exploitation (repeatedly pulling the arm with the highest observed reward). The second objective, termed the best arm identification (BAI) problem, focuses on pure exploration, where the learner aims to identify the arm with the highest expected reward, , i.e., (known as the best arm). A BAI policy (or algorithm) consists of a sampling rule for arm selection, a stop** rule to determine the end of exploration and a recommendation rule to output the best arm. The BAI problem has been explored in two distinct settings: fixed confidence and fixed budget. In the fixed confidence setting, the acceptance error is fixed, aiming to identify the best arm with a probability of at least while minimizing arm pulls. In the fixed-budget setting, the number of arm pulls (budget) is fixed, and the goal is to minimize the mis-identification probability of the best arm within the allotted budget. Our paper concentrates on the BAI problem, and one of its variants called good arm identification (GAI) in the fixed confidence setting. Below, we summarise some relevant findings from prior research.
II-C1 Fixed Confidence Best Arm Identification
Consider a problem instance denoted by . Without loss of generality, we can enumerate the arms based on their expected rewards, such that . We assume the existence of a unique best arm, denoted as . Here, we denote the sub-optimal gaps between the arms as . The learner’s objective is to accurately identify the best arm while minimizing the number of samples used. Policies that achieve this task are classified as -PC policies, as defined below.
Definition 2 (-PC).
Let be the estimate of the best arm at stoppage . Then, an algorithm is said to be -PAC if it satisfies,
(8) |
The primary objective is to characterize the expected stop** time of the BAI policy. Various research works have attempted to provide upper and lower bounds for this objective. For instance, the successive elimination procedure has been proposed to identify the best arm in samples [37]. In comparison, the Lower-Upper Confidence Bound algorithm (LUCB) improves upon this by requiring samples [38]. Additionally, the exponential-gap elimination algorithm achieves a sample complexity of , which is the best-known in the class of elimination-style policies for BAI under the fixed confidence setting [39]. These upper bounds exhibit a closeness to the lower bound postulated in [40], typically within a factor of or . Notably, the seminal findings of [41] which uses the principles of the Law of Iterated Logarithm (LIL), bridge this gap by delineating the necessity and sufficiency of samples for accurately identifying the best arm within a specified error margin of . Building upon this insight, [42] proposes lil’UCB, which leverages concentration bounds based on a finite version of the LIL, achieving order optimality in sample complexity akin to exponential-gap elimination.
II-C2 Fixed Confidence Good Arm Identification
Consider a problem instance . Alongside the acceptance error described in Section II-C1, we introduce a threshold and define the set of “good” arms as . In simpler terms, the good arms are those whose means are greater than or equal to . The number of good arms remains unknown to the agent, leading to what we term as the -GAI problem. Notably, the -GAI reduces to the BAI problem discussed earlier. Without loss of generality, we enumerate the arms based on their expected rewards: . Importantly, the agent is unaware of this indexing. For , and . The sample complexity is expressed in terms of .
At each time instant , the learner samples an arm and receives a corresponding (random) reward . The agent either outputs an arm that identifies as “good” or stops when no good arms remain. We denote the stop** time of the GAI policy as . Specifically, the agent outputs as good arms at rounds respectively, where denotes the estimate of the number of arms identified as good ones. The learner’s objective is to accurately and rapidly identify these good arms while minimizing the number of samples used. As elaborated below, this is achieved through policies falling within the class of -PAC policies.
Definition 3 (-PAC).
Let denote the number of good arms identified by the agent. A -PAC algorithm satisfies the following conditions:
-
1.
If there are at least good arms, then
-
2.
If there are fewer than good arms,
An algorithm is called -PAC if it is -PAC for all .
Just like in the BAI context (refer to Section II-C1), the objective in GAI is to determine the expected stop** time . The GAI algorithm consists of two key components: a sampling rule and an identification rule. The former dictates the arm selection process, while the latter guides the agent in distinguishing between good and bad arms. GAI confronts a novel challenge called the exploration-exploitation dilemma of confidence. Here, exploration involves the agent pulling arms other than the empirical best arm to identify potentially ‘good’ arms with fewer pulls. At the same time, exploitation entails pulling the empirical best arm to increase confidence in its classification as a good arm. To address this challenge, [27] proposed a hybrid algorithm for the dilemma of confidence (HDoC). In HDoC, the sampling rule is derived from the UCB algorithm for cumulative regret minimization [25], while the identification rule is based on the LUCB algorithm for BAI [38] and the APT algorithm for the thresholding bandits problem [43]. The proposed HDoC algorithm (LUCB-G) requires samples. However, a drawback of the LUCB-G algorithm is its impracticality when is very small. To address this issue and achieve faster convergence in the identification phase, [44] propose utilizing confidence widths derived from the finite LIL bound, akin to the approach in the lil’UCB algorithm [42]. They demonstrate a reduction in the required number of samples, achieving a sample complexity of . The specific connections between BAI/GAI and entanglement detection are elaborated in Section III and IV.
III The Quantum MAB Framework For Entanglement Detection
In this section, we introduce the quantum Multi-Armed Bandit (MAB) framework for entanglement detection. First, we highlight the structural similarity between this framework and the stochastic MAB model. In stochastic MAB, pulling an arm corresponds to sampling from a probability distribution with known support and unknown mean . When an arm is pulled, a reward is obtained with probability (w.p.) . In each round, different arms can be pulled, yielding independent and identically distributed (i.i.d.) rewards. Analogously, in the quantum setting, each arm represents an unknown quantum state . When is measured, the underlying probability distribution of the rewards is determined by the measurement . Specifically, if a Witness Basis Measurement (WBM) is chosen, measuring a state with will result in a reward with probability . Once the measurement is fixed, the rewards obtained from measuring are i.i.d. The subtle difference between the two models lies in the source of the rewards. In the stochastic MAB model, rewards are obtained by sampling from i.i.d. distributions, whereas in the quantum MAB model, the rewards depend on the chosen WBM.
In the Best Arm Identification (BAI) setting of stochastic MAB, the primary parameters of interest are the means of the rewards. Similarly, in the quantum analogue, is the parameter of interest. As discussed in Section II-B, for a given state and WBM , the value of determines whether the state is entangled. The specific problem we consider involves arms (states), of which are bad (entangled), and our goal is to identify these entangled states. We summarize this correspondence concisely in Table II.
Attributes | Stochastic MAB | Quantum MAB |
---|---|---|
Arms | Probability distributions | Density operators |
Measurement | WBM | |
Measurement Data | ||
Parameters to estimate | ||
Objective | Identify | Identify |
More formally, the objective of the learner is to accurately identify , while minimizing the number of measurements. This aligns with the goal of the -Bad Arm identification which aims to identify all those arms whose means fall below a specified threshold . In essence, solving the -Bad Arm identification is tantamount to addressing the -quantum MAB problem. We define the -quantum MAB setting as follows,
Definition 4.
The -quantum Multi-Armed Bandit (MAB) setting for entanglement detection is fully characterized by the tuple . Here, denotes a finite action set with , consisting of two-qubit separable states and two-qubit entangled states. The term corresponds to a suitable Witness Basis Measurement (WBM).
Remark 1.
The -dimensional discrete multi-armed quantum bandit model [45] is different from our formulation. The authors consider arms to be a finite set of observables and the environment, an unknown quantum state . The objective is to learn the unknown quantum state through an exploration-exploitation tradeoff. Given sequential oracle access to copies of , each round involves selecting an observable to maximize its expectation value (reward). The information from previous rounds (history) aids in refining the action choice, thereby minimizing the regret, which is the difference between the obtained and maximal rewards. The authors also exploit the inherent linear structure in measurement outcomes and map it to the linear bandit setting. Specifically, let be a set of orthogonal Hermitian matrices. The unknown environment and arm . Then, where and . In round , pulling arm provides a reward , where is 1-subgaussian.
To demonstrate the functionality of MAB policies, we identify suitable WBMs for families of parameterized two-qubit states denoted by .
III-A Two-qubit Depolarized Bell States
For , a two-qubit Depolarized Bell state is given by,
(9) |
Here, represents any one of the four Bell states , . When , (9) is called a Werner state and when , it is called an Isotropic state. The Peres-Horodecki criterion guarantees that is separable when and is entangled when . Table III outlines the specific choices of WBM for the combination of the maximally mixed state with each of the four Bell states. When measured with these corresponding WBMs, the entangled depolarized bell states are conclusively detected, determined by the value of which is strictly positive for and negative for .
Depolarized State | Pauli Basis | WBM |
---|---|---|
III-B Two-qubit Bell diagonal States
Bell diagonal states are a probabilistic mixture of the four Bell states. These states are more general than the ones in (9). Given parameters , , and such that , the Bell diagonal state is defined,
(10) |
The eigenvalues of are calculated to be , , and . Consequently, a Bell diagonal state is entangled if any one of these probabilities exceeds , while the sum of the other three probabilities is less than . Conversely, a Bell diagonal state is separable if all probabilities are less than or equal to . Expressing (10) in the Pauli basis yields,
where , and . When is entangled, the index for which determines the sign of and , see Table IV. It is notable that the signs of and follow a similar pattern to the Pauli basis expansion of various Depolarized Bell states listed in Table III. We observe that, for suitable combinations of , and , the Bell diagonal state reduces to one of the Depolarized Bell states and states can be detected using the same WBMs, as in Table III. Specifically, the value of under the two WBMs in Table IV is equal to and , respectively. Depending on the probabilistic mixture, one of the two WBMs will conclusively result in .
Probabilistic mixture | a | b | c | WBM |
---|---|---|---|---|
III-C Two-qubit Amplitude Dam** on Depolarized Bell States
A qubit amplitude dam** channel is a source of noise in superconducting circuit-based quantum computing and thus, serves as a realistic channel model for simulating lossy processes in these systems. Mathematically, it can be obtained from an isometry ,
(11) |
where denotes the Hilbert space for the channel’s input, and and represent the Hilbert spaces for the direct and complementary channel outputs, respectively. An isometry of the form,
(12) |
where defines a pair of channels, and . Here, is an amplitude dam** channel with dam** probability for the state to decay to output state . The isometry where and (Kraus) dam** operators such that and . For a single qubit represented by state , the amplitude damped output is given by,
(13) |
We can extend (13) for two qubit states with dam** probabilities and for the first and second qubit respectively. Assuming that , we consider Depolarized bell states (9) with amplitude dam**.
Proposition 5.
For any dam** probability , a Depolarized Bell state with amplitude dam** can not be expressed as a Bell diagonal state (10).
This fact can be readily demonstrated through a straightforward calculation. Consider the Isotropic state , which can be represented by the Bell diagonal state formed with probability distribution . In a Bell diagonal state, the diagonal elements corresponding to and are identical. In the case of an amplitude damped Isotropic state, we observe that,
However, obtaining closed-form expressions for and when is cumbersome. Specifically, the values on the diagonal corresponding to and is given by and , respectively. These expressions are equal only when .
Proposition 6.
For every , there exists such that an amplitude damped Depolarized Bell state becomes separable.
The PPT criterion asserts that a two-qubit state is entangled if and only if its partial transpose contains atleast one negative eigenvalue. For Bell states that are both amplitude damped and depolarized, we evaluate the eigenvalues and observe that one of them can exhibit either positive or negative values contingent upon the range of . Detailed findings are presented in Table V and depicted graphically in Fig. 1(a) and Fig. 1(b). Furthermore, the WBM for amplitude damped and Depolarized Bell states aligns with that of depolarized Bell states, as outlined in Table III.
State with | State with | Sign of eigenvalue |
---|---|---|
Always positive | ||
Always positive | ||
Always positive | ||
Positive and Negative |
![Refer to caption](extracted/5697427/posneg1.jpg)
![Refer to caption](extracted/5697427/posneg2.jpg)
IV Stochastic MAB policies for Entanglement Detection
In this section, we discuss stochastic MAB-based algorithms for entanglement detection in parameterized states within , as outlined in Section III. We will use stochastic MAB terminology in alignment with its quantum counterparts, as shown in Table II. We consider a set of unknown arms, denoted by . To perform measurements on the arms, the learner requires the knowledge of the underlying WBM. Therefore, we assume familiarity with the specific forms of the arms in , as they are detectable under the WBMs or . Here, and correspond to the WBMs of the first two witnesses in Table I, respectively. For example, consider , for all , where is unknown. These are isotropic states, which are probabilistic mixtures of the maximally mixed state and the Bell state and can be detected using WBM . With this assumption, we describe the template for the MAB problem as follows: In each round ,
-
•
The learner selects an arm .
-
•
The learner performs a measurement and obtains outcome with probability , where .
-
•
The learner updates the values of and identifies the entangled arm(s) or continues.
For a given WBM , the values of are bounded in . We can use concentration inequalities applicable to 1-subgaussian random variables. We apply the law of iterated logarithm [42] for a finite sum of 1-subgaussian random variables:
Lemma 7.
Let be i.i.d. sub-gaussian random variables with scale parameter . For any , , one has with probability at least for all ,
(14) |
where is the confidence width and .
Proof.
Readers can refer in [42, Lemma 1]. ∎
In the subsequent sections, we discuss two MAB policies: successive elimination for scenarios where there is a promise of one entangled arm among arms, and the HDoC policy for cases where there are entangled arms among arms, with being unknown.
IV-A Modified Successive Elimination Algorithm
We consider the -quantum MAB problem and characterise the expected stop** time for a modified version of the Successive Elimination algorithm [37] outlined as Algorithm 1. We are presented with arms such that . The algorithm takes the set of arms , the threshold value , and the error probability as input and outputs the arm . Let denote the number of times arm has been sampled in rounds and is the estimate of obtained on pulling arm until time . The algorithm maintains an active set and samples every arm in it. Subsequently, the estimates and Lower Confidence Bound (LCB) for the active arms are updated. In order to identify , the policy eliminates arms whose LCB exceeds the threshold and halts when only one arm remains in the active set.
Lemma 8.
Algorithm 1 is -PC.
Proof.
The proof is presented in Appendix VII-A1. ∎
The correctness of Algorithm 1 and the sample complexity of identifying the entangled arm is presented below.
Theorem 9.
With probability at least , the arm remains in the active set till termination.
Proof.
The proof is presented in Appendix VII-A2. ∎
Theorem 10.
Algorithm 1 successfully identifies the arm with probability and will terminate after samples where is the sub-optimal gap with respect to the threshold .
Proof.
The proof is presented in Appendix VII-A3. ∎
IV-B Modified lil’HDoC Algorithm
The lil’HDoC algorithm introduced in [44], is a variant of the HDoC algorithm proposed by [27]. This algorithm employs a novel approach by integrating the sampling rule based on the UCB algorithm for regret minimization, as detailed in [25] with an identification rule based on the confidence bound outlined in Lemma 7. In contrast to the LCB-based identification rule utilized in the HDoC algorithm [38], the integration of the LIL-based concentration in lil’HDoC presents a notable enhancement in sample complexity. This improvement stems from the observation that the LIL bound exhibits a higher growth rate compared to the LCB bound , consequently leading to a reduction in the required number of samples. In other words, there exists a value such that for all , ,
Consequently, by ensuring that each arm is sampled at least times initially, lil’HDoC not only accelerates the pace at which its confidence bound grows but also attains adequate confidence in identifying the good arms. We have that the confidence bound of HDoC . Through straightforward calculations, we see that the smallest integer such that the confidence bound of lil’HDoC grows faster than is,
(15) |
Thus, if each arm is initially sampled times, lil’HDoC achieves comparable identification capabilities to HDoC and possesses a sample complexity of samples on each arm. Now, let us map the lil’HDoC algorithm outlined as Algorithm 2 onto the -quantum MAB problem and characterise the expected stop** time.
Consider arms such that , with being unknown. The algorithm takes the set of arms , the threshold value , and the error probability as input and outputs the set of arms . Firstly, every arm is sampled for a minimum of times (15). While the arm set , the algorithm keeps track of the active arms and employs the sampling rule and identification rule explained earlier.
To demonstrate the correctness of Algorithm 2, we first show that the algorithm is -PAC for all and then characterise the sample complexity of identifying bad arms (entangled states).
Theorem 12.
With probability at least , the algorithm identifies all the arms in .
Proof.
The proof is presented in Appendix VII-B2. ∎
V Workflow for Entanglement Detection
In this section, we present a workflow for entanglement detection in scenarios where the arms in are detectable under distinct WBMs. In this routine, we suitably utilize the stochastic MAB policies discussed in the previous section. Specifically, we relax the assumption that the learner must have prior knowledge of the specific WBM, thereby enabling the sequential adaptation of WBMs through suitable unitary transformations. We evaluate the performance of this methodology on Depolarized Bell states and arbitrary quantum states. In particular, we select states, with of them being entangled, and investigate the numerical results of the quantum MAB problems.
V-A Entanglement Detection in Depolarised Bell states
We present numerical results on the sample complexity of entanglement detection for the -quantum MAB problem, specifically addressing Depolarized Bell states. These states are known to be detectable under the witnesses and , outlined in Table I. The procedure for entanglement detection is detailed in Algorithm 3. The algorithm operates with an input threshold of , an accepted error rate , a set of Depolarized Bell states—of which are entangled—and two WBMs. The sequence of WBMs in Algorithm 3 follows the order and then . It is important to note that the sequence in which the WBMs are selected is arbitrary, as the algorithm does not involve state estimation during the process. We note that for the -quantum MAB problem, there is a promise that one arm is entangled so the value of is known to the policy. Let us consider the following two experiments for arms.
-
•
In the first experiment, we generate five isotropic states (defined below (9)), such that exactly one of them is entangled and can be detected under WBM . As described earlier, we randomly generate the values of and under WBM , we compute . For , Algorithm 2 is iterated over 500 runs with WBM , confidence width and .
-
•
In the second experiment, we generate five depolarized Bell states formed with any of the Bell states. We randomly generate the values of such that one of these states is entangled. Under WBM and , we get and . Here, the WBM is unknown to the learner. Since there is a promise () that one arm is entangled, the learner should measure with at least one of the two WBMs. For , Algorithm 3 is iterated over runs. For both these experiments, we plot the average number of samples until stoppage on the y-axis and on the x-axis as shown in Fig. 2.
![Refer to caption](extracted/5697427/plot_1K.jpg)
We present numerical results on the sample complexity of entanglement detection for the -quantum MAB problem for Depolarized Bell states, with being unknown to the policy. We consider the following two experiments with arms.
-
•
In the first experiment, we generate five isotropic states as described earlier. Under WBM , we get that . For , Algorithm 2 is iterated over 500 runs with WBM . Here, and is unknown to the policy.
-
•
In the second experiment, we generate five depolarized Bell states formed with any of the Bell states. Under WBM and , the parameters are and respectively. Although the states are detectable under WBM , it is unknown to the learner. Thus, we need to run at least one iteration of Algorithm 2. In the first iteration, the inputs are arms and WBM (or ), and the policy returns entangled arms. In the second iteration, Algorithm 2 is executed with arms and WBM (or ) as the inputs. This routine is summarised in Algorithm 3 and iterated for 500 runs. We plot the average number of samples until stoppage on the y-axis and on the x-axis, as shown in Fig. 3.
For the instances considered above, the sample complexity scales with . It is noteworthy that when sub-optimal gaps are very small, the sample complexity increases significantly and may not scale with . Since we iterate the bandit policy at most once, the worst-case sample complexity for entanglement detection in depolarized Bell states scales by a factor of two.
![Refer to caption](extracted/5697427/plot_mK.jpg)
V-B Entanglement Detection in Arbitrary Quantum States
In this section, we present a routine for detecting entanglement in arbitrary quantum states and provide numerical results for the -quantum Multi-Armed Bandit (MAB) problem. To generate random density matrices, we follow the method described in [46]. Specifically, we start by generating a complex matrix , where the real and imaginary parts of each element are independently sampled from a normal distribution. We then compute the density matrix by normalizing , resulting in . This procedure ensures that is a valid density matrix. On the generated states, we run Algorithm 4, which takes as input the error threshold , the set of arms , and a permutation of that defines the order in which the six WBMs should be adapted. Since this is a promise problem, the algorithm stops as soon as one entangled arm is identified, without needing to measure with all six WBMs.
![Refer to caption](extracted/5697427/plot2.jpg)
We iterate the bandit policy at most five times, resulting in the worst-case sample complexity for entanglement detection being scaled by a factor of six. To this end, we conduct the following experiment, generating 500 different instances of arbitrary states generated following the procedure described earlier. We ensure that each instance includes one entangled arm. We note that these are valid instances verified by the PPT criterion. The objective of this experiment is test the efficacy of using the single parameter family of witnesses (4) to detect entanglement in arbitrary states. For , we report the fraction of times the entangled arm is accurately identified and this is shown in Fig. 4.
Pure entangled states and Values under
From the above experiment, we report several noteworthy observations. Firstly, we encountered instances of pure states where the value of equaled , which is the threshold value provided to the algorithm. In such cases, the algorithm required a significantly long time to converge and, despite this, incorrectly estimated the value of . Consequently, we adjusted the threshold to and imposed a cutoff on the sample complexity at to better reflect the real-time performance of this policy. Secondly, we came across instances of entangled states verified by the PPT test that yielded positive values of under all six WBMs. Interestingly, the mixed entangled state , where are defined in Table VI, with has under the six witnesses, indicating that this state cannot be detected by the witness family described in (4).
We derive an observation on the nature of such states, particularly focusing on the eigenstate , which corresponds to the largest eigenvalue of . This eigenstate has a Schmidt coefficient close to, but not equal to, 1, suggesting that it lies near the boundary of the separable states yet remains entangled. The pure state produces . Thus, we have identified examples of pure and mixed entangled states that can yield inconclusive results when measured using this particular witness family. In these instances, it is essential to measure all six witnesses a sufficient number of times to accurately obtain the expected values of the corresponding observables. Subsequently, performing FST can help determine the entanglement of these states using other separability criteria.
VI Future Works And Conclusion
We established a novel correspondence between the problem of entanglement detection and the Bad Arm Identification problem in stochastic Multi-Armed Bandits (MAB). We propose the -quantum Multi-Armed Bandit framework. Focus of this framework is on identifying entangled states out of states, where is potentially unknown. We apply this framework to two-qubit states using two key ingredients: a specialized set of six measurements for two-qubit states called Witness Basis Measurements (WBM) , and a separability criterion , which is based on the data obtained from these measurements and serves as the parameter that needs to be estimated. We present theoretical guarantees and numerical simulations to demonstrate how this parameter can be estimated quickly and accurately using MAB policies. First, we show that entangled states belonging to a class of parameterised two-qubit states can be detected by measuring a subset of the six WBMs. With the knowledge of the WBM, we show that we can directly apply some suitable MAB policies. Second, for the same parameterised states, we present a routine for entanglement detection when the WBM is not known by enabling arbitrary sequential adaptation of the WBMs. We extend this to arbitrary two qubit quantum states and provide numerical results on the efficacy of using these measurements for detecting entanglement.
A promising future direction is identifying WBMs for higher-dimensional bipartite systems. The authors of [2] propose a minimal tomographic scheme for two-qutrits, requiring only eleven witnesses instead of the traditional 81. Recent explorations in data-driven machine learning techniques have utilized SVMs to construct linear entanglement witnesses requiring only local measurements [47]. This approach offers promising avenues for extending these methods to address the -quantum MAB problem by constructing a minimal number of witnesses to accurately detect all states. Entanglement detection can be viewed as a membership problem, where a state belongs to a set if it has a specific property (such as, entanglement). This problem has also been explored along the lines of the partition identification problem [48], where the goal is to determine the partition to which a data point belongs, given the form of a hyperplane. Extending this concept to the -quantum MAB problem presents an exciting avenue for future research.
Acknowledgement
B.K. sincerely acknowledges the support from the Ministry of Education, Government of India, through the Prime Minister’s Research Fellowship (PMRF) Scheme. V.S. is supported by the U.S. Department of Energy, Office of Science, National Quantum Information Science Research Centers, Co-design Center for Quantum Advantage (C2QA) contract (DE- SC0012704). K.J. gratefully acknowledges a grant from Mphasis to the Centre for Quantum Information, Communication, and Computing (CQuICC) at IIT Madras.
References
- [1] D. Lu, T. Xin, N. Yu, Z. Ji, J. Chen, G. Long, J. Baugh, X. Peng, B. Zeng, and R. Laflamme, “Tomography is necessary for universal entanglement detection with single-copy observables,” Phys. Rev. Lett., vol. 116, p. 230501, Jun 2016. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.116.230501
- [2] H. Zhu, Y. S. Teo, and B.-G. Englert, “Minimal tomography with entanglement witnesses,” Phys. Rev. A, vol. 81, p. 052339, May 2010. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.81.052339
- [3] C. H. Bennett, G. Brassard, C. Crépeau, R. Jozsa, A. Peres, and W. K. Wootters, “Teleporting an unknown quantum state via dual classical and einstein-podolsky-rosen channels,” Phys. Rev. Lett., vol. 70, pp. 1895–1899, Mar 1993.
- [4] H. Buhrman, R. Cleve, and W. van Dam, “Quantum entanglement and communication complexity,” SIAM Journal on Computing, vol. 30, no. 6, pp. 1829–1841, 2001.
- [5] R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki, “Quantum entanglement,” Reviews of Modern Physics, vol. 81, no. 2, pp. 865–942, Jun. 2009.
- [6] R. Kueng, H. Rauhut, and U. Terstiege, “Low rank matrix recovery from rank one measurements,” Applied and Computational Harmonic Analysis, vol. 42, no. 1, pp. 88–116, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1063520315001037
- [7] J. Wang, V. B. Scholz, and R. Renner, “Confidence polytopes in quantum state tomography,” Physical Review Letters, vol. 122, no. 19, May 2019. [Online]. Available: http://dx.doi.org/10.1103/PhysRevLett.122.190401
- [8] R. O’Donnell and J. Wright, “Efficient quantum tomography,” Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:769062
- [9] ——, “Efficient quantum tomography ii,” Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:5245926
- [10] K. Banaszek, M. Cramer, and D. Gross, “Focus on quantum tomography,” New Journal of Physics, vol. 15, no. 12, p. 125020, dec 2013. [Online]. Available: https://dx.doi.org/10.1088/1367-2630/15/12/125020
- [11] S. T. Flammia, D. Gross, Y.-K. Liu, and J. Eisert, “Quantum tomography via compressed sensing: error bounds, sample complexity and efficient estimators,” New Journal of Physics, vol. 14, no. 9, p. 095022, sep 2012. [Online]. Available: https://dx.doi.org/10.1088/1367-2630/14/9/095022
- [12] M. Guta, J. Kahn, R. Kueng, and J. A. Tropp, “Fast state tomography with optimal error bounds,” Journal of Physics A: Mathematical and Theoretical, vol. 53, no. 20, p. 204001, apr 2020. [Online]. Available: https://dx.doi.org/10.1088/1751-8121/ab8111
- [13] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and G. Carleo, “Neural-network quantum state tomography,” Nature Physics, vol. 14, no. 5, p. 447–450, Feb. 2018. [Online]. Available: http://dx.doi.org/10.1038/s41567-018-0048-5
- [14] Y. Quek, S. Fort, and H. K. Ng, “Adaptive quantum state tomography with neural networks,” 2018.
- [15] D. Koutný, L. Motka, Z. Hradil, J. Řeháček, and L. L. Sánchez-Soto, “Neural-network quantum state tomography,” Physical Review A, vol. 106, no. 1, Jul. 2022. [Online]. Available: http://dx.doi.org/10.1103/PhysRevA.106.012409
- [16] T. Schmale, M. Reh, and M. Gärttner, “Efficient quantum state tomography with convolutional neural networks,” npj Quantum Information, vol. 8, no. 1, Sep. 2022. [Online]. Available: http://dx.doi.org/10.1038/s41534-022-00621-4
- [17] D. S. França, F. G. L. Brandão, and R. Kueng, “Fast and Robust Quantum State Tomography from Few Basis Measurements,” in 16th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2021), ser. Leibniz International Proceedings in Informatics (LIPIcs), M.-H. Hsieh, Ed., vol. 197. Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021, pp. 7:1–7:13. [Online]. Available: https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.TQC.2021.7
- [18] J. Haah, A. W. Harrow, Z. Ji, X. Wu, and N. Yu, “Sample-optimal tomography of quantum states,” IEEE Transactions on Information Theory, p. 1–1, 2017. [Online]. Available: http://dx.doi.org/10.1109/TIT.2017.2719044
- [19] Y. S. Teo, H. Zhu, B.-G. Englert, J. Řeháček, and Z. c. v. Hradil, “Quantum-state reconstruction by maximizing likelihood and entropy,” Phys. Rev. Lett., vol. 107, p. 020404, Jul 2011. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.107.020404
- [20] V. Siddhu, “Maximum a posteriori probability estimates for quantum tomography,” Physical Review A, vol. 99, no. 1, Jan. 2019. [Online]. Available: http://dx.doi.org/10.1103/PhysRevA.99.012342
- [21] M. Horodecki, P. Horodecki, and R. Horodecki, “Separability of mixed states: necessary and sufficient conditions,” Physics Letters A, vol. 223, no. 1, pp. 1–8, 1996. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960196007062
- [22] B. M. Terhal, “Bell inequalities and the separability criterion,” Physics Letters A, vol. 271, no. 5, pp. 319–326, 2000. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960100004011
- [23] M. Lewenstein, B. Kraus, J. I. Cirac, and P. Horodecki, “Optimization of entanglement witnesses,” Phys. Rev. A, vol. 62, p. 052310, Oct 2000. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.62.052310
- [24] D. Chruściński and G. Sarbicki, “Entanglement witnesses: construction, analysis and classification,” Journal of Physics A: Mathematical and Theoretical, vol. 47, no. 48, p. 483001, Nov. 2014. [Online]. Available: http://dx.doi.org/10.1088/1751-8113/47/48/483001
- [25] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine Learning, vol. 47, pp. 235–256, 05 2002.
- [26] J.-Y. Audibert, S. Bubeck, and R. Munos, “Best arm identification in multi-armed bandits.” in COLT, 2010, pp. 41–53.
- [27] H. Kano, J. Honda, K. Sakamaki, K. Matsuura, A. Nakamura, and M. Sugiyama, “Good arm identification via bandit feedback,” 2018.
- [28] M. Lewenstein, B. Kraus, J. I. Cirac, and P. Horodecki, “Optimization of entanglement witnesses,” Phys. Rev. A, vol. 62, p. 052310, Oct 2000. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.62.052310
- [29] I. Bengtsson and K. Zyczkowski, Geometry of Quantum States: An Introduction to Quantum Entanglement. Cambridge University Press, 2006.
- [30] M. Horodecki, P. Horodecki, and R. Horodecki, “Separability of mixed states: necessary and sufficient conditions,” Physics Letters A, vol. 223, no. 1, pp. 1–8, 1996. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960196007062
- [31] A. Peres, “Separability criterion for density matrices,” Phys. Rev. Lett., vol. 77, pp. 1413–1415, Aug 1996. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.77.1413
- [32] P. Horodecki, “Separability criterion and inseparable mixed states with positive partial transposition,” Physics Letters A, vol. 232, no. 5, pp. 333–339, 1997. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960197004167
- [33] O. Rudolph, “A separability criterion for density operators,” Journal of Physics A: Mathematical and General, vol. 33, no. 21, p. 3951–3955, May 2000. [Online]. Available: http://dx.doi.org/10.1088/0305-4470/33/21/308
- [34] O. Gühne, P. Hyllus, O. Gittsovich, and J. Eisert, “Covariance matrices and the separability problem,” Physical Review Letters, vol. 99, no. 13, Sep. 2007. [Online]. Available: http://dx.doi.org/10.1103/PhysRevLett.99.130504
- [35] L. Gurvits, “Classical deterministic complexity of edmonds’ problem and quantum entanglement,” 2003.
- [36] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri, “Complete family of separability criteria,” Phys. Rev. A, vol. 69, p. 022308, Feb 2004. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.69.022308
- [37] E. Even-Dar, S. Mannor, and Y. Mansour, “Pac bounds for multi-armed bandit and markov decision processes,” ser. COLT ’02. Berlin, Heidelberg: Springer-Verlag, 2002, p. 255–270.
- [38] S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone, “Pac subset selection in stochastic multi-armed bandits,” in Proceedings of the 29th International Coference on International Conference on Machine Learning, ser. ICML’12. Madison, WI, USA: Omnipress, 2012, p. 227–234.
- [39] Z. Karnin, T. Koren, and O. Somekh, “Almost optimal exploration in multi-armed bandits,” in Proceedings of the 30th International Conference on Machine Learning, 2013, pp. 1238–1246.
- [40] S. Mannor and J. N. Tsitsiklis, “The sample complexity of exploration in the multi-armed bandit problem,” J. Mach. Learn. Res., vol. 5, p. 623–648, dec 2004.
- [41] R. H. Farrell, “Asymptotic Behavior of Expected Sample Size in Certain One Sided Tests,” The Annals of Mathematical Statistics, vol. 35, no. 1, pp. 36 – 72, 1964. [Online]. Available: https://doi.org/10.1214/aoms/1177703731
- [42] K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck, “lil’ ucb : An optimal exploration algorithm for multi-armed bandits,” in Proceedings of The 27th Conference on Learning Theory, ser. Proceedings of Machine Learning Research, M. F. Balcan, V. Feldman, and C. Szepesvári, Eds., vol. 35. Barcelona, Spain: PMLR, 13–15 Jun 2014, pp. 423–439. [Online]. Available: https://proceedings.mlr.press/v35/jamieson14.html
- [43] A. Locatelli, M. Gutzeit, and A. Carpentier, “An optimal algorithm for the thresholding bandit problem,” in Proceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48. New York, New York, USA: PMLR, 20–22 Jun 2016, pp. 1690–1698. [Online]. Available: https://proceedings.mlr.press/v48/locatelli16.html
- [44] T.-H. Tsai, Y.-D. Tsai, and S.-D. Lin, “lil’hdoc: An algorithm for good arm identification under small threshold gap,” 2024.
- [45] J. Lumbreras, E. Haapasalo, and M. Tomamichel, “Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states,” Quantum, vol. 6, p. 749, Jun. 2022. [Online]. Available: http://dx.doi.org/10.22331/q-2022-06-29-749
- [46] K. Zyczkowski and H.-J. Sommers, “Induced measures in the space of mixed quantum states,” Journal of Physics A: Mathematical and General, vol. 34, no. 35, p. 7111–7125, Aug. 2001. [Online]. Available: http://dx.doi.org/10.1088/0305-4470/34/35/335
- [47] A. C. Greenwood, L. T. Wu, E. Y. Zhu, B. T. Kirby, and L. Qian, “Machine-learning-derived entanglement witnesses,” Phys. Rev. Appl., vol. 19, p. 034058, Mar 2023. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevApplied.19.034058
- [48] S. Juneja and S. Krishnasamy, “Sample complexity of partition identification using multi-armed bandits,” 2019.
VII Supplementary Material
The following lemma is useful for some calculations.
Lemma 14.
For ,
(16) |
VII-A Proof for Section IV-A
VII-A1 Proof of Lemma 8
VII-A2 Proof of Theorem 9
Proof.
Recall that the threshold and problem instance is such that . Let us consider the case that the event described in Lemma 8 holds. As outlined in Algorithm 1, the arm will be dropped from the active set if . That is,
This contradicts the assumption about the problem instance because and so, the arm will not be dropped from the active set as long as event holds. ∎
VII-A3 Proof of Theorem 10
Proof.
Let us consider the case where holds. By the elimination rule of Algorithm 1, an arm is removed from the active set if . We have that,
(18) |
Let us denote to be the number of samples of arm , that is, . The minimum value of can be obtained by solving,
(19) |
From Lemma 16, we get that,
(20) |
Thus, the total number of samples required to identify the arm with a probability of at least is . ∎
VII-B Proof for Section IV-B
VII-B1 Proof of Lemma 11
Proof.
Firstly, we show that Algorithm 2 is -PAC for arbitrary . In the case where there are arms greater than or equal to , we show that where is the number of good arms identified by the agent. Since we are now considering the case when , the event implies that at least one good arm is identified as a bad arm by the agent. That is, for some and , the upper confidence bound . Thus, we have that,
(21) |
The event considers all those outcomes where a bad arm is identified to be a good one. Thus, for some bad arm such that , we have,
(22) |
Thus, putting Eq. 21 and Eq. 22 together, we get that . Next, we consider the case when the number of good arms is less than and show that . Since there are at most good arms, the event implies that one of the output arms is such that there exists some index such that is a bad arm. Thus, we have that,
(23) |
We see that the algorithm is -PAC for all such , thereby giving us that the algorithm is -PAC. ∎
VII-B2 Proof of Theorem 12
Proof.
Recall that the threshold and problem instance is such that , with being unknown. Let us consider the case that the event described in Lemma 8 holds. As outlined in Algorithm 2, an arm will be dropped if . That is,
Thus, as long as event holds, all the arms that have will not dropped. Thus the lil’HDoC algorithm identifies all the arms correctly. ∎