Classical Bandit Algorithms for Entanglement Detection in Parameterized Qubit States

Bharati K Department of Electrical Engineering, IIT Madras, Chennai, India Vikesh Siddhu IBM Quantum, IBM Research India Krishna Jagannathan Department of Electrical Engineering, IIT Madras, Chennai, India

Abstract

Entanglement is a key resource for a wide range of tasks in quantum information and computing. Thus, verifying availability of this quantum resource is essential. Extensive research on entanglement detection has led to no-go theorems [1] that highlight the need for full state tomography (FST) in the absence of adaptive or joint measurements. Recent advancements, as proposed by [2], introduce a single-parameter family of entanglement witness measurements which are capable of conclusively detecting certain entangled states and only resort to FST when all witness measurements are inconclusive. We find a variety of realistic noisy two-qubit quantum states $\mathcal{F}$ that yield conclusive results under this witness family. We solve the problem of detecting entanglement among $K$ quantum states in $\mathcal{F}$ , of which $m$ states are entangled, with $m$ potentially unknown. We recognize a structural connection of this problem to the Bad Arm Identification problem in stochastic Multi-Armed Bandits (MAB). In contrast to existing quantum bandit frameworks, we establish a new correspondence tailored for entanglement detection and term it the $(m,K)$ -quantum Multi-Armed Bandit. We implement two well-known MAB policies for arbitrary states derived from $\mathcal{F}$ , present theoretical guarantees on the measurement/sample complexity and demonstrate the practicality of the policies through numerical simulations. More broadly, this paper highlights the potential for employing classical machine learning techniques for quantum entanglement detection.

Index Terms:

quantum computing, quantum states, entanglement detection, FST, entanglement witness, multi-armed bandit, bad arm identification

I Introduction

The emergence of quantum information theory has changed our understanding of quantum entanglement, transforming it from a property of quantum states to a vital resource. Entanglement allows us to perform non-classical tasks, such as quantum communication, quantum teleportation, and quantum information processing, to name a few [3, 4, 5]. However, checking if a given unknown state is entangled can be highly non-trivial. The first issue is theoretical, even if one completely determines an unknown state via full state tomography (FST), checking if a known state is entangled can be hard. The second issue is practical, real-world laboratory conditions introduce imperfections and noise which make it difficult to carry out FST or directly test if an unknown state is entangled or separable.

There is a vast literature dedicated to FST (see [6, 7, 8, 9, 10, 11, 12] and references therein and also see [13, 14, 15, 16, 17] for machine learning based approaches). Using entangled measurements, one can carry out FST with almost optimal copy complexity [9, 18]. In practice, entangled measurements are harder to carry out and one does single copy measurements. From data generated by single copy measurements, one can recover the state being measured using a variety of techniques such as linear inversion, maximum likelihood estimation, and maximum a posteriori estimation [19, 20]. From the reconstructed state it is possible to ascertain whether the state is entangled or separable using well-known criterion (some are outlined in Sec. II-B). However, this FST method becomes impractical as the number of qubits in the quantum system are increased due to computational challenges and exponential scaling in number of measurements required. If one is interested in testing for entanglement, it may not be necessary in practice to carry out FST. Furthermore, the sample complexity for determining FST does not provide an obvious measurement/sample complexity for entanglement detection.

Entanglement can be assessed by measuring entanglement witnesses [21, 22, 23, 24]. These observables indicate the presence of some entangled states. Although no single witness can detect all entangled states, it is important to note that each witness measurement contributes information about the state. If entanglement is not detected by any of the witnesses, the information given by the witness can eventually facilitate FST. This FST can then be used to check for entanglement using standard tests. This insight has been effectively explored in [2], which constructs a set of measurements that serve simultaneously as entanglement witnesses, and also enable FST. For bipartite qubit systems, [2] proposes a measurement scheme that requires six witness operator measurements. Rather than merely determining the expectation value of the witness operator, one can measure the eigenbasis $\mathcal{E}$ of a single-parameter family of witnesses. Based on the frequencies of these witness measurement outcomes, the authors formulate a criterion for separability $S_{\mathcal{E}}$ , that yields non-negative values for all separable states and negative values for some entangled states. For entangled states that cannot be detected by this witness family, a tomographic reconstruction of the state can be performed (see Sections II-A and II-B for further details).

Given an eigenbasis $\mathcal{E}$ , achieving high-precision estimation of $S_{\mathcal{E}}$ is pivotal but requires measurements of numerous copies of the state, imposing a significant resource constraint. This challenge is further compounded in scenarios involving multiple (say $K>1$ ) states, among which $m<K$ states may be entangled. We see that performing FST for all $K$ states may be unnecessary for entanglement detection. In such instances where resource and time efficiency is paramount, the necessity for a large number of measurements for accurate estimation of parameters can be circumvented by identifying certain ‘winning’ trends dictated by sample data estimates and choosing when and how measurements need to be made. This fits neatly into the well-studied Multi-Armed Bandits (MAB) framework in classical machine learning.

The MAB setting tackles sequential decision-making problems faced with a finite set of options (arms), with each arm yielding stochastic rewards with unknown average rewards. Arm selections unfold iteratively in rounds, with a learner choosing arms based on a predefined policy. Following each selection, the learner receives a reward corresponding to the chosen arm, influencing subsequent decisions and possible policy adjustments. There are two main objectives of the MAB framework. The first balances exploration (finding high-reward arms) and exploitation (selecting the arm with the highest observed reward) to maximize cumulative rewards [25]. The second objective involves pure exploration with the goal of identifying the arm with the highest expected reward, i.e., Best Arm Identification (BAI) [26]. A variant of BAI called the $(m,K)$ -Good Arm Identification (GAI) problem ( $m$ unknown) has a goal of identifying $m$ ‘good’ arms (out of $K$ ) whose expected rewards lie above a specified threshold $\zeta$ [27]. Equivalently, $(m,K)$ -Bad Arm Identification aims to identify $m$ ‘bad’ arms (out of $K$ ) whose expected rewards lie below a specified threshold. Two orthogonal parameters influence the performance of BAI policies: sample complexity and the probability of error in identifying the best arm. More details on MAB and BAI policies are in Section II-C.

The overarching goal of this paper is to utilize stochastic MAB policies to address the problem of entanglement detection, and to characterise the sample complexity for such an approach. The organisation and key contributions in this paper are as summarised below:

•

We identify a well-motivated class of parameterized two-qubit states $\mathcal{F},$ and a corresponding measurement $\mathcal{E}$ such that $S_{\mathcal{E}}\geq 0$ for all separable states in $\mathcal{F}$ and $S_{\mathcal{E}}<0$ for all entangled states. This is detailed in Sections III-A, III-B and III-C.
•

In Section III, we highlight the key contribution of our paper, recognizing a structural connection between the separability criterion $S_{\mathcal{E}}$ outlined in [2] and the Best Arm Identification (BAI) problem of stochastic Multi-Armed Bandits (MAB). Specifically, the $(m,K)$ -Bad Arm Identification problem corresponds to the $(m,K)$ -quantum Multi Armed Bandit problem ( $m$ potentially unknown) with the goal of identifying $m$ ‘bad’ arms and $m$ entangled states derived from $\mathcal{F}$ , respectively.
•

Another significant contribution of our paper lies in achieving conclusive entanglement detection without the explicit need for FST for commonly seen noisy two-qubit states which we find to be in $\mathcal{F}$ . In Section IV, we discuss two distinct MAB policies for entanglement detection based on Successive Elimination and Hybrid Dilemma of Confidence (Refer Section II-C). With well-defined confidence intervals, we demonstrate the correctness and characterise the sample complexity these policies.
•

In Section V-A, we present numerical results on the performance of the MAB policies for depolarised Bell states.
•

In Section V-B, we demonstrate the efficiency of the MAB policies and the WBMs in identifying general two-qubit entangled states and present numeric examples of pure and mixed two-qubit entangled states, where the single-parameter family of witnesses fail to provide conclusive results, thus necessitating FST.

II Preliminaries

Let $\mathcal{H}$ be a finite dimensional Hilbert space with dimension $d$ . A pure quantum state is represented by a unit norm vector $\ket{\psi}\in\mathcal{H}$ . Let $\mathcal{L}(\mathcal{H})$ be the space of linear operators on $\mathcal{H}$ , the Frobenius inner product for any $A,B\in\mathcal{L}(\mathcal{H}$ ), $\langle A,B\rangle\coloneqq\Tr(A^{{\dagger}}B)$ where ${\dagger}$ represents conjugate transpose. A Hermitian operator satisfies $H=H^{{\dagger}}$ . A density operator $\rho\in\mathcal{L}(\mathcal{H})$ is Hermitian, positive semi-definite, $\rho\geq 0$ , and has unit trace, $\Tr(\rho)=1$ ; it can represents both pure and mixed states. A positive operator value measure (POVM) is collection of positive operators $\{E_{i}\geq 0\}$ that sum to the identity, $\sum_{i}E_{i}=\mathbf{1}$ . A POVM represents a measurement where $E_{i}$ corresponds to measurement outcome $i$ , but sometimes we compress this and just say $E_{i}$ is a measurement outcome.

Let $\mathcal{H}_{a}$ and $\mathcal{H}_{b}$ be finite-dimensional Hilbert spaces with dimensions $d_{a}$ and $d_{b}$ , respectively, and $\mathcal{H}_{ab}\coloneqq\mathcal{H}_{a}\otimes\mathcal{H}_{b}$ , where $\otimes$ represents tensor product, be a bipartite Hilbert space with dimension $d=d_{a}d_{b}$ . A density operator $\rho_{ab}\in\mathcal{L}(\mathcal{H}_{ab})$ is called separable if it can be written as a convex combination of product states, that is,

\rho_{ab}=\sum_{i}p_{i}\ket{\phi^{i}_{a},\chi^{i}_{b}}\bra{\phi^{i}_{a},\chi^{% i}_{b}},

(1)

where $p_{i}\geq 0$ such that $\sum_{i}p_{i}=1$ and $\ket{\phi^{i}_{a},\chi^{i}_{b}}\coloneqq\ket{\phi}^{i}_{a}\otimes\ket{\chi}^{i% }_{b}$ is a product of two pure states. We denote the set of all separable density operators by $S_{ab}$ . Conversely, $\rho_{ab}$ is entangled if it can not be written in the form (1). We discuss some preliminaries on entanglement witnesses and witness-based measurements in Section II-A, the various separability criteria for entanglement detection in Section II-B and the framework and background on stochastic multi-armed problems in Section II-C.

II-A Entanglement Witnesses and Witness Operators Measurements

Entanglement can be detected by measuring entanglement witnesses and can be defined as follows:

Definition 1 (Entanglement Witness).

An entanglement witness, denoted as $W\in\mathcal{L}(\mathcal{H}_{ab})$ , is a Hermitian operator that detects some entangled state $\rho_{ent}\in\mathcal{H}_{ab}$ such that,

	$\displaystyle\langle\rho_{ent},W\rangle=\Tr(\rho_{ent}W)<0,$		(2)
	$\displaystyle\langle\rho,W\rangle=\Tr(\rho W)\geq 0,\ \forall\rho\in S_{ab}.$		(3)

Conceptually, a witness $W$ defines a hyperplane that delineates a set of entangled states it can detect $\left(D_{W}=\{\rho\ \text{s.t.}\ \Tr(\rho W)<0\}\right)$ from all other states. When comparing two arbitrary witnesses $W_{1}$ and $W_{2}$ , if $D_{W_{1}}$ is contained within $D_{W_{2}}$ , then $W_{2}$ is considered finer than $W_{1}$ . Further insights into this topology are detailed in [28, Lemma 1]. A witness is said to be optimal when no other witness is finer, suggesting that it touches the boundary of the convex set of separable states [29].

To improve the efficacy of identifying entangled states, [2] proposes a method to construct a set of measurements called Witness Operator Measurements (WOM), which we briefly discuss here. Let us consider the rank-one projector onto a pure entangled state denoted by $\rho(\alpha)=\ket{\psi}\bra{\psi}$ , where $\ket{\psi}=\cos{\alpha}\ket{00}+\sin{\alpha}\ket{11}$ . Here, the Schmidt coefficients $\cos{\alpha}$ and $\sin{\alpha}$ are arranged in non-increasing order as $1>\cos{\alpha}^{2}\geq\sin{\alpha}^{2}>0$ . Consequently, $\alpha\in[0,\pi/4]$ is chosen to adhere to this order.

In this paper, we consider the specific form of the witnesses from [2], namely, $W=\rho_{w}(\alpha)=\cos{\alpha}^{2}\mathbf{1}-\rho(\alpha)^{\top_{2}}$ . That is, consider a rank-one POVM $\sum_{i}w_{i}\rho_{i}=\mathbf{1}$ with outcomes $w_{i}\rho_{i}$ such that $w_{i}>0$ and $\rho_{i}$ ’s are projectors onto pure states with outcomes. We can construct a WOM with outcomes $w_{i}\rho_{iw}$ where $\rho_{iw}=\lambda_{\text{max}}\mathbf{1}-\rho_{i}^{\top_{2}}$ , where $\top_{2}$ signifies a transpose operation on the second subsystem and $\lambda_{\text{max}}$ is the largest eigenvalue across all $\rho_{i}^{\top_{2}}$ s.

II-B Separability criteria for entanglement detection

Using FST techniques, briefly outlined earlier, one can do a tomographic reconstruction of the state and subsequently determine its entanglement status using well-known separability criteria. For bipartite qubit systems, the Peres-Horodecki criterion [30, 31] establishes that a density operator $\rho_{ab}$ is separable if and only if the eigenvalues of its partial transpose $\rho_{ab}^{\top_{2}}$ are non-negative. This criterion remains necessary and sufficient even when $d_{a}=2$ and $d_{b}=3$ but is violated in higher dimensions by a class of entangled states with non-negative partial transposition. Other criteria include the range criterion [32], the matrix realignment criterion [33], the covariance matrix (CM) criterion [34], and additional methods discussed in [35, 36].

Another criterion for separability is obtained from the Witness Operator Measurements (WOMs) described in Section II-A, which are highly efficient for entanglement detection. We review this criterion from [2] next. Specifically, let us consider two-qubit witnesses of the form:

	$\displaystyle\rho_{w}(\alpha)$	$\displaystyle=\cos{\alpha}^{2}\mathbf{1}-\left(\ket{\psi}\bra{\psi}\right)^{% \top_{2}}$
		$\displaystyle=\cos{\alpha}^{2}\mathbf{1}-\left(\frac{1+\cos{2\alpha}}{2}\ket{0% 0}\bra{00}+\frac{1-\cos{2\alpha}}{2}\ket{11}\bra{11}+\frac{\sin{2\alpha}}{2}% \left\{\ket{\Psi^{+}}\bra{\Psi^{+}}-\ket{\Psi^{-}}\bra{\Psi^{-}}\right\}\right).$		(4)

where $\ket{\psi}=\cos{\alpha}\ket{00}+\sin{\alpha}\ket{11}$ such that $\alpha\in[0,\pi/4]$ and $\ket{\Psi^{\pm}}=\left(\ket{01}\pm\ket{10}\right)/\sqrt{2}$ . We denote the projectors onto the set of eigenstates of $\rho(\alpha)=\left(\ket{\psi}\bra{\psi}\right)^{\top_{2}}$ by $\mathcal{E}=\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+}},% \ket{\Psi^{-}}\bra{\Psi^{-}}\}$ . Each operator $E_{i}\in\mathcal{E}$ satisfies $E_{i}=E_{i}^{{\dagger}}$ , $E_{i}\geq 0$ , and $\sum_{i}E_{i}=\mathbf{1}$ , forming a Positive Operator-Valued Measure (POVM). Throughout the paper, we refer to this POVM as a Witness Basis Measurement (WBM).

Let us consider a quantum state $\rho$ . Let $f_{i}\coloneqq\Tr{E_{i}\rho}$ be the probability of obtaining outcome $i$ when the state $\rho$ is measured using WBM $\mathcal{E}$ . The expected value of the witness $\Tr(\rho_{w}(\alpha)\rho)$ can be expressed in terms of $f_{i}$ . If this expected value is less than a certain threshold (in our case, 0), we can conclude that $\rho$ is entangled else, this test is inconclusive. When this test is inconclusive, we pick the witnesses in Table I sequentially. These subsequent witnesses are obtained by applying unitary transformations $U_{1}$ and $U_{2}$ on each of the qubits to change in the eigenbasis of the underlying state as shown in (5).

\rho_{w}(\alpha)\longrightarrow(U_{1}\otimes U_{2})^{\dagger}\rho_{w}(\alpha)(% U_{1}\otimes U_{2}).

(5)

Witness	$U_{1}$	$U_{2}$
1	$\mathbf{1}$	$\mathbf{1}$
2	$\mathbf{1}$	$X$
3	$C^{\dagger}$	$C$
4	$C^{\dagger}$	$XC$
5	$C$	$C^{\dagger}$
6	$C$	$XC^{\dagger}$

Table I: Changing the eigenbasis of (4)

Expressing the eigenstates of the first witeness (4) in terms of Pauli operators yields three observables: $Z\mathbf{1}+\mathbf{1}Z$ , $ZZ$ , and $XX+YY$ . Estimates for these three observables come from measuring the first witness. Similarly, the second witness listed in Table I yields estimates for $Z\mathbf{1}-\mathbf{1}Z$ , $ZZ$ , and $XX+YY$ . Thus, for a pair of witnesses, we obtain estimates for five observables by applying suitable unitary transformations, and each of the other two witness pairs provides another five expectation values. In total, we obtain estimates for 15 expectation values, providing sufficient information about the two-qubit state. This, reduction of the number of witnesses from sixteen to six offers significant practical benefits. Instead of relying solely on comparing the expected value of the witness $\rho_{w}(\alpha)$ against a threshold, the authors [2] suggest adopting a more stringent criterion:

\min_{\alpha}\Tr{\rho_{\text{sep}}\left(\cos{\alpha}^{2}\mathbf{1}-\rho_{w}(% \alpha)\right)}\geq 0,\ \ \forall\rho_{\text{sep}}\in S_{ab}.

(6)

which holds for all separable states and is violated by set of entangled states that can be detected by this family of witnesses. The above optimisation leads to the following quadratic WBM criterion,

S=4f_{1}f_{2}-(f_{3}-f_{4})^{2}\geq 0,\ \ \forall\rho_{\text{sep}}\in S_{ab}.

(7)

In essence, the process of measuring the linear entanglement witnesses $\rho_{w}(\alpha)$ corresponds to measuring the projectors onto the eigenstate basis. It is important to note that the value of $S$ (7) depends on the underlying WBM. Thus, for a WBM $\mathcal{E}$ and state $\rho$ , we denote (7) as $S_{\mathcal{E}}(\rho)$ .

II-C Stochastic Multi-Armed Bandits

The stochastic Multi-Armed Bandit (MAB) framework is an archetype for many sequential decision-making problems. Within this framework, a bandit instance (problem instance) encompasses $K$ arms (or actions) situated in an environment where stochastic rewards are yielded upon the selection of an arm (termed pulling) or the execution of an action. We note that each arm $i\in[K]=\{1,2,\ldots,K\}$ is described by a probability distribution $\nu_{i}$ over $\mathbb{R}$ , with known support and an unknown expectation $\mu_{i}$ . We denote the problem instance by $\boldsymbol{\mu}=(\mu_{1},\mu_{2},\ldots\mu_{K})$ . Arm selection occurs iteratively in rounds, where during each round $t$ , a learner (or agent) selects an arm $X_{t}\in[K]$ according to a specified policy. Subsequently, the learner receives a stochastic reward $Z_{t}\sim\nu_{X_{t}}$ corresponding to the selected arm. Upon receiving the reward, the learner can terminate the process or continue by updating its policy to pursue a specific objective.

In the MAB literature, two objectives have been focal points of study. The first objective involves maximizing the cumulative reward accumulated over multiple game rounds, necessitating a trade-off between exploration (discovering arms with potentially higher rewards) and exploitation (repeatedly pulling the arm with the highest observed reward). The second objective, termed the best arm identification (BAI) problem, focuses on pure exploration, where the learner aims to identify the arm with the highest expected reward, , i.e., $i^{\star}=\arg\max_{i}\mu_{i}$ (known as the best arm). A BAI policy (or algorithm) consists of a sampling rule for arm selection, a stop** rule to determine the end of exploration and a recommendation rule to output the best arm. The BAI problem has been explored in two distinct settings: fixed confidence and fixed budget. In the fixed confidence setting, the acceptance error $\delta$ is fixed, aiming to identify the best arm with a probability of at least $1-\delta$ while minimizing arm pulls. In the fixed-budget setting, the number of arm pulls (budget) $T\in\mathbb{N}$ is fixed, and the goal is to minimize the mis-identification probability of the best arm within the allotted budget. Our paper concentrates on the BAI problem, and one of its variants called good arm identification (GAI) in the fixed confidence setting. Below, we summarise some relevant findings from prior research.

II-C1 Fixed Confidence Best Arm Identification

Consider a problem instance denoted by $\boldsymbol{\mu}$ . Without loss of generality, we can enumerate the arms based on their expected rewards, such that $\mu_{1}>\mu_{2}\geq\mu_{3}\ldots\geq\mu_{K}$ . We assume the existence of a unique best arm, denoted as $i^{\star}=1$ . Here, we denote the sub-optimal gaps between the arms as $\Delta_{i}=\mu_{i^{\star}}-\mu_{i}$ . The learner’s objective is to accurately identify the best arm $i^{\star}$ while minimizing the number of samples used. Policies that achieve this task are classified as $\delta$ -PC policies, as defined below.

Definition 2 ( $\delta$ -PC).

Let $\hat{i}_{\tau}$ be the estimate of the best arm at stoppage $\tau$ . Then, an algorithm is said to be $\delta$ -PAC if it satisfies,

\mathbb{P}_{\boldsymbol{\mu}}(\hat{i}_{\tau}\neq i^{\star})\leq\delta,\ % \mathbb{P}_{\boldsymbol{\mu}}(\tau<\infty)=1.

(8)

The primary objective is to characterize the expected stop** time $\mathbb{E}_{\boldsymbol{\mu}}[\tau]$ of the BAI policy. Various research works have attempted to provide upper and lower bounds for this objective. For instance, the successive elimination procedure has been proposed to identify the best arm in $\mathcal{O}(\Delta^{-2}\log(n\Delta^{-2}))$ samples [37]. In comparison, the Lower-Upper Confidence Bound algorithm (LUCB $1$ ) improves upon this by requiring $\mathcal{O}(\Delta^{-2}\log(\Delta^{-2}))$ samples [38]. Additionally, the exponential-gap elimination algorithm achieves a sample complexity of $\mathcal{O}(\Delta^{-2}\log(\log(\Delta^{-2})))$ , which is the best-known in the class of elimination-style policies for BAI under the fixed confidence setting [39]. These upper bounds exhibit a closeness to the lower bound $\mathcal{O}(\Delta^{-2})$ postulated in [40], typically within a factor of $\log$ or $\log\log$ . Notably, the seminal findings of [41] which uses the principles of the Law of Iterated Logarithm (LIL), bridge this gap by delineating the necessity and sufficiency of $\mathcal{O}(\Delta^{-2}\log(\log(\Delta^{-2})))$ samples for accurately identifying the best arm within a specified error margin of $\delta$ . Building upon this insight, [42] proposes lil’UCB, which leverages concentration bounds based on a finite version of the LIL, achieving order optimality in sample complexity akin to exponential-gap elimination.

II-C2 Fixed Confidence Good Arm Identification

Consider a problem instance $\boldsymbol{\mu}$ . Alongside the acceptance error $\delta$ described in Section II-C1, we introduce a threshold $\zeta\in(0,1)$ and define the set of “good” arms as $\mathcal{G}=\{i\in[K]\text{ such that }\mu_{i}\geq\zeta\}$ . In simpler terms, the good arms are those whose means are greater than or equal to $\zeta$ . The number of good arms $|\mathcal{G}|=m$ remains unknown to the agent, leading to what we term as the $(m,K)$ -GAI problem. Notably, the $(1,K)$ -GAI reduces to the BAI problem discussed earlier. Without loss of generality, we enumerate the arms based on their expected rewards: $\mu_{1}>\mu_{2}\geq\ldots\geq\mu_{m}\geq\zeta\geq\mu_{m+1}\ldots\geq\mu_{K}$ . Importantly, the agent is unaware of this indexing. For $i\in[K]$ , $\Delta_{i}\coloneqq|\mu_{i}-\zeta|$ and $\Delta_{i,j}=\mu_{i}-\mu_{j}$ . The sample complexity is expressed in terms of $\Delta=\min(\min_{i\in[K]}\Delta_{i},\min_{j\in[K-1]}\frac{\Delta_{j,j+1}}{2})$ .

At each time instant $t$ , the learner samples an arm $X_{t}\in[K]$ and receives a corresponding (random) reward $Z_{t}\sim\nu_{X_{t}}$ . The agent either outputs an arm that identifies as “good” or stops when no good arms remain. We denote the stop** time of the GAI policy as $\tau_{\text{stop}}$ . Specifically, the agent outputs $\hat{X}_{1},\hat{X}_{2},\ldots\hat{X}_{\hat{m}}$ as good arms at rounds $\tau_{1},\tau_{2},\ldots\tau_{\hat{m}}$ respectively, where $\hat{m}$ denotes the estimate of the number of arms identified as good ones. The learner’s objective is to accurately and rapidly identify these good arms while minimizing the number of samples used. As elaborated below, this is achieved through policies falling within the class of $(\lambda,\delta)$ -PAC policies.

Definition 3 ( $(\lambda,\delta)$ -PAC).

Let $\hat{m}$ denote the number of good arms identified by the agent. A $(\lambda,\delta)$ -PAC algorithm satisfies the following conditions:

If there are at least $\lambda$ good arms, then

\mathbb{P}_{\boldsymbol{\mu}}\left[\{\hat{m}<\lambda\}\cup\bigcup_{i\in\{\hat{% X}_{1},\hat{X}_{2},\ldots\hat{X}_{\lambda}\}}\{\mu_{i}<\zeta\}\right]\leq\delta,

If there are fewer than $\lambda$ good arms,

\mathbb{P}_{\boldsymbol{\mu}}\left[\hat{m}\geq\lambda\right]\leq\delta,

An algorithm is called $\delta$ -PAC if it is $(\lambda,\delta)$ -PAC for all $\lambda\in[K]$ .

Just like in the BAI context (refer to Section II-C1), the objective in GAI is to determine the expected stop** time $\mathbb{E}_{\boldsymbol{\mu}}[\tau_{\text{stop}}]$ . The GAI algorithm consists of two key components: a sampling rule and an identification rule. The former dictates the arm selection process, while the latter guides the agent in distinguishing between good and bad arms. GAI confronts a novel challenge called the exploration-exploitation dilemma of confidence. Here, exploration involves the agent pulling arms other than the empirical best arm to identify potentially ‘good’ arms with fewer pulls. At the same time, exploitation entails pulling the empirical best arm to increase confidence in its classification as a good arm. To address this challenge, [27] proposed a hybrid algorithm for the dilemma of confidence (HDoC). In HDoC, the sampling rule is derived from the UCB algorithm for cumulative regret minimization [25], while the identification rule is based on the LUCB algorithm for BAI [38] and the APT algorithm for the thresholding bandits problem [43]. The proposed HDoC algorithm (LUCB-G) requires $\mathcal{O}\left(\Delta^{-2}\left(K\log\frac{1}{\delta}+K\log K+K\log\frac{1}{% \Delta}\right)\right)$ samples. However, a drawback of the LUCB-G algorithm is its impracticality when $\Delta$ is very small. To address this issue and achieve faster convergence in the identification phase, [44] propose utilizing confidence widths derived from the finite LIL bound, akin to the approach in the lil’UCB algorithm [42]. They demonstrate a reduction in the required number of samples, achieving a sample complexity of $\mathcal{O}\left(\Delta^{-2}\left(K\log\frac{1}{\delta}+K\log K+K\log\log\frac% {1}{\Delta}\right)\right)$ . The specific connections between BAI/GAI and entanglement detection are elaborated in Section III and IV.

III The Quantum MAB Framework For Entanglement Detection

In this section, we introduce the quantum Multi-Armed Bandit (MAB) framework for entanglement detection. First, we highlight the structural similarity between this framework and the stochastic MAB model. In stochastic MAB, pulling an arm $i$ corresponds to sampling from a probability distribution $p_{i}(\cdot)$ with known support and unknown mean $\mu_{i}$ . When an arm is pulled, a reward $j$ is obtained with probability (w.p.) $p_{i}(j)$ . In each round, different arms can be pulled, yielding independent and identically distributed (i.i.d.) rewards. Analogously, in the quantum setting, each arm represents an unknown quantum state $\rho$ . When $\rho$ is measured, the underlying probability distribution of the rewards is determined by the measurement $\mathcal{E}$ . Specifically, if a Witness Basis Measurement (WBM) $\mathcal{E}$ is chosen, measuring a state $\rho$ with $\mathcal{E}$ will result in a reward $j\in\{1,2,3,4\}$ with probability $\Tr(\rho E_{j})$ . Once the measurement is fixed, the rewards obtained from measuring $\rho$ are i.i.d. The subtle difference between the two models lies in the source of the rewards. In the stochastic MAB model, rewards are obtained by sampling from i.i.d. distributions, whereas in the quantum MAB model, the rewards depend on the chosen WBM.

In the Best Arm Identification (BAI) setting of stochastic MAB, the primary parameters of interest are the means of the rewards. Similarly, in the quantum analogue, $S_{\mathcal{E}}(\rho)$ is the parameter of interest. As discussed in Section II-B, for a given state $\rho$ and WBM $\mathcal{E}$ , the value of $S_{\mathcal{E}}(\rho)$ determines whether the state is entangled. The specific problem we consider involves $K$ arms (states), of which $m$ are bad (entangled), and our goal is to identify these entangled states. We summarize this correspondence concisely in Table II.

Table II: Stochastic-Quantum MAB

Attributes	Stochastic MAB	Quantum MAB
Arms	Probability distributions $(p_{1},p_{2},\ldots p_{K})$	Density operators $\{\rho_{1},\rho_{2},\ldots,\rho_{K}\}$
Measurement	$-$	WBM $\mathcal{E}$
Measurement Data	$j\ \text{w.p.}\ p_{i}(j),\forall i\in[K]$	$j\ \text{w.p.}\ \Tr(E_{j}\ \rho_{i}),\ \forall j\in[4],\forall i\in[K]$
Parameters to estimate	$\boldsymbol{\mu}=(\mu_{1},\mu_{2},\ldots\mu_{K})$	$\boldsymbol{S}_{\mathcal{E}}=(S_{\mathcal{E}}(\rho_{1}),S_{\mathcal{E}}(\rho_{% 2}),\ldots,S_{\mathcal{E}}(\rho_{K}))$
Objective	Identify $\mathcal{G}^{C}=\{i\in[K]\ \text{such that}\ \mu_{i}\leq\zeta\}$	Identify $\mathcal{A}_{\text{ent}}=\{i\in[K]\ \text{such that}\ S_{\mathcal{E}}(\rho_{i}% )<0\}$

More formally, the objective of the learner is to accurately identify $\mathcal{A}_{\text{ent}}=\{i\in[K]\ \text{such that}\ S_{\mathcal{E}}(\rho_{i}% )<0\}$ , while minimizing the number of measurements. This aligns with the goal of the $(m,K)$ -Bad Arm identification which aims to identify all those arms $\mathcal{G}^{C}=\{i\in[K]\ \text{such that}\ \mu_{i}\leq\zeta\}$ whose means $\mu_{i}$ fall below a specified threshold $\zeta$ . In essence, solving the $(m,K)$ -Bad Arm identification is tantamount to addressing the $(m,K)$ -quantum MAB problem. We define the $(m,K)$ -quantum MAB setting as follows,

Definition 4.

The $(m,K)$ -quantum Multi-Armed Bandit (MAB) setting for entanglement detection is fully characterized by the tuple $(\mathcal{A},\mathcal{E})$ . Here, $\mathcal{A}$ denotes a finite action set with $|\mathcal{A}|=K$ , consisting of $(K-m)$ two-qubit separable states and $m$ two-qubit entangled states. The term $\mathcal{E}$ corresponds to a suitable Witness Basis Measurement (WBM).

Remark 1.

The $d$ -dimensional discrete multi-armed quantum bandit model [45] is different from our formulation. The authors consider arms to be a finite set of observables and the environment, an unknown quantum state $\rho$ . The objective is to learn the unknown quantum state $\rho$ through an exploration-exploitation tradeoff. Given sequential oracle access to copies of $\rho$ , each round involves selecting an observable to maximize its expectation value (reward). The information from previous rounds (history) aids in refining the action choice, thereby minimizing the regret, which is the difference between the obtained and maximal rewards. The authors also exploit the inherent linear structure in measurement outcomes and map it to the linear bandit setting. Specifically, let $\{\sigma\}_{i=1}^{d^{2}}$ be a set of orthogonal Hermitian matrices. The unknown environment $\rho=\sum_{i=1}^{d^{2}}\Tr(\rho\sigma_{i})\sigma_{i}=\sum_{i=1}^{d^{2}}\theta_% {i}\sigma_{i}$ and arm $\mathcal{O}_{t}=\sum_{i=1}^{d^{2}}\Tr(\mathcal{O}_{t}\sigma_{i})\sigma_{i}=% \sum_{i=1}^{d^{2}}A_{t,i}\sigma_{i}$ . Then, $\Tr(\rho\mathcal{O}_{t})=\boldsymbol{\theta}^{\top}\mathbf{A}_{t}$ where $\boldsymbol{\theta}=(\theta_{1},\theta_{2},\ldots\theta_{d^{2}})$ and $\mathbf{A}_{t}=(A_{t,1},A_{t,2},\ldots A_{t,d^{2}})$ . In round $t$ , pulling arm $\mathcal{O}_{t}$ provides a reward $X_{t}=\boldsymbol{\theta}^{\top}\mathbf{A}_{t}+\eta_{t}$ , where $\eta_{t}$ is 1-subgaussian.

To demonstrate the functionality of MAB policies, we identify suitable WBMs for families of parameterized two-qubit states denoted by $\mathcal{F}$ .

III-A Two-qubit Depolarized Bell States

For $p\in\mathbb{R},\frac{-1}{3}\leq p\leq 1$ , a two-qubit Depolarized Bell state $\rho(p)$ is given by,

\rho(p)=p\ket{\Upsilon}\bra{\Upsilon}+(1-p)\frac{\mathbf{1}}{4}.

(9)

Here, $\ket{\Upsilon}$ represents any one of the four Bell states $\ket{\Psi^{\pm}}=\left(\ket{01}\pm\ket{10}\right)/\sqrt{2}$ , $\ket{\Phi^{\pm}}=\left(\ket{00}\pm\ket{11}\right)/\sqrt{2}$ . When $\Upsilon=\ket{\Psi^{-}}$ , (9) is called a Werner state and when $\Upsilon=\ket{\Phi^{+}}$ , it is called an Isotropic state. The Peres-Horodecki criterion guarantees that $\rho(p)$ is separable when $\frac{-1}{3}\leq p\leq\frac{1}{3}$ and is entangled when $\frac{1}{3}<p\leq 1$ . Table III outlines the specific choices of WBM for the combination of the maximally mixed state with each of the four Bell states. When measured with these corresponding WBMs, the entangled depolarized bell states are conclusively detected, determined by the value of $S=(p-1)^{2}/4-p^{2}$ which is strictly positive for $-1\leq p\leq 1/3$ and negative for $p>1/3$ .

Table III: WBM for Depolarized Bell States

Depolarized State	Pauli Basis	WBM
$p\ket{\Phi^{+}}\bra{\Phi^{+}}+(1-p)\mathbf{1}/4$	$\big{[}\mathbf{1}+\alpha(XX-YY+ZZ)\big{]}/4$	$\{\ket{01}\bra{01},\ket{10}\bra{10},\ket{\Phi^{+}}\bra{\Phi^{+}},\ket{\Phi^{-}% }\bra{\Phi^{-}}\}$
$p\ket{\Psi^{+}}\bra{\Psi^{+}}+(1-p)\mathbf{1}/4$	$\big{[}\mathbf{1}+\alpha(XX+YY-ZZ)\big{]}/4$	$\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+}},\ket{\Psi^{-}% }\bra{\Psi^{-}}\}$
$p\ket{\Psi^{-}}\bra{\Psi^{-}}+(1-p)\mathbf{1}/4$	$\big{[}\mathbf{1}+\alpha(-XX-YY-ZZ)\big{]}/4$	$\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+}},\ket{\Psi^{-}% }\bra{\Psi^{-}}\}$
$p\ket{\Phi^{-}}\bra{\Phi^{-}}+(1-p)\mathbf{1}/4$	$\big{[}\mathbf{1}+\alpha(-XX+YY+ZZ)\big{]}/4$	$\{\ket{01}\bra{01},\ket{10}\bra{10},\ket{\Phi^{+}}\bra{\Phi^{+}},\ket{\Phi^{-}% }\bra{\Phi^{-}}\}$

III-B Two-qubit Bell diagonal States

Bell diagonal states are a probabilistic mixture of the four Bell states. These states are more general than the ones in (9). Given parameters $p_{1}$ , $p_{2}$ , $p_{3}$ and $p_{4}$ such that $p_{i}\geq 0,\sum_{i}p_{i}=1$ , the Bell diagonal state is defined,

\rho_{\text{Bell}}=p_{1}\ket{\Phi^{+}}\bra{\Phi^{+}}+p_{2}\ket{\Psi^{+}}\bra{% \Psi^{+}}+p_{3}\ket{\Psi^{-}}\bra{\Psi^{-}}+p_{4}\ket{\Phi^{-}}\bra{\Phi^{-}}.

(10)

The eigenvalues of $\rho_{\text{Bell}}^{\top_{2}}$ are calculated to be $\frac{1}{2}-p_{1}$ , $\frac{1}{2}-p_{2}$ , $\frac{1}{2}-p_{3}$ and $\frac{1}{2}-p_{4}$ . Consequently, a Bell diagonal state is entangled if any one of these probabilities exceeds $1/2$ , while the sum of the other three probabilities is less than $1/2$ . Conversely, a Bell diagonal state is separable if all probabilities are less than or equal to $1/2$ . Expressing (10) in the Pauli basis yields,

\rho_{\text{Bell}}=\frac{1}{4}\left[\mathbf{1}+aXX+bYY+cZZ\right],

where $a=p_{1}+p_{2}-p_{3}-p_{4}$ , $b=-p_{1}+p_{2}-p_{3}+p_{4}$ and $c=p_{1}-p_{2}-p_{3}+p_{4}$ . When $\rho_{\text{Bell}}$ is entangled, the index for which $p_{i}>1/2$ determines the sign of $a,b,$ and $c$ , see Table IV. It is notable that the signs of $a,b$ and $c$ follow a similar pattern to the Pauli basis expansion of various Depolarized Bell states listed in Table III. We observe that, for suitable combinations of $a,b$ , and $c\in\{+1,-1\}$ , the Bell diagonal state reduces to one of the Depolarized Bell states and states can be detected using the same WBMs, as in Table III. Specifically, the value of $S$ under the two WBMs in Table IV is equal to $(1-p_{1}-p_{4})^{2}-4(p_{1}-p_{4})^{2}$ and $(1-p_{2}-p_{3})^{2}-4(p_{2}-p_{3})^{2}$ , respectively. Depending on the probabilistic mixture, one of the two WBMs will conclusively result in $S<0$ .

Table IV: WBM for Bell Diagonal States

Probabilistic mixture	a	b	c	WBM
$p_{1}>0.5,\ p_{2}+p_{3}+p_{4}<0.5$	$+$	$-$	$+$	$\{\ket{01}\bra{01},\ket{10}\bra{10},\ket{\Phi^{+}}\bra{\Phi^{+}},\ket{\Phi^{-}% }\bra{\Phi^{-}}\}$
$p_{2}>0.5,\ p_{1}+p_{3}+p_{4}<0.5$	$+$	$+$	$-$	$\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+}},\ket{\Psi^{-}% }\bra{\Psi^{-}}\}$
$p_{3}>0.5,\ p_{1}+p_{2}+p_{4}<0.5$	$-$	$-$	$-$	$\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+}},\ket{\Psi^{-}% }\bra{\Psi^{-}}\}$
$p_{4}>0.5,\ p_{1}+p_{2}+p_{3}<0.5$	$-$	$+$	$-$	$\{\ket{01}\bra{01},\ket{10}\bra{10},\ket{\Phi^{+}}\bra{\Phi^{+}},\ket{\Phi^{-}% }\bra{\Phi^{-}}\}$

III-C Two-qubit Amplitude Dam** on Depolarized Bell States

A qubit amplitude dam** channel is a source of noise in superconducting circuit-based quantum computing and thus, serves as a realistic channel model for simulating lossy processes in these systems. Mathematically, it can be obtained from an isometry $J$ ,

J:\mathcal{H}_{a}\mapsto\mathcal{H}_{b}\otimes\mathcal{H}_{c};\ \ J^{\dagger}J% =\mathbf{1}_{a}

(11)

where $\mathcal{H}_{a}$ denotes the Hilbert space for the channel’s input, and $\mathcal{H}_{b}$ and $\mathcal{H}_{c}$ represent the Hilbert spaces for the direct and complementary channel outputs, respectively. An isometry of the form,

	$\displaystyle J_{1}\ket{0}_{a}$	$\displaystyle=\ket{0}_{b}\ket{1}_{c},$
	$\displaystyle J_{1}\ket{1}_{a}$	$\displaystyle=\sqrt{1-r}\ket{1}_{b}\ket{1}_{c}+\sqrt{r}\ket{0}_{b}\ket{0}_{c},$		(12)

where $0\leq r\leq 1$ defines a pair of channels, $\mathcal{B}(A)=\Tr_{c}(JAJ^{{\dagger}})$ and $\mathcal{C}(A)=\Tr_{b}(JAJ^{{\dagger}})$ . Here, $\mathcal{B}$ is an amplitude dam** channel with dam** probability $r$ for the state $\ket{1}_{a}$ to decay to output state $\ket{0}_{b}$ . The isometry $J_{1}=K_{0}\otimes\ket{0}+K_{1}\otimes\ket{1}$ where $K_{0}$ and $K_{1}$ (Kraus) dam** operators such that $K_{0}=[0,\sqrt{r};0,0]$ and $K_{1}=[1,0;0,\sqrt{1-r}]$ . For a single qubit represented by state $\rho$ , the amplitude damped output is given by,

\mathcal{B}(\rho)=K_{0}\rho K_{0}^{\dagger}+K_{1}\rho K_{1}^{\dagger}.

(13)

We can extend (13) for two qubit states with dam** probabilities $r$ and $q$ for the first and second qubit respectively. Assuming that $r=q$ , we consider Depolarized bell states (9) with amplitude dam**.

Proposition 5.

For any dam** probability $r>0$ , a Depolarized Bell state with amplitude dam** can not be expressed as a Bell diagonal state (10).

This fact can be readily demonstrated through a straightforward calculation. Consider the Isotropic state $\rho(p)=p\ket{\Phi^{+}}\bra{\Phi^{+}}+(1-p)\frac{\mathbf{1}}{4}$ , which can be represented by the Bell diagonal state formed with probability distribution $(p_{1},p_{2},p_{3},p_{4})=\left((3p+1)/4,(1-p)/4,(1-p)/4,(1-p)/4\right)$ . In a Bell diagonal state, the diagonal elements corresponding to $\ket{00}\bra{00}$ and $\ket{11}\bra{11}$ are identical. In the case of an amplitude damped Isotropic state, we observe that,

p_{2}=p_{3}=\frac{1-r}{4}\left(p-pr-r-1\right).

However, obtaining closed-form expressions for $p_{1}$ and $p_{4}$ when $r>0$ is cumbersome. Specifically, the values on the diagonal corresponding to $\ket{00}\bra{00}$ and $\ket{11}\bra{11}$ is given by $\frac{p}{2}(r^{2}+1)-\frac{p-1}{4}(r+1)^{2}$ and $\frac{p}{2}(r-1)^{2}-\frac{p-1}{4}(r-1)^{2}$ , respectively. These expressions are equal only when $r=0$ .

Proposition 6.

For every $p\in[\frac{1}{3},1]$ , there exists $\tilde{r}\subset[0,1]$ such that an amplitude damped Depolarized Bell state becomes separable.

The PPT criterion asserts that a two-qubit state is entangled if and only if its partial transpose contains atleast one negative eigenvalue. For Bell states that are both amplitude damped and depolarized, we evaluate the eigenvalues and observe that one of them can exhibit either positive or negative values contingent upon the range of $r$ . Detailed findings are presented in Table V and depicted graphically in Fig. 1(a) and Fig. 1(b). Furthermore, the WBM for amplitude damped and Depolarized Bell states aligns with that of depolarized Bell states, as outlined in Table III.

Table V: The four eigenvalues of amplitude damped Depolarized Bell states

State with $\ket{\Phi}^{\pm}$	State with $\ket{\Psi}^{\pm}$	Sign of eigenvalue
$\frac{(p+1)(1-r^{2})}{4}$	$\frac{(1-r)(1+r+p-pr)}{4}$	Always positive
$\frac{(p+1)(1-r)^{2}}{4}$	$\frac{(1-r)(1+r+p-pr)}{4}$	Always positive
$\frac{p(r-1)^{2}+(r+1)^{2}}{4}$	$\frac{r^{2}+1-p(1-r)^{2}+2\sqrt{p^{2}(1-r)^{2}+r^{2}}}{4}$	Always positive
$\frac{-r^{2}(p-1)+pr+(1-3p)}{4}$	$\frac{r^{2}+1-p(1-r)^{2}-2\sqrt{p^{2}(1-r)^{2}+r^{2}}}{4}$	Positive and Negative

Refer to caption — (a) Range of $r$ for eigenvalue corresponding to $\ket{\Phi^{\pm}}$

IV Stochastic MAB policies for Entanglement Detection

In this section, we discuss stochastic MAB-based algorithms for entanglement detection in parameterized states within $\mathcal{F}$ , as outlined in Section III. We will use stochastic MAB terminology in alignment with its quantum counterparts, as shown in Table II. We consider a set of $K$ unknown arms, denoted by $\mathcal{A}=\{\rho_{1},\rho_{2},\ldots,\rho_{K}\}\in\mathcal{F}$ . To perform measurements on the arms, the learner requires the knowledge of the underlying WBM. Therefore, we assume familiarity with the specific forms of the arms in $\mathcal{A}$ , as they are detectable under the WBMs $\mathcal{E}_{1}=\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+% }},\ket{\Psi^{-}}\bra{\Psi^{-}}\}$ or $\mathcal{E}_{2}=\{\ket{01}\bra{01},\ket{10}\bra{10},\ket{\Phi^{+}}\bra{\Phi^{+% }},\ket{\Phi^{-}}\bra{\Phi^{-}}\}$ . Here, $\mathcal{E}_{1}$ and $\mathcal{E}_{2}$ correspond to the WBMs of the first two witnesses in Table I, respectively. For example, consider $\rho_{i}=p_{i}\ket{\Phi^{+}}\bra{\Phi^{+}}+(1-p_{i})\frac{I}{4}$ , for all $i\in[K]$ , where $p_{i}$ is unknown. These are isotropic states, which are probabilistic mixtures of the maximally mixed state and the Bell state $\ket{\Phi^{+}}$ and can be detected using WBM $\mathcal{E}_{2}$ . With this assumption, we describe the template for the MAB problem as follows: In each round $t\in\mathbb{N}$ ,

•

The learner selects an arm $i\in[K]$ .
•

The learner performs a measurement $\mathcal{E}$ and obtains outcome $j$ with probability $\Tr{\rho_{i}E_{j}}$ , where $j\in\{1,2,3,4\}$ .
•

The learner updates the values of $\boldsymbol{\hat{S}}_{\mathcal{E}}$ and identifies the entangled arm(s) or continues.

For a given WBM $\mathcal{E}$ , the values of $S_{\mathcal{E}}$ are bounded in $[1,-1]$ . We can use concentration inequalities applicable to 1-subgaussian random variables. We apply the law of iterated logarithm [42] for a finite sum of 1-subgaussian random variables:

Lemma 7.

Let $X_{1},X_{2},\ldots X_{t}$ be i.i.d. sub-gaussian random variables with scale parameter $\sigma=1$ . For any $\varepsilon\in(0,1)$ , $\delta\in\left(0,\frac{\log(1+\varepsilon)}{e}\right)$ , one has with probability at least $1-c_{\varepsilon}\delta^{(1+\varepsilon)}$ for all $t\geq 1$ ,

\frac{1}{t}\sum_{s=1}^{t}X_{s}\leq U(t,\delta),

(14)

where $U(t,\delta)=(1+\sqrt{\varepsilon})\sqrt{\frac{2(1+\varepsilon)}{t}\log\left(% \frac{\log\left((1+\varepsilon)t\right)}{\delta}\right)}$ is the confidence width and $c_{\varepsilon}=\frac{2+\varepsilon}{\varepsilon}\left(\frac{1}{\log(1+% \varepsilon)}\right)^{1+\varepsilon}$ .

Proof.

Readers can refer in [42, Lemma 1]. ∎

In the subsequent sections, we discuss two MAB policies: successive elimination for scenarios where there is a promise of one entangled arm among $K$ arms, and the HDoC policy for cases where there are $m$ entangled arms among $K$ arms, with $m$ being unknown.

IV-A Modified Successive Elimination Algorithm

We consider the $(1,K)$ -quantum MAB problem and characterise the expected stop** time for a modified version of the Successive Elimination algorithm [37] outlined as Algorithm 1. We are presented with $K$ arms such that $S_{\mathcal{E}}(\rho_{1})\geq S_{\mathcal{E}}(\rho_{2})\geq S_{\mathcal{E}}(% \rho_{3})\ldots>S_{\mathcal{E}}(\rho_{K-1})>0>S_{\mathcal{E}}(\rho_{K})$ . The algorithm takes the set of arms $[K]$ , the threshold value $0$ , and the error probability $\delta$ as input and outputs the arm $i^{\star}=\arg\min_{i\in[K]}S_{\mathcal{E}}(\rho_{i})$ . Let $N_{i}(t)$ denote the number of times arm $i$ has been sampled in $t$ rounds and $\hat{S}_{i,N_{i}(t)}$ is the estimate of $S_{\mathcal{E}}(\rho_{i})$ obtained on pulling arm $i$ until time $t$ . The algorithm maintains an active set $\Omega$ and samples every arm in it. Subsequently, the estimates and Lower Confidence Bound (LCB) for the active arms are updated. In order to identify $i^{\star}$ , the policy eliminates arms whose LCB exceeds the threshold and halts when only one arm remains in the active set.

Algorithm 1 Modified Successive Elimination Algorithm

0: threshold

\zeta=0

, acceptance error rate

\delta

, arms

\mathcal{A}\leftarrow[K]

\Omega

Active set

\Omega\leftarrow[K]

\hat{S}_{i,N_{i}(t)}=0,\ \forall i\in\Omega

for

t=1,2,3,\ldots

Sample every arm

i\in\Omega

Update confidence width

U\left(N_{i}(t),\frac{\delta}{c_{\varepsilon}K}\right)\leftarrow(1+\sqrt{% \varepsilon})\sqrt{\frac{2(1+\varepsilon)}{N_{i}(t)}\log\left(\frac{c_{% \varepsilon}K\log\left((1+\varepsilon)N_{i}(t)\right)}{\delta}\right)}

Update

\hat{S}_{i,N_{i}(t)},\text{LCB}_{i}(t)\leftarrow\hat{S}_{i,N_{i}(t)}-U\left(N_% {i}(t),\frac{\delta}{c_{\varepsilon}K}\right)

\text{LCB}_{i}(t)>0,i\in\Omega

then

\Omega\leftarrow\Omega-\{i\}

end if

|\Omega|=1

then

Return

\Omega

end if

end for

Lemma 8.

Algorithm 1 is $\delta$ -PC.

Proof.

The proof is presented in Appendix VII-A1. ∎

The correctness of Algorithm 1 and the sample complexity of identifying the entangled arm is presented below.

Theorem 9.

With probability at least $1-\delta$ , the arm $i^{\star}=K=\arg\min_{i\in[K]}S_{\mathcal{E}}(\rho_{i})$ remains in the active set $\Omega$ till termination.

Proof.

The proof is presented in Appendix VII-A2. ∎

Theorem 10.

Algorithm 1 successfully identifies the arm $i^{\star}$ with probability $1-\delta$ and will terminate after $\sum_{i\in[K]}\mathcal{O}\left(\Delta_{i}^{-2}\log\left(\frac{K\log\Delta_{i}^% {-2}}{\delta}\right)\right)$ samples where $\Delta_{i}=|S_{\mathcal{E}}(\rho_{i})-\zeta|$ is the sub-optimal gap with respect to the threshold $\zeta$ .

Proof.

The proof is presented in Appendix VII-A3. ∎

We see that the sample complexity achieved in Theorem 10 is within a $\log(K)$ factor of the optimum as proven in Theorem 1 in [42]. Thus, given a $(1,K)$ -quantum MAB framework prescribed by $(\mathcal{A},\mathcal{E})$ , we use the recipe provided in Algorithm 1 to identify the entangled arm.

IV-B Modified lil’HDoC Algorithm

The lil’HDoC algorithm introduced in [44], is a variant of the HDoC algorithm proposed by [27]. This algorithm employs a novel approach by integrating the sampling rule based on the UCB algorithm for regret minimization, as detailed in [25] with an identification rule based on the confidence bound outlined in Lemma 7. In contrast to the LCB-based identification rule utilized in the HDoC algorithm [38], the integration of the LIL-based concentration in lil’HDoC presents a notable enhancement in sample complexity. This improvement stems from the observation that the LIL bound $\sqrt{\frac{\log\log t}{t}}$ exhibits a higher growth rate compared to the LCB bound $\sqrt{\frac{\log t}{t}}$ , consequently leading to a reduction in the required number of samples. In other words, there exists a value $T$ such that for all $t>T$ , $c_{1},c_{2}\in\mathbb{R}^{+}$ ,

c_{1}\sqrt{\frac{\log t}{t}}>c_{2}\sqrt{\frac{\log\log t}{t}}.

Consequently, by ensuring that each arm is sampled at least $T$ times initially, lil’HDoC not only accelerates the pace at which its confidence bound grows but also attains adequate confidence in identifying the good arms. We have that the confidence bound of HDoC $\alpha(t)=\sqrt{\frac{\ln(\frac{4Kt^{2}}{\delta})}{2t}}$ . Through straightforward calculations, we see that the smallest integer $T$ such that the confidence bound of lil’HDoC $U\left(T,\frac{\delta}{c_{\varepsilon}K}\right)$ grows faster than $\alpha(T)$ is,

T\geq\frac{1}{4}\log(K+1)\log\left(\max\left(\frac{1}{\delta},2\right)\right)c% _{\varepsilon}^{3/2}.

(15)

Thus, if each arm is initially sampled $T$ times, lil’HDoC achieves comparable identification capabilities to HDoC and possesses a sample complexity of $\mathcal{O}\left(\log(K+1)\log\left(\max\left(\frac{1}{\delta},2\right)\right)\right)$ samples on each arm. Now, let us map the lil’HDoC algorithm outlined as Algorithm 2 onto the $(m,K)$ -quantum MAB problem and characterise the expected stop** time.

Consider $K$ arms such that $S_{\mathcal{E}}(\rho_{1})\geq S_{\mathcal{E}}(\rho_{2})\ldots>S_{\mathcal{E}}(% \rho_{K-m})>0>S_{\mathcal{E}}(\rho_{K-m+1})\ldots>S_{\mathcal{E}}(\rho_{K})$ , with $m$ being unknown. The algorithm takes the set of arms $[K]$ , the threshold value $0$ , and the error probability $\delta$ as input and outputs the set of arms $\Omega=\{i\in[K]\ \text{such that}\ S_{\mathcal{E}}(\rho_{i})<0\}$ . Firstly, every arm is sampled for a minimum of $T$ times (15). While the arm set $\mathcal{A}\neq\emptyset$ , the algorithm keeps track of the active arms and employs the sampling rule and identification rule explained earlier.

Algorithm 2 lil’HdoC

0: threshold

\zeta=0

, acceptance error rate

\delta

, arms

\mathcal{A}\leftarrow[K]

\mathcal{A}_{\text{ent}}

t\leftarrow 0,\mathcal{A}_{\text{ent}}\leftarrow\emptyset

for each arm

i\in\mathcal{A}

Pull arm

i

for

T

times

N_{i}(t)\leftarrow T

end for

while

\mathcal{A}\neq\emptyset

Pull arm

h_{t}=\arg\max_{i\in\mathcal{A}}\ \hat{S}_{i,N_{i}(t)}+\sqrt{\frac{\log t}{2N_% {i}(t)}}

\hat{S}_{h_{t},N_{h_{t}}(t)}-U\left(N_{h_{t}}(t),\frac{\delta}{c_{\varepsilon}% K}\right)\geq\zeta

then

Remove

h_{t}

from

\mathcal{A}

else if

\hat{S}_{h_{t},N_{h_{t}}(t)}+U\left(N_{h_{t}}(t),\frac{\delta}{c_{\varepsilon}% K}\right)<\zeta

then

Add

h_{t}

\mathcal{A}_{\text{ent}}

Remove

h_{t}

from

\mathcal{A}

end if

end while

To demonstrate the correctness of Algorithm 2, we first show that the algorithm is $(\lambda,\delta)$ -PAC for all $\lambda\in[K]$ and then characterise the sample complexity of identifying $m$ bad arms (entangled states).

Lemma 11.

Algorithm 2 is $\delta$ -PAC.

Proof.

The proof is presented in Appendix VII-B1. ∎

Theorem 12.

With probability at least $1-\delta$ , the algorithm identifies all the arms in $\mathcal{A}_{\text{ent}}$ .

Proof.

The proof is presented in Appendix VII-B2. ∎

With $T=1$ in (15), it can be seen from Theorem 10 that the number of samples required to identify an entangled arm $i\in[K]$ is $\mathcal{O}\left(\Delta_{i}^{-2}\log\left(\frac{K\log\Delta_{i}^{-2}}{\delta}% \right)\right)$ . However, in practice, $T$ is chosen to be larger than 1, and the total sample complexity is expressed in terms of $\Delta=\min_{i\in[K]}\Delta_{i}$ .

Theorem 13.

With probability $1-\delta$ and $T$ as given in (15), the total sample complexity of Algorithm 2 is $\mathcal{O}\left(\Delta^{-2}\left(K\log\frac{1}{\delta}+K\log K+K\log\log\frac% {1}{\Delta}\right)\right)+\mathcal{O}\left(K\log(K+1)\log\left(\max\left(\frac% {1}{\delta},e\right)\right)\right)$ .

Proof.

The first term in the sample complexity is derived in Appendix VII-A3 and the second term follows from (15). ∎

V Workflow for Entanglement Detection

In this section, we present a workflow for entanglement detection in scenarios where the arms in $\mathcal{A}$ are detectable under distinct WBMs. In this routine, we suitably utilize the stochastic MAB policies discussed in the previous section. Specifically, we relax the assumption that the learner must have prior knowledge of the specific WBM, thereby enabling the sequential adaptation of WBMs through suitable unitary transformations. We evaluate the performance of this methodology on Depolarized Bell states and arbitrary quantum states. In particular, we select $K$ states, with $m$ of them being entangled, and investigate the numerical results of the $(m,K)$ quantum MAB problems.

V-A Entanglement Detection in Depolarised Bell states

We present numerical results on the sample complexity of entanglement detection for the $(m,K)$ -quantum MAB problem, specifically addressing Depolarized Bell states. These states are known to be detectable under the witnesses $\mathcal{E}_{1}$ and $\mathcal{E}_{2}$ , outlined in Table I. The procedure for entanglement detection is detailed in Algorithm 3. The algorithm operates with an input threshold of $\zeta=0$ , an accepted error rate $\delta$ , a set of $K$ Depolarized Bell states—of which $m$ are entangled—and two WBMs. The sequence of WBMs in Algorithm 3 follows the order $\mathcal{E}_{1}$ and then $\mathcal{E}_{2}$ . It is important to note that the sequence in which the WBMs are selected is arbitrary, as the algorithm does not involve state estimation during the process. We note that for the $(1,K)$ -quantum MAB problem, there is a promise that one arm is entangled so the value of $m=1$ is known to the policy. Let us consider the following two experiments for $K=5$ arms.

•

In the first experiment, we generate five isotropic states (defined below (9)), such that exactly one of them is entangled and can be detected under WBM $\mathcal{E}_{1}$ . As described earlier, we randomly generate the values of $p$ and under WBM $\mathcal{E}_{1}$ , we compute $\boldsymbol{S}_{\mathcal{E}_{1}}=\left[0.3329,0.0577,0.3110,0.1870,-0.2401\right]$ . For $\delta\in(0,1)$ , Algorithm 2 is iterated over 500 runs with WBM $\mathcal{E}_{1}$ , confidence width $U(t,\delta)=\sqrt{\frac{\log(\frac{4Kt^{2}}{\delta})}{2t}}$ and $m=1$ .
•

In the second experiment, we generate five depolarized Bell states formed with any of the Bell states. We randomly generate the values of $p$ such that one of these states is entangled. Under WBM $\mathcal{E}_{1}$ and $\mathcal{E}_{2}$ , we get $\boldsymbol{S}_{\mathcal{E}_{1}}=\left[0.3333,0.2138,0.3252,0.1484,0.4706\right]$ and $\boldsymbol{S}_{\mathcal{E}_{2}}=\left[0.1547,0.2839,0.1484,0.3252,-0.0398\right]$ . Here, the WBM is unknown to the learner. Since there is a promise ( $m=1$ ) that one arm is entangled, the learner should measure with at least one of the two WBMs. For $\delta\in(0,1)$ , Algorithm 3 is iterated over $500$ runs. For both these experiments, we plot the average number of samples until stoppage on the y-axis and $\delta$ on the x-axis as shown in Fig. 2.

Algorithm 3

(m,K)

-quantum MAB policy for states in

\mathcal{F}

0: threshold

\zeta=0

, acceptance error rates

\delta

, arms

\mathcal{A}\leftarrow[K]

, WBMs

\{\mathcal{E}_{1},\mathcal{E}_{2}\}

A_{\text{ent}}

With

\mathcal{E}\leftarrow\mathcal{E}_{1}

, run Algorithm 2 for

K

arms with

U(t,\delta)\leftarrow\sqrt{\frac{\log(\frac{4Kt^{2}}{\delta})}{2t}}

and return stop** time

\tau_{1}

and entangled arms

A_{\text{ent}}^{(1)}

if (

|A_{\text{ent}}^{(1)}|==1

and

m==1

) or (

|A_{\text{ent}}^{(1)}|==K

) then

A_{\text{ent}}^{(2)}\leftarrow\emptyset

else if

|A_{\text{ent}}^{(1)}|<K

then

With

\mathcal{E}\leftarrow\mathcal{E}_{2}

, run Algorithm 2 for

K-|A_{\text{ent}}^{(1)}|

arms with

U(t,\delta)\leftarrow\sqrt{\frac{\log(\frac{4Kt^{2}}{\delta})}{2t}}

and return stop** time

\tau_{2}

and entangled arms

A_{\text{ent}}^{(2)}

end if

A_{\text{ent}}\leftarrow A_{\text{ent}}^{(1)}+A_{\text{ent}}^{(2)}

We present numerical results on the sample complexity of entanglement detection for the $(m,K)$ -quantum MAB problem for Depolarized Bell states, with $m$ being unknown to the policy. We consider the following two experiments with $K=5$ arms.

•

In the first experiment, we generate five isotropic states as described earlier. Under WBM $\mathcal{E}_{1}$ , we get that $\boldsymbol{S}_{\mathcal{E}_{1}}=\left[0.0391,0.0664,-0.5177,-0.8978,-0.0616\right]$ . For $\delta\in(0,1)$ , Algorithm 2 is iterated over 500 runs with WBM $\mathcal{E}_{1}$ . Here, $m=3$ and is unknown to the policy.
•

In the second experiment, we generate five depolarized Bell states formed with any of the Bell states. Under WBM $\mathcal{E}_{1}$ and $\mathcal{E}_{2}$ , the parameters are $\boldsymbol{S}_{\mathcal{E}_{1}}=\left[0.4598,0.3191,0.3694,0.5965,0.9670\right]$ and $\boldsymbol{S}_{\mathcal{E}_{2}}=\left[-0.0233,0.1724,0.1073,-0.2449,-0.9344\right]$ respectively. Although the states are detectable under WBM $\mathcal{E}_{2}$ , it is unknown to the learner. Thus, we need to run at least one iteration of Algorithm 2. In the first iteration, the inputs are $K$ arms and WBM $\mathcal{E}_{1}$ (or $\mathcal{E}_{2}$ ), and the policy returns $\tilde{m}<K$ entangled arms. In the second iteration, Algorithm 2 is executed with $K-\tilde{m}$ arms and WBM $\mathcal{E}_{2}$ (or $\mathcal{E}_{1}$ ) as the inputs. This routine is summarised in Algorithm 3 and iterated for 500 runs. We plot the average number of samples until stoppage on the y-axis and $\delta$ on the x-axis, as shown in Fig. 3.

For the instances considered above, the sample complexity scales with $m$ . It is noteworthy that when sub-optimal gaps are very small, the sample complexity increases significantly and may not scale with $m$ . Since we iterate the bandit policy at most once, the worst-case sample complexity for entanglement detection in depolarized Bell states scales by a factor of two.

V-B Entanglement Detection in Arbitrary Quantum States

In this section, we present a routine for detecting entanglement in arbitrary quantum states and provide numerical results for the $(1,K)$ -quantum Multi-Armed Bandit (MAB) problem. To generate random density matrices, we follow the method described in [46]. Specifically, we start by generating a complex matrix $A\in\mathbb{C}^{4\times 4}$ , where the real and imaginary parts of each element are independently sampled from a normal distribution. We then compute the density matrix $\rho$ by normalizing $AA^{\dagger}$ , resulting in $\rho=\frac{AA^{\dagger}}{\text{Tr}(AA^{\dagger})}$ . This procedure ensures that $\rho$ is a valid density matrix. On the generated states, we run Algorithm 4, which takes as input the error threshold $\delta$ , the set of arms $\mathcal{A}$ , and a permutation of $\{1,2,3,4,5,6\}$ that defines the order in which the six WBMs should be adapted. Since this is a promise problem, the algorithm stops as soon as one entangled arm is identified, without needing to measure with all six WBMs.

Algorithm 4

(1,K)

-quantum MAB policy for arbitrary quantum states

0: threshold

\zeta=0

, acceptance error rates

\delta

, arms

\mathcal{A}\leftarrow[K]

, P =

\text{perm}(1,2,3,4,5,6)

A_{\text{ent}}

flag

\leftarrow 1

I\leftarrow 1

while flag do

With

\mathcal{E}\leftarrow\mathcal{E}_{P(I)}

, run Algorithm 2 for

K

arms with

U(t,\delta)\leftarrow\sqrt{\frac{\log(\frac{4Kt^{2}}{\delta})}{2t}}

and return entangled arm

A_{\text{ent}}^{(I)}

|A_{\text{ent}}^{(I)}|==1

then

flag

\leftarrow 0

else

I\leftarrow I+1

end if

end while

A_{\text{ent}}\leftarrow A_{\text{ent}}^{(I)}

We iterate the bandit policy at most five times, resulting in the worst-case sample complexity for entanglement detection being scaled by a factor of six. To this end, we conduct the following experiment, generating 500 different instances of $K=5$ arbitrary states generated following the procedure described earlier. We ensure that each instance includes one entangled arm. We note that these are valid instances verified by the PPT criterion. The objective of this experiment is test the efficacy of using the single parameter family of witnesses (4) to detect entanglement in arbitrary states. For $\delta\in(0,1)$ , we report the fraction of times the entangled arm is accurately identified and this is shown in Fig. 4.

Table VI: Examples of arbitrary pure entangled states detected by the family of witnesses (4)

Pure entangled states $\ket{\psi_{1}},\ket{\psi_{2}}$ and $\ket{\psi_{3}}$ Values under $(S_{\mathcal{E}_{i}})_{i=1}^{6}$ $[0.2687+0.0375i;0.2406+0.4090i;0.0502+0.6162i;0.2413+0.5107i]$ $(-0.1851,0.3160,0.1598,-0.0058,0.2177,-0.1947)$ $[0.0565+0.3355i;0.0508+0.0686i;0.4885+0.5191i;0.5689+0.2125i]$ $(0.1562,-0.0280,-0.1135,0.1832,-0.0779,0.1373)$ $[0.1953+0.4438i;0.4958+0.4009i;0.0069+0.3495i;0.0322+0.4848i]$ $(-0.1851,0.3160,0.1598,-0.0058,0.2177,-0.1947)$

From the above experiment, we report several noteworthy observations. Firstly, we encountered instances of pure states $\rho$ where the value of $S_{\mathcal{E}}(\rho)$ equaled $0$ , which is the threshold value provided to the algorithm. In such cases, the algorithm required a significantly long time to converge and, despite this, incorrectly estimated the value of $S_{\mathcal{E}}(\rho)$ . Consequently, we adjusted the threshold to $-1\times 10^{-3}$ and imposed a cutoff on the sample complexity at $1\times 10^{12}$ to better reflect the real-time performance of this policy. Secondly, we came across instances of entangled states verified by the PPT test that yielded positive values of $S_{\mathcal{E}}(\rho)$ under all six WBMs. Interestingly, the mixed entangled state $\rho=\sum_{i=1}^{3}p_{i}\ket{\psi_{i}}\bra{\psi_{i}}$ , where $\ket{\psi_{i}}$ are defined in Table VI, with $(p_{i})_{i=1}^{3}=(0.2936,0.0655,0.6409)$ has $(S_{\mathcal{E}})=(0.0732,0.1727,0.1257,0.1139,0.0736,0.0296)$ under the six witnesses, indicating that this state cannot be detected by the witness family described in (4).

We derive an observation on the nature of such states, particularly focusing on the eigenstate $\ket{\lambda}_{\text{max}}=[0.3773-0.1445i,0.4768-0.3244i,0.4598+0.0809i,0.5351]$ , which corresponds to the largest eigenvalue of $\rho$ . This eigenstate has a Schmidt coefficient close to, but not equal to, 1, suggesting that it lies near the boundary of the separable states yet remains entangled. The pure state $\ket{\lambda}_{\text{max}}\bra{\lambda}_{\text{max}}$ produces $(S_{\mathcal{E}})=(0.0380,0.1269,0.0401,0.1054,0.0221,0.0074)$ . Thus, we have identified examples of pure and mixed entangled states that can yield inconclusive results when measured using this particular witness family. In these instances, it is essential to measure all six witnesses a sufficient number of times to accurately obtain the expected values of the corresponding observables. Subsequently, performing FST can help determine the entanglement of these states using other separability criteria.

VI Future Works And Conclusion

We established a novel correspondence between the problem of entanglement detection and the Bad Arm Identification problem in stochastic Multi-Armed Bandits (MAB). We propose the $(m,K)$ -quantum Multi-Armed Bandit framework. Focus of this framework is on identifying $m$ entangled states out of $K$ states, where $m$ is potentially unknown. We apply this framework to two-qubit states using two key ingredients: a specialized set of six measurements for two-qubit states called Witness Basis Measurements (WBM) $\mathcal{E}$ , and a separability criterion $S_{\mathcal{E}}$ , which is based on the data obtained from these measurements and serves as the parameter that needs to be estimated. We present theoretical guarantees and numerical simulations to demonstrate how this parameter can be estimated quickly and accurately using MAB policies. First, we show that entangled states belonging to a class of parameterised two-qubit states $\mathcal{F}$ can be detected by measuring a subset of the six WBMs. With the knowledge of the WBM, we show that we can directly apply some suitable MAB policies. Second, for the same parameterised states, we present a routine for entanglement detection when the WBM is not known by enabling arbitrary sequential adaptation of the WBMs. We extend this to arbitrary two qubit quantum states and provide numerical results on the efficacy of using these measurements for detecting entanglement.

A promising future direction is identifying WBMs for higher-dimensional bipartite systems. The authors of [2] propose a minimal tomographic scheme for two-qutrits, requiring only eleven witnesses instead of the traditional 81. Recent explorations in data-driven machine learning techniques have utilized SVMs to construct linear entanglement witnesses requiring only local measurements [47]. This approach offers promising avenues for extending these methods to address the $(m,K)$ -quantum MAB problem by constructing a minimal number of witnesses to accurately detect all $m$ states. Entanglement detection can be viewed as a membership problem, where a state belongs to a set if it has a specific property (such as, entanglement). This problem has also been explored along the lines of the partition identification problem [48], where the goal is to determine the partition to which a data point belongs, given the form of a hyperplane. Extending this concept to the $(m,K)$ -quantum MAB problem presents an exciting avenue for future research.

Acknowledgement

B.K. sincerely acknowledges the support from the Ministry of Education, Government of India, through the Prime Minister’s Research Fellowship (PMRF) Scheme. V.S. is supported by the U.S. Department of Energy, Office of Science, National Quantum Information Science Research Centers, Co-design Center for Quantum Advantage (C2QA) contract (DE- SC0012704). K.J. gratefully acknowledges a grant from Mphasis to the Centre for Quantum Information, Communication, and Computing (CQuICC) at IIT Madras.

References

[1] D. Lu, T. Xin, N. Yu, Z. Ji, J. Chen, G. Long, J. Baugh, X. Peng, B. Zeng, and R. Laflamme, “Tomography is necessary for universal entanglement detection with single-copy observables,” Phys. Rev. Lett., vol. 116, p. 230501, Jun 2016. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.116.230501
[2] H. Zhu, Y. S. Teo, and B.-G. Englert, “Minimal tomography with entanglement witnesses,” Phys. Rev. A, vol. 81, p. 052339, May 2010. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.81.052339
[3] C. H. Bennett, G. Brassard, C. Crépeau, R. Jozsa, A. Peres, and W. K. Wootters, “Teleporting an unknown quantum state via dual classical and einstein-podolsky-rosen channels,” Phys. Rev. Lett., vol. 70, pp. 1895–1899, Mar 1993.
[4] H. Buhrman, R. Cleve, and W. van Dam, “Quantum entanglement and communication complexity,” SIAM Journal on Computing, vol. 30, no. 6, pp. 1829–1841, 2001.
[5] R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki, “Quantum entanglement,” Reviews of Modern Physics, vol. 81, no. 2, pp. 865–942, Jun. 2009.
[6] R. Kueng, H. Rauhut, and U. Terstiege, “Low rank matrix recovery from rank one measurements,” Applied and Computational Harmonic Analysis, vol. 42, no. 1, pp. 88–116, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1063520315001037
[7] J. Wang, V. B. Scholz, and R. Renner, “Confidence polytopes in quantum state tomography,” Physical Review Letters, vol. 122, no. 19, May 2019. [Online]. Available: http://dx.doi.org/10.1103/PhysRevLett.122.190401
[8] R. O’Donnell and J. Wright, “Efficient quantum tomography,” Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:769062
[9] ——, “Efficient quantum tomography ii,” Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:5245926
[10] K. Banaszek, M. Cramer, and D. Gross, “Focus on quantum tomography,” New Journal of Physics, vol. 15, no. 12, p. 125020, dec 2013. [Online]. Available: https://dx.doi.org/10.1088/1367-2630/15/12/125020
[11] S. T. Flammia, D. Gross, Y.-K. Liu, and J. Eisert, “Quantum tomography via compressed sensing: error bounds, sample complexity and efficient estimators,” New Journal of Physics, vol. 14, no. 9, p. 095022, sep 2012. [Online]. Available: https://dx.doi.org/10.1088/1367-2630/14/9/095022
[12] M. Guta, J. Kahn, R. Kueng, and J. A. Tropp, “Fast state tomography with optimal error bounds,” Journal of Physics A: Mathematical and Theoretical, vol. 53, no. 20, p. 204001, apr 2020. [Online]. Available: https://dx.doi.org/10.1088/1751-8121/ab8111
[13] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and G. Carleo, “Neural-network quantum state tomography,” Nature Physics, vol. 14, no. 5, p. 447–450, Feb. 2018. [Online]. Available: http://dx.doi.org/10.1038/s41567-018-0048-5
[14] Y. Quek, S. Fort, and H. K. Ng, “Adaptive quantum state tomography with neural networks,” 2018.
[15] D. Koutný, L. Motka, Z. Hradil, J. Řeháček, and L. L. Sánchez-Soto, “Neural-network quantum state tomography,” Physical Review A, vol. 106, no. 1, Jul. 2022. [Online]. Available: http://dx.doi.org/10.1103/PhysRevA.106.012409
[16] T. Schmale, M. Reh, and M. Gärttner, “Efficient quantum state tomography with convolutional neural networks,” npj Quantum Information, vol. 8, no. 1, Sep. 2022. [Online]. Available: http://dx.doi.org/10.1038/s41534-022-00621-4
[17] D. S. França, F. G. L. Brandão, and R. Kueng, “Fast and Robust Quantum State Tomography from Few Basis Measurements,” in 16th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2021), ser. Leibniz International Proceedings in Informatics (LIPIcs), M.-H. Hsieh, Ed., vol. 197. Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021, pp. 7:1–7:13. [Online]. Available: https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.TQC.2021.7
[18] J. Haah, A. W. Harrow, Z. Ji, X. Wu, and N. Yu, “Sample-optimal tomography of quantum states,” IEEE Transactions on Information Theory, p. 1–1, 2017. [Online]. Available: http://dx.doi.org/10.1109/TIT.2017.2719044
[19] Y. S. Teo, H. Zhu, B.-G. Englert, J. Řeháček, and Z. c. v. Hradil, “Quantum-state reconstruction by maximizing likelihood and entropy,” Phys. Rev. Lett., vol. 107, p. 020404, Jul 2011. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.107.020404
[20] V. Siddhu, “Maximum a posteriori probability estimates for quantum tomography,” Physical Review A, vol. 99, no. 1, Jan. 2019. [Online]. Available: http://dx.doi.org/10.1103/PhysRevA.99.012342
[21] M. Horodecki, P. Horodecki, and R. Horodecki, “Separability of mixed states: necessary and sufficient conditions,” Physics Letters A, vol. 223, no. 1, pp. 1–8, 1996. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960196007062
[22] B. M. Terhal, “Bell inequalities and the separability criterion,” Physics Letters A, vol. 271, no. 5, pp. 319–326, 2000. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960100004011
[23] M. Lewenstein, B. Kraus, J. I. Cirac, and P. Horodecki, “Optimization of entanglement witnesses,” Phys. Rev. A, vol. 62, p. 052310, Oct 2000. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.62.052310
[24] D. Chruściński and G. Sarbicki, “Entanglement witnesses: construction, analysis and classification,” Journal of Physics A: Mathematical and Theoretical, vol. 47, no. 48, p. 483001, Nov. 2014. [Online]. Available: http://dx.doi.org/10.1088/1751-8113/47/48/483001
[25] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine Learning, vol. 47, pp. 235–256, 05 2002.
[26] J.-Y. Audibert, S. Bubeck, and R. Munos, “Best arm identification in multi-armed bandits.” in COLT, 2010, pp. 41–53.
[27] H. Kano, J. Honda, K. Sakamaki, K. Matsuura, A. Nakamura, and M. Sugiyama, “Good arm identification via bandit feedback,” 2018.
[28] M. Lewenstein, B. Kraus, J. I. Cirac, and P. Horodecki, “Optimization of entanglement witnesses,” Phys. Rev. A, vol. 62, p. 052310, Oct 2000. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.62.052310
[29] I. Bengtsson and K. Zyczkowski, Geometry of Quantum States: An Introduction to Quantum Entanglement. Cambridge University Press, 2006.
[30] M. Horodecki, P. Horodecki, and R. Horodecki, “Separability of mixed states: necessary and sufficient conditions,” Physics Letters A, vol. 223, no. 1, pp. 1–8, 1996. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960196007062
[31] A. Peres, “Separability criterion for density matrices,” Phys. Rev. Lett., vol. 77, pp. 1413–1415, Aug 1996. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.77.1413
[32] P. Horodecki, “Separability criterion and inseparable mixed states with positive partial transposition,” Physics Letters A, vol. 232, no. 5, pp. 333–339, 1997. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960197004167
[33] O. Rudolph, “A separability criterion for density operators,” Journal of Physics A: Mathematical and General, vol. 33, no. 21, p. 3951–3955, May 2000. [Online]. Available: http://dx.doi.org/10.1088/0305-4470/33/21/308
[34] O. Gühne, P. Hyllus, O. Gittsovich, and J. Eisert, “Covariance matrices and the separability problem,” Physical Review Letters, vol. 99, no. 13, Sep. 2007. [Online]. Available: http://dx.doi.org/10.1103/PhysRevLett.99.130504
[35] L. Gurvits, “Classical deterministic complexity of edmonds’ problem and quantum entanglement,” 2003.
[36] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri, “Complete family of separability criteria,” Phys. Rev. A, vol. 69, p. 022308, Feb 2004. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.69.022308
[37] E. Even-Dar, S. Mannor, and Y. Mansour, “Pac bounds for multi-armed bandit and markov decision processes,” ser. COLT ’02. Berlin, Heidelberg: Springer-Verlag, 2002, p. 255–270.
[38] S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone, “Pac subset selection in stochastic multi-armed bandits,” in Proceedings of the 29th International Coference on International Conference on Machine Learning, ser. ICML’12. Madison, WI, USA: Omnipress, 2012, p. 227–234.
[39] Z. Karnin, T. Koren, and O. Somekh, “Almost optimal exploration in multi-armed bandits,” in Proceedings of the 30th International Conference on Machine Learning, 2013, pp. 1238–1246.
[40] S. Mannor and J. N. Tsitsiklis, “The sample complexity of exploration in the multi-armed bandit problem,” J. Mach. Learn. Res., vol. 5, p. 623–648, dec 2004.
[41] R. H. Farrell, “Asymptotic Behavior of Expected Sample Size in Certain One Sided Tests,” The Annals of Mathematical Statistics, vol. 35, no. 1, pp. 36 – 72, 1964. [Online]. Available: https://doi.org/10.1214/aoms/1177703731
[42] K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck, “lil’ ucb : An optimal exploration algorithm for multi-armed bandits,” in Proceedings of The 27th Conference on Learning Theory, ser. Proceedings of Machine Learning Research, M. F. Balcan, V. Feldman, and C. Szepesvári, Eds., vol. 35. Barcelona, Spain: PMLR, 13–15 Jun 2014, pp. 423–439. [Online]. Available: https://proceedings.mlr.press/v35/jamieson14.html
[43] A. Locatelli, M. Gutzeit, and A. Carpentier, “An optimal algorithm for the thresholding bandit problem,” in Proceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48. New York, New York, USA: PMLR, 20–22 Jun 2016, pp. 1690–1698. [Online]. Available: https://proceedings.mlr.press/v48/locatelli16.html
[44] T.-H. Tsai, Y.-D. Tsai, and S.-D. Lin, “lil’hdoc: An algorithm for good arm identification under small threshold gap,” 2024.
[45] J. Lumbreras, E. Haapasalo, and M. Tomamichel, “Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states,” Quantum, vol. 6, p. 749, Jun. 2022. [Online]. Available: http://dx.doi.org/10.22331/q-2022-06-29-749
[46] K. Zyczkowski and H.-J. Sommers, “Induced measures in the space of mixed quantum states,” Journal of Physics A: Mathematical and General, vol. 34, no. 35, p. 7111–7125, Aug. 2001. [Online]. Available: http://dx.doi.org/10.1088/0305-4470/34/35/335
[47] A. C. Greenwood, L. T. Wu, E. Y. Zhu, B. T. Kirby, and L. Qian, “Machine-learning-derived entanglement witnesses,” Phys. Rev. Appl., vol. 19, p. 034058, Mar 2023. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevApplied.19.034058
[48] S. Juneja and S. Krishnasamy, “Sample complexity of partition identification using multi-armed bandits,” 2019.

VII Supplementary Material

The following lemma is useful for some calculations.

Lemma 14.

For $t\geq 1,c>0,\varepsilon\in(0,1),0<w\leq 1$ ,

\frac{1}{t}\log\left(\frac{\log\left((1+\varepsilon)t\right)}{w}\right)\geq c% \implies t\leq\frac{1}{c}\log\left(\frac{2\log\left(\frac{(1+\varepsilon)}{cw}% \right)}{w}\right).

(16)

VII-A Proof for Section IV-A

VII-A1 Proof of Lemma 8

Proof.

Let $\mathcal{B}$ denote the ”good” event that at any time $t>0$ and for all arms $i\in[K]$ , the true value $S_{\mathcal{E}}(\rho_{i})$ is well concentrated around its estimate $\hat{S}_{i,N_{i}(t)}$ .

\mathcal{B}\coloneqq\bigcup_{i=1}^{K}\bigcup_{t=1}^{\infty}\left\{|\hat{S}_{i,% N_{i}(t)}-S_{i}|\leq U\left(N_{i}(t),\frac{\delta}{c_{\varepsilon}K}\right)\right\}

From Lemma 7 and by applying the union bound, we get that

\mathbb{P}\left[\mathcal{B}\right]\geq 1-c_{\varepsilon}K\left(\frac{\delta}{c% _{\varepsilon}K}\right)^{1+\varepsilon}\geq 1-\delta

(17)

where Eq. 17 holds because $\varepsilon\in(0,1)$ and $c_{\varepsilon}\geq 1$ . ∎

VII-A2 Proof of Theorem 9

Proof.

Recall that the threshold $\zeta=0$ and problem instance $\boldsymbol{S}_{\mathcal{E}}$ is such that $S_{\mathcal{E}}(\rho_{1})\geq S_{\mathcal{E}}(\rho_{2})\geq S_{\mathcal{E}}(% \rho_{3})\ldots>S_{\mathcal{E}}(\rho_{K-1})>0>S_{\mathcal{E}}(\rho_{K})$ . Let us consider the case that the event $\mathcal{B}$ described in Lemma 8 holds. As outlined in Algorithm 1, the arm $i^{\star}$ will be dropped from the active set $\Omega$ if $\text{LCB}_{i^{\star}}(t)>0$ . That is,

	$\displaystyle\hat{S}_{i^{\star},N_{i^{\star}}(t)}-U\left(N_{i^{\star}}(t),% \frac{\delta}{c_{\varepsilon}K}\right)>0$
	$\displaystyle\hat{S}_{i^{\star},N_{i^{\star}}(t)}-\|\hat{S}_{i^{\star},N_{i}^{% \star}(t)}-S_{i^{\star}}\|>0$
	$\displaystyle\implies S_{i^{\star}}>0$

This contradicts the assumption about the problem instance $\boldsymbol{S}$ because $S_{i^{\star}}=S_{\mathcal{E}}(\rho_{K})<0$ and so, the arm $i^{\star}$ will not be dropped from the active set $\Omega$ as long as event $\mathcal{B}$ holds. ∎

VII-A3 Proof of Theorem 10

Proof.

Let us consider the case where $\mathcal{B}$ holds. By the elimination rule of Algorithm 1, an arm $i$ is removed from the active set $\Omega$ if $\text{LCB}_{i}(t)>0$ . We have that,

	$\displaystyle\hat{S}_{i,N_{i}(t)}-U\left(N_{i}(t),\frac{\delta}{c_{\varepsilon% }K}\right)\geq\zeta$
	$\displaystyle\hat{S}_{i,N_{i}(t)}-S_{i}+\Delta_{i}\geq U\left(N_{i}(t),\frac{% \delta}{c_{\varepsilon}K}\right)$
	$\displaystyle\implies\Delta_{i}\geq 2U\left(N_{i}(t),\frac{\delta}{c_{% \varepsilon}K}\right)$		(18)

Let us denote $N_{i}$ to be the number of samples of arm $i$ , that is, $N_{i}=\inf\{t:U\left(N_{i}(t),\frac{\delta}{c_{\varepsilon}K}\right)\leq\frac{% \Delta_{i}}{2}\}$ . The minimum value of $N_{i}$ can be obtained by solving,

	$\displaystyle U\left(N_{i},\frac{\delta}{c_{\varepsilon}K}\right)=\frac{\Delta% _{i}}{2}$
	$\displaystyle(1+\sqrt{\varepsilon})\sqrt{\frac{2(1+\varepsilon)}{N_{i}}\log% \left(\frac{\log\left((1+\varepsilon)N_{i}\right)}{\delta/c_{\varepsilon}K}% \right)}=\frac{\Delta_{i}}{2}$
	$\displaystyle\frac{1}{N_{i}}\log\left(\frac{\log\left((1+\varepsilon)N_{i}% \right)}{\delta/c_{\varepsilon}K}\right)=\frac{\Delta_{i}^{2}}{8(1+\varepsilon% )(1+\sqrt{\varepsilon})^{2}}$		(19)

From Lemma 16, we get that,

N_{i}=\frac{8(1+\varepsilon)(1+\sqrt{\varepsilon})^{2}}{\Delta_{i}^{2}}\log% \left(\frac{2c_{\varepsilon}K\log\left(\frac{8c_{\varepsilon}(1+\varepsilon)^{% 2}(1+\sqrt{\varepsilon})^{2}}{\delta}\frac{K}{\Delta_{i}^{2}}\right)}{\delta}\right)

(20)

Thus, the total number of samples required to identify the arm $i^{\star}$ with a probability of at least $1-\delta$ is $N\leq\sum_{i=1}^{K}N_{i}$ . ∎

VII-B Proof for Section IV-B

VII-B1 Proof of Lemma 11

Proof.

Firstly, we show that Algorithm 2 is $(\lambda,\delta)$ -PAC for arbitrary $\lambda\in[K]$ . In the case where there are arms greater than or equal to $\lambda$ , we show that $\mathbb{P}\left[\{\hat{m}<\lambda\}\cup\bigcup_{i\in\mathcal{A}_{\text{ent}}}% \{S_{i}<\zeta\}\right]\leq\delta$ where $\hat{m}$ is the number of good arms identified by the agent. Since we are now considering the case when $m\geq\lambda$ , the event $\{\hat{m}<\lambda\}$ implies that at least one good arm $j\in[m]$ is identified as a bad arm by the agent. That is, for some $j\in[m]$ and $t\in\mathbb{N}$ , the upper confidence bound $\hat{S}_{j,N_{j}(t)}+U\left(N_{j}(t),\frac{\delta}{c_{\varepsilon}K}\right)<\zeta$ . Thus, we have that,

$\displaystyle\mathbb{P}\left[\hat{m}<\lambda\right]$	$\displaystyle\leq\sum_{j\in[m]}\mathbb{P}\left[\bigcup_{t\in\mathbb{N}}\{\hat{% S}_{j,N_{j}(t)}+U\left(N_{j}(t),\frac{\delta}{c_{\varepsilon}K}\right)<\zeta\}\right]$
	$\displaystyle\leq\sum_{j\in[m]}c_{\varepsilon}\left(\frac{\delta}{c_{% \varepsilon}K}\right)^{1+\varepsilon}\ \ \ \ \ \text{(By Lemma \ref{lemma:FLIL% })}$
	$\displaystyle\leq mc_{\varepsilon}\left(\frac{\delta}{c_{\varepsilon}K}\right)$	(21)

The event $\bigcup_{i\in\{\hat{X}_{1},\hat{X}_{2},\ldots\hat{X}_{\lambda}\}}\{\mu_{i}<\zeta\}$ considers all those outcomes where a bad arm is identified to be a good one. Thus, for some bad arm $j\in\{\hat{X}_{1},\hat{X}_{2},\ldots\hat{X}_{\hat{m}}\}$ such that $j\in[K]\setminus[m]$ , we have,

	$\displaystyle\mathbb{P}\left[\bigcup_{i\in\{\hat{X}_{1},\hat{X}_{2},\ldots\hat% {X}_{\lambda}\}}\{S_{i}<\zeta\}\right]$	$\displaystyle\leq\sum_{j\in[K]\setminus[m]}\mathbb{P}\left[\bigcup_{t\in% \mathbb{N}}\{\hat{S}_{j,N_{j}(t)}-U\left(N_{j}(t),\frac{\delta}{c_{\varepsilon% }K}\right)>\zeta\}\right]$
		$\displaystyle\leq(K-m)c_{\varepsilon}\left(\frac{\delta}{c_{\varepsilon}K}\right)$		(22)

Thus, putting Eq. 21 and Eq. 22 together, we get that $\mathbb{P}\left[\{\hat{m}<\lambda\}\cup\bigcup_{i\in\{\hat{X}_{1},\hat{X}_{2},% \ldots\hat{X}_{\hat{m}}\}}\{\mu_{i}<\zeta\}\right]\leq\delta$ . Next, we consider the case when the number of good arms $m$ is less than $\lambda$ and show that $\mathbb{P}\left[\hat{m}\geq\lambda\right]\leq\delta$ . Since there are at most $\lambda$ good arms, the event $\{\hat{m}>\lambda\}$ implies that one of the output arms $j\in\{\hat{X}_{1},\hat{X}_{2},\ldots\hat{X}_{\lambda}\}$ is such that there exists some index $j$ such that $\hat{X}_{j}$ is a bad arm. Thus, we have that,

$\displaystyle\mathbb{P}\left[\hat{m}\geq\lambda\right]$	$\displaystyle\leq\sum_{j\in[K]\setminus[m]}\mathbb{P}[j\in\{\hat{X}_{1},\hat{X% }_{2},\ldots\hat{X}_{\lambda}\}]$
	$\displaystyle\leq(K-m)c_{\varepsilon}\left(\frac{\delta}{c_{\varepsilon}K}% \right)^{1+\varepsilon}$
	$\displaystyle\leq\frac{K-m}{K}c_{\varepsilon}\left(\frac{\delta}{c_{% \varepsilon}}\right)$
	$\displaystyle\leq\delta$	(23)

We see that the algorithm is $(\lambda,\delta)$ -PAC for all such $\lambda\in[K]$ , thereby giving us that the algorithm is $\delta$ -PAC. ∎

VII-B2 Proof of Theorem 12

Proof.

Recall that the threshold $\zeta=0$ and problem instance $\boldsymbol{S}_{\mathcal{E}}$ is such that $S_{\mathcal{E}}(\rho_{1})\geq S_{\mathcal{E}}(\rho_{2})\ldots>S_{\mathcal{E}}(% \rho_{K-m})>0>S_{\mathcal{E}}(\rho_{K-m+1})\ldots>S_{\mathcal{E}}(\rho_{K})$ , with $m$ being unknown. Let us consider the case that the event $\mathcal{B}$ described in Lemma 8 holds. As outlined in Algorithm 2, an arm $i$ will be dropped if $\text{LCB}_{i}(t)>0$ . That is,

	$\displaystyle\hat{S}_{i,N_{i}(t)}-U\left(N_{i}(t),\frac{\delta}{c_{\varepsilon% }K}\right)>0$
	$\displaystyle\hat{S}_{i,N_{i}(t)}-\|\hat{S}_{i,N_{i}(t)}-S_{i}\|>0$
	$\displaystyle\implies S_{i}>0$

Thus, as long as event $\mathcal{B}$ holds, all the arms that have $S_{\mathcal{E}}<0$ will not dropped. Thus the lil’HDoC algorithm identifies all the arms correctly. ∎

Classical Bandit Algorithms for Entanglement Detection in Parameterized Qubit States

Abstract

Index Terms:

I Introduction

II Preliminaries

II-A Entanglement Witnesses and Witness Operators Measurements

Definition 1 (Entanglement Witness).

II-B Separability criteria for entanglement detection

II-C Stochastic Multi-Armed Bandits

II-C1 Fixed Confidence Best Arm Identification

Definition 2 (δ𝛿\deltaitalic_δ-PC).

II-C2 Fixed Confidence Good Arm Identification

Definition 3 ((λ,δ)𝜆𝛿(\lambda,\delta)( italic_λ , italic_δ )-PAC).

III The Quantum MAB Framework For Entanglement Detection

Definition 4.

Remark 1.

III-A Two-qubit Depolarized Bell States

III-B Two-qubit Bell diagonal States

III-C Two-qubit Amplitude Dam** on Depolarized Bell States

Proposition 5.

Proposition 6.

IV Stochastic MAB policies for Entanglement Detection

Lemma 7.

Proof.

IV-A Modified Successive Elimination Algorithm

Lemma 8.

Proof.

Theorem 9.

Proof.

Theorem 10.

Proof.

IV-B Modified lil’HDoC Algorithm

Lemma 11.

Proof.

Theorem 12.

Proof.

Theorem 13.

Proof.

V Workflow for Entanglement Detection

V-A Entanglement Detection in Depolarised Bell states

V-B Entanglement Detection in Arbitrary Quantum States

VI Future Works And Conclusion

Acknowledgement

References

VII Supplementary Material

Lemma 14.

VII-A Proof for Section IV-A

VII-A1 Proof of Lemma 8

Proof.

VII-A2 Proof of Theorem 9

Proof.

VII-A3 Proof of Theorem 10

Proof.

VII-B Proof for Section IV-B

VII-B1 Proof of Lemma 11

Proof.

VII-B2 Proof of Theorem 12

Proof.

Definition 2 ( $\delta$ -PC).

Definition 3 ( $(\lambda,\delta)$ -PAC).