Classical Bandit Algorithms for Entanglement Detection in Parameterized Qubit States

Bharati K Department of Electrical Engineering, IIT Madras, Chennai, India Vikesh Siddhu IBM Quantum, IBM Research India Krishna Jagannathan Department of Electrical Engineering, IIT Madras, Chennai, India
Abstract

Entanglement is a key resource for a wide range of tasks in quantum information and computing. Thus, verifying availability of this quantum resource is essential. Extensive research on entanglement detection has led to no-go theorems [1] that highlight the need for full state tomography (FST) in the absence of adaptive or joint measurements. Recent advancements, as proposed by [2], introduce a single-parameter family of entanglement witness measurements which are capable of conclusively detecting certain entangled states and only resort to FST when all witness measurements are inconclusive. We find a variety of realistic noisy two-qubit quantum states \mathcal{F}caligraphic_F that yield conclusive results under this witness family. We solve the problem of detecting entanglement among K𝐾Kitalic_K quantum states in \mathcal{F}caligraphic_F, of which m𝑚mitalic_m states are entangled, with m𝑚mitalic_m potentially unknown. We recognize a structural connection of this problem to the Bad Arm Identification problem in stochastic Multi-Armed Bandits (MAB). In contrast to existing quantum bandit frameworks, we establish a new correspondence tailored for entanglement detection and term it the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum Multi-Armed Bandit. We implement two well-known MAB policies for arbitrary states derived from \mathcal{F}caligraphic_F, present theoretical guarantees on the measurement/sample complexity and demonstrate the practicality of the policies through numerical simulations. More broadly, this paper highlights the potential for employing classical machine learning techniques for quantum entanglement detection.

Index Terms:
quantum computing, quantum states, entanglement detection, FST, entanglement witness, multi-armed bandit, bad arm identification

I Introduction

The emergence of quantum information theory has changed our understanding of quantum entanglement, transforming it from a property of quantum states to a vital resource. Entanglement allows us to perform non-classical tasks, such as quantum communication, quantum teleportation, and quantum information processing, to name a few [3, 4, 5]. However, checking if a given unknown state is entangled can be highly non-trivial. The first issue is theoretical, even if one completely determines an unknown state via full state tomography (FST), checking if a known state is entangled can be hard. The second issue is practical, real-world laboratory conditions introduce imperfections and noise which make it difficult to carry out FST or directly test if an unknown state is entangled or separable.

There is a vast literature dedicated to FST (see [6, 7, 8, 9, 10, 11, 12] and references therein and also see [13, 14, 15, 16, 17] for machine learning based approaches). Using entangled measurements, one can carry out FST with almost optimal copy complexity [9, 18]. In practice, entangled measurements are harder to carry out and one does single copy measurements. From data generated by single copy measurements, one can recover the state being measured using a variety of techniques such as linear inversion, maximum likelihood estimation, and maximum a posteriori estimation [19, 20]. From the reconstructed state it is possible to ascertain whether the state is entangled or separable using well-known criterion (some are outlined in Sec. II-B). However, this FST method becomes impractical as the number of qubits in the quantum system are increased due to computational challenges and exponential scaling in number of measurements required. If one is interested in testing for entanglement, it may not be necessary in practice to carry out FST. Furthermore, the sample complexity for determining FST does not provide an obvious measurement/sample complexity for entanglement detection.

Entanglement can be assessed by measuring entanglement witnesses [21, 22, 23, 24]. These observables indicate the presence of some entangled states. Although no single witness can detect all entangled states, it is important to note that each witness measurement contributes information about the state. If entanglement is not detected by any of the witnesses, the information given by the witness can eventually facilitate FST. This FST can then be used to check for entanglement using standard tests. This insight has been effectively explored in [2], which constructs a set of measurements that serve simultaneously as entanglement witnesses, and also enable FST. For bipartite qubit systems, [2] proposes a measurement scheme that requires six witness operator measurements. Rather than merely determining the expectation value of the witness operator, one can measure the eigenbasis \mathcal{E}caligraphic_E of a single-parameter family of witnesses. Based on the frequencies of these witness measurement outcomes, the authors formulate a criterion for separability Ssubscript𝑆S_{\mathcal{E}}italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT, that yields non-negative values for all separable states and negative values for some entangled states. For entangled states that cannot be detected by this witness family, a tomographic reconstruction of the state can be performed (see Sections II-A and II-B for further details).

Given an eigenbasis \mathcal{E}caligraphic_E, achieving high-precision estimation of Ssubscript𝑆S_{\mathcal{E}}italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT is pivotal but requires measurements of numerous copies of the state, imposing a significant resource constraint. This challenge is further compounded in scenarios involving multiple (say K>1𝐾1K>1italic_K > 1) states, among which m<K𝑚𝐾m<Kitalic_m < italic_K states may be entangled. We see that performing FST for all K𝐾Kitalic_K states may be unnecessary for entanglement detection. In such instances where resource and time efficiency is paramount, the necessity for a large number of measurements for accurate estimation of parameters can be circumvented by identifying certain ‘winning’ trends dictated by sample data estimates and choosing when and how measurements need to be made. This fits neatly into the well-studied Multi-Armed Bandits (MAB) framework in classical machine learning.

The MAB setting tackles sequential decision-making problems faced with a finite set of options (arms), with each arm yielding stochastic rewards with unknown average rewards. Arm selections unfold iteratively in rounds, with a learner choosing arms based on a predefined policy. Following each selection, the learner receives a reward corresponding to the chosen arm, influencing subsequent decisions and possible policy adjustments. There are two main objectives of the MAB framework. The first balances exploration (finding high-reward arms) and exploitation (selecting the arm with the highest observed reward) to maximize cumulative rewards [25]. The second objective involves pure exploration with the goal of identifying the arm with the highest expected reward, i.e., Best Arm Identification (BAI) [26]. A variant of BAI called the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-Good Arm Identification (GAI) problem (m𝑚mitalic_m unknown) has a goal of identifying m𝑚mitalic_m ‘good’ arms (out of K𝐾Kitalic_K) whose expected rewards lie above a specified threshold ζ𝜁\zetaitalic_ζ[27]. Equivalently, (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-Bad Arm Identification aims to identify m𝑚mitalic_m ‘bad’ arms (out of K𝐾Kitalic_K) whose expected rewards lie below a specified threshold. Two orthogonal parameters influence the performance of BAI policies: sample complexity and the probability of error in identifying the best arm. More details on MAB and BAI policies are in Section II-C.

The overarching goal of this paper is to utilize stochastic MAB policies to address the problem of entanglement detection, and to characterise the sample complexity for such an approach. The organisation and key contributions in this paper are as summarised below:

  • We identify a well-motivated class of parameterized two-qubit states ,\mathcal{F},caligraphic_F , and a corresponding measurement \mathcal{E}caligraphic_E such that S0subscript𝑆0S_{\mathcal{E}}\geq 0italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ≥ 0 for all separable states in \mathcal{F}caligraphic_F and S<0subscript𝑆0S_{\mathcal{E}}<0italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT < 0 for all entangled states. This is detailed in Sections III-A, III-B and III-C.

  • In Section III, we highlight the key contribution of our paper, recognizing a structural connection between the separability criterion Ssubscript𝑆S_{\mathcal{E}}italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT outlined in [2] and the Best Arm Identification (BAI) problem of stochastic Multi-Armed Bandits (MAB). Specifically, the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-Bad Arm Identification problem corresponds to the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum Multi Armed Bandit problem (m𝑚mitalic_m potentially unknown) with the goal of identifying m𝑚mitalic_m ‘bad’ arms and m𝑚mitalic_m entangled states derived from \mathcal{F}caligraphic_F, respectively.

  • Another significant contribution of our paper lies in achieving conclusive entanglement detection without the explicit need for FST for commonly seen noisy two-qubit states which we find to be in \mathcal{F}caligraphic_F. In Section IV, we discuss two distinct MAB policies for entanglement detection based on Successive Elimination and Hybrid Dilemma of Confidence (Refer Section II-C). With well-defined confidence intervals, we demonstrate the correctness and characterise the sample complexity these policies.

  • In Section V-A, we present numerical results on the performance of the MAB policies for depolarised Bell states.

  • In Section V-B, we demonstrate the efficiency of the MAB policies and the WBMs in identifying general two-qubit entangled states and present numeric examples of pure and mixed two-qubit entangled states, where the single-parameter family of witnesses fail to provide conclusive results, thus necessitating FST.

II Preliminaries

Let \mathcal{H}caligraphic_H be a finite dimensional Hilbert space with dimension d𝑑ditalic_d. A pure quantum state is represented by a unit norm vector |ψket𝜓\ket{\psi}\in\mathcal{H}| start_ARG italic_ψ end_ARG ⟩ ∈ caligraphic_H. Let ()\mathcal{L}(\mathcal{H})caligraphic_L ( caligraphic_H ) be the space of linear operators on \mathcal{H}caligraphic_H, the Frobenius inner product for any A,B(A,B\in\mathcal{L}(\mathcal{H}italic_A , italic_B ∈ caligraphic_L ( caligraphic_H), A,BTr(AB)𝐴𝐵tracesuperscript𝐴𝐵\langle A,B\rangle\coloneqq\Tr(A^{{\dagger}}B)⟨ italic_A , italic_B ⟩ ≔ roman_Tr ( start_ARG italic_A start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_B end_ARG ) where {\dagger} represents conjugate transpose. A Hermitian operator satisfies H=H𝐻superscript𝐻H=H^{{\dagger}}italic_H = italic_H start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT. A density operator ρ()𝜌\rho\in\mathcal{L}(\mathcal{H})italic_ρ ∈ caligraphic_L ( caligraphic_H ) is Hermitian, positive semi-definite, ρ0𝜌0\rho\geq 0italic_ρ ≥ 0, and has unit trace, Tr(ρ)=1trace𝜌1\Tr(\rho)=1roman_Tr ( start_ARG italic_ρ end_ARG ) = 1; it can represents both pure and mixed states. A positive operator value measure (POVM) is collection of positive operators {Ei0}subscript𝐸𝑖0\{E_{i}\geq 0\}{ italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 } that sum to the identity, iEi=𝟏subscript𝑖subscript𝐸𝑖1\sum_{i}E_{i}=\mathbf{1}∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_1. A POVM represents a measurement where Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponds to measurement outcome i𝑖iitalic_i, but sometimes we compress this and just say Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a measurement outcome.

Let asubscript𝑎\mathcal{H}_{a}caligraphic_H start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and bsubscript𝑏\mathcal{H}_{b}caligraphic_H start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT be finite-dimensional Hilbert spaces with dimensions dasubscript𝑑𝑎d_{a}italic_d start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and dbsubscript𝑑𝑏d_{b}italic_d start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, respectively, and ababsubscript𝑎𝑏tensor-productsubscript𝑎subscript𝑏\mathcal{H}_{ab}\coloneqq\mathcal{H}_{a}\otimes\mathcal{H}_{b}caligraphic_H start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ≔ caligraphic_H start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ⊗ caligraphic_H start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, where tensor-product\otimes represents tensor product, be a bipartite Hilbert space with dimension d=dadb𝑑subscript𝑑𝑎subscript𝑑𝑏d=d_{a}d_{b}italic_d = italic_d start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT. A density operator ρab(ab)subscript𝜌𝑎𝑏subscript𝑎𝑏\rho_{ab}\in\mathcal{L}(\mathcal{H}_{ab})italic_ρ start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ∈ caligraphic_L ( caligraphic_H start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ) is called separable if it can be written as a convex combination of product states, that is,

ρab=ipi|ϕai,χbiϕai,χbi|,subscript𝜌𝑎𝑏subscript𝑖subscript𝑝𝑖ketsubscriptsuperscriptitalic-ϕ𝑖𝑎subscriptsuperscript𝜒𝑖𝑏brasubscriptsuperscriptitalic-ϕ𝑖𝑎subscriptsuperscript𝜒𝑖𝑏\rho_{ab}=\sum_{i}p_{i}\ket{\phi^{i}_{a},\chi^{i}_{b}}\bra{\phi^{i}_{a},\chi^{% i}_{b}},italic_ρ start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_ARG italic_ϕ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_χ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG ⟩ ⟨ start_ARG italic_ϕ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_χ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG | , (1)

where pi0subscript𝑝𝑖0p_{i}\geq 0italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 such that ipi=1subscript𝑖subscript𝑝𝑖1\sum_{i}p_{i}=1∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 and |ϕai,χbi|ϕai|χbiketsubscriptsuperscriptitalic-ϕ𝑖𝑎subscriptsuperscript𝜒𝑖𝑏tensor-productsubscriptsuperscriptketitalic-ϕ𝑖𝑎subscriptsuperscriptket𝜒𝑖𝑏\ket{\phi^{i}_{a},\chi^{i}_{b}}\coloneqq\ket{\phi}^{i}_{a}\otimes\ket{\chi}^{i% }_{b}| start_ARG italic_ϕ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_χ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG ⟩ ≔ | start_ARG italic_ϕ end_ARG ⟩ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ⊗ | start_ARG italic_χ end_ARG ⟩ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is a product of two pure states. We denote the set of all separable density operators by Sabsubscript𝑆𝑎𝑏S_{ab}italic_S start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT. Conversely, ρabsubscript𝜌𝑎𝑏\rho_{ab}italic_ρ start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT is entangled if it can not be written in the form (1). We discuss some preliminaries on entanglement witnesses and witness-based measurements in Section II-A, the various separability criteria for entanglement detection in Section II-B and the framework and background on stochastic multi-armed problems in Section II-C.

II-A Entanglement Witnesses and Witness Operators Measurements

Entanglement can be detected by measuring entanglement witnesses and can be defined as follows:

Definition 1 (Entanglement Witness).

An entanglement witness, denoted as W(ab)𝑊subscript𝑎𝑏W\in\mathcal{L}(\mathcal{H}_{ab})italic_W ∈ caligraphic_L ( caligraphic_H start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT ), is a Hermitian operator that detects some entangled state ρentabsubscript𝜌𝑒𝑛𝑡subscript𝑎𝑏\rho_{ent}\in\mathcal{H}_{ab}italic_ρ start_POSTSUBSCRIPT italic_e italic_n italic_t end_POSTSUBSCRIPT ∈ caligraphic_H start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT such that,

ρent,W=Tr(ρentW)<0,subscript𝜌𝑒𝑛𝑡𝑊tracesubscript𝜌𝑒𝑛𝑡𝑊0\displaystyle\langle\rho_{ent},W\rangle=\Tr(\rho_{ent}W)<0,⟨ italic_ρ start_POSTSUBSCRIPT italic_e italic_n italic_t end_POSTSUBSCRIPT , italic_W ⟩ = roman_Tr ( start_ARG italic_ρ start_POSTSUBSCRIPT italic_e italic_n italic_t end_POSTSUBSCRIPT italic_W end_ARG ) < 0 , (2)
ρ,W=Tr(ρW)0,ρSab.formulae-sequence𝜌𝑊trace𝜌𝑊0for-all𝜌subscript𝑆𝑎𝑏\displaystyle\langle\rho,W\rangle=\Tr(\rho W)\geq 0,\ \forall\rho\in S_{ab}.⟨ italic_ρ , italic_W ⟩ = roman_Tr ( start_ARG italic_ρ italic_W end_ARG ) ≥ 0 , ∀ italic_ρ ∈ italic_S start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT . (3)

Conceptually, a witness W𝑊Witalic_W defines a hyperplane that delineates a set of entangled states it can detect (DW={ρs.t.Tr(ρW)<0})subscript𝐷𝑊𝜌s.t.trace𝜌𝑊0\left(D_{W}=\{\rho\ \text{s.t.}\ \Tr(\rho W)<0\}\right)( italic_D start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = { italic_ρ s.t. roman_Tr ( start_ARG italic_ρ italic_W end_ARG ) < 0 } ) from all other states. When comparing two arbitrary witnesses W1subscript𝑊1W_{1}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and W2subscript𝑊2W_{2}italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, if DW1subscript𝐷subscript𝑊1D_{W_{1}}italic_D start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT is contained within DW2subscript𝐷subscript𝑊2D_{W_{2}}italic_D start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, then W2subscript𝑊2W_{2}italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is considered finer than W1subscript𝑊1W_{1}italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Further insights into this topology are detailed in [28, Lemma 1]. A witness is said to be optimal when no other witness is finer, suggesting that it touches the boundary of the convex set of separable states [29].

To improve the efficacy of identifying entangled states, [2] proposes a method to construct a set of measurements called Witness Operator Measurements (WOM), which we briefly discuss here. Let us consider the rank-one projector onto a pure entangled state denoted by ρ(α)=|ψψ|𝜌𝛼ket𝜓bra𝜓\rho(\alpha)=\ket{\psi}\bra{\psi}italic_ρ ( italic_α ) = | start_ARG italic_ψ end_ARG ⟩ ⟨ start_ARG italic_ψ end_ARG |, where |ψ=cos(α)|00+sin(α)|11ket𝜓𝛼ket00𝛼ket11\ket{\psi}=\cos{\alpha}\ket{00}+\sin{\alpha}\ket{11}| start_ARG italic_ψ end_ARG ⟩ = roman_cos ( start_ARG italic_α end_ARG ) | start_ARG 00 end_ARG ⟩ + roman_sin ( start_ARG italic_α end_ARG ) | start_ARG 11 end_ARG ⟩. Here, the Schmidt coefficients cos(α)𝛼\cos{\alpha}roman_cos ( start_ARG italic_α end_ARG ) and sin(α)𝛼\sin{\alpha}roman_sin ( start_ARG italic_α end_ARG ) are arranged in non-increasing order as 1>cos(α)2sin(α)2>01superscript𝛼2superscript𝛼201>\cos{\alpha}^{2}\geq\sin{\alpha}^{2}>01 > roman_cos ( start_ARG italic_α end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ roman_sin ( start_ARG italic_α end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0. Consequently, α[0,π/4]𝛼0𝜋4\alpha\in[0,\pi/4]italic_α ∈ [ 0 , italic_π / 4 ] is chosen to adhere to this order.

In this paper, we consider the specific form of the witnesses from [2], namely, W=ρw(α)=cos(α)2𝟏ρ(α)2𝑊subscript𝜌𝑤𝛼superscript𝛼21𝜌superscript𝛼subscripttop2W=\rho_{w}(\alpha)=\cos{\alpha}^{2}\mathbf{1}-\rho(\alpha)^{\top_{2}}italic_W = italic_ρ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( italic_α ) = roman_cos ( start_ARG italic_α end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 - italic_ρ ( italic_α ) start_POSTSUPERSCRIPT ⊤ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. That is, consider a rank-one POVM iwiρi=𝟏subscript𝑖subscript𝑤𝑖subscript𝜌𝑖1\sum_{i}w_{i}\rho_{i}=\mathbf{1}∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_1 with outcomes wiρisubscript𝑤𝑖subscript𝜌𝑖w_{i}\rho_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT such that wi>0subscript𝑤𝑖0w_{i}>0italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 and ρisubscript𝜌𝑖\rho_{i}italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are projectors onto pure states with outcomes. We can construct a WOM with outcomes wiρiwsubscript𝑤𝑖subscript𝜌𝑖𝑤w_{i}\rho_{iw}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_i italic_w end_POSTSUBSCRIPT where ρiw=λmax𝟏ρi2subscript𝜌𝑖𝑤subscript𝜆max1superscriptsubscript𝜌𝑖subscripttop2\rho_{iw}=\lambda_{\text{max}}\mathbf{1}-\rho_{i}^{\top_{2}}italic_ρ start_POSTSUBSCRIPT italic_i italic_w end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT bold_1 - italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where 2subscripttop2\top_{2}⊤ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT signifies a transpose operation on the second subsystem and λmaxsubscript𝜆max\lambda_{\text{max}}italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT is the largest eigenvalue across all ρi2superscriptsubscript𝜌𝑖subscripttop2\rho_{i}^{\top_{2}}italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPTs.

II-B Separability criteria for entanglement detection

Using FST techniques, briefly outlined earlier, one can do a tomographic reconstruction of the state and subsequently determine its entanglement status using well-known separability criteria. For bipartite qubit systems, the Peres-Horodecki criterion [30, 31] establishes that a density operator ρabsubscript𝜌𝑎𝑏\rho_{ab}italic_ρ start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT is separable if and only if the eigenvalues of its partial transpose ρab2superscriptsubscript𝜌𝑎𝑏subscripttop2\rho_{ab}^{\top_{2}}italic_ρ start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are non-negative. This criterion remains necessary and sufficient even when da=2subscript𝑑𝑎2d_{a}=2italic_d start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 2 and db=3subscript𝑑𝑏3d_{b}=3italic_d start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = 3 but is violated in higher dimensions by a class of entangled states with non-negative partial transposition. Other criteria include the range criterion [32], the matrix realignment criterion [33], the covariance matrix (CM) criterion [34], and additional methods discussed in [35, 36].

Another criterion for separability is obtained from the Witness Operator Measurements (WOMs) described in Section II-A, which are highly efficient for entanglement detection. We review this criterion from [2] next. Specifically, let us consider two-qubit witnesses of the form:

ρw(α)subscript𝜌𝑤𝛼\displaystyle\rho_{w}(\alpha)italic_ρ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( italic_α ) =cos(α)2𝟏(|ψψ|)2absentsuperscript𝛼21superscriptket𝜓bra𝜓subscripttop2\displaystyle=\cos{\alpha}^{2}\mathbf{1}-\left(\ket{\psi}\bra{\psi}\right)^{% \top_{2}}= roman_cos ( start_ARG italic_α end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 - ( | start_ARG italic_ψ end_ARG ⟩ ⟨ start_ARG italic_ψ end_ARG | ) start_POSTSUPERSCRIPT ⊤ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
=cos(α)2𝟏(1+cos(2α)2|0000|+1cos(2α)2|1111|+sin(2α)2{|Ψ+Ψ+||ΨΨ|}).absentsuperscript𝛼2112𝛼2ket00bra0012𝛼2ket11bra112𝛼2ketsuperscriptΨbrasuperscriptΨketsuperscriptΨbrasuperscriptΨ\displaystyle=\cos{\alpha}^{2}\mathbf{1}-\left(\frac{1+\cos{2\alpha}}{2}\ket{0% 0}\bra{00}+\frac{1-\cos{2\alpha}}{2}\ket{11}\bra{11}+\frac{\sin{2\alpha}}{2}% \left\{\ket{\Psi^{+}}\bra{\Psi^{+}}-\ket{\Psi^{-}}\bra{\Psi^{-}}\right\}\right).= roman_cos ( start_ARG italic_α end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 - ( divide start_ARG 1 + roman_cos ( start_ARG 2 italic_α end_ARG ) end_ARG start_ARG 2 end_ARG | start_ARG 00 end_ARG ⟩ ⟨ start_ARG 00 end_ARG | + divide start_ARG 1 - roman_cos ( start_ARG 2 italic_α end_ARG ) end_ARG start_ARG 2 end_ARG | start_ARG 11 end_ARG ⟩ ⟨ start_ARG 11 end_ARG | + divide start_ARG roman_sin ( start_ARG 2 italic_α end_ARG ) end_ARG start_ARG 2 end_ARG { | start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | - | start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | } ) . (4)

where |ψ=cos(α)|00+sin(α)|11ket𝜓𝛼ket00𝛼ket11\ket{\psi}=\cos{\alpha}\ket{00}+\sin{\alpha}\ket{11}| start_ARG italic_ψ end_ARG ⟩ = roman_cos ( start_ARG italic_α end_ARG ) | start_ARG 00 end_ARG ⟩ + roman_sin ( start_ARG italic_α end_ARG ) | start_ARG 11 end_ARG ⟩ such that α[0,π/4]𝛼0𝜋4\alpha\in[0,\pi/4]italic_α ∈ [ 0 , italic_π / 4 ] and |Ψ±=(|01±|10)/2ketsuperscriptΨplus-or-minusplus-or-minusket01ket102\ket{\Psi^{\pm}}=\left(\ket{01}\pm\ket{10}\right)/\sqrt{2}| start_ARG roman_Ψ start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT end_ARG ⟩ = ( | start_ARG 01 end_ARG ⟩ ± | start_ARG 10 end_ARG ⟩ ) / square-root start_ARG 2 end_ARG. We denote the projectors onto the set of eigenstates of ρ(α)=(|ψψ|)2𝜌𝛼superscriptket𝜓bra𝜓subscripttop2\rho(\alpha)=\left(\ket{\psi}\bra{\psi}\right)^{\top_{2}}italic_ρ ( italic_α ) = ( | start_ARG italic_ψ end_ARG ⟩ ⟨ start_ARG italic_ψ end_ARG | ) start_POSTSUPERSCRIPT ⊤ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT by ={|0000|,|1111|,|Ψ+Ψ+|,|ΨΨ|}ket00bra00ket11bra11ketsuperscriptΨbrasuperscriptΨketsuperscriptΨbrasuperscriptΨ\mathcal{E}=\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+}},% \ket{\Psi^{-}}\bra{\Psi^{-}}\}caligraphic_E = { | start_ARG 00 end_ARG ⟩ ⟨ start_ARG 00 end_ARG | , | start_ARG 11 end_ARG ⟩ ⟨ start_ARG 11 end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | }. Each operator Eisubscript𝐸𝑖E_{i}\in\mathcal{E}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_E satisfies Ei=Eisubscript𝐸𝑖superscriptsubscript𝐸𝑖E_{i}=E_{i}^{{\dagger}}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT, Ei0subscript𝐸𝑖0E_{i}\geq 0italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0, and iEi=𝟏subscript𝑖subscript𝐸𝑖1\sum_{i}E_{i}=\mathbf{1}∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_1, forming a Positive Operator-Valued Measure (POVM). Throughout the paper, we refer to this POVM as a Witness Basis Measurement (WBM).

Let us consider a quantum state ρ𝜌\rhoitalic_ρ. Let fiTr(Eiρ)subscript𝑓𝑖tracesubscript𝐸𝑖𝜌f_{i}\coloneqq\Tr{E_{i}\rho}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ roman_Tr ( start_ARG italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ρ end_ARG ) be the probability of obtaining outcome i𝑖iitalic_i when the state ρ𝜌\rhoitalic_ρ is measured using WBM \mathcal{E}caligraphic_E. The expected value of the witness Tr(ρw(α)ρ)tracesubscript𝜌𝑤𝛼𝜌\Tr(\rho_{w}(\alpha)\rho)roman_Tr ( start_ARG italic_ρ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( italic_α ) italic_ρ end_ARG ) can be expressed in terms of fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. If this expected value is less than a certain threshold (in our case, 0), we can conclude that ρ𝜌\rhoitalic_ρ is entangled else, this test is inconclusive. When this test is inconclusive, we pick the witnesses in Table I sequentially. These subsequent witnesses are obtained by applying unitary transformations U1subscript𝑈1U_{1}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and U2subscript𝑈2U_{2}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT on each of the qubits to change in the eigenbasis of the underlying state as shown in (5).

ρw(α)(U1U2)ρw(α)(U1U2).subscript𝜌𝑤𝛼superscripttensor-productsubscript𝑈1subscript𝑈2subscript𝜌𝑤𝛼tensor-productsubscript𝑈1subscript𝑈2\rho_{w}(\alpha)\longrightarrow(U_{1}\otimes U_{2})^{\dagger}\rho_{w}(\alpha)(% U_{1}\otimes U_{2}).italic_ρ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( italic_α ) ⟶ ( italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_ρ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( italic_α ) ( italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . (5)
Witness U1subscript𝑈1U_{1}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT U2subscript𝑈2U_{2}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
1 𝟏1\mathbf{1}bold_1 𝟏1\mathbf{1}bold_1
2 𝟏1\mathbf{1}bold_1 X𝑋Xitalic_X
3 Csuperscript𝐶C^{\dagger}italic_C start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT C𝐶Citalic_C
4 Csuperscript𝐶C^{\dagger}italic_C start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT XC𝑋𝐶XCitalic_X italic_C
5 C𝐶Citalic_C Csuperscript𝐶C^{\dagger}italic_C start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT
6 C𝐶Citalic_C XC𝑋superscript𝐶XC^{\dagger}italic_X italic_C start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT
Table I: Changing the eigenbasis of (4)

Expressing the eigenstates of the first witeness (4) in terms of Pauli operators yields three observables: Z𝟏+𝟏Z𝑍11𝑍Z\mathbf{1}+\mathbf{1}Zitalic_Z bold_1 + bold_1 italic_Z, ZZ𝑍𝑍ZZitalic_Z italic_Z, and XX+YY𝑋𝑋𝑌𝑌XX+YYitalic_X italic_X + italic_Y italic_Y. Estimates for these three observables come from measuring the first witness. Similarly, the second witness listed in Table I yields estimates for Z𝟏𝟏Z𝑍11𝑍Z\mathbf{1}-\mathbf{1}Zitalic_Z bold_1 - bold_1 italic_Z, ZZ𝑍𝑍ZZitalic_Z italic_Z, and XX+YY𝑋𝑋𝑌𝑌XX+YYitalic_X italic_X + italic_Y italic_Y. Thus, for a pair of witnesses, we obtain estimates for five observables by applying suitable unitary transformations, and each of the other two witness pairs provides another five expectation values. In total, we obtain estimates for 15 expectation values, providing sufficient information about the two-qubit state. This, reduction of the number of witnesses from sixteen to six offers significant practical benefits. Instead of relying solely on comparing the expected value of the witness ρw(α)subscript𝜌𝑤𝛼\rho_{w}(\alpha)italic_ρ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( italic_α ) against a threshold, the authors [2] suggest adopting a more stringent criterion:

minαTr(ρsep(cos(α)2𝟏ρw(α)))0,ρsepSab.formulae-sequencesubscript𝛼tracesubscript𝜌sepsuperscript𝛼21subscript𝜌𝑤𝛼0for-allsubscript𝜌sepsubscript𝑆𝑎𝑏\min_{\alpha}\Tr{\rho_{\text{sep}}\left(\cos{\alpha}^{2}\mathbf{1}-\rho_{w}(% \alpha)\right)}\geq 0,\ \ \forall\rho_{\text{sep}}\in S_{ab}.roman_min start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT roman_Tr ( start_ARG italic_ρ start_POSTSUBSCRIPT sep end_POSTSUBSCRIPT ( roman_cos ( start_ARG italic_α end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_1 - italic_ρ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( italic_α ) ) end_ARG ) ≥ 0 , ∀ italic_ρ start_POSTSUBSCRIPT sep end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT . (6)

which holds for all separable states and is violated by set of entangled states that can be detected by this family of witnesses. The above optimisation leads to the following quadratic WBM criterion,

S=4f1f2(f3f4)20,ρsepSab.formulae-sequence𝑆4subscript𝑓1subscript𝑓2superscriptsubscript𝑓3subscript𝑓420for-allsubscript𝜌sepsubscript𝑆𝑎𝑏S=4f_{1}f_{2}-(f_{3}-f_{4})^{2}\geq 0,\ \ \forall\rho_{\text{sep}}\in S_{ab}.italic_S = 4 italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - ( italic_f start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 0 , ∀ italic_ρ start_POSTSUBSCRIPT sep end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_a italic_b end_POSTSUBSCRIPT . (7)

In essence, the process of measuring the linear entanglement witnesses ρw(α)subscript𝜌𝑤𝛼\rho_{w}(\alpha)italic_ρ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ( italic_α ) corresponds to measuring the projectors onto the eigenstate basis. It is important to note that the value of S𝑆Sitalic_S (7) depends on the underlying WBM. Thus, for a WBM \mathcal{E}caligraphic_E and state ρ𝜌\rhoitalic_ρ, we denote (7) as S(ρ)subscript𝑆𝜌S_{\mathcal{E}}(\rho)italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ ).

II-C Stochastic Multi-Armed Bandits

The stochastic Multi-Armed Bandit (MAB) framework is an archetype for many sequential decision-making problems. Within this framework, a bandit instance (problem instance) encompasses K𝐾Kitalic_K arms (or actions) situated in an environment where stochastic rewards are yielded upon the selection of an arm (termed pulling) or the execution of an action. We note that each arm i[K]={1,2,,K}𝑖delimited-[]𝐾12𝐾i\in[K]=\{1,2,\ldots,K\}italic_i ∈ [ italic_K ] = { 1 , 2 , … , italic_K } is described by a probability distribution νisubscript𝜈𝑖\nu_{i}italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over \mathbb{R}blackboard_R, with known support and an unknown expectation μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We denote the problem instance by 𝝁=(μ1,μ2,μK)𝝁subscript𝜇1subscript𝜇2subscript𝜇𝐾\boldsymbol{\mu}=(\mu_{1},\mu_{2},\ldots\mu_{K})bold_italic_μ = ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_μ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ). Arm selection occurs iteratively in rounds, where during each round t𝑡titalic_t, a learner (or agent) selects an arm Xt[K]subscript𝑋𝑡delimited-[]𝐾X_{t}\in[K]italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ italic_K ] according to a specified policy. Subsequently, the learner receives a stochastic reward ZtνXtsimilar-tosubscript𝑍𝑡subscript𝜈subscript𝑋𝑡Z_{t}\sim\nu_{X_{t}}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_ν start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT corresponding to the selected arm. Upon receiving the reward, the learner can terminate the process or continue by updating its policy to pursue a specific objective.

In the MAB literature, two objectives have been focal points of study. The first objective involves maximizing the cumulative reward accumulated over multiple game rounds, necessitating a trade-off between exploration (discovering arms with potentially higher rewards) and exploitation (repeatedly pulling the arm with the highest observed reward). The second objective, termed the best arm identification (BAI) problem, focuses on pure exploration, where the learner aims to identify the arm with the highest expected reward, , i.e., i=argmaxiμisuperscript𝑖subscript𝑖subscript𝜇𝑖i^{\star}=\arg\max_{i}\mu_{i}italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (known as the best arm). A BAI policy (or algorithm) consists of a sampling rule for arm selection, a stop** rule to determine the end of exploration and a recommendation rule to output the best arm. The BAI problem has been explored in two distinct settings: fixed confidence and fixed budget. In the fixed confidence setting, the acceptance error δ𝛿\deltaitalic_δ is fixed, aiming to identify the best arm with a probability of at least 1δ1𝛿1-\delta1 - italic_δ while minimizing arm pulls. In the fixed-budget setting, the number of arm pulls (budget) T𝑇T\in\mathbb{N}italic_T ∈ blackboard_N is fixed, and the goal is to minimize the mis-identification probability of the best arm within the allotted budget. Our paper concentrates on the BAI problem, and one of its variants called good arm identification (GAI) in the fixed confidence setting. Below, we summarise some relevant findings from prior research.

II-C1 Fixed Confidence Best Arm Identification

Consider a problem instance denoted by 𝝁𝝁\boldsymbol{\mu}bold_italic_μ. Without loss of generality, we can enumerate the arms based on their expected rewards, such that μ1>μ2μ3μKsubscript𝜇1subscript𝜇2subscript𝜇3subscript𝜇𝐾\mu_{1}>\mu_{2}\geq\mu_{3}\ldots\geq\mu_{K}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_μ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT … ≥ italic_μ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT. We assume the existence of a unique best arm, denoted as i=1superscript𝑖1i^{\star}=1italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = 1. Here, we denote the sub-optimal gaps between the arms as Δi=μiμisubscriptΔ𝑖subscript𝜇superscript𝑖subscript𝜇𝑖\Delta_{i}=\mu_{i^{\star}}-\mu_{i}roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The learner’s objective is to accurately identify the best arm isuperscript𝑖i^{\star}italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT while minimizing the number of samples used. Policies that achieve this task are classified as δ𝛿\deltaitalic_δ-PC policies, as defined below.

Definition 2 (δ𝛿\deltaitalic_δ-PC).

Let i^τsubscript^𝑖𝜏\hat{i}_{\tau}over^ start_ARG italic_i end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT be the estimate of the best arm at stoppage τ𝜏\tauitalic_τ. Then, an algorithm is said to be δ𝛿\deltaitalic_δ-PAC if it satisfies,

𝝁(i^τi)δ,𝝁(τ<)=1.formulae-sequencesubscript𝝁subscript^𝑖𝜏superscript𝑖𝛿subscript𝝁𝜏1\mathbb{P}_{\boldsymbol{\mu}}(\hat{i}_{\tau}\neq i^{\star})\leq\delta,\ % \mathbb{P}_{\boldsymbol{\mu}}(\tau<\infty)=1.blackboard_P start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT ( over^ start_ARG italic_i end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ≠ italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ≤ italic_δ , blackboard_P start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT ( italic_τ < ∞ ) = 1 . (8)

The primary objective is to characterize the expected stop** time 𝔼𝝁[τ]subscript𝔼𝝁delimited-[]𝜏\mathbb{E}_{\boldsymbol{\mu}}[\tau]blackboard_E start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT [ italic_τ ] of the BAI policy. Various research works have attempted to provide upper and lower bounds for this objective. For instance, the successive elimination procedure has been proposed to identify the best arm in 𝒪(Δ2log(nΔ2))𝒪superscriptΔ2𝑛superscriptΔ2\mathcal{O}(\Delta^{-2}\log(n\Delta^{-2}))caligraphic_O ( roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_log ( start_ARG italic_n roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_ARG ) ) samples [37]. In comparison, the Lower-Upper Confidence Bound algorithm (LUCB1111) improves upon this by requiring 𝒪(Δ2log(Δ2))𝒪superscriptΔ2superscriptΔ2\mathcal{O}(\Delta^{-2}\log(\Delta^{-2}))caligraphic_O ( roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_log ( start_ARG roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_ARG ) ) samples [38]. Additionally, the exponential-gap elimination algorithm achieves a sample complexity of 𝒪(Δ2log(log(Δ2)))𝒪superscriptΔ2superscriptΔ2\mathcal{O}(\Delta^{-2}\log(\log(\Delta^{-2})))caligraphic_O ( roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_log ( start_ARG roman_log ( start_ARG roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_ARG ) end_ARG ) ), which is the best-known in the class of elimination-style policies for BAI under the fixed confidence setting [39]. These upper bounds exhibit a closeness to the lower bound 𝒪(Δ2)𝒪superscriptΔ2\mathcal{O}(\Delta^{-2})caligraphic_O ( roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) postulated in [40], typically within a factor of log\logroman_log or loglog\log\logroman_log roman_log. Notably, the seminal findings of [41] which uses the principles of the Law of Iterated Logarithm (LIL), bridge this gap by delineating the necessity and sufficiency of 𝒪(Δ2log(log(Δ2)))𝒪superscriptΔ2superscriptΔ2\mathcal{O}(\Delta^{-2}\log(\log(\Delta^{-2})))caligraphic_O ( roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_log ( start_ARG roman_log ( start_ARG roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_ARG ) end_ARG ) ) samples for accurately identifying the best arm within a specified error margin of δ𝛿\deltaitalic_δ. Building upon this insight, [42] proposes lil’UCB, which leverages concentration bounds based on a finite version of the LIL, achieving order optimality in sample complexity akin to exponential-gap elimination.

II-C2 Fixed Confidence Good Arm Identification

Consider a problem instance 𝝁𝝁\boldsymbol{\mu}bold_italic_μ. Alongside the acceptance error δ𝛿\deltaitalic_δ described in Section II-C1, we introduce a threshold ζ(0,1)𝜁01\zeta\in(0,1)italic_ζ ∈ ( 0 , 1 ) and define the set of “good” arms as 𝒢={i[K] such that μiζ}𝒢𝑖delimited-[]𝐾 such that subscript𝜇𝑖𝜁\mathcal{G}=\{i\in[K]\text{ such that }\mu_{i}\geq\zeta\}caligraphic_G = { italic_i ∈ [ italic_K ] such that italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ italic_ζ }. In simpler terms, the good arms are those whose means are greater than or equal to ζ𝜁\zetaitalic_ζ. The number of good arms |𝒢|=m𝒢𝑚|\mathcal{G}|=m| caligraphic_G | = italic_m remains unknown to the agent, leading to what we term as the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-GAI problem. Notably, the (1,K)1𝐾(1,K)( 1 , italic_K )-GAI reduces to the BAI problem discussed earlier. Without loss of generality, we enumerate the arms based on their expected rewards: μ1>μ2μmζμm+1μKsubscript𝜇1subscript𝜇2subscript𝜇𝑚𝜁subscript𝜇𝑚1subscript𝜇𝐾\mu_{1}>\mu_{2}\geq\ldots\geq\mu_{m}\geq\zeta\geq\mu_{m+1}\ldots\geq\mu_{K}italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ … ≥ italic_μ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ≥ italic_ζ ≥ italic_μ start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT … ≥ italic_μ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT. Importantly, the agent is unaware of this indexing. For i[K]𝑖delimited-[]𝐾i\in[K]italic_i ∈ [ italic_K ], Δi|μiζ|subscriptΔ𝑖subscript𝜇𝑖𝜁\Delta_{i}\coloneqq|\mu_{i}-\zeta|roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ | italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ζ | and Δi,j=μiμjsubscriptΔ𝑖𝑗subscript𝜇𝑖subscript𝜇𝑗\Delta_{i,j}=\mu_{i}-\mu_{j}roman_Δ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. The sample complexity is expressed in terms of Δ=min(mini[K]Δi,minj[K1]Δj,j+12)Δsubscript𝑖delimited-[]𝐾subscriptΔ𝑖subscript𝑗delimited-[]𝐾1subscriptΔ𝑗𝑗12\Delta=\min(\min_{i\in[K]}\Delta_{i},\min_{j\in[K-1]}\frac{\Delta_{j,j+1}}{2})roman_Δ = roman_min ( roman_min start_POSTSUBSCRIPT italic_i ∈ [ italic_K ] end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , roman_min start_POSTSUBSCRIPT italic_j ∈ [ italic_K - 1 ] end_POSTSUBSCRIPT divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_j , italic_j + 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ).

At each time instant t𝑡titalic_t, the learner samples an arm Xt[K]subscript𝑋𝑡delimited-[]𝐾X_{t}\in[K]italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ italic_K ] and receives a corresponding (random) reward ZtνXtsimilar-tosubscript𝑍𝑡subscript𝜈subscript𝑋𝑡Z_{t}\sim\nu_{X_{t}}italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_ν start_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT. The agent either outputs an arm that identifies as “good” or stops when no good arms remain. We denote the stop** time of the GAI policy as τstopsubscript𝜏stop\tau_{\text{stop}}italic_τ start_POSTSUBSCRIPT stop end_POSTSUBSCRIPT. Specifically, the agent outputs X^1,X^2,X^m^subscript^𝑋1subscript^𝑋2subscript^𝑋^𝑚\hat{X}_{1},\hat{X}_{2},\ldots\hat{X}_{\hat{m}}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_m end_ARG end_POSTSUBSCRIPT as good arms at rounds τ1,τ2,τm^subscript𝜏1subscript𝜏2subscript𝜏^𝑚\tau_{1},\tau_{2},\ldots\tau_{\hat{m}}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_τ start_POSTSUBSCRIPT over^ start_ARG italic_m end_ARG end_POSTSUBSCRIPT respectively, where m^^𝑚\hat{m}over^ start_ARG italic_m end_ARG denotes the estimate of the number of arms identified as good ones. The learner’s objective is to accurately and rapidly identify these good arms while minimizing the number of samples used. As elaborated below, this is achieved through policies falling within the class of (λ,δ)𝜆𝛿(\lambda,\delta)( italic_λ , italic_δ )-PAC policies.

Definition 3 ((λ,δ)𝜆𝛿(\lambda,\delta)( italic_λ , italic_δ )-PAC).

Let m^^𝑚\hat{m}over^ start_ARG italic_m end_ARG denote the number of good arms identified by the agent. A (λ,δ)𝜆𝛿(\lambda,\delta)( italic_λ , italic_δ )-PAC algorithm satisfies the following conditions:

  1. 1.

    If there are at least λ𝜆\lambdaitalic_λ good arms, then

    𝝁[{m^<λ}i{X^1,X^2,X^λ}{μi<ζ}]δ,subscript𝝁delimited-[]^𝑚𝜆subscript𝑖subscript^𝑋1subscript^𝑋2subscript^𝑋𝜆subscript𝜇𝑖𝜁𝛿\mathbb{P}_{\boldsymbol{\mu}}\left[\{\hat{m}<\lambda\}\cup\bigcup_{i\in\{\hat{% X}_{1},\hat{X}_{2},\ldots\hat{X}_{\lambda}\}}\{\mu_{i}<\zeta\}\right]\leq\delta,blackboard_P start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT [ { over^ start_ARG italic_m end_ARG < italic_λ } ∪ ⋃ start_POSTSUBSCRIPT italic_i ∈ { over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT } end_POSTSUBSCRIPT { italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_ζ } ] ≤ italic_δ ,
  2. 2.

    If there are fewer than λ𝜆\lambdaitalic_λ good arms,

    𝝁[m^λ]δ,subscript𝝁delimited-[]^𝑚𝜆𝛿\mathbb{P}_{\boldsymbol{\mu}}\left[\hat{m}\geq\lambda\right]\leq\delta,blackboard_P start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT [ over^ start_ARG italic_m end_ARG ≥ italic_λ ] ≤ italic_δ ,

An algorithm is called δ𝛿\deltaitalic_δ-PAC if it is (λ,δ)𝜆𝛿(\lambda,\delta)( italic_λ , italic_δ )-PAC for all λ[K]𝜆delimited-[]𝐾\lambda\in[K]italic_λ ∈ [ italic_K ].

Just like in the BAI context (refer to Section II-C1), the objective in GAI is to determine the expected stop** time 𝔼𝝁[τstop]subscript𝔼𝝁delimited-[]subscript𝜏stop\mathbb{E}_{\boldsymbol{\mu}}[\tau_{\text{stop}}]blackboard_E start_POSTSUBSCRIPT bold_italic_μ end_POSTSUBSCRIPT [ italic_τ start_POSTSUBSCRIPT stop end_POSTSUBSCRIPT ]. The GAI algorithm consists of two key components: a sampling rule and an identification rule. The former dictates the arm selection process, while the latter guides the agent in distinguishing between good and bad arms. GAI confronts a novel challenge called the exploration-exploitation dilemma of confidence. Here, exploration involves the agent pulling arms other than the empirical best arm to identify potentially ‘good’ arms with fewer pulls. At the same time, exploitation entails pulling the empirical best arm to increase confidence in its classification as a good arm. To address this challenge, [27] proposed a hybrid algorithm for the dilemma of confidence (HDoC). In HDoC, the sampling rule is derived from the UCB algorithm for cumulative regret minimization [25], while the identification rule is based on the LUCB algorithm for BAI [38] and the APT algorithm for the thresholding bandits problem [43]. The proposed HDoC algorithm (LUCB-G) requires 𝒪(Δ2(Klog1δ+KlogK+Klog1Δ))𝒪superscriptΔ2𝐾1𝛿𝐾𝐾𝐾1Δ\mathcal{O}\left(\Delta^{-2}\left(K\log\frac{1}{\delta}+K\log K+K\log\frac{1}{% \Delta}\right)\right)caligraphic_O ( roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ( italic_K roman_log divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG + italic_K roman_log italic_K + italic_K roman_log divide start_ARG 1 end_ARG start_ARG roman_Δ end_ARG ) ) samples. However, a drawback of the LUCB-G algorithm is its impracticality when ΔΔ\Deltaroman_Δ is very small. To address this issue and achieve faster convergence in the identification phase, [44] propose utilizing confidence widths derived from the finite LIL bound, akin to the approach in the lil’UCB algorithm [42]. They demonstrate a reduction in the required number of samples, achieving a sample complexity of 𝒪(Δ2(Klog1δ+KlogK+Kloglog1Δ))𝒪superscriptΔ2𝐾1𝛿𝐾𝐾𝐾1Δ\mathcal{O}\left(\Delta^{-2}\left(K\log\frac{1}{\delta}+K\log K+K\log\log\frac% {1}{\Delta}\right)\right)caligraphic_O ( roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ( italic_K roman_log divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG + italic_K roman_log italic_K + italic_K roman_log roman_log divide start_ARG 1 end_ARG start_ARG roman_Δ end_ARG ) ). The specific connections between BAI/GAI and entanglement detection are elaborated in Section III and IV.

III The Quantum MAB Framework For Entanglement Detection

In this section, we introduce the quantum Multi-Armed Bandit (MAB) framework for entanglement detection. First, we highlight the structural similarity between this framework and the stochastic MAB model. In stochastic MAB, pulling an arm i𝑖iitalic_i corresponds to sampling from a probability distribution pi()subscript𝑝𝑖p_{i}(\cdot)italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) with known support and unknown mean μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. When an arm is pulled, a reward j𝑗jitalic_j is obtained with probability (w.p.) pi(j)subscript𝑝𝑖𝑗p_{i}(j)italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_j ). In each round, different arms can be pulled, yielding independent and identically distributed (i.i.d.) rewards. Analogously, in the quantum setting, each arm represents an unknown quantum state ρ𝜌\rhoitalic_ρ. When ρ𝜌\rhoitalic_ρ is measured, the underlying probability distribution of the rewards is determined by the measurement \mathcal{E}caligraphic_E. Specifically, if a Witness Basis Measurement (WBM) \mathcal{E}caligraphic_E is chosen, measuring a state ρ𝜌\rhoitalic_ρ with \mathcal{E}caligraphic_E will result in a reward j{1,2,3,4}𝑗1234j\in\{1,2,3,4\}italic_j ∈ { 1 , 2 , 3 , 4 } with probability Tr(ρEj)trace𝜌subscript𝐸𝑗\Tr(\rho E_{j})roman_Tr ( start_ARG italic_ρ italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ). Once the measurement is fixed, the rewards obtained from measuring ρ𝜌\rhoitalic_ρ are i.i.d. The subtle difference between the two models lies in the source of the rewards. In the stochastic MAB model, rewards are obtained by sampling from i.i.d. distributions, whereas in the quantum MAB model, the rewards depend on the chosen WBM.

In the Best Arm Identification (BAI) setting of stochastic MAB, the primary parameters of interest are the means of the rewards. Similarly, in the quantum analogue, S(ρ)subscript𝑆𝜌S_{\mathcal{E}}(\rho)italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ ) is the parameter of interest. As discussed in Section II-B, for a given state ρ𝜌\rhoitalic_ρ and WBM \mathcal{E}caligraphic_E, the value of S(ρ)subscript𝑆𝜌S_{\mathcal{E}}(\rho)italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ ) determines whether the state is entangled. The specific problem we consider involves K𝐾Kitalic_K arms (states), of which m𝑚mitalic_m are bad (entangled), and our goal is to identify these entangled states. We summarize this correspondence concisely in Table II.

Table II: Stochastic-Quantum MAB
Attributes Stochastic MAB Quantum MAB
Arms Probability distributions (p1,p2,pK)subscript𝑝1subscript𝑝2subscript𝑝𝐾(p_{1},p_{2},\ldots p_{K})( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_p start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) Density operators {ρ1,ρ2,,ρK}subscript𝜌1subscript𝜌2subscript𝜌𝐾\{\rho_{1},\rho_{2},\ldots,\rho_{K}\}{ italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_ρ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT }
Measurement -- WBM \mathcal{E}caligraphic_E
Measurement Data jw.p.pi(j),i[K]𝑗w.p.subscript𝑝𝑖𝑗for-all𝑖delimited-[]𝐾j\ \text{w.p.}\ p_{i}(j),\forall i\in[K]italic_j w.p. italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_j ) , ∀ italic_i ∈ [ italic_K ] jw.p.Tr(Ejρi),j[4],i[K]formulae-sequence𝑗w.p.tracesubscript𝐸𝑗subscript𝜌𝑖for-all𝑗delimited-[]4for-all𝑖delimited-[]𝐾j\ \text{w.p.}\ \Tr(E_{j}\ \rho_{i}),\ \forall j\in[4],\forall i\in[K]italic_j w.p. roman_Tr ( start_ARG italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) , ∀ italic_j ∈ [ 4 ] , ∀ italic_i ∈ [ italic_K ]
Parameters to estimate 𝝁=(μ1,μ2,μK)𝝁subscript𝜇1subscript𝜇2subscript𝜇𝐾\boldsymbol{\mu}=(\mu_{1},\mu_{2},\ldots\mu_{K})bold_italic_μ = ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_μ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) 𝑺=(S(ρ1),S(ρ2),,S(ρK))subscript𝑺subscript𝑆subscript𝜌1subscript𝑆subscript𝜌2subscript𝑆subscript𝜌𝐾\boldsymbol{S}_{\mathcal{E}}=(S_{\mathcal{E}}(\rho_{1}),S_{\mathcal{E}}(\rho_{% 2}),\ldots,S_{\mathcal{E}}(\rho_{K}))bold_italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT = ( italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) )
Objective Identify 𝒢C={i[K]such thatμiζ}superscript𝒢𝐶𝑖delimited-[]𝐾such thatsubscript𝜇𝑖𝜁\mathcal{G}^{C}=\{i\in[K]\ \text{such that}\ \mu_{i}\leq\zeta\}caligraphic_G start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT = { italic_i ∈ [ italic_K ] such that italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_ζ } Identify 𝒜ent={i[K]such thatS(ρi)<0}subscript𝒜ent𝑖delimited-[]𝐾such thatsubscript𝑆subscript𝜌𝑖0\mathcal{A}_{\text{ent}}=\{i\in[K]\ \text{such that}\ S_{\mathcal{E}}(\rho_{i}% )<0\}caligraphic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT = { italic_i ∈ [ italic_K ] such that italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < 0 }

More formally, the objective of the learner is to accurately identify 𝒜ent={i[K]such thatS(ρi)<0}subscript𝒜ent𝑖delimited-[]𝐾such thatsubscript𝑆subscript𝜌𝑖0\mathcal{A}_{\text{ent}}=\{i\in[K]\ \text{such that}\ S_{\mathcal{E}}(\rho_{i}% )<0\}caligraphic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT = { italic_i ∈ [ italic_K ] such that italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < 0 }, while minimizing the number of measurements. This aligns with the goal of the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-Bad Arm identification which aims to identify all those arms 𝒢C={i[K]such thatμiζ}superscript𝒢𝐶𝑖delimited-[]𝐾such thatsubscript𝜇𝑖𝜁\mathcal{G}^{C}=\{i\in[K]\ \text{such that}\ \mu_{i}\leq\zeta\}caligraphic_G start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT = { italic_i ∈ [ italic_K ] such that italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_ζ } whose means μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT fall below a specified threshold ζ𝜁\zetaitalic_ζ. In essence, solving the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-Bad Arm identification is tantamount to addressing the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum MAB problem. We define the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum MAB setting as follows,

Definition 4.

The (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum Multi-Armed Bandit (MAB) setting for entanglement detection is fully characterized by the tuple (𝒜,)𝒜(\mathcal{A},\mathcal{E})( caligraphic_A , caligraphic_E ). Here, 𝒜𝒜\mathcal{A}caligraphic_A denotes a finite action set with |𝒜|=K𝒜𝐾|\mathcal{A}|=K| caligraphic_A | = italic_K, consisting of (Km)𝐾𝑚(K-m)( italic_K - italic_m ) two-qubit separable states and m𝑚mitalic_m two-qubit entangled states. The term \mathcal{E}caligraphic_E corresponds to a suitable Witness Basis Measurement (WBM).

Remark 1.

The d𝑑ditalic_d-dimensional discrete multi-armed quantum bandit model [45] is different from our formulation. The authors consider arms to be a finite set of observables and the environment, an unknown quantum state ρ𝜌\rhoitalic_ρ. The objective is to learn the unknown quantum state ρ𝜌\rhoitalic_ρ through an exploration-exploitation tradeoff. Given sequential oracle access to copies of ρ𝜌\rhoitalic_ρ, each round involves selecting an observable to maximize its expectation value (reward). The information from previous rounds (history) aids in refining the action choice, thereby minimizing the regret, which is the difference between the obtained and maximal rewards. The authors also exploit the inherent linear structure in measurement outcomes and map it to the linear bandit setting. Specifically, let {σ}i=1d2superscriptsubscript𝜎𝑖1superscript𝑑2\{\sigma\}_{i=1}^{d^{2}}{ italic_σ } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT be a set of orthogonal Hermitian matrices. The unknown environment ρ=i=1d2Tr(ρσi)σi=i=1d2θiσi𝜌superscriptsubscript𝑖1superscript𝑑2trace𝜌subscript𝜎𝑖subscript𝜎𝑖superscriptsubscript𝑖1superscript𝑑2subscript𝜃𝑖subscript𝜎𝑖\rho=\sum_{i=1}^{d^{2}}\Tr(\rho\sigma_{i})\sigma_{i}=\sum_{i=1}^{d^{2}}\theta_% {i}\sigma_{i}italic_ρ = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT roman_Tr ( start_ARG italic_ρ italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and arm 𝒪t=i=1d2Tr(𝒪tσi)σi=i=1d2At,iσisubscript𝒪𝑡superscriptsubscript𝑖1superscript𝑑2tracesubscript𝒪𝑡subscript𝜎𝑖subscript𝜎𝑖superscriptsubscript𝑖1superscript𝑑2subscript𝐴𝑡𝑖subscript𝜎𝑖\mathcal{O}_{t}=\sum_{i=1}^{d^{2}}\Tr(\mathcal{O}_{t}\sigma_{i})\sigma_{i}=% \sum_{i=1}^{d^{2}}A_{t,i}\sigma_{i}caligraphic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT roman_Tr ( start_ARG caligraphic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then, Tr(ρ𝒪t)=𝛉𝐀ttrace𝜌subscript𝒪𝑡superscript𝛉topsubscript𝐀𝑡\Tr(\rho\mathcal{O}_{t})=\boldsymbol{\theta}^{\top}\mathbf{A}_{t}roman_Tr ( start_ARG italic_ρ caligraphic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ) = bold_italic_θ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT where 𝛉=(θ1,θ2,θd2)𝛉subscript𝜃1subscript𝜃2subscript𝜃superscript𝑑2\boldsymbol{\theta}=(\theta_{1},\theta_{2},\ldots\theta_{d^{2}})bold_italic_θ = ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_θ start_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) and 𝐀t=(At,1,At,2,At,d2)subscript𝐀𝑡subscript𝐴𝑡1subscript𝐴𝑡2subscript𝐴𝑡superscript𝑑2\mathbf{A}_{t}=(A_{t,1},A_{t,2},\ldots A_{t,d^{2}})bold_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_A start_POSTSUBSCRIPT italic_t , 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT italic_t , 2 end_POSTSUBSCRIPT , … italic_A start_POSTSUBSCRIPT italic_t , italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ). In round t𝑡titalic_t, pulling arm 𝒪tsubscript𝒪𝑡\mathcal{O}_{t}caligraphic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT provides a reward Xt=𝛉𝐀t+ηtsubscript𝑋𝑡superscript𝛉topsubscript𝐀𝑡subscript𝜂𝑡X_{t}=\boldsymbol{\theta}^{\top}\mathbf{A}_{t}+\eta_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_θ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is 1-subgaussian.

To demonstrate the functionality of MAB policies, we identify suitable WBMs for families of parameterized two-qubit states denoted by \mathcal{F}caligraphic_F.

III-A Two-qubit Depolarized Bell States

For p,13p1formulae-sequence𝑝13𝑝1p\in\mathbb{R},\frac{-1}{3}\leq p\leq 1italic_p ∈ blackboard_R , divide start_ARG - 1 end_ARG start_ARG 3 end_ARG ≤ italic_p ≤ 1, a two-qubit Depolarized Bell state ρ(p)𝜌𝑝\rho(p)italic_ρ ( italic_p ) is given by,

ρ(p)=p|ΥΥ|+(1p)𝟏4.𝜌𝑝𝑝ketΥbraΥ1𝑝14\rho(p)=p\ket{\Upsilon}\bra{\Upsilon}+(1-p)\frac{\mathbf{1}}{4}.italic_ρ ( italic_p ) = italic_p | start_ARG roman_Υ end_ARG ⟩ ⟨ start_ARG roman_Υ end_ARG | + ( 1 - italic_p ) divide start_ARG bold_1 end_ARG start_ARG 4 end_ARG . (9)

Here, |ΥketΥ\ket{\Upsilon}| start_ARG roman_Υ end_ARG ⟩ represents any one of the four Bell states |Ψ±=(|01±|10)/2ketsuperscriptΨplus-or-minusplus-or-minusket01ket102\ket{\Psi^{\pm}}=\left(\ket{01}\pm\ket{10}\right)/\sqrt{2}| start_ARG roman_Ψ start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT end_ARG ⟩ = ( | start_ARG 01 end_ARG ⟩ ± | start_ARG 10 end_ARG ⟩ ) / square-root start_ARG 2 end_ARG, |Φ±=(|00±|11)/2ketsuperscriptΦplus-or-minusplus-or-minusket00ket112\ket{\Phi^{\pm}}=\left(\ket{00}\pm\ket{11}\right)/\sqrt{2}| start_ARG roman_Φ start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT end_ARG ⟩ = ( | start_ARG 00 end_ARG ⟩ ± | start_ARG 11 end_ARG ⟩ ) / square-root start_ARG 2 end_ARG. When Υ=|ΨΥketsuperscriptΨ\Upsilon=\ket{\Psi^{-}}roman_Υ = | start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩, (9) is called a Werner state and when Υ=|Φ+ΥketsuperscriptΦ\Upsilon=\ket{\Phi^{+}}roman_Υ = | start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩, it is called an Isotropic state. The Peres-Horodecki criterion guarantees that ρ(p)𝜌𝑝\rho(p)italic_ρ ( italic_p ) is separable when 13p1313𝑝13\frac{-1}{3}\leq p\leq\frac{1}{3}divide start_ARG - 1 end_ARG start_ARG 3 end_ARG ≤ italic_p ≤ divide start_ARG 1 end_ARG start_ARG 3 end_ARG and is entangled when 13<p113𝑝1\frac{1}{3}<p\leq 1divide start_ARG 1 end_ARG start_ARG 3 end_ARG < italic_p ≤ 1. Table III outlines the specific choices of WBM for the combination of the maximally mixed state with each of the four Bell states. When measured with these corresponding WBMs, the entangled depolarized bell states are conclusively detected, determined by the value of S=(p1)2/4p2𝑆superscript𝑝124superscript𝑝2S=(p-1)^{2}/4-p^{2}italic_S = ( italic_p - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 4 - italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT which is strictly positive for 1p1/31𝑝13-1\leq p\leq 1/3- 1 ≤ italic_p ≤ 1 / 3 and negative for p>1/3𝑝13p>1/3italic_p > 1 / 3.

Table III: WBM for Depolarized Bell States
Depolarized State Pauli Basis WBM
p|Φ+Φ+|+(1p)𝟏/4𝑝ketsuperscriptΦbrasuperscriptΦ1𝑝14p\ket{\Phi^{+}}\bra{\Phi^{+}}+(1-p)\mathbf{1}/4italic_p | start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | + ( 1 - italic_p ) bold_1 / 4 [𝟏+α(XXYY+ZZ)]/4delimited-[]1𝛼𝑋𝑋𝑌𝑌𝑍𝑍4\big{[}\mathbf{1}+\alpha(XX-YY+ZZ)\big{]}/4[ bold_1 + italic_α ( italic_X italic_X - italic_Y italic_Y + italic_Z italic_Z ) ] / 4 {|0101|,|1010|,|Φ+Φ+|,|ΦΦ|}ket01bra01ket10bra10ketsuperscriptΦbrasuperscriptΦketsuperscriptΦbrasuperscriptΦ\{\ket{01}\bra{01},\ket{10}\bra{10},\ket{\Phi^{+}}\bra{\Phi^{+}},\ket{\Phi^{-}% }\bra{\Phi^{-}}\}{ | start_ARG 01 end_ARG ⟩ ⟨ start_ARG 01 end_ARG | , | start_ARG 10 end_ARG ⟩ ⟨ start_ARG 10 end_ARG | , | start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | }
p|Ψ+Ψ+|+(1p)𝟏/4𝑝ketsuperscriptΨbrasuperscriptΨ1𝑝14p\ket{\Psi^{+}}\bra{\Psi^{+}}+(1-p)\mathbf{1}/4italic_p | start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | + ( 1 - italic_p ) bold_1 / 4 [𝟏+α(XX+YYZZ)]/4delimited-[]1𝛼𝑋𝑋𝑌𝑌𝑍𝑍4\big{[}\mathbf{1}+\alpha(XX+YY-ZZ)\big{]}/4[ bold_1 + italic_α ( italic_X italic_X + italic_Y italic_Y - italic_Z italic_Z ) ] / 4 {|0000|,|1111|,|Ψ+Ψ+|,|ΨΨ|}ket00bra00ket11bra11ketsuperscriptΨbrasuperscriptΨketsuperscriptΨbrasuperscriptΨ\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+}},\ket{\Psi^{-}% }\bra{\Psi^{-}}\}{ | start_ARG 00 end_ARG ⟩ ⟨ start_ARG 00 end_ARG | , | start_ARG 11 end_ARG ⟩ ⟨ start_ARG 11 end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | }
p|ΨΨ|+(1p)𝟏/4𝑝ketsuperscriptΨbrasuperscriptΨ1𝑝14p\ket{\Psi^{-}}\bra{\Psi^{-}}+(1-p)\mathbf{1}/4italic_p | start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | + ( 1 - italic_p ) bold_1 / 4 [𝟏+α(XXYYZZ)]/4delimited-[]1𝛼𝑋𝑋𝑌𝑌𝑍𝑍4\big{[}\mathbf{1}+\alpha(-XX-YY-ZZ)\big{]}/4[ bold_1 + italic_α ( - italic_X italic_X - italic_Y italic_Y - italic_Z italic_Z ) ] / 4 {|0000|,|1111|,|Ψ+Ψ+|,|ΨΨ|}ket00bra00ket11bra11ketsuperscriptΨbrasuperscriptΨketsuperscriptΨbrasuperscriptΨ\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+}},\ket{\Psi^{-}% }\bra{\Psi^{-}}\}{ | start_ARG 00 end_ARG ⟩ ⟨ start_ARG 00 end_ARG | , | start_ARG 11 end_ARG ⟩ ⟨ start_ARG 11 end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | }
p|ΦΦ|+(1p)𝟏/4𝑝ketsuperscriptΦbrasuperscriptΦ1𝑝14p\ket{\Phi^{-}}\bra{\Phi^{-}}+(1-p)\mathbf{1}/4italic_p | start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | + ( 1 - italic_p ) bold_1 / 4 [𝟏+α(XX+YY+ZZ)]/4delimited-[]1𝛼𝑋𝑋𝑌𝑌𝑍𝑍4\big{[}\mathbf{1}+\alpha(-XX+YY+ZZ)\big{]}/4[ bold_1 + italic_α ( - italic_X italic_X + italic_Y italic_Y + italic_Z italic_Z ) ] / 4 {|0101|,|1010|,|Φ+Φ+|,|ΦΦ|}ket01bra01ket10bra10ketsuperscriptΦbrasuperscriptΦketsuperscriptΦbrasuperscriptΦ\{\ket{01}\bra{01},\ket{10}\bra{10},\ket{\Phi^{+}}\bra{\Phi^{+}},\ket{\Phi^{-}% }\bra{\Phi^{-}}\}{ | start_ARG 01 end_ARG ⟩ ⟨ start_ARG 01 end_ARG | , | start_ARG 10 end_ARG ⟩ ⟨ start_ARG 10 end_ARG | , | start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | }

III-B Two-qubit Bell diagonal States

Bell diagonal states are a probabilistic mixture of the four Bell states. These states are more general than the ones in (9). Given parameters p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, p3subscript𝑝3p_{3}italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and p4subscript𝑝4p_{4}italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT such that pi0,ipi=1formulae-sequencesubscript𝑝𝑖0subscript𝑖subscript𝑝𝑖1p_{i}\geq 0,\sum_{i}p_{i}=1italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 , ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1, the Bell diagonal state is defined,

ρBell=p1|Φ+Φ+|+p2|Ψ+Ψ+|+p3|ΨΨ|+p4|ΦΦ|.subscript𝜌Bellsubscript𝑝1ketsuperscriptΦbrasuperscriptΦsubscript𝑝2ketsuperscriptΨbrasuperscriptΨsubscript𝑝3ketsuperscriptΨbrasuperscriptΨsubscript𝑝4ketsuperscriptΦbrasuperscriptΦ\rho_{\text{Bell}}=p_{1}\ket{\Phi^{+}}\bra{\Phi^{+}}+p_{2}\ket{\Psi^{+}}\bra{% \Psi^{+}}+p_{3}\ket{\Psi^{-}}\bra{\Psi^{-}}+p_{4}\ket{\Phi^{-}}\bra{\Phi^{-}}.italic_ρ start_POSTSUBSCRIPT Bell end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | + italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | + italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | + italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT | start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | . (10)

The eigenvalues of ρBell2superscriptsubscript𝜌Bellsubscripttop2\rho_{\text{Bell}}^{\top_{2}}italic_ρ start_POSTSUBSCRIPT Bell end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are calculated to be 12p112subscript𝑝1\frac{1}{2}-p_{1}divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 12p212subscript𝑝2\frac{1}{2}-p_{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, 12p312subscript𝑝3\frac{1}{2}-p_{3}divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and 12p412subscript𝑝4\frac{1}{2}-p_{4}divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. Consequently, a Bell diagonal state is entangled if any one of these probabilities exceeds 1/2121/21 / 2, while the sum of the other three probabilities is less than 1/2121/21 / 2. Conversely, a Bell diagonal state is separable if all probabilities are less than or equal to 1/2121/21 / 2. Expressing (10) in the Pauli basis yields,

ρBell=14[𝟏+aXX+bYY+cZZ],subscript𝜌Bell14delimited-[]1𝑎𝑋𝑋𝑏𝑌𝑌𝑐𝑍𝑍\rho_{\text{Bell}}=\frac{1}{4}\left[\mathbf{1}+aXX+bYY+cZZ\right],italic_ρ start_POSTSUBSCRIPT Bell end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 4 end_ARG [ bold_1 + italic_a italic_X italic_X + italic_b italic_Y italic_Y + italic_c italic_Z italic_Z ] ,

where a=p1+p2p3p4𝑎subscript𝑝1subscript𝑝2subscript𝑝3subscript𝑝4a=p_{1}+p_{2}-p_{3}-p_{4}italic_a = italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, b=p1+p2p3+p4𝑏subscript𝑝1subscript𝑝2subscript𝑝3subscript𝑝4b=-p_{1}+p_{2}-p_{3}+p_{4}italic_b = - italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT and c=p1p2p3+p4𝑐subscript𝑝1subscript𝑝2subscript𝑝3subscript𝑝4c=p_{1}-p_{2}-p_{3}+p_{4}italic_c = italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. When ρBellsubscript𝜌Bell\rho_{\text{Bell}}italic_ρ start_POSTSUBSCRIPT Bell end_POSTSUBSCRIPT is entangled, the index for which pi>1/2subscript𝑝𝑖12p_{i}>1/2italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 1 / 2 determines the sign of a,b,𝑎𝑏a,b,italic_a , italic_b , and c𝑐citalic_c, see Table IV. It is notable that the signs of a,b𝑎𝑏a,bitalic_a , italic_b and c𝑐citalic_c follow a similar pattern to the Pauli basis expansion of various Depolarized Bell states listed in Table III. We observe that, for suitable combinations of a,b𝑎𝑏a,bitalic_a , italic_b, and c{+1,1}𝑐11c\in\{+1,-1\}italic_c ∈ { + 1 , - 1 }, the Bell diagonal state reduces to one of the Depolarized Bell states and states can be detected using the same WBMs, as in Table III. Specifically, the value of S𝑆Sitalic_S under the two WBMs in Table IV is equal to (1p1p4)24(p1p4)2superscript1subscript𝑝1subscript𝑝424superscriptsubscript𝑝1subscript𝑝42(1-p_{1}-p_{4})^{2}-4(p_{1}-p_{4})^{2}( 1 - italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and (1p2p3)24(p2p3)2superscript1subscript𝑝2subscript𝑝324superscriptsubscript𝑝2subscript𝑝32(1-p_{2}-p_{3})^{2}-4(p_{2}-p_{3})^{2}( 1 - italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 ( italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, respectively. Depending on the probabilistic mixture, one of the two WBMs will conclusively result in S<0𝑆0S<0italic_S < 0.

Table IV: WBM for Bell Diagonal States
Probabilistic mixture a b c WBM
p1>0.5,p2+p3+p4<0.5formulae-sequencesubscript𝑝10.5subscript𝑝2subscript𝑝3subscript𝑝40.5p_{1}>0.5,\ p_{2}+p_{3}+p_{4}<0.5italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0.5 , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT < 0.5 +++ -- +++ {|0101|,|1010|,|Φ+Φ+|,|ΦΦ|}ket01bra01ket10bra10ketsuperscriptΦbrasuperscriptΦketsuperscriptΦbrasuperscriptΦ\{\ket{01}\bra{01},\ket{10}\bra{10},\ket{\Phi^{+}}\bra{\Phi^{+}},\ket{\Phi^{-}% }\bra{\Phi^{-}}\}{ | start_ARG 01 end_ARG ⟩ ⟨ start_ARG 01 end_ARG | , | start_ARG 10 end_ARG ⟩ ⟨ start_ARG 10 end_ARG | , | start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | }
p2>0.5,p1+p3+p4<0.5formulae-sequencesubscript𝑝20.5subscript𝑝1subscript𝑝3subscript𝑝40.5p_{2}>0.5,\ p_{1}+p_{3}+p_{4}<0.5italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0.5 , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT < 0.5 +++ +++ -- {|0000|,|1111|,|Ψ+Ψ+|,|ΨΨ|}ket00bra00ket11bra11ketsuperscriptΨbrasuperscriptΨketsuperscriptΨbrasuperscriptΨ\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+}},\ket{\Psi^{-}% }\bra{\Psi^{-}}\}{ | start_ARG 00 end_ARG ⟩ ⟨ start_ARG 00 end_ARG | , | start_ARG 11 end_ARG ⟩ ⟨ start_ARG 11 end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | }
p3>0.5,p1+p2+p4<0.5formulae-sequencesubscript𝑝30.5subscript𝑝1subscript𝑝2subscript𝑝40.5p_{3}>0.5,\ p_{1}+p_{2}+p_{4}<0.5italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT > 0.5 , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT < 0.5 -- -- -- {|0000|,|1111|,|Ψ+Ψ+|,|ΨΨ|}ket00bra00ket11bra11ketsuperscriptΨbrasuperscriptΨketsuperscriptΨbrasuperscriptΨ\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+}},\ket{\Psi^{-}% }\bra{\Psi^{-}}\}{ | start_ARG 00 end_ARG ⟩ ⟨ start_ARG 00 end_ARG | , | start_ARG 11 end_ARG ⟩ ⟨ start_ARG 11 end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | }
p4>0.5,p1+p2+p3<0.5formulae-sequencesubscript𝑝40.5subscript𝑝1subscript𝑝2subscript𝑝30.5p_{4}>0.5,\ p_{1}+p_{2}+p_{3}<0.5italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT > 0.5 , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT < 0.5 -- +++ -- {|0101|,|1010|,|Φ+Φ+|,|ΦΦ|}ket01bra01ket10bra10ketsuperscriptΦbrasuperscriptΦketsuperscriptΦbrasuperscriptΦ\{\ket{01}\bra{01},\ket{10}\bra{10},\ket{\Phi^{+}}\bra{\Phi^{+}},\ket{\Phi^{-}% }\bra{\Phi^{-}}\}{ | start_ARG 01 end_ARG ⟩ ⟨ start_ARG 01 end_ARG | , | start_ARG 10 end_ARG ⟩ ⟨ start_ARG 10 end_ARG | , | start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | }

III-C Two-qubit Amplitude Dam** on Depolarized Bell States

A qubit amplitude dam** channel is a source of noise in superconducting circuit-based quantum computing and thus, serves as a realistic channel model for simulating lossy processes in these systems. Mathematically, it can be obtained from an isometry J𝐽Jitalic_J,

J:abc;JJ=𝟏a:𝐽formulae-sequencemaps-tosubscript𝑎tensor-productsubscript𝑏subscript𝑐superscript𝐽𝐽subscript1𝑎J:\mathcal{H}_{a}\mapsto\mathcal{H}_{b}\otimes\mathcal{H}_{c};\ \ J^{\dagger}J% =\mathbf{1}_{a}italic_J : caligraphic_H start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ↦ caligraphic_H start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ⊗ caligraphic_H start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ; italic_J start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT italic_J = bold_1 start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT (11)

where asubscript𝑎\mathcal{H}_{a}caligraphic_H start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT denotes the Hilbert space for the channel’s input, and bsubscript𝑏\mathcal{H}_{b}caligraphic_H start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and csubscript𝑐\mathcal{H}_{c}caligraphic_H start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT represent the Hilbert spaces for the direct and complementary channel outputs, respectively. An isometry of the form,

J1|0asubscript𝐽1subscriptket0𝑎\displaystyle J_{1}\ket{0}_{a}italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | start_ARG 0 end_ARG ⟩ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT =|0b|1c,absentsubscriptket0𝑏subscriptket1𝑐\displaystyle=\ket{0}_{b}\ket{1}_{c},= | start_ARG 0 end_ARG ⟩ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT | start_ARG 1 end_ARG ⟩ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ,
J1|1asubscript𝐽1subscriptket1𝑎\displaystyle J_{1}\ket{1}_{a}italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | start_ARG 1 end_ARG ⟩ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT =1r|1b|1c+r|0b|0c,absent1𝑟subscriptket1𝑏subscriptket1𝑐𝑟subscriptket0𝑏subscriptket0𝑐\displaystyle=\sqrt{1-r}\ket{1}_{b}\ket{1}_{c}+\sqrt{r}\ket{0}_{b}\ket{0}_{c},= square-root start_ARG 1 - italic_r end_ARG | start_ARG 1 end_ARG ⟩ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT | start_ARG 1 end_ARG ⟩ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + square-root start_ARG italic_r end_ARG | start_ARG 0 end_ARG ⟩ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT | start_ARG 0 end_ARG ⟩ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , (12)

where 0r10𝑟10\leq r\leq 10 ≤ italic_r ≤ 1 defines a pair of channels, (A)=Trc(JAJ)𝐴subscripttrace𝑐𝐽𝐴superscript𝐽\mathcal{B}(A)=\Tr_{c}(JAJ^{{\dagger}})caligraphic_B ( italic_A ) = roman_Tr start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_J italic_A italic_J start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) and 𝒞(A)=Trb(JAJ)𝒞𝐴subscripttrace𝑏𝐽𝐴superscript𝐽\mathcal{C}(A)=\Tr_{b}(JAJ^{{\dagger}})caligraphic_C ( italic_A ) = roman_Tr start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_J italic_A italic_J start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ). Here, \mathcal{B}caligraphic_B is an amplitude dam** channel with dam** probability r𝑟ritalic_r for the state |1asubscriptket1𝑎\ket{1}_{a}| start_ARG 1 end_ARG ⟩ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT to decay to output state |0bsubscriptket0𝑏\ket{0}_{b}| start_ARG 0 end_ARG ⟩ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT. The isometry J1=K0|0+K1|1subscript𝐽1tensor-productsubscript𝐾0ket0tensor-productsubscript𝐾1ket1J_{1}=K_{0}\otimes\ket{0}+K_{1}\otimes\ket{1}italic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⊗ | start_ARG 0 end_ARG ⟩ + italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ | start_ARG 1 end_ARG ⟩ where K0subscript𝐾0K_{0}italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and K1subscript𝐾1K_{1}italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (Kraus) dam** operators such that K0=[0,r;0,0]subscript𝐾00𝑟00K_{0}=[0,\sqrt{r};0,0]italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = [ 0 , square-root start_ARG italic_r end_ARG ; 0 , 0 ] and K1=[1,0;0,1r]subscript𝐾11001𝑟K_{1}=[1,0;0,\sqrt{1-r}]italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = [ 1 , 0 ; 0 , square-root start_ARG 1 - italic_r end_ARG ]. For a single qubit represented by state ρ𝜌\rhoitalic_ρ, the amplitude damped output is given by,

(ρ)=K0ρK0+K1ρK1.𝜌subscript𝐾0𝜌superscriptsubscript𝐾0subscript𝐾1𝜌superscriptsubscript𝐾1\mathcal{B}(\rho)=K_{0}\rho K_{0}^{\dagger}+K_{1}\rho K_{1}^{\dagger}.caligraphic_B ( italic_ρ ) = italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_ρ italic_K start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT + italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ρ italic_K start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT . (13)

We can extend (13) for two qubit states with dam** probabilities r𝑟ritalic_r and q𝑞qitalic_q for the first and second qubit respectively. Assuming that r=q𝑟𝑞r=qitalic_r = italic_q, we consider Depolarized bell states (9) with amplitude dam**.

Proposition 5.

For any dam** probability r>0𝑟0r>0italic_r > 0, a Depolarized Bell state with amplitude dam** can not be expressed as a Bell diagonal state (10).

This fact can be readily demonstrated through a straightforward calculation. Consider the Isotropic state ρ(p)=p|Φ+Φ+|+(1p)𝟏4𝜌𝑝𝑝ketsuperscriptΦbrasuperscriptΦ1𝑝14\rho(p)=p\ket{\Phi^{+}}\bra{\Phi^{+}}+(1-p)\frac{\mathbf{1}}{4}italic_ρ ( italic_p ) = italic_p | start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | + ( 1 - italic_p ) divide start_ARG bold_1 end_ARG start_ARG 4 end_ARG, which can be represented by the Bell diagonal state formed with probability distribution (p1,p2,p3,p4)=((3p+1)/4,(1p)/4,(1p)/4,(1p)/4)subscript𝑝1subscript𝑝2subscript𝑝3subscript𝑝43𝑝141𝑝41𝑝41𝑝4(p_{1},p_{2},p_{3},p_{4})=\left((3p+1)/4,(1-p)/4,(1-p)/4,(1-p)/4\right)( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ) = ( ( 3 italic_p + 1 ) / 4 , ( 1 - italic_p ) / 4 , ( 1 - italic_p ) / 4 , ( 1 - italic_p ) / 4 ). In a Bell diagonal state, the diagonal elements corresponding to |0000|ket00bra00\ket{00}\bra{00}| start_ARG 00 end_ARG ⟩ ⟨ start_ARG 00 end_ARG | and |1111|ket11bra11\ket{11}\bra{11}| start_ARG 11 end_ARG ⟩ ⟨ start_ARG 11 end_ARG | are identical. In the case of an amplitude damped Isotropic state, we observe that,

p2=p3=1r4(pprr1).subscript𝑝2subscript𝑝31𝑟4𝑝𝑝𝑟𝑟1p_{2}=p_{3}=\frac{1-r}{4}\left(p-pr-r-1\right).italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = divide start_ARG 1 - italic_r end_ARG start_ARG 4 end_ARG ( italic_p - italic_p italic_r - italic_r - 1 ) .

However, obtaining closed-form expressions for p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and p4subscript𝑝4p_{4}italic_p start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT when r>0𝑟0r>0italic_r > 0 is cumbersome. Specifically, the values on the diagonal corresponding to |0000|ket00bra00\ket{00}\bra{00}| start_ARG 00 end_ARG ⟩ ⟨ start_ARG 00 end_ARG | and |1111|ket11bra11\ket{11}\bra{11}| start_ARG 11 end_ARG ⟩ ⟨ start_ARG 11 end_ARG | is given by p2(r2+1)p14(r+1)2𝑝2superscript𝑟21𝑝14superscript𝑟12\frac{p}{2}(r^{2}+1)-\frac{p-1}{4}(r+1)^{2}divide start_ARG italic_p end_ARG start_ARG 2 end_ARG ( italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) - divide start_ARG italic_p - 1 end_ARG start_ARG 4 end_ARG ( italic_r + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and p2(r1)2p14(r1)2𝑝2superscript𝑟12𝑝14superscript𝑟12\frac{p}{2}(r-1)^{2}-\frac{p-1}{4}(r-1)^{2}divide start_ARG italic_p end_ARG start_ARG 2 end_ARG ( italic_r - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_p - 1 end_ARG start_ARG 4 end_ARG ( italic_r - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, respectively. These expressions are equal only when r=0𝑟0r=0italic_r = 0.

Proposition 6.

For every p[13,1]𝑝131p\in[\frac{1}{3},1]italic_p ∈ [ divide start_ARG 1 end_ARG start_ARG 3 end_ARG , 1 ], there exists r~[0,1]~𝑟01\tilde{r}\subset[0,1]over~ start_ARG italic_r end_ARG ⊂ [ 0 , 1 ] such that an amplitude damped Depolarized Bell state becomes separable.

The PPT criterion asserts that a two-qubit state is entangled if and only if its partial transpose contains atleast one negative eigenvalue. For Bell states that are both amplitude damped and depolarized, we evaluate the eigenvalues and observe that one of them can exhibit either positive or negative values contingent upon the range of r𝑟ritalic_r. Detailed findings are presented in Table V and depicted graphically in Fig. 1(a) and Fig. 1(b). Furthermore, the WBM for amplitude damped and Depolarized Bell states aligns with that of depolarized Bell states, as outlined in Table III.

Table V: The four eigenvalues of amplitude damped Depolarized Bell states
State with |Φ±superscriptketΦplus-or-minus\ket{\Phi}^{\pm}| start_ARG roman_Φ end_ARG ⟩ start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT State with |Ψ±superscriptketΨplus-or-minus\ket{\Psi}^{\pm}| start_ARG roman_Ψ end_ARG ⟩ start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT Sign of eigenvalue
(p+1)(1r2)4𝑝11superscript𝑟24\frac{(p+1)(1-r^{2})}{4}divide start_ARG ( italic_p + 1 ) ( 1 - italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 4 end_ARG (1r)(1+r+ppr)41𝑟1𝑟𝑝𝑝𝑟4\frac{(1-r)(1+r+p-pr)}{4}divide start_ARG ( 1 - italic_r ) ( 1 + italic_r + italic_p - italic_p italic_r ) end_ARG start_ARG 4 end_ARG Always positive
(p+1)(1r)24𝑝1superscript1𝑟24\frac{(p+1)(1-r)^{2}}{4}divide start_ARG ( italic_p + 1 ) ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG (1r)(1+r+ppr)41𝑟1𝑟𝑝𝑝𝑟4\frac{(1-r)(1+r+p-pr)}{4}divide start_ARG ( 1 - italic_r ) ( 1 + italic_r + italic_p - italic_p italic_r ) end_ARG start_ARG 4 end_ARG Always positive
p(r1)2+(r+1)24𝑝superscript𝑟12superscript𝑟124\frac{p(r-1)^{2}+(r+1)^{2}}{4}divide start_ARG italic_p ( italic_r - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_r + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 4 end_ARG r2+1p(1r)2+2p2(1r)2+r24superscript𝑟21𝑝superscript1𝑟22superscript𝑝2superscript1𝑟2superscript𝑟24\frac{r^{2}+1-p(1-r)^{2}+2\sqrt{p^{2}(1-r)^{2}+r^{2}}}{4}divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 - italic_p ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 square-root start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG 4 end_ARG Always positive
r2(p1)+pr+(13p)4superscript𝑟2𝑝1𝑝𝑟13𝑝4\frac{-r^{2}(p-1)+pr+(1-3p)}{4}divide start_ARG - italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_p - 1 ) + italic_p italic_r + ( 1 - 3 italic_p ) end_ARG start_ARG 4 end_ARG r2+1p(1r)22p2(1r)2+r24superscript𝑟21𝑝superscript1𝑟22superscript𝑝2superscript1𝑟2superscript𝑟24\frac{r^{2}+1-p(1-r)^{2}-2\sqrt{p^{2}(1-r)^{2}+r^{2}}}{4}divide start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 - italic_p ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 square-root start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_r ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG 4 end_ARG Positive and Negative
Refer to caption
(a) Range of r𝑟ritalic_r for eigenvalue corresponding to |Φ±ketsuperscriptΦplus-or-minus\ket{\Phi^{\pm}}| start_ARG roman_Φ start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT end_ARG ⟩
Refer to caption
(b) Range of r𝑟ritalic_r for eigenvalue corresponding to |Ψ±ketsuperscriptΨplus-or-minus\ket{\Psi^{\pm}}| start_ARG roman_Ψ start_POSTSUPERSCRIPT ± end_POSTSUPERSCRIPT end_ARG ⟩
Figure 1: A phase diagram representing the region of dam** and depolarizing parameters, r𝑟ritalic_r and p𝑝pitalic_p, respectively, where the damped-depolarized Bell state has negative or positive partial transpose.

IV Stochastic MAB policies for Entanglement Detection

In this section, we discuss stochastic MAB-based algorithms for entanglement detection in parameterized states within \mathcal{F}caligraphic_F, as outlined in Section III. We will use stochastic MAB terminology in alignment with its quantum counterparts, as shown in Table II. We consider a set of K𝐾Kitalic_K unknown arms, denoted by 𝒜={ρ1,ρ2,,ρK}𝒜subscript𝜌1subscript𝜌2subscript𝜌𝐾\mathcal{A}=\{\rho_{1},\rho_{2},\ldots,\rho_{K}\}\in\mathcal{F}caligraphic_A = { italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_ρ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT } ∈ caligraphic_F. To perform measurements on the arms, the learner requires the knowledge of the underlying WBM. Therefore, we assume familiarity with the specific forms of the arms in 𝒜𝒜\mathcal{A}caligraphic_A, as they are detectable under the WBMs 1={|0000|,|1111|,|Ψ+Ψ+|,|ΨΨ|}subscript1ket00bra00ket11bra11ketsuperscriptΨbrasuperscriptΨketsuperscriptΨbrasuperscriptΨ\mathcal{E}_{1}=\{\ket{00}\bra{00},\ket{11}\bra{11},\ket{\Psi^{+}}\bra{\Psi^{+% }},\ket{\Psi^{-}}\bra{\Psi^{-}}\}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { | start_ARG 00 end_ARG ⟩ ⟨ start_ARG 00 end_ARG | , | start_ARG 11 end_ARG ⟩ ⟨ start_ARG 11 end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Ψ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | } or 2={|0101|,|1010|,|Φ+Φ+|,|ΦΦ|}subscript2ket01bra01ket10bra10ketsuperscriptΦbrasuperscriptΦketsuperscriptΦbrasuperscriptΦ\mathcal{E}_{2}=\{\ket{01}\bra{01},\ket{10}\bra{10},\ket{\Phi^{+}}\bra{\Phi^{+% }},\ket{\Phi^{-}}\bra{\Phi^{-}}\}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { | start_ARG 01 end_ARG ⟩ ⟨ start_ARG 01 end_ARG | , | start_ARG 10 end_ARG ⟩ ⟨ start_ARG 10 end_ARG | , | start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | , | start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_ARG | }. Here, 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT correspond to the WBMs of the first two witnesses in Table I, respectively. For example, consider ρi=pi|Φ+Φ+|+(1pi)I4subscript𝜌𝑖subscript𝑝𝑖ketsuperscriptΦbrasuperscriptΦ1subscript𝑝𝑖𝐼4\rho_{i}=p_{i}\ket{\Phi^{+}}\bra{\Phi^{+}}+(1-p_{i})\frac{I}{4}italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ ⟨ start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG | + ( 1 - italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) divide start_ARG italic_I end_ARG start_ARG 4 end_ARG, for all i[K]𝑖delimited-[]𝐾i\in[K]italic_i ∈ [ italic_K ], where pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is unknown. These are isotropic states, which are probabilistic mixtures of the maximally mixed state and the Bell state |Φ+ketsuperscriptΦ\ket{\Phi^{+}}| start_ARG roman_Φ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG ⟩ and can be detected using WBM 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. With this assumption, we describe the template for the MAB problem as follows: In each round t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N,

  • The learner selects an arm i[K]𝑖delimited-[]𝐾i\in[K]italic_i ∈ [ italic_K ].

  • The learner performs a measurement \mathcal{E}caligraphic_E and obtains outcome j𝑗jitalic_j with probability Tr(ρiEj)tracesubscript𝜌𝑖subscript𝐸𝑗\Tr{\rho_{i}E_{j}}roman_Tr ( start_ARG italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ), where j{1,2,3,4}𝑗1234j\in\{1,2,3,4\}italic_j ∈ { 1 , 2 , 3 , 4 }.

  • The learner updates the values of 𝑺^subscriptbold-^𝑺\boldsymbol{\hat{S}}_{\mathcal{E}}overbold_^ start_ARG bold_italic_S end_ARG start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT and identifies the entangled arm(s) or continues.

For a given WBM \mathcal{E}caligraphic_E, the values of Ssubscript𝑆S_{\mathcal{E}}italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT are bounded in [1,1]11[1,-1][ 1 , - 1 ]. We can use concentration inequalities applicable to 1-subgaussian random variables. We apply the law of iterated logarithm [42] for a finite sum of 1-subgaussian random variables:

Lemma 7.

Let X1,X2,Xtsubscript𝑋1subscript𝑋2subscript𝑋𝑡X_{1},X_{2},\ldots X_{t}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be i.i.d. sub-gaussian random variables with scale parameter σ=1𝜎1\sigma=1italic_σ = 1. For any ε(0,1)𝜀01\varepsilon\in(0,1)italic_ε ∈ ( 0 , 1 ), δ(0,log(1+ε)e)𝛿01𝜀𝑒\delta\in\left(0,\frac{\log(1+\varepsilon)}{e}\right)italic_δ ∈ ( 0 , divide start_ARG roman_log ( start_ARG 1 + italic_ε end_ARG ) end_ARG start_ARG italic_e end_ARG ), one has with probability at least 1cεδ(1+ε)1subscript𝑐𝜀superscript𝛿1𝜀1-c_{\varepsilon}\delta^{(1+\varepsilon)}1 - italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT ( 1 + italic_ε ) end_POSTSUPERSCRIPT for all t1𝑡1t\geq 1italic_t ≥ 1,

1ts=1tXsU(t,δ),1𝑡superscriptsubscript𝑠1𝑡subscript𝑋𝑠𝑈𝑡𝛿\frac{1}{t}\sum_{s=1}^{t}X_{s}\leq U(t,\delta),divide start_ARG 1 end_ARG start_ARG italic_t end_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≤ italic_U ( italic_t , italic_δ ) , (14)

where U(t,δ)=(1+ε)2(1+ε)tlog(log((1+ε)t)δ)𝑈𝑡𝛿1𝜀21𝜀𝑡1𝜀𝑡𝛿U(t,\delta)=(1+\sqrt{\varepsilon})\sqrt{\frac{2(1+\varepsilon)}{t}\log\left(% \frac{\log\left((1+\varepsilon)t\right)}{\delta}\right)}italic_U ( italic_t , italic_δ ) = ( 1 + square-root start_ARG italic_ε end_ARG ) square-root start_ARG divide start_ARG 2 ( 1 + italic_ε ) end_ARG start_ARG italic_t end_ARG roman_log ( divide start_ARG roman_log ( ( 1 + italic_ε ) italic_t ) end_ARG start_ARG italic_δ end_ARG ) end_ARG is the confidence width and cε=2+εε(1log(1+ε))1+εsubscript𝑐𝜀2𝜀𝜀superscript11𝜀1𝜀c_{\varepsilon}=\frac{2+\varepsilon}{\varepsilon}\left(\frac{1}{\log(1+% \varepsilon)}\right)^{1+\varepsilon}italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = divide start_ARG 2 + italic_ε end_ARG start_ARG italic_ε end_ARG ( divide start_ARG 1 end_ARG start_ARG roman_log ( start_ARG 1 + italic_ε end_ARG ) end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT.

Proof.

Readers can refer in [42, Lemma 1]. ∎

In the subsequent sections, we discuss two MAB policies: successive elimination for scenarios where there is a promise of one entangled arm among K𝐾Kitalic_K arms, and the HDoC policy for cases where there are m𝑚mitalic_m entangled arms among K𝐾Kitalic_K arms, with m𝑚mitalic_m being unknown.

IV-A Modified Successive Elimination Algorithm

We consider the (1,K)1𝐾(1,K)( 1 , italic_K )-quantum MAB problem and characterise the expected stop** time for a modified version of the Successive Elimination algorithm [37] outlined as Algorithm 1. We are presented with K𝐾Kitalic_K arms such that S(ρ1)S(ρ2)S(ρ3)>S(ρK1)>0>S(ρK)subscript𝑆subscript𝜌1subscript𝑆subscript𝜌2subscript𝑆subscript𝜌3subscript𝑆subscript𝜌𝐾10subscript𝑆subscript𝜌𝐾S_{\mathcal{E}}(\rho_{1})\geq S_{\mathcal{E}}(\rho_{2})\geq S_{\mathcal{E}}(% \rho_{3})\ldots>S_{\mathcal{E}}(\rho_{K-1})>0>S_{\mathcal{E}}(\rho_{K})italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≥ italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≥ italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) … > italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT ) > 0 > italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ). The algorithm takes the set of arms [K]delimited-[]𝐾[K][ italic_K ], the threshold value 00, and the error probability δ𝛿\deltaitalic_δ as input and outputs the arm i=argmini[K]S(ρi)superscript𝑖subscript𝑖delimited-[]𝐾subscript𝑆subscript𝜌𝑖i^{\star}=\arg\min_{i\in[K]}S_{\mathcal{E}}(\rho_{i})italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_i ∈ [ italic_K ] end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Let Ni(t)subscript𝑁𝑖𝑡N_{i}(t)italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) denote the number of times arm i𝑖iitalic_i has been sampled in t𝑡titalic_t rounds and S^i,Ni(t)subscript^𝑆𝑖subscript𝑁𝑖𝑡\hat{S}_{i,N_{i}(t)}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT is the estimate of S(ρi)subscript𝑆subscript𝜌𝑖S_{\mathcal{E}}(\rho_{i})italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) obtained on pulling arm i𝑖iitalic_i until time t𝑡titalic_t. The algorithm maintains an active set ΩΩ\Omegaroman_Ω and samples every arm in it. Subsequently, the estimates and Lower Confidence Bound (LCB) for the active arms are updated. In order to identify isuperscript𝑖i^{\star}italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT, the policy eliminates arms whose LCB exceeds the threshold and halts when only one arm remains in the active set.

Algorithm 1 Modified Successive Elimination Algorithm
0:  threshold ζ=0𝜁0\zeta=0italic_ζ = 0, acceptance error rate δ𝛿\deltaitalic_δ, arms 𝒜[K]𝒜delimited-[]𝐾\mathcal{A}\leftarrow[K]caligraphic_A ← [ italic_K ]
0:  ΩΩ\Omegaroman_Ω
  Active set Ω[K]Ωdelimited-[]𝐾\Omega\leftarrow[K]roman_Ω ← [ italic_K ]
  S^i,Ni(t)=0,iΩformulae-sequencesubscript^𝑆𝑖subscript𝑁𝑖𝑡0for-all𝑖Ω\hat{S}_{i,N_{i}(t)}=0,\ \forall i\in\Omegaover^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT = 0 , ∀ italic_i ∈ roman_Ω
  for t=1,2,3,𝑡123t=1,2,3,\ldotsitalic_t = 1 , 2 , 3 , … do
     Sample every arm iΩ𝑖Ωi\in\Omegaitalic_i ∈ roman_Ω
     Update confidence width U(Ni(t),δcεK)(1+ε)2(1+ε)Ni(t)log(cεKlog((1+ε)Ni(t))δ)𝑈subscript𝑁𝑖𝑡𝛿subscript𝑐𝜀𝐾1𝜀21𝜀subscript𝑁𝑖𝑡subscript𝑐𝜀𝐾1𝜀subscript𝑁𝑖𝑡𝛿U\left(N_{i}(t),\frac{\delta}{c_{\varepsilon}K}\right)\leftarrow(1+\sqrt{% \varepsilon})\sqrt{\frac{2(1+\varepsilon)}{N_{i}(t)}\log\left(\frac{c_{% \varepsilon}K\log\left((1+\varepsilon)N_{i}(t)\right)}{\delta}\right)}italic_U ( italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) ← ( 1 + square-root start_ARG italic_ε end_ARG ) square-root start_ARG divide start_ARG 2 ( 1 + italic_ε ) end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG roman_log ( divide start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K roman_log ( ( 1 + italic_ε ) italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ) end_ARG start_ARG italic_δ end_ARG ) end_ARG
     Update S^i,Ni(t),LCBi(t)S^i,Ni(t)U(Ni(t),δcεK)subscript^𝑆𝑖subscript𝑁𝑖𝑡subscriptLCB𝑖𝑡subscript^𝑆𝑖subscript𝑁𝑖𝑡𝑈subscript𝑁𝑖𝑡𝛿subscript𝑐𝜀𝐾\hat{S}_{i,N_{i}(t)},\text{LCB}_{i}(t)\leftarrow\hat{S}_{i,N_{i}(t)}-U\left(N_% {i}(t),\frac{\delta}{c_{\varepsilon}K}\right)over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT , LCB start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ← over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - italic_U ( italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG )
     if LCBi(t)>0,iΩformulae-sequencesubscriptLCB𝑖𝑡0𝑖Ω\text{LCB}_{i}(t)>0,i\in\OmegaLCB start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) > 0 , italic_i ∈ roman_Ω then
        ΩΩ{i}ΩΩ𝑖\Omega\leftarrow\Omega-\{i\}roman_Ω ← roman_Ω - { italic_i }
     end if
     if |Ω|=1Ω1|\Omega|=1| roman_Ω | = 1 then
        Return ΩΩ\Omegaroman_Ω
     end if
  end for
Lemma 8.

Algorithm 1 is δ𝛿\deltaitalic_δ-PC.

Proof.

The proof is presented in Appendix VII-A1. ∎

The correctness of Algorithm 1 and the sample complexity of identifying the entangled arm is presented below.

Theorem 9.

With probability at least 1δ1𝛿1-\delta1 - italic_δ, the arm i=K=argmini[K]S(ρi)superscript𝑖𝐾subscript𝑖delimited-[]𝐾subscript𝑆subscript𝜌𝑖i^{\star}=K=\arg\min_{i\in[K]}S_{\mathcal{E}}(\rho_{i})italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_K = roman_arg roman_min start_POSTSUBSCRIPT italic_i ∈ [ italic_K ] end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) remains in the active set ΩΩ\Omegaroman_Ω till termination.

Proof.

The proof is presented in Appendix VII-A2. ∎

Theorem 10.

Algorithm 1 successfully identifies the arm isuperscript𝑖i^{\star}italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT with probability 1δ1𝛿1-\delta1 - italic_δ and will terminate after i[K]𝒪(Δi2log(KlogΔi2δ))subscript𝑖delimited-[]𝐾𝒪superscriptsubscriptΔ𝑖2𝐾superscriptsubscriptΔ𝑖2𝛿\sum_{i\in[K]}\mathcal{O}\left(\Delta_{i}^{-2}\log\left(\frac{K\log\Delta_{i}^% {-2}}{\delta}\right)\right)∑ start_POSTSUBSCRIPT italic_i ∈ [ italic_K ] end_POSTSUBSCRIPT caligraphic_O ( roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_log ( divide start_ARG italic_K roman_log roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG ) ) samples where Δi=|S(ρi)ζ|subscriptΔ𝑖subscript𝑆subscript𝜌𝑖𝜁\Delta_{i}=|S_{\mathcal{E}}(\rho_{i})-\zeta|roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_ζ | is the sub-optimal gap with respect to the threshold ζ𝜁\zetaitalic_ζ.

Proof.

The proof is presented in Appendix VII-A3. ∎

We see that the sample complexity achieved in Theorem 10 is within a log(K)𝐾\log(K)roman_log ( start_ARG italic_K end_ARG ) factor of the optimum as proven in Theorem 1 in [42]. Thus, given a (1,K)1𝐾(1,K)( 1 , italic_K )-quantum MAB framework prescribed by (𝒜,)𝒜(\mathcal{A},\mathcal{E})( caligraphic_A , caligraphic_E ), we use the recipe provided in Algorithm 1 to identify the entangled arm.

IV-B Modified lil’HDoC Algorithm

The lil’HDoC algorithm introduced in [44], is a variant of the HDoC algorithm proposed by [27]. This algorithm employs a novel approach by integrating the sampling rule based on the UCB algorithm for regret minimization, as detailed in [25] with an identification rule based on the confidence bound outlined in Lemma 7. In contrast to the LCB-based identification rule utilized in the HDoC algorithm [38], the integration of the LIL-based concentration in lil’HDoC presents a notable enhancement in sample complexity. This improvement stems from the observation that the LIL bound loglogtt𝑡𝑡\sqrt{\frac{\log\log t}{t}}square-root start_ARG divide start_ARG roman_log roman_log italic_t end_ARG start_ARG italic_t end_ARG end_ARG exhibits a higher growth rate compared to the LCB bound logtt𝑡𝑡\sqrt{\frac{\log t}{t}}square-root start_ARG divide start_ARG roman_log italic_t end_ARG start_ARG italic_t end_ARG end_ARG, consequently leading to a reduction in the required number of samples. In other words, there exists a value T𝑇Titalic_T such that for all t>T𝑡𝑇t>Titalic_t > italic_T, c1,c2+subscript𝑐1subscript𝑐2superscriptc_{1},c_{2}\in\mathbb{R}^{+}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT,

c1logtt>c2loglogtt.subscript𝑐1𝑡𝑡subscript𝑐2𝑡𝑡c_{1}\sqrt{\frac{\log t}{t}}>c_{2}\sqrt{\frac{\log\log t}{t}}.italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG roman_log italic_t end_ARG start_ARG italic_t end_ARG end_ARG > italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG roman_log roman_log italic_t end_ARG start_ARG italic_t end_ARG end_ARG .

Consequently, by ensuring that each arm is sampled at least T𝑇Titalic_T times initially, lil’HDoC not only accelerates the pace at which its confidence bound grows but also attains adequate confidence in identifying the good arms. We have that the confidence bound of HDoC α(t)=ln(4Kt2δ)2t𝛼𝑡4𝐾superscript𝑡2𝛿2𝑡\alpha(t)=\sqrt{\frac{\ln(\frac{4Kt^{2}}{\delta})}{2t}}italic_α ( italic_t ) = square-root start_ARG divide start_ARG roman_ln ( start_ARG divide start_ARG 4 italic_K italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG end_ARG ) end_ARG start_ARG 2 italic_t end_ARG end_ARG. Through straightforward calculations, we see that the smallest integer T𝑇Titalic_T such that the confidence bound of lil’HDoC U(T,δcεK)𝑈𝑇𝛿subscript𝑐𝜀𝐾U\left(T,\frac{\delta}{c_{\varepsilon}K}\right)italic_U ( italic_T , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) grows faster than α(T)𝛼𝑇\alpha(T)italic_α ( italic_T ) is,

T14log(K+1)log(max(1δ,2))cε3/2.𝑇14𝐾11𝛿2superscriptsubscript𝑐𝜀32T\geq\frac{1}{4}\log(K+1)\log\left(\max\left(\frac{1}{\delta},2\right)\right)c% _{\varepsilon}^{3/2}.italic_T ≥ divide start_ARG 1 end_ARG start_ARG 4 end_ARG roman_log ( start_ARG italic_K + 1 end_ARG ) roman_log ( roman_max ( divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG , 2 ) ) italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT . (15)

Thus, if each arm is initially sampled T𝑇Titalic_T times, lil’HDoC achieves comparable identification capabilities to HDoC and possesses a sample complexity of 𝒪(log(K+1)log(max(1δ,2)))𝒪𝐾11𝛿2\mathcal{O}\left(\log(K+1)\log\left(\max\left(\frac{1}{\delta},2\right)\right)\right)caligraphic_O ( roman_log ( start_ARG italic_K + 1 end_ARG ) roman_log ( roman_max ( divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG , 2 ) ) ) samples on each arm. Now, let us map the lil’HDoC algorithm outlined as Algorithm 2 onto the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum MAB problem and characterise the expected stop** time.

Consider K𝐾Kitalic_K arms such that S(ρ1)S(ρ2)>S(ρKm)>0>S(ρKm+1)>S(ρK)subscript𝑆subscript𝜌1subscript𝑆subscript𝜌2subscript𝑆subscript𝜌𝐾𝑚0subscript𝑆subscript𝜌𝐾𝑚1subscript𝑆subscript𝜌𝐾S_{\mathcal{E}}(\rho_{1})\geq S_{\mathcal{E}}(\rho_{2})\ldots>S_{\mathcal{E}}(% \rho_{K-m})>0>S_{\mathcal{E}}(\rho_{K-m+1})\ldots>S_{\mathcal{E}}(\rho_{K})italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≥ italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) … > italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K - italic_m end_POSTSUBSCRIPT ) > 0 > italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K - italic_m + 1 end_POSTSUBSCRIPT ) … > italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ), with m𝑚mitalic_m being unknown. The algorithm takes the set of arms [K]delimited-[]𝐾[K][ italic_K ], the threshold value 00, and the error probability δ𝛿\deltaitalic_δ as input and outputs the set of arms Ω={i[K]such thatS(ρi)<0}Ω𝑖delimited-[]𝐾such thatsubscript𝑆subscript𝜌𝑖0\Omega=\{i\in[K]\ \text{such that}\ S_{\mathcal{E}}(\rho_{i})<0\}roman_Ω = { italic_i ∈ [ italic_K ] such that italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < 0 }. Firstly, every arm is sampled for a minimum of T𝑇Titalic_T times (15). While the arm set 𝒜𝒜\mathcal{A}\neq\emptysetcaligraphic_A ≠ ∅, the algorithm keeps track of the active arms and employs the sampling rule and identification rule explained earlier.

Algorithm 2 lil’HdoC
0:  threshold ζ=0𝜁0\zeta=0italic_ζ = 0, acceptance error rate δ𝛿\deltaitalic_δ, arms 𝒜[K]𝒜delimited-[]𝐾\mathcal{A}\leftarrow[K]caligraphic_A ← [ italic_K ]
0:  𝒜entsubscript𝒜ent\mathcal{A}_{\text{ent}}caligraphic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT
  t0,𝒜entformulae-sequence𝑡0subscript𝒜entt\leftarrow 0,\mathcal{A}_{\text{ent}}\leftarrow\emptysetitalic_t ← 0 , caligraphic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT ← ∅
  for each arm i𝒜𝑖𝒜i\in\mathcal{A}italic_i ∈ caligraphic_A do
     Pull arm i𝑖iitalic_i for T𝑇Titalic_T times
     Ni(t)Tsubscript𝑁𝑖𝑡𝑇N_{i}(t)\leftarrow Titalic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ← italic_T
  end for
  while 𝒜𝒜\mathcal{A}\neq\emptysetcaligraphic_A ≠ ∅ do
     Pull arm ht=argmaxi𝒜S^i,Ni(t)+logt2Ni(t)subscript𝑡subscript𝑖𝒜subscript^𝑆𝑖subscript𝑁𝑖𝑡𝑡2subscript𝑁𝑖𝑡h_{t}=\arg\max_{i\in\mathcal{A}}\ \hat{S}_{i,N_{i}(t)}+\sqrt{\frac{\log t}{2N_% {i}(t)}}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_i ∈ caligraphic_A end_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT + square-root start_ARG divide start_ARG roman_log italic_t end_ARG start_ARG 2 italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG end_ARG
     if S^ht,Nht(t)U(Nht(t),δcεK)ζsubscript^𝑆subscript𝑡subscript𝑁subscript𝑡𝑡𝑈subscript𝑁subscript𝑡𝑡𝛿subscript𝑐𝜀𝐾𝜁\hat{S}_{h_{t},N_{h_{t}}(t)}-U\left(N_{h_{t}}(t),\frac{\delta}{c_{\varepsilon}% K}\right)\geq\zetaover^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - italic_U ( italic_N start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) ≥ italic_ζ then
        Remove htsubscript𝑡h_{t}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from 𝒜𝒜\mathcal{A}caligraphic_A
     else if S^ht,Nht(t)+U(Nht(t),δcεK)<ζsubscript^𝑆subscript𝑡subscript𝑁subscript𝑡𝑡𝑈subscript𝑁subscript𝑡𝑡𝛿subscript𝑐𝜀𝐾𝜁\hat{S}_{h_{t},N_{h_{t}}(t)}+U\left(N_{h_{t}}(t),\frac{\delta}{c_{\varepsilon}% K}\right)<\zetaover^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT + italic_U ( italic_N start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) < italic_ζ then
        Add htsubscript𝑡h_{t}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to 𝒜entsubscript𝒜ent\mathcal{A}_{\text{ent}}caligraphic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT
        Remove htsubscript𝑡h_{t}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from 𝒜𝒜\mathcal{A}caligraphic_A
     end if
  end while

To demonstrate the correctness of Algorithm 2, we first show that the algorithm is (λ,δ)𝜆𝛿(\lambda,\delta)( italic_λ , italic_δ )-PAC for all λ[K]𝜆delimited-[]𝐾\lambda\in[K]italic_λ ∈ [ italic_K ] and then characterise the sample complexity of identifying m𝑚mitalic_m bad arms (entangled states).

Lemma 11.

Algorithm 2 is δ𝛿\deltaitalic_δ-PAC.

Proof.

The proof is presented in Appendix VII-B1. ∎

Theorem 12.

With probability at least 1δ1𝛿1-\delta1 - italic_δ, the algorithm identifies all the arms in 𝒜entsubscript𝒜ent\mathcal{A}_{\text{ent}}caligraphic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT.

Proof.

The proof is presented in Appendix VII-B2. ∎

With T=1𝑇1T=1italic_T = 1 in (15), it can be seen from Theorem 10 that the number of samples required to identify an entangled arm i[K]𝑖delimited-[]𝐾i\in[K]italic_i ∈ [ italic_K ] is 𝒪(Δi2log(KlogΔi2δ))𝒪superscriptsubscriptΔ𝑖2𝐾superscriptsubscriptΔ𝑖2𝛿\mathcal{O}\left(\Delta_{i}^{-2}\log\left(\frac{K\log\Delta_{i}^{-2}}{\delta}% \right)\right)caligraphic_O ( roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT roman_log ( divide start_ARG italic_K roman_log roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG ) ). However, in practice, T𝑇Titalic_T is chosen to be larger than 1, and the total sample complexity is expressed in terms of Δ=mini[K]ΔiΔsubscript𝑖delimited-[]𝐾subscriptΔ𝑖\Delta=\min_{i\in[K]}\Delta_{i}roman_Δ = roman_min start_POSTSUBSCRIPT italic_i ∈ [ italic_K ] end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Theorem 13.

With probability 1δ1𝛿1-\delta1 - italic_δ and T𝑇Titalic_T as given in (15), the total sample complexity of Algorithm 2 is 𝒪(Δ2(Klog1δ+KlogK+Kloglog1Δ))+𝒪(Klog(K+1)log(max(1δ,e)))𝒪superscriptΔ2𝐾1𝛿𝐾𝐾𝐾1Δ𝒪𝐾𝐾11𝛿𝑒\mathcal{O}\left(\Delta^{-2}\left(K\log\frac{1}{\delta}+K\log K+K\log\log\frac% {1}{\Delta}\right)\right)+\mathcal{O}\left(K\log(K+1)\log\left(\max\left(\frac% {1}{\delta},e\right)\right)\right)caligraphic_O ( roman_Δ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ( italic_K roman_log divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG + italic_K roman_log italic_K + italic_K roman_log roman_log divide start_ARG 1 end_ARG start_ARG roman_Δ end_ARG ) ) + caligraphic_O ( italic_K roman_log ( start_ARG italic_K + 1 end_ARG ) roman_log ( roman_max ( divide start_ARG 1 end_ARG start_ARG italic_δ end_ARG , italic_e ) ) ).

Proof.

The first term in the sample complexity is derived in Appendix VII-A3 and the second term follows from (15). ∎

V Workflow for Entanglement Detection

In this section, we present a workflow for entanglement detection in scenarios where the arms in 𝒜𝒜\mathcal{A}caligraphic_A are detectable under distinct WBMs. In this routine, we suitably utilize the stochastic MAB policies discussed in the previous section. Specifically, we relax the assumption that the learner must have prior knowledge of the specific WBM, thereby enabling the sequential adaptation of WBMs through suitable unitary transformations. We evaluate the performance of this methodology on Depolarized Bell states and arbitrary quantum states. In particular, we select K𝐾Kitalic_K states, with m𝑚mitalic_m of them being entangled, and investigate the numerical results of the (m,K)𝑚𝐾(m,K)( italic_m , italic_K ) quantum MAB problems.

V-A Entanglement Detection in Depolarised Bell states

We present numerical results on the sample complexity of entanglement detection for the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum MAB problem, specifically addressing Depolarized Bell states. These states are known to be detectable under the witnesses 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, outlined in Table I. The procedure for entanglement detection is detailed in Algorithm 3. The algorithm operates with an input threshold of ζ=0𝜁0\zeta=0italic_ζ = 0, an accepted error rate δ𝛿\deltaitalic_δ, a set of K𝐾Kitalic_K Depolarized Bell states—of which m𝑚mitalic_m are entangled—and two WBMs. The sequence of WBMs in Algorithm 3 follows the order 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and then 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. It is important to note that the sequence in which the WBMs are selected is arbitrary, as the algorithm does not involve state estimation during the process. We note that for the (1,K)1𝐾(1,K)( 1 , italic_K )-quantum MAB problem, there is a promise that one arm is entangled so the value of m=1𝑚1m=1italic_m = 1 is known to the policy. Let us consider the following two experiments for K=5𝐾5K=5italic_K = 5 arms.

  • In the first experiment, we generate five isotropic states (defined below (9)), such that exactly one of them is entangled and can be detected under WBM 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. As described earlier, we randomly generate the values of p𝑝pitalic_p and under WBM 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we compute 𝑺1=[0.3329,0.0577,0.3110,0.1870,0.2401]subscript𝑺subscript10.33290.05770.31100.18700.2401\boldsymbol{S}_{\mathcal{E}_{1}}=\left[0.3329,0.0577,0.3110,0.1870,-0.2401\right]bold_italic_S start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ 0.3329 , 0.0577 , 0.3110 , 0.1870 , - 0.2401 ]. For δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), Algorithm 2 is iterated over 500 runs with WBM 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, confidence width U(t,δ)=log(4Kt2δ)2t𝑈𝑡𝛿4𝐾superscript𝑡2𝛿2𝑡U(t,\delta)=\sqrt{\frac{\log(\frac{4Kt^{2}}{\delta})}{2t}}italic_U ( italic_t , italic_δ ) = square-root start_ARG divide start_ARG roman_log ( start_ARG divide start_ARG 4 italic_K italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG end_ARG ) end_ARG start_ARG 2 italic_t end_ARG end_ARG and m=1𝑚1m=1italic_m = 1.

  • In the second experiment, we generate five depolarized Bell states formed with any of the Bell states. We randomly generate the values of p𝑝pitalic_p such that one of these states is entangled. Under WBM 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we get 𝑺1=[0.3333,0.2138,0.3252,0.1484,0.4706]subscript𝑺subscript10.33330.21380.32520.14840.4706\boldsymbol{S}_{\mathcal{E}_{1}}=\left[0.3333,0.2138,0.3252,0.1484,0.4706\right]bold_italic_S start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ 0.3333 , 0.2138 , 0.3252 , 0.1484 , 0.4706 ] and 𝑺2=[0.1547,0.2839,0.1484,0.3252,0.0398]subscript𝑺subscript20.15470.28390.14840.32520.0398\boldsymbol{S}_{\mathcal{E}_{2}}=\left[0.1547,0.2839,0.1484,0.3252,-0.0398\right]bold_italic_S start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ 0.1547 , 0.2839 , 0.1484 , 0.3252 , - 0.0398 ]. Here, the WBM is unknown to the learner. Since there is a promise (m=1𝑚1m=1italic_m = 1) that one arm is entangled, the learner should measure with at least one of the two WBMs. For δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), Algorithm 3 is iterated over 500500500500 runs. For both these experiments, we plot the average number of samples until stoppage on the y-axis and δ𝛿\deltaitalic_δ on the x-axis as shown in Fig. 2.

Refer to caption
Figure 2: Average number of samples v/s δ𝛿\deltaitalic_δ for the (1,K)1𝐾(1,K)( 1 , italic_K )-quantum MAB problem
Algorithm 3 (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum MAB policy for states in \mathcal{F}caligraphic_F
0:  threshold ζ=0𝜁0\zeta=0italic_ζ = 0, acceptance error rates δ𝛿\deltaitalic_δ, arms 𝒜[K]𝒜delimited-[]𝐾\mathcal{A}\leftarrow[K]caligraphic_A ← [ italic_K ], WBMs {1,2}subscript1subscript2\{\mathcal{E}_{1},\mathcal{E}_{2}\}{ caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }
0:  Aentsubscript𝐴entA_{\text{ent}}italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT
  With 1subscript1\mathcal{E}\leftarrow\mathcal{E}_{1}caligraphic_E ← caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, run Algorithm 2 for K𝐾Kitalic_K arms with U(t,δ)log(4Kt2δ)2t𝑈𝑡𝛿4𝐾superscript𝑡2𝛿2𝑡U(t,\delta)\leftarrow\sqrt{\frac{\log(\frac{4Kt^{2}}{\delta})}{2t}}italic_U ( italic_t , italic_δ ) ← square-root start_ARG divide start_ARG roman_log ( start_ARG divide start_ARG 4 italic_K italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG end_ARG ) end_ARG start_ARG 2 italic_t end_ARG end_ARG and return stop** time τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and entangled arms Aent(1)superscriptsubscript𝐴ent1A_{\text{ent}}^{(1)}italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT
  if (|Aent(1)|==1|A_{\text{ent}}^{(1)}|==1| italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT | = = 1 and m==1m==1italic_m = = 1) or (|Aent(1)|==K|A_{\text{ent}}^{(1)}|==K| italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT | = = italic_Kthen
     Aent(2)superscriptsubscript𝐴ent2A_{\text{ent}}^{(2)}\leftarrow\emptysetitalic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ← ∅
  else if |Aent(1)|<Ksuperscriptsubscript𝐴ent1𝐾|A_{\text{ent}}^{(1)}|<K| italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT | < italic_K then
     With 2subscript2\mathcal{E}\leftarrow\mathcal{E}_{2}caligraphic_E ← caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, run Algorithm 2 for K|Aent(1)|𝐾superscriptsubscript𝐴ent1K-|A_{\text{ent}}^{(1)}|italic_K - | italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT | arms with U(t,δ)log(4Kt2δ)2t𝑈𝑡𝛿4𝐾superscript𝑡2𝛿2𝑡U(t,\delta)\leftarrow\sqrt{\frac{\log(\frac{4Kt^{2}}{\delta})}{2t}}italic_U ( italic_t , italic_δ ) ← square-root start_ARG divide start_ARG roman_log ( start_ARG divide start_ARG 4 italic_K italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG end_ARG ) end_ARG start_ARG 2 italic_t end_ARG end_ARG and return stop** time τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and entangled arms Aent(2)superscriptsubscript𝐴ent2A_{\text{ent}}^{(2)}italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT
  end if
  AentAent(1)+Aent(2)subscript𝐴entsuperscriptsubscript𝐴ent1superscriptsubscript𝐴ent2A_{\text{ent}}\leftarrow A_{\text{ent}}^{(1)}+A_{\text{ent}}^{(2)}italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT ← italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT + italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT

We present numerical results on the sample complexity of entanglement detection for the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum MAB problem for Depolarized Bell states, with m𝑚mitalic_m being unknown to the policy. We consider the following two experiments with K=5𝐾5K=5italic_K = 5 arms.

  • In the first experiment, we generate five isotropic states as described earlier. Under WBM 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, we get that 𝑺1=[0.0391,0.0664,0.5177,0.8978,0.0616]subscript𝑺subscript10.03910.06640.51770.89780.0616\boldsymbol{S}_{\mathcal{E}_{1}}=\left[0.0391,0.0664,-0.5177,-0.8978,-0.0616\right]bold_italic_S start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ 0.0391 , 0.0664 , - 0.5177 , - 0.8978 , - 0.0616 ]. For δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), Algorithm 2 is iterated over 500 runs with WBM 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Here, m=3𝑚3m=3italic_m = 3 and is unknown to the policy.

  • In the second experiment, we generate five depolarized Bell states formed with any of the Bell states. Under WBM 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, the parameters are 𝑺1=[0.4598,0.3191,0.3694,0.5965,0.9670]subscript𝑺subscript10.45980.31910.36940.59650.9670\boldsymbol{S}_{\mathcal{E}_{1}}=\left[0.4598,0.3191,0.3694,0.5965,0.9670\right]bold_italic_S start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ 0.4598 , 0.3191 , 0.3694 , 0.5965 , 0.9670 ] and 𝑺2=[0.0233,0.1724,0.1073,0.2449,0.9344]subscript𝑺subscript20.02330.17240.10730.24490.9344\boldsymbol{S}_{\mathcal{E}_{2}}=\left[-0.0233,0.1724,0.1073,-0.2449,-0.9344\right]bold_italic_S start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ - 0.0233 , 0.1724 , 0.1073 , - 0.2449 , - 0.9344 ] respectively. Although the states are detectable under WBM 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, it is unknown to the learner. Thus, we need to run at least one iteration of Algorithm 2. In the first iteration, the inputs are K𝐾Kitalic_K arms and WBM 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (or 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), and the policy returns m~<K~𝑚𝐾\tilde{m}<Kover~ start_ARG italic_m end_ARG < italic_K entangled arms. In the second iteration, Algorithm 2 is executed with Km~𝐾~𝑚K-\tilde{m}italic_K - over~ start_ARG italic_m end_ARG arms and WBM 2subscript2\mathcal{E}_{2}caligraphic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (or 1subscript1\mathcal{E}_{1}caligraphic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) as the inputs. This routine is summarised in Algorithm 3 and iterated for 500 runs. We plot the average number of samples until stoppage on the y-axis and δ𝛿\deltaitalic_δ on the x-axis, as shown in Fig. 3.

For the instances considered above, the sample complexity scales with m𝑚mitalic_m. It is noteworthy that when sub-optimal gaps are very small, the sample complexity increases significantly and may not scale with m𝑚mitalic_m. Since we iterate the bandit policy at most once, the worst-case sample complexity for entanglement detection in depolarized Bell states scales by a factor of two.

Refer to caption
Figure 3: Average stop** time v/s δ𝛿\deltaitalic_δ for the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum MAB problem

V-B Entanglement Detection in Arbitrary Quantum States

In this section, we present a routine for detecting entanglement in arbitrary quantum states and provide numerical results for the (1,K)1𝐾(1,K)( 1 , italic_K )-quantum Multi-Armed Bandit (MAB) problem. To generate random density matrices, we follow the method described in [46]. Specifically, we start by generating a complex matrix A4×4𝐴superscript44A\in\mathbb{C}^{4\times 4}italic_A ∈ blackboard_C start_POSTSUPERSCRIPT 4 × 4 end_POSTSUPERSCRIPT, where the real and imaginary parts of each element are independently sampled from a normal distribution. We then compute the density matrix ρ𝜌\rhoitalic_ρ by normalizing AA𝐴superscript𝐴AA^{\dagger}italic_A italic_A start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT, resulting in ρ=AATr(AA)𝜌𝐴superscript𝐴Tr𝐴superscript𝐴\rho=\frac{AA^{\dagger}}{\text{Tr}(AA^{\dagger})}italic_ρ = divide start_ARG italic_A italic_A start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT end_ARG start_ARG Tr ( italic_A italic_A start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT ) end_ARG. This procedure ensures that ρ𝜌\rhoitalic_ρ is a valid density matrix. On the generated states, we run Algorithm 4, which takes as input the error threshold δ𝛿\deltaitalic_δ, the set of arms 𝒜𝒜\mathcal{A}caligraphic_A, and a permutation of {1,2,3,4,5,6}123456\{1,2,3,4,5,6\}{ 1 , 2 , 3 , 4 , 5 , 6 } that defines the order in which the six WBMs should be adapted. Since this is a promise problem, the algorithm stops as soon as one entangled arm is identified, without needing to measure with all six WBMs.

Algorithm 4 (1,K)1𝐾(1,K)( 1 , italic_K )-quantum MAB policy for arbitrary quantum states
0:  threshold ζ=0𝜁0\zeta=0italic_ζ = 0, acceptance error rates δ𝛿\deltaitalic_δ, arms 𝒜[K]𝒜delimited-[]𝐾\mathcal{A}\leftarrow[K]caligraphic_A ← [ italic_K ], P = perm(1,2,3,4,5,6)perm123456\text{perm}(1,2,3,4,5,6)perm ( 1 , 2 , 3 , 4 , 5 , 6 )
0:  Aentsubscript𝐴entA_{\text{ent}}italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT
  flag 1absent1\leftarrow 1← 1, I1𝐼1I\leftarrow 1italic_I ← 1
  while flag do
     With P(I)subscript𝑃𝐼\mathcal{E}\leftarrow\mathcal{E}_{P(I)}caligraphic_E ← caligraphic_E start_POSTSUBSCRIPT italic_P ( italic_I ) end_POSTSUBSCRIPT, run Algorithm 2 for K𝐾Kitalic_K arms with U(t,δ)log(4Kt2δ)2t𝑈𝑡𝛿4𝐾superscript𝑡2𝛿2𝑡U(t,\delta)\leftarrow\sqrt{\frac{\log(\frac{4Kt^{2}}{\delta})}{2t}}italic_U ( italic_t , italic_δ ) ← square-root start_ARG divide start_ARG roman_log ( start_ARG divide start_ARG 4 italic_K italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG end_ARG ) end_ARG start_ARG 2 italic_t end_ARG end_ARG and return entangled arm Aent(I)superscriptsubscript𝐴ent𝐼A_{\text{ent}}^{(I)}italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_I ) end_POSTSUPERSCRIPT
     if |Aent(I)|==1|A_{\text{ent}}^{(I)}|==1| italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_I ) end_POSTSUPERSCRIPT | = = 1 then
        flag 0absent0\leftarrow 0← 0
     else
        II+1𝐼𝐼1I\leftarrow I+1italic_I ← italic_I + 1
     end if
  end while
  AentAent(I)subscript𝐴entsuperscriptsubscript𝐴ent𝐼A_{\text{ent}}\leftarrow A_{\text{ent}}^{(I)}italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT ← italic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_I ) end_POSTSUPERSCRIPT
Refer to caption
Figure 4: Entanglement Detection ratio v/s δ𝛿\deltaitalic_δ for the (1,K)1𝐾(1,K)( 1 , italic_K )-quantum MAB problem for arbitrary quantum states

We iterate the bandit policy at most five times, resulting in the worst-case sample complexity for entanglement detection being scaled by a factor of six. To this end, we conduct the following experiment, generating 500 different instances of K=5𝐾5K=5italic_K = 5 arbitrary states generated following the procedure described earlier. We ensure that each instance includes one entangled arm. We note that these are valid instances verified by the PPT criterion. The objective of this experiment is test the efficacy of using the single parameter family of witnesses (4) to detect entanglement in arbitrary states. For δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ), we report the fraction of times the entangled arm is accurately identified and this is shown in Fig. 4.

Table VI: Examples of arbitrary pure entangled states detected by the family of witnesses (4)

Pure entangled states |ψ1,|ψ2ketsubscript𝜓1ketsubscript𝜓2\ket{\psi_{1}},\ket{\psi_{2}}| start_ARG italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ⟩ , | start_ARG italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ⟩ and |ψ3ketsubscript𝜓3\ket{\psi_{3}}| start_ARG italic_ψ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG ⟩ Values under (Si)i=16superscriptsubscriptsubscript𝑆subscript𝑖𝑖16(S_{\mathcal{E}_{i}})_{i=1}^{6}( italic_S start_POSTSUBSCRIPT caligraphic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT [0.2687+0.0375i;0.2406+0.4090i;0.0502+0.6162i;0.2413+0.5107i]0.26870.0375𝑖0.24060.4090𝑖0.05020.6162𝑖0.24130.5107𝑖[0.2687+0.0375i;0.2406+0.4090i;0.0502+0.6162i;0.2413+0.5107i][ 0.2687 + 0.0375 italic_i ; 0.2406 + 0.4090 italic_i ; 0.0502 + 0.6162 italic_i ; 0.2413 + 0.5107 italic_i ] (0.1851,0.3160,0.1598,0.0058,0.2177,0.1947)0.18510.31600.15980.00580.21770.1947(-0.1851,0.3160,0.1598,-0.0058,0.2177,-0.1947)( - 0.1851 , 0.3160 , 0.1598 , - 0.0058 , 0.2177 , - 0.1947 ) [0.0565+0.3355i;0.0508+0.0686i;0.4885+0.5191i;0.5689+0.2125i]0.05650.3355𝑖0.05080.0686𝑖0.48850.5191𝑖0.56890.2125𝑖[0.0565+0.3355i;0.0508+0.0686i;0.4885+0.5191i;0.5689+0.2125i][ 0.0565 + 0.3355 italic_i ; 0.0508 + 0.0686 italic_i ; 0.4885 + 0.5191 italic_i ; 0.5689 + 0.2125 italic_i ] (0.1562,0.0280,0.1135,0.1832,0.0779,0.1373)0.15620.02800.11350.18320.07790.1373(0.1562,-0.0280,-0.1135,0.1832,-0.0779,0.1373)( 0.1562 , - 0.0280 , - 0.1135 , 0.1832 , - 0.0779 , 0.1373 ) [0.1953+0.4438i;0.4958+0.4009i;0.0069+0.3495i;0.0322+0.4848i]0.19530.4438𝑖0.49580.4009𝑖0.00690.3495𝑖0.03220.4848𝑖[0.1953+0.4438i;0.4958+0.4009i;0.0069+0.3495i;0.0322+0.4848i][ 0.1953 + 0.4438 italic_i ; 0.4958 + 0.4009 italic_i ; 0.0069 + 0.3495 italic_i ; 0.0322 + 0.4848 italic_i ] (0.1851,0.3160,0.1598,0.0058,0.2177,0.1947)0.18510.31600.15980.00580.21770.1947(-0.1851,0.3160,0.1598,-0.0058,0.2177,-0.1947)( - 0.1851 , 0.3160 , 0.1598 , - 0.0058 , 0.2177 , - 0.1947 )

From the above experiment, we report several noteworthy observations. Firstly, we encountered instances of pure states ρ𝜌\rhoitalic_ρ where the value of S(ρ)subscript𝑆𝜌S_{\mathcal{E}}(\rho)italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ ) equaled 00, which is the threshold value provided to the algorithm. In such cases, the algorithm required a significantly long time to converge and, despite this, incorrectly estimated the value of S(ρ)subscript𝑆𝜌S_{\mathcal{E}}(\rho)italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ ). Consequently, we adjusted the threshold to 1×1031superscript103-1\times 10^{-3}- 1 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and imposed a cutoff on the sample complexity at 1×10121superscript10121\times 10^{12}1 × 10 start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT to better reflect the real-time performance of this policy. Secondly, we came across instances of entangled states verified by the PPT test that yielded positive values of S(ρ)subscript𝑆𝜌S_{\mathcal{E}}(\rho)italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ ) under all six WBMs. Interestingly, the mixed entangled state ρ=i=13pi|ψiψi|𝜌superscriptsubscript𝑖13subscript𝑝𝑖ketsubscript𝜓𝑖brasubscript𝜓𝑖\rho=\sum_{i=1}^{3}p_{i}\ket{\psi_{i}}\bra{\psi_{i}}italic_ρ = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_ARG italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ⟩ ⟨ start_ARG italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG |, where |ψiketsubscript𝜓𝑖\ket{\psi_{i}}| start_ARG italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ⟩ are defined in Table VI, with (pi)i=13=(0.2936,0.0655,0.6409)superscriptsubscriptsubscript𝑝𝑖𝑖130.29360.06550.6409(p_{i})_{i=1}^{3}=(0.2936,0.0655,0.6409)( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT = ( 0.2936 , 0.0655 , 0.6409 ) has (S)=(0.0732,0.1727,0.1257,0.1139,0.0736,0.0296)subscript𝑆0.07320.17270.12570.11390.07360.0296(S_{\mathcal{E}})=(0.0732,0.1727,0.1257,0.1139,0.0736,0.0296)( italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ) = ( 0.0732 , 0.1727 , 0.1257 , 0.1139 , 0.0736 , 0.0296 ) under the six witnesses, indicating that this state cannot be detected by the witness family described in (4).

We derive an observation on the nature of such states, particularly focusing on the eigenstate |λmax=[0.37730.1445i,0.47680.3244i,0.4598+0.0809i,0.5351]subscriptket𝜆max0.37730.1445𝑖0.47680.3244𝑖0.45980.0809𝑖0.5351\ket{\lambda}_{\text{max}}=[0.3773-0.1445i,0.4768-0.3244i,0.4598+0.0809i,0.5351]| start_ARG italic_λ end_ARG ⟩ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT = [ 0.3773 - 0.1445 italic_i , 0.4768 - 0.3244 italic_i , 0.4598 + 0.0809 italic_i , 0.5351 ], which corresponds to the largest eigenvalue of ρ𝜌\rhoitalic_ρ. This eigenstate has a Schmidt coefficient close to, but not equal to, 1, suggesting that it lies near the boundary of the separable states yet remains entangled. The pure state |λmaxλ|maxsubscriptket𝜆maxsubscriptbra𝜆max\ket{\lambda}_{\text{max}}\bra{\lambda}_{\text{max}}| start_ARG italic_λ end_ARG ⟩ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ⟨ start_ARG italic_λ end_ARG | start_POSTSUBSCRIPT max end_POSTSUBSCRIPT produces (S)=(0.0380,0.1269,0.0401,0.1054,0.0221,0.0074)subscript𝑆0.03800.12690.04010.10540.02210.0074(S_{\mathcal{E}})=(0.0380,0.1269,0.0401,0.1054,0.0221,0.0074)( italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ) = ( 0.0380 , 0.1269 , 0.0401 , 0.1054 , 0.0221 , 0.0074 ). Thus, we have identified examples of pure and mixed entangled states that can yield inconclusive results when measured using this particular witness family. In these instances, it is essential to measure all six witnesses a sufficient number of times to accurately obtain the expected values of the corresponding observables. Subsequently, performing FST can help determine the entanglement of these states using other separability criteria.

VI Future Works And Conclusion

We established a novel correspondence between the problem of entanglement detection and the Bad Arm Identification problem in stochastic Multi-Armed Bandits (MAB). We propose the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum Multi-Armed Bandit framework. Focus of this framework is on identifying m𝑚mitalic_m entangled states out of K𝐾Kitalic_K states, where m𝑚mitalic_m is potentially unknown. We apply this framework to two-qubit states using two key ingredients: a specialized set of six measurements for two-qubit states called Witness Basis Measurements (WBM) \mathcal{E}caligraphic_E, and a separability criterion Ssubscript𝑆S_{\mathcal{E}}italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT, which is based on the data obtained from these measurements and serves as the parameter that needs to be estimated. We present theoretical guarantees and numerical simulations to demonstrate how this parameter can be estimated quickly and accurately using MAB policies. First, we show that entangled states belonging to a class of parameterised two-qubit states \mathcal{F}caligraphic_F can be detected by measuring a subset of the six WBMs. With the knowledge of the WBM, we show that we can directly apply some suitable MAB policies. Second, for the same parameterised states, we present a routine for entanglement detection when the WBM is not known by enabling arbitrary sequential adaptation of the WBMs. We extend this to arbitrary two qubit quantum states and provide numerical results on the efficacy of using these measurements for detecting entanglement.

A promising future direction is identifying WBMs for higher-dimensional bipartite systems. The authors of [2] propose a minimal tomographic scheme for two-qutrits, requiring only eleven witnesses instead of the traditional 81. Recent explorations in data-driven machine learning techniques have utilized SVMs to construct linear entanglement witnesses requiring only local measurements [47]. This approach offers promising avenues for extending these methods to address the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum MAB problem by constructing a minimal number of witnesses to accurately detect all m𝑚mitalic_m states. Entanglement detection can be viewed as a membership problem, where a state belongs to a set if it has a specific property (such as, entanglement). This problem has also been explored along the lines of the partition identification problem [48], where the goal is to determine the partition to which a data point belongs, given the form of a hyperplane. Extending this concept to the (m,K)𝑚𝐾(m,K)( italic_m , italic_K )-quantum MAB problem presents an exciting avenue for future research.

Acknowledgement

B.K. sincerely acknowledges the support from the Ministry of Education, Government of India, through the Prime Minister’s Research Fellowship (PMRF) Scheme. V.S. is supported by the U.S. Department of Energy, Office of Science, National Quantum Information Science Research Centers, Co-design Center for Quantum Advantage (C2QA) contract (DE- SC0012704). K.J. gratefully acknowledges a grant from Mphasis to the Centre for Quantum Information, Communication, and Computing (CQuICC) at IIT Madras.

References

  • [1] D. Lu, T. Xin, N. Yu, Z. Ji, J. Chen, G. Long, J. Baugh, X. Peng, B. Zeng, and R. Laflamme, “Tomography is necessary for universal entanglement detection with single-copy observables,” Phys. Rev. Lett., vol. 116, p. 230501, Jun 2016. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.116.230501
  • [2] H. Zhu, Y. S. Teo, and B.-G. Englert, “Minimal tomography with entanglement witnesses,” Phys. Rev. A, vol. 81, p. 052339, May 2010. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.81.052339
  • [3] C. H. Bennett, G. Brassard, C. Crépeau, R. Jozsa, A. Peres, and W. K. Wootters, “Teleporting an unknown quantum state via dual classical and einstein-podolsky-rosen channels,” Phys. Rev. Lett., vol. 70, pp. 1895–1899, Mar 1993.
  • [4] H. Buhrman, R. Cleve, and W. van Dam, “Quantum entanglement and communication complexity,” SIAM Journal on Computing, vol. 30, no. 6, pp. 1829–1841, 2001.
  • [5] R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki, “Quantum entanglement,” Reviews of Modern Physics, vol. 81, no. 2, pp. 865–942, Jun. 2009.
  • [6] R. Kueng, H. Rauhut, and U. Terstiege, “Low rank matrix recovery from rank one measurements,” Applied and Computational Harmonic Analysis, vol. 42, no. 1, pp. 88–116, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1063520315001037
  • [7] J. Wang, V. B. Scholz, and R. Renner, “Confidence polytopes in quantum state tomography,” Physical Review Letters, vol. 122, no. 19, May 2019. [Online]. Available: http://dx.doi.org/10.1103/PhysRevLett.122.190401
  • [8] R. O’Donnell and J. Wright, “Efficient quantum tomography,” Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:769062
  • [9] ——, “Efficient quantum tomography ii,” Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:5245926
  • [10] K. Banaszek, M. Cramer, and D. Gross, “Focus on quantum tomography,” New Journal of Physics, vol. 15, no. 12, p. 125020, dec 2013. [Online]. Available: https://dx.doi.org/10.1088/1367-2630/15/12/125020
  • [11] S. T. Flammia, D. Gross, Y.-K. Liu, and J. Eisert, “Quantum tomography via compressed sensing: error bounds, sample complexity and efficient estimators,” New Journal of Physics, vol. 14, no. 9, p. 095022, sep 2012. [Online]. Available: https://dx.doi.org/10.1088/1367-2630/14/9/095022
  • [12] M. Guta, J. Kahn, R. Kueng, and J. A. Tropp, “Fast state tomography with optimal error bounds,” Journal of Physics A: Mathematical and Theoretical, vol. 53, no. 20, p. 204001, apr 2020. [Online]. Available: https://dx.doi.org/10.1088/1751-8121/ab8111
  • [13] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and G. Carleo, “Neural-network quantum state tomography,” Nature Physics, vol. 14, no. 5, p. 447–450, Feb. 2018. [Online]. Available: http://dx.doi.org/10.1038/s41567-018-0048-5
  • [14] Y. Quek, S. Fort, and H. K. Ng, “Adaptive quantum state tomography with neural networks,” 2018.
  • [15] D. Koutný, L. Motka, Z. Hradil, J. Řeháček, and L. L. Sánchez-Soto, “Neural-network quantum state tomography,” Physical Review A, vol. 106, no. 1, Jul. 2022. [Online]. Available: http://dx.doi.org/10.1103/PhysRevA.106.012409
  • [16] T. Schmale, M. Reh, and M. Gärttner, “Efficient quantum state tomography with convolutional neural networks,” npj Quantum Information, vol. 8, no. 1, Sep. 2022. [Online]. Available: http://dx.doi.org/10.1038/s41534-022-00621-4
  • [17] D. S. França, F. G. L. Brandão, and R. Kueng, “Fast and Robust Quantum State Tomography from Few Basis Measurements,” in 16th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2021), ser. Leibniz International Proceedings in Informatics (LIPIcs), M.-H. Hsieh, Ed., vol. 197.   Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021, pp. 7:1–7:13. [Online]. Available: https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.TQC.2021.7
  • [18] J. Haah, A. W. Harrow, Z. Ji, X. Wu, and N. Yu, “Sample-optimal tomography of quantum states,” IEEE Transactions on Information Theory, p. 1–1, 2017. [Online]. Available: http://dx.doi.org/10.1109/TIT.2017.2719044
  • [19] Y. S. Teo, H. Zhu, B.-G. Englert, J. Řeháček, and Z. c. v. Hradil, “Quantum-state reconstruction by maximizing likelihood and entropy,” Phys. Rev. Lett., vol. 107, p. 020404, Jul 2011. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.107.020404
  • [20] V. Siddhu, “Maximum a posteriori probability estimates for quantum tomography,” Physical Review A, vol. 99, no. 1, Jan. 2019. [Online]. Available: http://dx.doi.org/10.1103/PhysRevA.99.012342
  • [21] M. Horodecki, P. Horodecki, and R. Horodecki, “Separability of mixed states: necessary and sufficient conditions,” Physics Letters A, vol. 223, no. 1, pp. 1–8, 1996. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960196007062
  • [22] B. M. Terhal, “Bell inequalities and the separability criterion,” Physics Letters A, vol. 271, no. 5, pp. 319–326, 2000. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960100004011
  • [23] M. Lewenstein, B. Kraus, J. I. Cirac, and P. Horodecki, “Optimization of entanglement witnesses,” Phys. Rev. A, vol. 62, p. 052310, Oct 2000. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.62.052310
  • [24] D. Chruściński and G. Sarbicki, “Entanglement witnesses: construction, analysis and classification,” Journal of Physics A: Mathematical and Theoretical, vol. 47, no. 48, p. 483001, Nov. 2014. [Online]. Available: http://dx.doi.org/10.1088/1751-8113/47/48/483001
  • [25] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine Learning, vol. 47, pp. 235–256, 05 2002.
  • [26] J.-Y. Audibert, S. Bubeck, and R. Munos, “Best arm identification in multi-armed bandits.” in COLT, 2010, pp. 41–53.
  • [27] H. Kano, J. Honda, K. Sakamaki, K. Matsuura, A. Nakamura, and M. Sugiyama, “Good arm identification via bandit feedback,” 2018.
  • [28] M. Lewenstein, B. Kraus, J. I. Cirac, and P. Horodecki, “Optimization of entanglement witnesses,” Phys. Rev. A, vol. 62, p. 052310, Oct 2000. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.62.052310
  • [29] I. Bengtsson and K. Zyczkowski, Geometry of Quantum States: An Introduction to Quantum Entanglement.   Cambridge University Press, 2006.
  • [30] M. Horodecki, P. Horodecki, and R. Horodecki, “Separability of mixed states: necessary and sufficient conditions,” Physics Letters A, vol. 223, no. 1, pp. 1–8, 1996. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960196007062
  • [31] A. Peres, “Separability criterion for density matrices,” Phys. Rev. Lett., vol. 77, pp. 1413–1415, Aug 1996. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.77.1413
  • [32] P. Horodecki, “Separability criterion and inseparable mixed states with positive partial transposition,” Physics Letters A, vol. 232, no. 5, pp. 333–339, 1997. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0375960197004167
  • [33] O. Rudolph, “A separability criterion for density operators,” Journal of Physics A: Mathematical and General, vol. 33, no. 21, p. 3951–3955, May 2000. [Online]. Available: http://dx.doi.org/10.1088/0305-4470/33/21/308
  • [34] O. Gühne, P. Hyllus, O. Gittsovich, and J. Eisert, “Covariance matrices and the separability problem,” Physical Review Letters, vol. 99, no. 13, Sep. 2007. [Online]. Available: http://dx.doi.org/10.1103/PhysRevLett.99.130504
  • [35] L. Gurvits, “Classical deterministic complexity of edmonds’ problem and quantum entanglement,” 2003.
  • [36] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri, “Complete family of separability criteria,” Phys. Rev. A, vol. 69, p. 022308, Feb 2004. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevA.69.022308
  • [37] E. Even-Dar, S. Mannor, and Y. Mansour, “Pac bounds for multi-armed bandit and markov decision processes,” ser. COLT ’02.   Berlin, Heidelberg: Springer-Verlag, 2002, p. 255–270.
  • [38] S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone, “Pac subset selection in stochastic multi-armed bandits,” in Proceedings of the 29th International Coference on International Conference on Machine Learning, ser. ICML’12.   Madison, WI, USA: Omnipress, 2012, p. 227–234.
  • [39] Z. Karnin, T. Koren, and O. Somekh, “Almost optimal exploration in multi-armed bandits,” in Proceedings of the 30th International Conference on Machine Learning, 2013, pp. 1238–1246.
  • [40] S. Mannor and J. N. Tsitsiklis, “The sample complexity of exploration in the multi-armed bandit problem,” J. Mach. Learn. Res., vol. 5, p. 623–648, dec 2004.
  • [41] R. H. Farrell, “Asymptotic Behavior of Expected Sample Size in Certain One Sided Tests,” The Annals of Mathematical Statistics, vol. 35, no. 1, pp. 36 – 72, 1964. [Online]. Available: https://doi.org/10.1214/aoms/1177703731
  • [42] K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck, “lil’ ucb : An optimal exploration algorithm for multi-armed bandits,” in Proceedings of The 27th Conference on Learning Theory, ser. Proceedings of Machine Learning Research, M. F. Balcan, V. Feldman, and C. Szepesvári, Eds., vol. 35.   Barcelona, Spain: PMLR, 13–15 Jun 2014, pp. 423–439. [Online]. Available: https://proceedings.mlr.press/v35/jamieson14.html
  • [43] A. Locatelli, M. Gutzeit, and A. Carpentier, “An optimal algorithm for the thresholding bandit problem,” in Proceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48.   New York, New York, USA: PMLR, 20–22 Jun 2016, pp. 1690–1698. [Online]. Available: https://proceedings.mlr.press/v48/locatelli16.html
  • [44] T.-H. Tsai, Y.-D. Tsai, and S.-D. Lin, “lil’hdoc: An algorithm for good arm identification under small threshold gap,” 2024.
  • [45] J. Lumbreras, E. Haapasalo, and M. Tomamichel, “Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states,” Quantum, vol. 6, p. 749, Jun. 2022. [Online]. Available: http://dx.doi.org/10.22331/q-2022-06-29-749
  • [46] K. Zyczkowski and H.-J. Sommers, “Induced measures in the space of mixed quantum states,” Journal of Physics A: Mathematical and General, vol. 34, no. 35, p. 7111–7125, Aug. 2001. [Online]. Available: http://dx.doi.org/10.1088/0305-4470/34/35/335
  • [47] A. C. Greenwood, L. T. Wu, E. Y. Zhu, B. T. Kirby, and L. Qian, “Machine-learning-derived entanglement witnesses,” Phys. Rev. Appl., vol. 19, p. 034058, Mar 2023. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevApplied.19.034058
  • [48] S. Juneja and S. Krishnasamy, “Sample complexity of partition identification using multi-armed bandits,” 2019.

VII Supplementary Material

The following lemma is useful for some calculations.

Lemma 14.

For t1,c>0,ε(0,1),0<w1formulae-sequence𝑡1formulae-sequence𝑐0formulae-sequence𝜀010𝑤1t\geq 1,c>0,\varepsilon\in(0,1),0<w\leq 1italic_t ≥ 1 , italic_c > 0 , italic_ε ∈ ( 0 , 1 ) , 0 < italic_w ≤ 1,

1tlog(log((1+ε)t)w)ct1clog(2log((1+ε)cw)w).1𝑡1𝜀𝑡𝑤𝑐𝑡1𝑐21𝜀𝑐𝑤𝑤\frac{1}{t}\log\left(\frac{\log\left((1+\varepsilon)t\right)}{w}\right)\geq c% \implies t\leq\frac{1}{c}\log\left(\frac{2\log\left(\frac{(1+\varepsilon)}{cw}% \right)}{w}\right).divide start_ARG 1 end_ARG start_ARG italic_t end_ARG roman_log ( divide start_ARG roman_log ( ( 1 + italic_ε ) italic_t ) end_ARG start_ARG italic_w end_ARG ) ≥ italic_c ⟹ italic_t ≤ divide start_ARG 1 end_ARG start_ARG italic_c end_ARG roman_log ( divide start_ARG 2 roman_log ( divide start_ARG ( 1 + italic_ε ) end_ARG start_ARG italic_c italic_w end_ARG ) end_ARG start_ARG italic_w end_ARG ) . (16)

VII-A Proof for Section IV-A

VII-A1 Proof of Lemma 8

Proof.

Let \mathcal{B}caligraphic_B denote the ”good” event that at any time t>0𝑡0t>0italic_t > 0 and for all arms i[K]𝑖delimited-[]𝐾i\in[K]italic_i ∈ [ italic_K ], the true value S(ρi)subscript𝑆subscript𝜌𝑖S_{\mathcal{E}}(\rho_{i})italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is well concentrated around its estimate S^i,Ni(t)subscript^𝑆𝑖subscript𝑁𝑖𝑡\hat{S}_{i,N_{i}(t)}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT.

i=1Kt=1{|S^i,Ni(t)Si|U(Ni(t),δcεK)}superscriptsubscript𝑖1𝐾superscriptsubscript𝑡1subscript^𝑆𝑖subscript𝑁𝑖𝑡subscript𝑆𝑖𝑈subscript𝑁𝑖𝑡𝛿subscript𝑐𝜀𝐾\mathcal{B}\coloneqq\bigcup_{i=1}^{K}\bigcup_{t=1}^{\infty}\left\{|\hat{S}_{i,% N_{i}(t)}-S_{i}|\leq U\left(N_{i}(t),\frac{\delta}{c_{\varepsilon}K}\right)\right\}caligraphic_B ≔ ⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ⋃ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT { | over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ italic_U ( italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) }

From Lemma 7 and by applying the union bound, we get that

[]1cεK(δcεK)1+ε1δdelimited-[]1subscript𝑐𝜀𝐾superscript𝛿subscript𝑐𝜀𝐾1𝜀1𝛿\mathbb{P}\left[\mathcal{B}\right]\geq 1-c_{\varepsilon}K\left(\frac{\delta}{c% _{\varepsilon}K}\right)^{1+\varepsilon}\geq 1-\deltablackboard_P [ caligraphic_B ] ≥ 1 - italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K ( divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT ≥ 1 - italic_δ (17)

where Eq. 17 holds because ε(0,1)𝜀01\varepsilon\in(0,1)italic_ε ∈ ( 0 , 1 ) and cε1subscript𝑐𝜀1c_{\varepsilon}\geq 1italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ≥ 1. ∎

VII-A2 Proof of Theorem 9

Proof.

Recall that the threshold ζ=0𝜁0\zeta=0italic_ζ = 0 and problem instance 𝑺subscript𝑺\boldsymbol{S}_{\mathcal{E}}bold_italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT is such that S(ρ1)S(ρ2)S(ρ3)>S(ρK1)>0>S(ρK)subscript𝑆subscript𝜌1subscript𝑆subscript𝜌2subscript𝑆subscript𝜌3subscript𝑆subscript𝜌𝐾10subscript𝑆subscript𝜌𝐾S_{\mathcal{E}}(\rho_{1})\geq S_{\mathcal{E}}(\rho_{2})\geq S_{\mathcal{E}}(% \rho_{3})\ldots>S_{\mathcal{E}}(\rho_{K-1})>0>S_{\mathcal{E}}(\rho_{K})italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≥ italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≥ italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) … > italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT ) > 0 > italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ). Let us consider the case that the event \mathcal{B}caligraphic_B described in Lemma 8 holds. As outlined in Algorithm 1, the arm isuperscript𝑖i^{\star}italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT will be dropped from the active set ΩΩ\Omegaroman_Ω if LCBi(t)>0subscriptLCBsuperscript𝑖𝑡0\text{LCB}_{i^{\star}}(t)>0LCB start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) > 0. That is,

S^i,Ni(t)U(Ni(t),δcεK)>0subscript^𝑆superscript𝑖subscript𝑁superscript𝑖𝑡𝑈subscript𝑁superscript𝑖𝑡𝛿subscript𝑐𝜀𝐾0\displaystyle\hat{S}_{i^{\star},N_{i^{\star}}(t)}-U\left(N_{i^{\star}}(t),% \frac{\delta}{c_{\varepsilon}K}\right)>0over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_N start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - italic_U ( italic_N start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) > 0
S^i,Ni(t)|S^i,Ni(t)Si|>0subscript^𝑆superscript𝑖subscript𝑁superscript𝑖𝑡subscript^𝑆superscript𝑖superscriptsubscript𝑁𝑖𝑡subscript𝑆superscript𝑖0\displaystyle\hat{S}_{i^{\star},N_{i^{\star}}(t)}-|\hat{S}_{i^{\star},N_{i}^{% \star}(t)}-S_{i^{\star}}|>0over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_N start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - | over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_t ) end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | > 0
Si>0absentsubscript𝑆superscript𝑖0\displaystyle\implies S_{i^{\star}}>0⟹ italic_S start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT > 0

This contradicts the assumption about the problem instance 𝑺𝑺\boldsymbol{S}bold_italic_S because Si=S(ρK)<0subscript𝑆superscript𝑖subscript𝑆subscript𝜌𝐾0S_{i^{\star}}=S_{\mathcal{E}}(\rho_{K})<0italic_S start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) < 0 and so, the arm isuperscript𝑖i^{\star}italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT will not be dropped from the active set ΩΩ\Omegaroman_Ω as long as event \mathcal{B}caligraphic_B holds. ∎

VII-A3 Proof of Theorem 10

Proof.

Let us consider the case where \mathcal{B}caligraphic_B holds. By the elimination rule of Algorithm 1, an arm i𝑖iitalic_i is removed from the active set ΩΩ\Omegaroman_Ω if LCBi(t)>0subscriptLCB𝑖𝑡0\text{LCB}_{i}(t)>0LCB start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) > 0. We have that,

S^i,Ni(t)U(Ni(t),δcεK)ζsubscript^𝑆𝑖subscript𝑁𝑖𝑡𝑈subscript𝑁𝑖𝑡𝛿subscript𝑐𝜀𝐾𝜁\displaystyle\hat{S}_{i,N_{i}(t)}-U\left(N_{i}(t),\frac{\delta}{c_{\varepsilon% }K}\right)\geq\zetaover^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - italic_U ( italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) ≥ italic_ζ
S^i,Ni(t)Si+ΔiU(Ni(t),δcεK)subscript^𝑆𝑖subscript𝑁𝑖𝑡subscript𝑆𝑖subscriptΔ𝑖𝑈subscript𝑁𝑖𝑡𝛿subscript𝑐𝜀𝐾\displaystyle\hat{S}_{i,N_{i}(t)}-S_{i}+\Delta_{i}\geq U\left(N_{i}(t),\frac{% \delta}{c_{\varepsilon}K}\right)over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ italic_U ( italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG )
Δi2U(Ni(t),δcεK)absentsubscriptΔ𝑖2𝑈subscript𝑁𝑖𝑡𝛿subscript𝑐𝜀𝐾\displaystyle\implies\Delta_{i}\geq 2U\left(N_{i}(t),\frac{\delta}{c_{% \varepsilon}K}\right)⟹ roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 2 italic_U ( italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) (18)

Let us denote Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to be the number of samples of arm i𝑖iitalic_i, that is, Ni=inf{t:U(Ni(t),δcεK)Δi2}subscript𝑁𝑖infimumconditional-set𝑡𝑈subscript𝑁𝑖𝑡𝛿subscript𝑐𝜀𝐾subscriptΔ𝑖2N_{i}=\inf\{t:U\left(N_{i}(t),\frac{\delta}{c_{\varepsilon}K}\right)\leq\frac{% \Delta_{i}}{2}\}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_inf { italic_t : italic_U ( italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) ≤ divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG }. The minimum value of Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be obtained by solving,

U(Ni,δcεK)=Δi2𝑈subscript𝑁𝑖𝛿subscript𝑐𝜀𝐾subscriptΔ𝑖2\displaystyle U\left(N_{i},\frac{\delta}{c_{\varepsilon}K}\right)=\frac{\Delta% _{i}}{2}italic_U ( italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) = divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG
(1+ε)2(1+ε)Nilog(log((1+ε)Ni)δ/cεK)=Δi21𝜀21𝜀subscript𝑁𝑖1𝜀subscript𝑁𝑖𝛿subscript𝑐𝜀𝐾subscriptΔ𝑖2\displaystyle(1+\sqrt{\varepsilon})\sqrt{\frac{2(1+\varepsilon)}{N_{i}}\log% \left(\frac{\log\left((1+\varepsilon)N_{i}\right)}{\delta/c_{\varepsilon}K}% \right)}=\frac{\Delta_{i}}{2}( 1 + square-root start_ARG italic_ε end_ARG ) square-root start_ARG divide start_ARG 2 ( 1 + italic_ε ) end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_log ( divide start_ARG roman_log ( ( 1 + italic_ε ) italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_δ / italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) end_ARG = divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG
1Nilog(log((1+ε)Ni)δ/cεK)=Δi28(1+ε)(1+ε)21subscript𝑁𝑖1𝜀subscript𝑁𝑖𝛿subscript𝑐𝜀𝐾superscriptsubscriptΔ𝑖281𝜀superscript1𝜀2\displaystyle\frac{1}{N_{i}}\log\left(\frac{\log\left((1+\varepsilon)N_{i}% \right)}{\delta/c_{\varepsilon}K}\right)=\frac{\Delta_{i}^{2}}{8(1+\varepsilon% )(1+\sqrt{\varepsilon})^{2}}divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_log ( divide start_ARG roman_log ( ( 1 + italic_ε ) italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_δ / italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) = divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 8 ( 1 + italic_ε ) ( 1 + square-root start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (19)

From Lemma 16, we get that,

Ni=8(1+ε)(1+ε)2Δi2log(2cεKlog(8cε(1+ε)2(1+ε)2δKΔi2)δ)subscript𝑁𝑖81𝜀superscript1𝜀2superscriptsubscriptΔ𝑖22subscript𝑐𝜀𝐾8subscript𝑐𝜀superscript1𝜀2superscript1𝜀2𝛿𝐾superscriptsubscriptΔ𝑖2𝛿N_{i}=\frac{8(1+\varepsilon)(1+\sqrt{\varepsilon})^{2}}{\Delta_{i}^{2}}\log% \left(\frac{2c_{\varepsilon}K\log\left(\frac{8c_{\varepsilon}(1+\varepsilon)^{% 2}(1+\sqrt{\varepsilon})^{2}}{\delta}\frac{K}{\Delta_{i}^{2}}\right)}{\delta}\right)italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 8 ( 1 + italic_ε ) ( 1 + square-root start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_log ( divide start_ARG 2 italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K roman_log ( divide start_ARG 8 italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( 1 + italic_ε ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 + square-root start_ARG italic_ε end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG divide start_ARG italic_K end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) end_ARG start_ARG italic_δ end_ARG ) (20)

Thus, the total number of samples required to identify the arm isuperscript𝑖i^{\star}italic_i start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT with a probability of at least 1δ1𝛿1-\delta1 - italic_δ is Ni=1KNi𝑁superscriptsubscript𝑖1𝐾subscript𝑁𝑖N\leq\sum_{i=1}^{K}N_{i}italic_N ≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. ∎

VII-B Proof for Section IV-B

VII-B1 Proof of Lemma 11

Proof.

Firstly, we show that Algorithm 2 is (λ,δ)𝜆𝛿(\lambda,\delta)( italic_λ , italic_δ )-PAC for arbitrary λ[K]𝜆delimited-[]𝐾\lambda\in[K]italic_λ ∈ [ italic_K ]. In the case where there are arms greater than or equal to λ𝜆\lambdaitalic_λ, we show that [{m^<λ}i𝒜ent{Si<ζ}]δdelimited-[]^𝑚𝜆subscript𝑖subscript𝒜entsubscript𝑆𝑖𝜁𝛿\mathbb{P}\left[\{\hat{m}<\lambda\}\cup\bigcup_{i\in\mathcal{A}_{\text{ent}}}% \{S_{i}<\zeta\}\right]\leq\deltablackboard_P [ { over^ start_ARG italic_m end_ARG < italic_λ } ∪ ⋃ start_POSTSUBSCRIPT italic_i ∈ caligraphic_A start_POSTSUBSCRIPT ent end_POSTSUBSCRIPT end_POSTSUBSCRIPT { italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_ζ } ] ≤ italic_δ where m^^𝑚\hat{m}over^ start_ARG italic_m end_ARG is the number of good arms identified by the agent. Since we are now considering the case when mλ𝑚𝜆m\geq\lambdaitalic_m ≥ italic_λ, the event {m^<λ}^𝑚𝜆\{\hat{m}<\lambda\}{ over^ start_ARG italic_m end_ARG < italic_λ } implies that at least one good arm j[m]𝑗delimited-[]𝑚j\in[m]italic_j ∈ [ italic_m ] is identified as a bad arm by the agent. That is, for some j[m]𝑗delimited-[]𝑚j\in[m]italic_j ∈ [ italic_m ] and t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N, the upper confidence bound S^j,Nj(t)+U(Nj(t),δcεK)<ζsubscript^𝑆𝑗subscript𝑁𝑗𝑡𝑈subscript𝑁𝑗𝑡𝛿subscript𝑐𝜀𝐾𝜁\hat{S}_{j,N_{j}(t)}+U\left(N_{j}(t),\frac{\delta}{c_{\varepsilon}K}\right)<\zetaover^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j , italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT + italic_U ( italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) < italic_ζ. Thus, we have that,

[m^<λ]delimited-[]^𝑚𝜆\displaystyle\mathbb{P}\left[\hat{m}<\lambda\right]blackboard_P [ over^ start_ARG italic_m end_ARG < italic_λ ] j[m][t{S^j,Nj(t)+U(Nj(t),δcεK)<ζ}]absentsubscript𝑗delimited-[]𝑚delimited-[]subscript𝑡subscript^𝑆𝑗subscript𝑁𝑗𝑡𝑈subscript𝑁𝑗𝑡𝛿subscript𝑐𝜀𝐾𝜁\displaystyle\leq\sum_{j\in[m]}\mathbb{P}\left[\bigcup_{t\in\mathbb{N}}\{\hat{% S}_{j,N_{j}(t)}+U\left(N_{j}(t),\frac{\delta}{c_{\varepsilon}K}\right)<\zeta\}\right]≤ ∑ start_POSTSUBSCRIPT italic_j ∈ [ italic_m ] end_POSTSUBSCRIPT blackboard_P [ ⋃ start_POSTSUBSCRIPT italic_t ∈ blackboard_N end_POSTSUBSCRIPT { over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j , italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT + italic_U ( italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) < italic_ζ } ]
j[m]cε(δcεK)1+ε(By Lemma 7)absentsubscript𝑗delimited-[]𝑚subscript𝑐𝜀superscript𝛿subscript𝑐𝜀𝐾1𝜀(By Lemma 7)\displaystyle\leq\sum_{j\in[m]}c_{\varepsilon}\left(\frac{\delta}{c_{% \varepsilon}K}\right)^{1+\varepsilon}\ \ \ \ \ \text{(By Lemma \ref{lemma:FLIL% })}≤ ∑ start_POSTSUBSCRIPT italic_j ∈ [ italic_m ] end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT (By Lemma )
mcε(δcεK)absent𝑚subscript𝑐𝜀𝛿subscript𝑐𝜀𝐾\displaystyle\leq mc_{\varepsilon}\left(\frac{\delta}{c_{\varepsilon}K}\right)≤ italic_m italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) (21)

The event i{X^1,X^2,X^λ}{μi<ζ}subscript𝑖subscript^𝑋1subscript^𝑋2subscript^𝑋𝜆subscript𝜇𝑖𝜁\bigcup_{i\in\{\hat{X}_{1},\hat{X}_{2},\ldots\hat{X}_{\lambda}\}}\{\mu_{i}<\zeta\}⋃ start_POSTSUBSCRIPT italic_i ∈ { over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT } end_POSTSUBSCRIPT { italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_ζ } considers all those outcomes where a bad arm is identified to be a good one. Thus, for some bad arm j{X^1,X^2,X^m^}𝑗subscript^𝑋1subscript^𝑋2subscript^𝑋^𝑚j\in\{\hat{X}_{1},\hat{X}_{2},\ldots\hat{X}_{\hat{m}}\}italic_j ∈ { over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_m end_ARG end_POSTSUBSCRIPT } such that j[K][m]𝑗delimited-[]𝐾delimited-[]𝑚j\in[K]\setminus[m]italic_j ∈ [ italic_K ] ∖ [ italic_m ], we have,

[i{X^1,X^2,X^λ}{Si<ζ}]delimited-[]subscript𝑖subscript^𝑋1subscript^𝑋2subscript^𝑋𝜆subscript𝑆𝑖𝜁\displaystyle\mathbb{P}\left[\bigcup_{i\in\{\hat{X}_{1},\hat{X}_{2},\ldots\hat% {X}_{\lambda}\}}\{S_{i}<\zeta\}\right]blackboard_P [ ⋃ start_POSTSUBSCRIPT italic_i ∈ { over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT } end_POSTSUBSCRIPT { italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_ζ } ] j[K][m][t{S^j,Nj(t)U(Nj(t),δcεK)>ζ}]absentsubscript𝑗delimited-[]𝐾delimited-[]𝑚delimited-[]subscript𝑡subscript^𝑆𝑗subscript𝑁𝑗𝑡𝑈subscript𝑁𝑗𝑡𝛿subscript𝑐𝜀𝐾𝜁\displaystyle\leq\sum_{j\in[K]\setminus[m]}\mathbb{P}\left[\bigcup_{t\in% \mathbb{N}}\{\hat{S}_{j,N_{j}(t)}-U\left(N_{j}(t),\frac{\delta}{c_{\varepsilon% }K}\right)>\zeta\}\right]≤ ∑ start_POSTSUBSCRIPT italic_j ∈ [ italic_K ] ∖ [ italic_m ] end_POSTSUBSCRIPT blackboard_P [ ⋃ start_POSTSUBSCRIPT italic_t ∈ blackboard_N end_POSTSUBSCRIPT { over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_j , italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - italic_U ( italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) > italic_ζ } ]
(Km)cε(δcεK)absent𝐾𝑚subscript𝑐𝜀𝛿subscript𝑐𝜀𝐾\displaystyle\leq(K-m)c_{\varepsilon}\left(\frac{\delta}{c_{\varepsilon}K}\right)≤ ( italic_K - italic_m ) italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) (22)

Thus, putting Eq. 21 and Eq. 22 together, we get that [{m^<λ}i{X^1,X^2,X^m^}{μi<ζ}]δdelimited-[]^𝑚𝜆subscript𝑖subscript^𝑋1subscript^𝑋2subscript^𝑋^𝑚subscript𝜇𝑖𝜁𝛿\mathbb{P}\left[\{\hat{m}<\lambda\}\cup\bigcup_{i\in\{\hat{X}_{1},\hat{X}_{2},% \ldots\hat{X}_{\hat{m}}\}}\{\mu_{i}<\zeta\}\right]\leq\deltablackboard_P [ { over^ start_ARG italic_m end_ARG < italic_λ } ∪ ⋃ start_POSTSUBSCRIPT italic_i ∈ { over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_m end_ARG end_POSTSUBSCRIPT } end_POSTSUBSCRIPT { italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_ζ } ] ≤ italic_δ. Next, we consider the case when the number of good arms m𝑚mitalic_m is less than λ𝜆\lambdaitalic_λ and show that [m^λ]δdelimited-[]^𝑚𝜆𝛿\mathbb{P}\left[\hat{m}\geq\lambda\right]\leq\deltablackboard_P [ over^ start_ARG italic_m end_ARG ≥ italic_λ ] ≤ italic_δ. Since there are at most λ𝜆\lambdaitalic_λ good arms, the event {m^>λ}^𝑚𝜆\{\hat{m}>\lambda\}{ over^ start_ARG italic_m end_ARG > italic_λ } implies that one of the output arms j{X^1,X^2,X^λ}𝑗subscript^𝑋1subscript^𝑋2subscript^𝑋𝜆j\in\{\hat{X}_{1},\hat{X}_{2},\ldots\hat{X}_{\lambda}\}italic_j ∈ { over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT } is such that there exists some index j𝑗jitalic_j such that X^jsubscript^𝑋𝑗\hat{X}_{j}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is a bad arm. Thus, we have that,

[m^λ]delimited-[]^𝑚𝜆\displaystyle\mathbb{P}\left[\hat{m}\geq\lambda\right]blackboard_P [ over^ start_ARG italic_m end_ARG ≥ italic_λ ] j[K][m][j{X^1,X^2,X^λ}]absentsubscript𝑗delimited-[]𝐾delimited-[]𝑚delimited-[]𝑗subscript^𝑋1subscript^𝑋2subscript^𝑋𝜆\displaystyle\leq\sum_{j\in[K]\setminus[m]}\mathbb{P}[j\in\{\hat{X}_{1},\hat{X% }_{2},\ldots\hat{X}_{\lambda}\}]≤ ∑ start_POSTSUBSCRIPT italic_j ∈ [ italic_K ] ∖ [ italic_m ] end_POSTSUBSCRIPT blackboard_P [ italic_j ∈ { over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT } ]
(Km)cε(δcεK)1+εabsent𝐾𝑚subscript𝑐𝜀superscript𝛿subscript𝑐𝜀𝐾1𝜀\displaystyle\leq(K-m)c_{\varepsilon}\left(\frac{\delta}{c_{\varepsilon}K}% \right)^{1+\varepsilon}≤ ( italic_K - italic_m ) italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) start_POSTSUPERSCRIPT 1 + italic_ε end_POSTSUPERSCRIPT
KmKcε(δcε)absent𝐾𝑚𝐾subscript𝑐𝜀𝛿subscript𝑐𝜀\displaystyle\leq\frac{K-m}{K}c_{\varepsilon}\left(\frac{\delta}{c_{% \varepsilon}}\right)≤ divide start_ARG italic_K - italic_m end_ARG start_ARG italic_K end_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT ( divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT end_ARG )
δabsent𝛿\displaystyle\leq\delta≤ italic_δ (23)

We see that the algorithm is (λ,δ)𝜆𝛿(\lambda,\delta)( italic_λ , italic_δ )-PAC for all such λ[K]𝜆delimited-[]𝐾\lambda\in[K]italic_λ ∈ [ italic_K ], thereby giving us that the algorithm is δ𝛿\deltaitalic_δ-PAC. ∎

VII-B2 Proof of Theorem 12

Proof.

Recall that the threshold ζ=0𝜁0\zeta=0italic_ζ = 0 and problem instance 𝑺subscript𝑺\boldsymbol{S}_{\mathcal{E}}bold_italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT is such that S(ρ1)S(ρ2)>S(ρKm)>0>S(ρKm+1)>S(ρK)subscript𝑆subscript𝜌1subscript𝑆subscript𝜌2subscript𝑆subscript𝜌𝐾𝑚0subscript𝑆subscript𝜌𝐾𝑚1subscript𝑆subscript𝜌𝐾S_{\mathcal{E}}(\rho_{1})\geq S_{\mathcal{E}}(\rho_{2})\ldots>S_{\mathcal{E}}(% \rho_{K-m})>0>S_{\mathcal{E}}(\rho_{K-m+1})\ldots>S_{\mathcal{E}}(\rho_{K})italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≥ italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) … > italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K - italic_m end_POSTSUBSCRIPT ) > 0 > italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K - italic_m + 1 end_POSTSUBSCRIPT ) … > italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT ( italic_ρ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ), with m𝑚mitalic_m being unknown. Let us consider the case that the event \mathcal{B}caligraphic_B described in Lemma 8 holds. As outlined in Algorithm 2, an arm i𝑖iitalic_i will be dropped if LCBi(t)>0subscriptLCB𝑖𝑡0\text{LCB}_{i}(t)>0LCB start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) > 0. That is,

S^i,Ni(t)U(Ni(t),δcεK)>0subscript^𝑆𝑖subscript𝑁𝑖𝑡𝑈subscript𝑁𝑖𝑡𝛿subscript𝑐𝜀𝐾0\displaystyle\hat{S}_{i,N_{i}(t)}-U\left(N_{i}(t),\frac{\delta}{c_{\varepsilon% }K}\right)>0over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - italic_U ( italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , divide start_ARG italic_δ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT italic_K end_ARG ) > 0
S^i,Ni(t)|S^i,Ni(t)Si|>0subscript^𝑆𝑖subscript𝑁𝑖𝑡subscript^𝑆𝑖subscript𝑁𝑖𝑡subscript𝑆𝑖0\displaystyle\hat{S}_{i,N_{i}(t)}-|\hat{S}_{i,N_{i}(t)}-S_{i}|>0over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - | over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i , italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_POSTSUBSCRIPT - italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | > 0
Si>0absentsubscript𝑆𝑖0\displaystyle\implies S_{i}>0⟹ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0

Thus, as long as event \mathcal{B}caligraphic_B holds, all the arms that have S<0subscript𝑆0S_{\mathcal{E}}<0italic_S start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT < 0 will not dropped. Thus the lil’HDoC algorithm identifies all the arms correctly. ∎