\NewEnviron

notforprint*\BODY \NewEnvirontrivial\BODY \NewEnvironignore* \NewEnvironskipproof \NewEnvironnoignore*\BODY \NewEnvironrevone*\BODY  \NewEnvironrevtwo*\BODY  \NewEnvironrevmisc*\BODY 

Beyond boundaries: Gary Lorden’s groundbreaking contributions to sequential analysis

\nameJay Bartroffa and Alexander G. Tartakovskyb CONTACT: Jay Bartroff. Email: [email protected] a University of Texas at Austin, Austin, Texas, USA; bAGT StatConsult, Los Angeles, California, USA
Abstract

Gary Lorden provided several fundamental and novel insights into sequential hypothesis testing and changepoint detection. In this article, we provide an overview of Lorden’s contributions in the context of existing results in those areas, and some extensions made possible by Lorden’s work. We also mention some of Lorden’s significant consulting work, including as an expert witness and for NASA, the entertainment industry, and Major League Baseball.

keywords:
Sequential hypothesis testing; changepoint detection; CUSUM; multihypothesis sequential tests; Caltech.

1 Introduction

The purpose of this article is to provide an overview of Gary Lorden’s significant contributions to the field of sequential analysis. But first, to give a sense of Lorden’s remarkable life and personality, in Sections 1.1-1.2 we give a brief biography of Lorden and highlight some of his extra-academic work.

Refer to caption
Figure 1: Counter-clockwise from top: Gary Lorden as an undergraduate at Caltech in 1959; lecturing in 1987; and lecturing circa 2010. Photos courtesy of Caltech.

1.1 Biographical Sketch

Gary Allen Lorden was born in Los Angeles, California, on June 10, 1941. Lorden entered Caltech as a freshman in 1958, and Lorden’s undergraduate contemporaries at Caltech included a number of others who would also go on to be notable statisticians including Larry Brown, Peter Bickel, Brad Efron, and Carl Morris. Lorden received a BS from Caltech in 1962 and a PhD from Cornell University in 1966 under the supervision of Jack Kiefer. After a faculty position at Northwestern University, Lorden rejoined Caltech in 1968, where he stayed until his retirement in 2009.

Beyond research and teaching, Lorden was known for his leadership roles at Caltech. He served as dean of students from 1984 to 1988, vice president for student affairs from 1989 to 1998, and acting vice president for student affairs in 2002. He was executive officer (Caltech’s version of department chair) for the mathematics department from 2003 to 2006.

Gary and his wife, Louise, were both accomplished pianists and enjoyed playing and singing duets for guests, often students, at their home in Pasadena. They also enjoyed showing students how well they danced together. Lorden also liked to act and regularly participated in plays put on at Caltech.

Lorden passed away on October 25, 2023 at the age of 82. He is survived by his children Lisa and Diana, and his wife Louise passed away in 2015.

1.2 Consulting, Hollywood, and Major League Baseball

Lorden was known for his creativity and generosity of ideas, so it is not surprising that throughout his career he was in demand as an academic collaborator and consultant, and much of his early work on changepoint detection arose from collaborations with researchers at the Jet Propulsion Laboratory (JPL) in Pasadena, California, which is managed for NASA by Lorden’s home institution of Caltech.

But Lorden was also an uncommonly effective, engaging, and entertaining communicator and this caused many outside academics to seek him out as a consultant too. Lorden would routinely be interviewed by reporters and appear on the evening news, explaining things like the odds of winning the latest lottery jackpot. Lorden also “moonlighted” as an expert witness in court cases involving statistics and math.

Eventually Hollywood came calling, and in 2005 Lorden was asked to be the math consultant for a (then) new CBS television show called NUMB3RS. In a bit of art imitating life, the show was about a Caltech professor who used math to help the FBI solve crimes, and Lorden worked with the show’s writers to accurately incorporate mathematical topics into the storylines. An aspect of the relationship that Lorden particulalry relished, that he related to us, was that the show’s creators initially considered setting the show at MIT (Caltech’s rival), but decided to change the venue after learning more about Caltech and Lorden. The show would go on to be a hit, running for 118 episodes over 6 seasons. If a viewer of the show knew something of Lorden’s work they could see some of his favorite topics (and many topics discussed below in this article) woven into the episodes’ plot lines including changepoint detection, hypothesis testing, Bayesian methods, gambling math, cryptography, and sports statistics among others. With Keith Devlin, Lorden wrote a popular general audience book (Devlin and Lorden 2007) on the mathematical topics appearing in the show. For example, Devlin and Lorden (2007, Chapter 4) explain the basic concept of changepoint detection:

“the determination that a definite change has occurred, as opposed to normal fluctuations,”

and goes on to discuss this method’s importance for quick response to potential bioterrorist attacks and for designing efficient algorithms to pinpoint various kinds of criminal activity, such as to detect an increase in crime rates in certain geographical areas and to track changes in financial transactions that could be criminal.

Another high-profile consulting project came in 2018 when Lorden, Bartroff, and others were chosen by the Commissioner of Major League Baseball (MLB) to be statisticians on a committee with physicists, engineers, and baseball experts studying MLB’s then-recent surge in home runs. The committee studied vast amounts of Statcast game data, as well as laboratory tests on the properties of the baseball that can affect home run production, including Lorden traveling to Costa Rica to inspect baseball manufacturer Rawlings’ production plant there. The committee made recommendations (Nathan et al., 2018) to MLB and Rawlings for future monitoring, testing, and storage of baseballs.

1.3 The Remainder of this Article

The remainder of this article is devoted to describing Lorden’s fundamental and far-reaching results in sequential hypothesis testing and changepoint detection, which we aim to present within the context of those areas. Beginning with hypothesis testing in Section 2, after describing the testing setup and optimality of the sequential probability ratio test (SPRT) in Section 2.1.1, Lorden’s findings on multi-parameter testing and their application to near-optimality of the multihypothesis SPRT are covered in Section 2.1.2. We then cover Lorden’s fundamental inequality for excess over the boundary in Section 2.2. Additionally, we explore Lorden’s contributions to the Keifer-Weiss problem of testing while minimizing the expected sample size at a parameter value between the hypotheses and other results stemming from his work (Section 2.3), optimal testing of composite hypotheses (Section 2.4), and optimal multistage testing (Section 2.5). In Section 3 we cover Lorden’s fundamental minimax changepoint detection theory and related advancements in the field.

2 Sequential Hypothesis Testing

In this section, we delve into Lorden’s contributions to hypothesis testing, encompassing the challenges of excess over the boundaries of random walks, the formulation of nearly optimal multi-decision sequential rules (including multihypothesis tests and multistage tests), and the modified Keifer–Weiss problem. These topics were extensively explored in the seminal Lorden’s papers (Lorden 1967, 1970, 1976, 1977a, 1972, 1973, 1980, 1983).

2.1 Multihypothesis Testing

2.1.1 The General Multihypothesis Testing Problem

One of Lorden’s major fundamental contributions is the proposal of a multihypothesis sequential test that achieves third-order asymptotic optimality in the i.i.d. case, akin to Wald’s SPRT when error probabilities are small. Further details will be provided in Subsection 2.1.2. Additionally, extensions to a more general non-i.i.d. case, where observations may exhibit dependency and non-identical distributions, have been explored in Lai (1981); Tartakovsky (1998, 2020, 2024); Tartakovsky, Nikiforov, and Basseville (2015).

We begin with formulating the following multihypothesis testing problem as addressed by Lorden (1977a). Let (Ω,,n,𝖯)Ωsubscript𝑛𝖯(\Omega,{\mathscr{F}},{\mathscr{F}}_{n},{\mathsf{P}})( roman_Ω , script_F , script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , sansserif_P ), n+={0,1,2,}𝑛subscript012n\in\mathbb{Z}_{+}=\{0,1,2,\ldots\}italic_n ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = { 0 , 1 , 2 , … }, be a filtered probability space, where the sub-σ𝜎\sigmaitalic_σ-algebra n=σ(𝐗n)subscript𝑛𝜎superscript𝐗𝑛{\mathscr{F}}_{n}=\sigma({\mathbf{X}}^{n})script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_σ ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) of {\mathscr{F}}script_F is generated by the sequence of random variables 𝐗n={Xt, 1tn}superscript𝐗𝑛subscript𝑋𝑡1𝑡𝑛{\mathbf{X}}^{n}=\{X_{t},\>1\leqslant t\leqslant n\}bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = { italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 1 ⩽ italic_t ⩽ italic_n } observed up to time n𝑛nitalic_n, which is defined on the space (Ω,)Ω(\Omega,{\mathscr{F}})( roman_Ω , script_F ) (0subscript0{\mathscr{F}}_{0}script_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is trivial). The focus lies on the N𝑁Nitalic_N-decision problem of testing the hypotheses i:𝖯=𝖯i:subscript𝑖𝖯subscript𝖯𝑖\operatorname{\mathcal{H}}_{i}:~{}{\mathsf{P}}={\mathsf{P}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : sansserif_P = sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,,N𝑖1𝑁i=1,\dots,Nitalic_i = 1 , … , italic_N, where 𝖯1,,𝖯Nsubscript𝖯1subscript𝖯𝑁{\mathsf{P}}_{1},\dots,{\mathsf{P}}_{N}sansserif_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , sansserif_P start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT are given probability measures assumed to be locally mutually absolutely continuous, i.e., their restrictions 𝖯in=𝖯|nsuperscriptsubscript𝖯𝑖𝑛evaluated-at𝖯subscript𝑛{\mathsf{P}}_{i}^{n}={\mathsf{P}}|_{{\mathscr{F}}_{n}}sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = sansserif_P | start_POSTSUBSCRIPT script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝖯jn=𝖯j|nsuperscriptsubscript𝖯𝑗𝑛evaluated-atsubscript𝖯𝑗subscript𝑛{\mathsf{P}}_{j}^{n}={\mathsf{P}}_{j}|_{{\mathscr{F}}_{n}}sansserif_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = sansserif_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | start_POSTSUBSCRIPT script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT to nsubscript𝑛{\mathscr{F}}_{n}script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are equivalent for all 1n<1𝑛1\leqslant n<\infty1 ⩽ italic_n < ∞ and all i,j=1,,Nformulae-sequence𝑖𝑗1𝑁i,j=1,\dots,Nitalic_i , italic_j = 1 , … , italic_N, ij𝑖𝑗i\neq jitalic_i ≠ italic_j. Let 𝖰nsuperscript𝖰𝑛{\mathsf{Q}}^{n}sansserif_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be a restriction to nsubscript𝑛{\mathscr{F}}_{n}script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT of a non-degenerate σ𝜎\sigmaitalic_σ-finite measure Q𝑄Qitalic_Q on (Ω,)Ω(\Omega,{\mathscr{F}})( roman_Ω , script_F ).

Assume that the observed random variables X1,X2,subscript𝑋1subscript𝑋2X_{1},X_{2},\dotsitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … are independent and identically distributed (i.i.d.), so that under 𝖯isubscript𝖯𝑖{\mathsf{P}}_{i}sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT the sample 𝐗n=(X1,,Xn)superscript𝐗𝑛subscript𝑋1subscript𝑋𝑛{\mathbf{X}}^{n}=(X_{1},\dots,X_{n})bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) has joint density pi(𝐗n)subscript𝑝𝑖superscript𝐗𝑛p_{i}({\mathbf{X}}^{n})italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) with respect to the dominating measure 𝖰nsuperscript𝖰𝑛{\mathsf{Q}}^{n}sansserif_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT for all n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, which can be expressed as

pi(𝐗n)=t=1nfi(Xt),i=1,,N,formulae-sequencesubscript𝑝𝑖superscript𝐗𝑛superscriptsubscriptproduct𝑡1𝑛subscript𝑓𝑖subscript𝑋𝑡𝑖1𝑁p_{i}({\mathbf{X}}^{n})=\prod_{t=1}^{n}f_{i}(X_{t}),\quad i=1,\dots,N,italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , italic_i = 1 , … , italic_N , (1)

where fi(Xt)subscript𝑓𝑖subscript𝑋𝑡f_{i}(X_{t})italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) represents the respective density for Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT under the hypothesis isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Hence, the interest lies in determining which of N𝑁Nitalic_N given densities f1,,fNsubscript𝑓1subscript𝑓𝑁f_{1},\dots,f_{N}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT is true.

A multihypothesis sequential test is a pair δ=(T,d)𝛿𝑇𝑑\delta=(T,d)italic_δ = ( italic_T , italic_d ), where T𝑇Titalic_T is a stop** time with respect to the filtration {n}n+subscriptsubscript𝑛𝑛subscript\{{\mathscr{F}}_{n}\}_{n\in\mathbb{Z}_{+}}{ script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_POSTSUBSCRIPT and d=d(𝐗T)𝑑𝑑superscript𝐗𝑇d=d({\mathbf{X}}^{T})italic_d = italic_d ( bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) is an Tsubscript𝑇{\mathscr{F}}_{T}script_F start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT-measurable terminal decision function with values in the set {1,,N}1𝑁\{1,\dots,N\}{ 1 , … , italic_N }. Specifically, d=i𝑑𝑖d=iitalic_d = italic_i means that the hypothesis isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is accepted upon stop**, {d=i}={T<,δaccepts i}𝑑𝑖𝑇𝛿accepts i\left\{d=i\right\}=\left\{T<\infty,~{}\delta~{}\text{accepts~{}$\operatorname{% \mathcal{H}}_{i}$}\right\}{ italic_d = italic_i } = { italic_T < ∞ , italic_δ accepts caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }. Let αij(δ)=𝖯i(d=j)subscript𝛼𝑖𝑗𝛿subscript𝖯𝑖𝑑𝑗\alpha_{ij}(\delta)={\mathsf{P}}_{i}(d=j)italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_δ ) = sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d = italic_j ), ij𝑖𝑗i\neq jitalic_i ≠ italic_j, i,j=1,,Nformulae-sequence𝑖𝑗1𝑁i,j=1,\dots,Nitalic_i , italic_j = 1 , … , italic_N, denote the error probabilities of the test δ𝛿\deltaitalic_δ, i.e., the probabilities of accepting the hypothesis jsubscript𝑗\operatorname{\mathcal{H}}_{j}caligraphic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT when isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is true.

Introduce the class of tests with probabilities of errors αij(δ)subscript𝛼𝑖𝑗𝛿\alpha_{ij}(\delta)italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_δ ) that do not exceed the prespecified numbers 0<αij<10subscript𝛼𝑖𝑗10<\alpha_{ij}<10 < italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT < 1:

(𝜶)={δ:αij(δ)αijfori,j=1,,N,ij},𝜶conditional-set𝛿formulae-sequencesubscript𝛼𝑖𝑗𝛿subscript𝛼𝑖𝑗for𝑖formulae-sequence𝑗1𝑁𝑖𝑗{\mathbb{C}}({\bm{\alpha}})=\left\{\delta:\alpha_{ij}(\delta)\leqslant\alpha_{% ij}~{}\text{for}~{}i,j=1,\dots,N,\,i\neq j\right\},blackboard_C ( bold_italic_α ) = { italic_δ : italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_δ ) ⩽ italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT for italic_i , italic_j = 1 , … , italic_N , italic_i ≠ italic_j } , (2)

where 𝜶=(αij)𝜶subscript𝛼𝑖𝑗{\bm{\alpha}}=(\alpha_{ij})bold_italic_α = ( italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) is a matrix of given error probabilities that are positive numbers less than 1111 (the diagonal entries αiisubscript𝛼𝑖𝑖\alpha_{ii}italic_α start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT are immaterial).

Let 𝖤isubscript𝖤𝑖{\mathsf{E}}_{i}sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote the expectation under the hypothesis isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (i.e., under the measure 𝖯isubscript𝖯𝑖{\mathsf{P}}_{i}sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT). The objective is to discover a sequential test that would minimize the expected sample sizes 𝖤i[T]subscript𝖤𝑖delimited-[]𝑇{\mathsf{E}}_{i}[T]sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T ] for all hypotheses isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,,N𝑖1𝑁i=1,\dots,Nitalic_i = 1 , … , italic_N, at least approximately.

For n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N, define the likelihood ratio (LR) and the log-likelihood ratio (LLR) processes between the hypotheses isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and jsubscript𝑗\operatorname{\mathcal{H}}_{j}caligraphic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT as

Λij(n)subscriptΛ𝑖𝑗𝑛\displaystyle\Lambda_{ij}(n)roman_Λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_n ) =d𝖯ind𝖯jn(𝐗n)=pi(𝐗n)pj(𝐗n)=t=1nfi(Xt)fj(Xt),absentdsuperscriptsubscript𝖯𝑖𝑛dsuperscriptsubscript𝖯𝑗𝑛superscript𝐗𝑛subscript𝑝𝑖superscript𝐗𝑛subscript𝑝𝑗superscript𝐗𝑛superscriptsubscriptproduct𝑡1𝑛subscript𝑓𝑖subscript𝑋𝑡subscript𝑓𝑗subscript𝑋𝑡\displaystyle=\frac{{\mathrm{d}}{\mathsf{P}}_{i}^{n}}{{\mathrm{d}}{\mathsf{P}}% _{j}^{n}}({\mathbf{X}}^{n})=\frac{p_{i}({\mathbf{X}}^{n})}{p_{j}({\mathbf{X}}^% {n})}=\prod_{t=1}^{n}\frac{f_{i}(X_{t})}{f_{j}(X_{t})},= divide start_ARG roman_d sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG start_ARG roman_d sansserif_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = divide start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG = ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG ,
λij(n)subscript𝜆𝑖𝑗𝑛\displaystyle\lambda_{ij}(n)italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_n ) =logΛij(n)=t=1nlog[fi(Xt)fj(Xt)].absentsubscriptΛ𝑖𝑗𝑛superscriptsubscript𝑡1𝑛subscript𝑓𝑖subscript𝑋𝑡subscript𝑓𝑗subscript𝑋𝑡\displaystyle=\log\Lambda_{ij}(n)=\sum_{t=1}^{n}\log\left[\frac{f_{i}(X_{t})}{% f_{j}(X_{t})}\right].= roman_log roman_Λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_n ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log [ divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG ] .

In a particular case of two hypotheses 1subscript1\operatorname{\mathcal{H}}_{1}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\operatorname{\mathcal{H}}_{2}caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (N=2𝑁2N=2italic_N = 2), Wald (1945, 1947) introduced the Sequential Probability Ratio Test (SPRT). Let Zt=log[f1(Xt)/f2(Xt)]subscript𝑍𝑡subscript𝑓1subscript𝑋𝑡subscript𝑓2subscript𝑋𝑡Z_{t}=\log[f_{1}(X_{t})/f_{2}(X_{t})]italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_log [ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] be the LLR for the observation Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, so the LLR for the sample 𝐗nsuperscript𝐗𝑛{\mathbf{X}}^{n}bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the sum

λ12(n)=λn=t=1nZt,n=1,2,formulae-sequencesubscript𝜆12𝑛subscript𝜆𝑛superscriptsubscript𝑡1𝑛subscript𝑍𝑡𝑛12\lambda_{12}(n)=\lambda_{n}=\sum_{t=1}^{n}Z_{t},\quad n=1,2,\dotsitalic_λ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ( italic_n ) = italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_n = 1 , 2 , … (3)

Letting a0<0subscript𝑎00a_{0}<0italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < 0 and a1>0subscript𝑎10a_{1}>0italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 be thresholds, Wald’s SPRT δ(a0,a1)=(T,d)subscript𝛿subscript𝑎0subscript𝑎1subscript𝑇subscript𝑑\delta_{*}(a_{0},a_{1})=(T_{*},d_{*})italic_δ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = ( italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) is

T(a0,a1)=inf{n1:λn(a0,a1)},d(a0,a1)={1ifλTa12ifλTa0.formulae-sequencesubscript𝑇subscript𝑎0subscript𝑎1infimumconditional-set𝑛1subscript𝜆𝑛subscript𝑎0subscript𝑎1subscript𝑑subscript𝑎0subscript𝑎1cases1ifsubscript𝜆subscript𝑇subscript𝑎12ifsubscript𝜆subscript𝑇subscript𝑎0T_{*}(a_{0},a_{1})=\inf\left\{n\geqslant 1:\lambda_{n}\notin(a_{0},a_{1})% \right\},\quad d_{*}(a_{0},a_{1})=\begin{cases}1&\text{if}~{}~{}\lambda_{T_{*}% }\geqslant a_{1}\\ 2&\text{if}~{}~{}\lambda_{T_{*}}\leqslant a_{0}.\end{cases}italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = roman_inf { italic_n ⩾ 1 : italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∉ ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) } , italic_d start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = { start_ROW start_CELL 1 end_CELL start_CELL if italic_λ start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⩾ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 2 end_CELL start_CELL if italic_λ start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⩽ italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . end_CELL end_ROW (4)

In the case of two hypotheses, the class of tests (2) is defined as

(α0,α1)={δ:α0(δ)α0and α1(δ)α1},subscript𝛼0subscript𝛼1conditional-set𝛿subscript𝛼0𝛿subscript𝛼0and subscript𝛼1𝛿subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})=\left\{\delta:\alpha_{0}(\delta)\leqslant% \alpha_{0}~{}\text{and }~{}\alpha_{1}(\delta)\leqslant\alpha_{1}\right\},blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = { italic_δ : italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_δ ) ⩽ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_δ ) ⩽ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } ,

where α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT represents the upper bound on the Type I error (false positive) probability α0(δ)=α12(δ)subscript𝛼0𝛿subscript𝛼12𝛿\alpha_{0}(\delta)=\alpha_{12}(\delta)italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_δ ) = italic_α start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ( italic_δ ), and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT represents the upper bound on the Type II error (false negative) probability α1(δ)=α21(δ)subscript𝛼1𝛿subscript𝛼21𝛿\alpha_{1}(\delta)=\alpha_{21}(\delta)italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_δ ) = italic_α start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ( italic_δ ).

Wald’s SPRT possesses an extraordinary optimality property: it minimizes both the expected sample sizes 𝖤1[T]subscript𝖤1delimited-[]𝑇{\mathsf{E}}_{1}[T]sansserif_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ italic_T ] and 𝖤2[T]subscript𝖤2delimited-[]𝑇{\mathsf{E}}_{2}[T]sansserif_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_T ] within the class of sequential (and non-sequential) tests (α0,α1)subscript𝛼0subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) as long as the observations are i.i.d. under both hypotheses. In other words: 𝖤i[T]=infδ(α0,α1)𝖤i[T]subscript𝖤𝑖delimited-[]subscript𝑇subscriptinfimum𝛿subscript𝛼0subscript𝛼1subscript𝖤𝑖delimited-[]𝑇{\mathsf{E}}_{i}[T_{*}]=\inf_{\delta\in{\mathbb{C}}(\alpha_{0},\alpha_{1})}{% \mathsf{E}}_{i}[T]sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ] = roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T ] for i=0,1𝑖01i=0,1italic_i = 0 , 1, as established by Wald and Wolfowitz (1948) through a Bayesian approach.

Lai (1981) proved that the SPRT is also first-order asymptotically optimal as max(α0,α1)0subscript𝛼0subscript𝛼10\max(\alpha_{0},\alpha_{1})\to 0roman_max ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → 0 for general non-i.i.d. models with dependent and non-identically distributed observations when the normalized log-likelihood ratio n1λnsuperscript𝑛1subscript𝜆𝑛n^{-1}\lambda_{n}italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT converges 1111-quickly to finite numbers Iisubscript𝐼𝑖I_{i}italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT under 𝖯isubscript𝖯𝑖{\mathsf{P}}_{i}sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

The central idea of Lorden’s investigation, elaborated in detail in Section 2.1.2, is that, similar to how the SPRT is strictly optimal in the class (α0,α1)subscript𝛼0subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) for any error probabilities α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, combinations of SPRTs exhibit third-order asymptotic optimality for multihypothesis testing problems involving any finite number of densities when probabilities of errors are small.

2.1.2 Near Optimality of the Multihypothesis SPRT

The problem of sequentially testing many hypotheses is substantially more complex than that of testing two hypotheses. Identifying an optimal test in the class (2) that minimizes expected sample sizes for all hypotheses 1,,Nsubscript1subscript𝑁\operatorname{\mathcal{H}}_{1},\dots,\operatorname{\mathcal{H}}_{N}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , caligraphic_H start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, is daunting. Hence, a significant portion of the development of sequential multihypothesis testing in the 20th century has focused on the exploration of certain combinations of one-sided SPRTs. See Armitage (1950); Chernoff (1959); Kiefer and Sacks (1963); Lorden (1967, 1977a).

The results of Lorden’s ingenious paper Lorden (1977a) are of fundamental importance as they establish third-order asymptotic optimality of the accepting multihypothesis test that he proposed. More specifically, Lorden established that just as the SPRT is optimal in the class (α0,α1)subscript𝛼0subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) for testing two hypotheses, certain combinations of one-sided SPRTs are nearly optimal in a third-order sense in the class (𝜶)𝜶{\mathbb{C}}({\bm{\alpha}})blackboard_C ( bold_italic_α ), i.e., subject to error probability constraints expected sample sizes are minimized to within the negligible additive o(1)𝑜1o(1)italic_o ( 1 ) term:

infδ(𝜶)𝖤i[T]=𝖤i[T]+o(1)asαmax0for alli=1,,N,formulae-sequencesubscriptinfimum𝛿𝜶subscript𝖤𝑖delimited-[]𝑇subscript𝖤𝑖delimited-[]subscript𝑇𝑜1formulae-sequenceassubscript𝛼max0for all𝑖1𝑁\inf_{\delta\in{\mathbb{C}}({\bm{\alpha}})}{\mathsf{E}}_{i}[T]={\mathsf{E}}_{i% }[T_{*}]+o(1)\quad\text{as}~{}{\alpha_{\rm max}}\to 0\quad\text{for all}~{}i=1% ,\dots,N,roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C ( bold_italic_α ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T ] = sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ] + italic_o ( 1 ) as italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT → 0 for all italic_i = 1 , … , italic_N , (5)

where αmax=max1i,jN,ijαijsubscript𝛼maxsubscriptformulae-sequence1𝑖formulae-sequence𝑗𝑁𝑖𝑗subscript𝛼𝑖𝑗{\alpha_{\rm max}}=\max_{1\leqslant i,j\leqslant N,i\neq j}\alpha_{ij}italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT 1 ⩽ italic_i , italic_j ⩽ italic_N , italic_i ≠ italic_j end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and Tsubscript𝑇T_{*}italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT is the stop** time of the multihypothesis test δsubscript𝛿\delta_{*}italic_δ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT, which is defined below.

We now define a test proposed by Lorden, which we will refer to as the accepting Matrix SPRT. Write 𝒩={1,,N}𝒩1𝑁{\mathscr{N}}=\{1,\dots,N\}script_N = { 1 , … , italic_N }. For a threshold matrix 𝐀=(Aij)i,j𝒩𝐀subscriptsubscript𝐴𝑖𝑗𝑖𝑗𝒩\mathbf{A}=(A_{ij})_{i,j\in{\mathscr{N}}}bold_A = ( italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i , italic_j ∈ script_N end_POSTSUBSCRIPT, with Aij>0subscript𝐴𝑖𝑗0A_{ij}>0italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT > 0 for ij𝑖𝑗i\neq jitalic_i ≠ italic_j and the Aiisubscript𝐴𝑖𝑖A_{ii}italic_A start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT are immaterial (00, say), define the Matrix SPRT (MSPRT) δ=(T,d)subscript𝛿subscript𝑇subscript𝑑\delta_{*}=(T_{*},d_{*})italic_δ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = ( italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ), built on one-sided SPRTs between the hypotheses isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and jsubscript𝑗\operatorname{\mathcal{H}}_{j}caligraphic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, as follows:

Stop at the first n1 such that, for some i,Λij(n)Ajifor all ji,Stop at the first n1 such that, for some isubscriptΛ𝑖𝑗𝑛subscript𝐴𝑗𝑖for all ji\text{Stop at the first $n\geqslant 1$ such that, for some $i$},~{}\Lambda_{ij% }(n)\geqslant A_{ji}~{}\text{for all $j\neq i$},Stop at the first italic_n ⩾ 1 such that, for some italic_i , roman_Λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_n ) ⩾ italic_A start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT for all italic_j ≠ italic_i , (6)

and accept the unique isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that satisfies these inequalities. Note that for N=2𝑁2N=2italic_N = 2 the MSPRT coincides with Wald’s SPRT.

Let aji=logAjisubscript𝑎𝑗𝑖subscript𝐴𝑗𝑖a_{ji}=\log A_{ji}italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT = roman_log italic_A start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT. Introducing the Markov accepting times for the hypotheses isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as

Ti=inf{n1:minji1jN[λij(n)aji]0},i=1,,N,formulae-sequencesubscript𝑇𝑖infimumconditional-set𝑛1subscriptsuperscript𝑗𝑖1𝑗𝑁subscript𝜆𝑖𝑗𝑛subscript𝑎𝑗𝑖0𝑖1𝑁T_{i}=\inf\left\{n\geqslant 1:\min_{\stackrel{{\scriptstyle 1\leqslant j% \leqslant N}}{{j\neq i}}}\left[\lambda_{ij}(n)-a_{ji}\right]\geqslant 0\right% \},\quad i=1,\dots,N,italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_inf { italic_n ⩾ 1 : roman_min start_POSTSUBSCRIPT start_RELOP SUPERSCRIPTOP start_ARG italic_j ≠ italic_i end_ARG start_ARG 1 ⩽ italic_j ⩽ italic_N end_ARG end_RELOP end_POSTSUBSCRIPT [ italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_n ) - italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ] ⩾ 0 } , italic_i = 1 , … , italic_N , (7)

the test in (6) can also be written in the following form:

T=min1jNTj,d=iifT=Ti.formulae-sequencesubscript𝑇subscript1𝑗𝑁subscript𝑇𝑗formulae-sequencesubscript𝑑𝑖ifsubscript𝑇subscript𝑇𝑖T_{*}=\min_{1\leqslant j\leqslant N}T_{j},\qquad d_{*}=i\quad\mbox{if}\quad T_% {*}=T_{i}.italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT 1 ⩽ italic_j ⩽ italic_N end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = italic_i if italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (8)

Thus, in the MSPRT, each component SPRT is extended until, for some i𝒩𝑖𝒩i\in{\mathscr{N}}italic_i ∈ script_N, all N1𝑁1N-1italic_N - 1 SPRTs involving isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT accept isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

The MSPRT is not strictly optimal for N>2𝑁2N>2italic_N > 2 but it is a good approximation to the optimal multihypothesis test. Under certain conditions and with some choice of the threshold matrix 𝐀𝐀\mathbf{A}bold_A, it minimizes the expected sample sizes 𝖤i[T]subscript𝖤𝑖delimited-[]𝑇{\mathsf{E}}_{i}[T]sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T ] for all i=1,,N𝑖1𝑁i=1,\dots,Nitalic_i = 1 , … , italic_N to within a vanishing o(1)𝑜1o(1)italic_o ( 1 ) term for small error probabilities; see (5).

Consider first the first-order asymptotic criterion: Find a multihypothesis test δ(𝜶)=(d(𝜶),T(𝜶))subscript𝛿𝜶subscript𝑑𝜶subscript𝑇𝜶\delta_{*}({\bm{\alpha}})=(d_{*}({\bm{\alpha}}),T_{*}({\bm{\alpha}}))italic_δ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_α ) = ( italic_d start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_α ) , italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_α ) ) such that

limαmax0infδ(𝜶)𝖤i[T]𝖤i[T(𝜶)]=1for alli=1,,N.formulae-sequencesubscriptsubscript𝛼max0subscriptinfimum𝛿𝜶subscript𝖤𝑖delimited-[]𝑇subscript𝖤𝑖delimited-[]subscript𝑇𝜶1for all𝑖1𝑁\lim_{{\alpha_{\rm max}}\to 0}\frac{\inf_{\delta\in{\mathbb{C}}({\bm{\alpha}})% }{\mathsf{E}}_{i}[T]}{{\mathsf{E}}_{i}[T_{*}({\bm{\alpha}})]}=1\quad\text{for % all}~{}i=1,\dots,N.roman_lim start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT → 0 end_POSTSUBSCRIPT divide start_ARG roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C ( bold_italic_α ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T ] end_ARG start_ARG sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_α ) ] end_ARG = 1 for all italic_i = 1 , … , italic_N . (9)

Using Wald’s likelihood ratio identity, it is easily shown that αij(δ)exp(aij)subscript𝛼𝑖𝑗subscript𝛿subscript𝑎𝑖𝑗\alpha_{ij}(\delta_{*})\leqslant\exp(-a_{ij})italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ⩽ roman_exp ( - italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) for i,j=1,,Nformulae-sequence𝑖𝑗1𝑁i,j=1,\dots,Nitalic_i , italic_j = 1 , … , italic_N, ij𝑖𝑗i\neq jitalic_i ≠ italic_j, so selecting aji=|logαji|subscript𝑎𝑗𝑖subscript𝛼𝑗𝑖a_{ji}=|\log\alpha_{ji}|italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT = | roman_log italic_α start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT | implies δ(𝜶)subscript𝛿𝜶\delta_{*}\in{\mathbb{C}}({\bm{\alpha}})italic_δ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ blackboard_C ( bold_italic_α ). These inequalities are similar to Wald’s in the binary hypothesis case and are very imprecise. Using Wald’s approach it is rather easy to prove that the MSPRT with boundaries aji=|logαji|subscript𝑎𝑗𝑖subscript𝛼𝑗𝑖a_{ji}=|\log\alpha_{ji}|italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT = | roman_log italic_α start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT | is first-order asymptotically optimal, minimizing expected sample sizes as long as the Kullback-Leibler information numbers Iij=𝖤i[λij(1)]subscript𝐼𝑖𝑗subscript𝖤𝑖delimited-[]subscript𝜆𝑖𝑗1I_{ij}={\mathsf{E}}_{i}[\lambda_{ij}(1)]italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( 1 ) ] are positive and finite; see Tartakovsky, Nikiforov, and Basseville (2015, Section 4.3.1).

In his ingenious paper, Lorden (1977a) substantially improved this result showing that with a sophisticated design that includes accurate estimation of thresholds accounting for overshoots, the MSPRT is nearly optimal in the third-order sense (5).

Specifically, assume the second-moment condition

𝖤i[λij(1)]2<,i,j=1,,Nformulae-sequencesubscript𝖤𝑖superscriptdelimited-[]subscript𝜆𝑖𝑗12𝑖𝑗1𝑁{\mathsf{E}}_{i}[\lambda_{ij}(1)]^{2}<\infty,\quad i,j=1,\dots,Nsansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( 1 ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < ∞ , italic_i , italic_j = 1 , … , italic_N (10)

and define the numbers

ij=exp{n=11n[𝖯j(λij(n)>0)+𝖯i(λij(n)0)]},i,j=1,,N.formulae-sequencesubscript𝑖𝑗superscriptsubscript𝑛11𝑛delimited-[]subscript𝖯𝑗subscript𝜆𝑖𝑗𝑛0subscript𝖯𝑖subscript𝜆𝑖𝑗𝑛0𝑖𝑗1𝑁\mathcal{L}_{ij}=\exp\left\{-\sum_{n=1}^{\infty}\frac{1}{n}[{\mathsf{P}}_{j}(% \lambda_{ij}(n)>0)+{\mathsf{P}}_{i}(\lambda_{ij}(n)\leqslant 0)]\right\},\quad i% ,j=1,\dots,N.caligraphic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = roman_exp { - ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG [ sansserif_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_n ) > 0 ) + sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_n ) ⩽ 0 ) ] } , italic_i , italic_j = 1 , … , italic_N . (11)

These numbers are symmetric, ij=jisubscript𝑖𝑗subscript𝑗𝑖\mathcal{L}_{ij}=\mathcal{L}_{ji}caligraphic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = caligraphic_L start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT, and 0<ij10subscript𝑖𝑗10<\mathcal{L}_{ij}\leqslant 10 < caligraphic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⩽ 1 (ii1subscript𝑖𝑖1\mathcal{L}_{ii}\equiv 1caligraphic_L start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT ≡ 1). Furthermore, ij=1subscript𝑖𝑗1\mathcal{L}_{ij}=1caligraphic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 only if the measures 𝖯insuperscriptsubscript𝖯𝑖𝑛{\mathsf{P}}_{i}^{n}sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝖯jnsuperscriptsubscript𝖯𝑗𝑛{\mathsf{P}}_{j}^{n}sansserif_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are singular so that the absolute continuity assumption is violated.

For i,j𝒩𝑖𝑗𝒩i,j\in{\mathscr{N}}italic_i , italic_j ∈ script_N (ij𝑖𝑗i\neq jitalic_i ≠ italic_j) and a>0𝑎0a>0italic_a > 0, define one-sided SPRTs

τij(a)=inf{n0:λij(n)a}.subscript𝜏𝑖𝑗𝑎infimumconditional-set𝑛0subscript𝜆𝑖𝑗𝑛𝑎\tau_{ij}(a)=\inf\left\{n\geqslant 0:\lambda_{ij}(n)\geqslant a\right\}.italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_a ) = roman_inf { italic_n ⩾ 0 : italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_n ) ⩾ italic_a } . (12)

Using a renewal-theoretic argument, the numbers ijsubscript𝑖𝑗\mathcal{L}_{ij}caligraphic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT are tightly related to the overshoots in the one-sided tests. If the LLR λij(1)subscript𝜆𝑖𝑗1\lambda_{ij}(1)italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( 1 ) is non-arithmetic under isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, then

ij=ζijIij,ζij=lima𝖤i{exp[(λij(τij(a))a)]}formulae-sequencesubscript𝑖𝑗subscript𝜁𝑖𝑗subscript𝐼𝑖𝑗subscript𝜁𝑖𝑗subscript𝑎subscript𝖤𝑖subscript𝜆𝑖𝑗subscript𝜏𝑖𝑗𝑎𝑎\mathcal{L}_{ij}=\zeta_{ij}I_{ij},\quad\zeta_{ij}=\lim_{a\to\infty}{\mathsf{E}% }_{i}\left\{\exp\left[-(\lambda_{ij}(\tau_{ij}(a))-a)\right]\right\}caligraphic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_ζ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT , italic_ζ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = roman_lim start_POSTSUBSCRIPT italic_a → ∞ end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT { roman_exp [ - ( italic_λ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_a ) ) - italic_a ) ] } (13)

(see, e.g., Theorem 3.1.3 in Tartakovsky, Nikiforov, and Basseville (2015)).

It turns out that the \mathcal{L}caligraphic_L-numbers play a significant role both in the Bayes and the frequentist frameworks. They facilitate the adjustment of boundaries necessary to achieve optimality.

Consider the Bayes multihypothesis problem with the prior distribution of hypotheses 𝝅=(π0(1),,π0(N))𝝅subscript𝜋01subscript𝜋0𝑁{\bm{\pi}}=(\pi_{0}(1),\dots,\pi_{0}(N))bold_italic_π = ( italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 ) , … , italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_N ) ), where π0(i)=𝖯(i)subscript𝜋0𝑖𝖯subscript𝑖\pi_{0}(i)={\mathsf{P}}(\operatorname{\mathcal{H}}_{i})italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_i ) = sansserif_P ( caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and the loss incurred when stop** at time T=n𝑇𝑛T=nitalic_T = italic_n and making the decision d=j𝑑𝑗d=jitalic_d = italic_j while the hypothesis isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is true is Ln(i,d=j,𝐗n)=Lij+cnL_{n}(\operatorname{\mathcal{H}}_{i},d=j,{\mathbf{X}}^{n})=L_{ij}+cnitalic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_d = italic_j , bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT + italic_c italic_n, where c>0𝑐0c>0italic_c > 0 is the cost of making one observation or sampling cost and where 0<Lij<0subscript𝐿𝑖𝑗0<L_{ij}<\infty0 < italic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT < ∞ for ij𝑖𝑗i\neq jitalic_i ≠ italic_j and 0 if i=j𝑖𝑗i=jitalic_i = italic_j.

The average (integrated) risk of the test δ=(T,d)𝛿𝑇𝑑\delta=(T,d)italic_δ = ( italic_T , italic_d ) is

ρcπ(δ)=i=1Nπ0(i)[j=1NLij𝖯i(d=j)+c𝖤i[T]].superscriptsubscript𝜌𝑐𝜋𝛿superscriptsubscript𝑖1𝑁subscript𝜋0𝑖delimited-[]superscriptsubscript𝑗1𝑁subscript𝐿𝑖𝑗subscript𝖯𝑖𝑑𝑗𝑐subscript𝖤𝑖delimited-[]𝑇\rho_{c}^{\pi}(\delta)=\sum_{i=1}^{N}\pi_{0}(i)\left[\sum_{j=1}^{N}L_{ij}{% \mathsf{P}}_{i}(d=j)+c\,{\mathsf{E}}_{i}[T]\right].italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_δ ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_i ) [ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d = italic_j ) + italic_c sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T ] ] .

It follows from Theorem 1 of Lorden (1977a) that, as c0𝑐0c\to 0italic_c → 0, the MSPRT δsubscript𝛿\delta_{*}italic_δ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT defined in (6) with the thresholds Aji(c)=(π0(j)/π0(i))Ljiij/csubscript𝐴𝑗𝑖𝑐subscript𝜋0𝑗subscript𝜋0𝑖subscript𝐿𝑗𝑖subscript𝑖𝑗𝑐A_{ji}(c)=(\pi_{0}(j)/\pi_{0}(i))L_{ji}\mathcal{L}_{ij}/citalic_A start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ( italic_c ) = ( italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_j ) / italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_i ) ) italic_L start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT / italic_c is asymptotically third-order optimal (i.e., to within o(c)𝑜𝑐o(c)italic_o ( italic_c )) under the second moment condition (10):

ρcπ(δ)=infδρcπ(δ)+o(c)asc0,formulae-sequencesuperscriptsubscript𝜌𝑐𝜋superscript𝛿subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝜋𝛿𝑜𝑐as𝑐0\rho_{c}^{\pi}(\delta^{*})=\inf_{\delta}~{}\rho_{c}^{\pi}(\delta)+o(c)\quad% \text{as}~{}~{}c\to 0,italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_π end_POSTSUPERSCRIPT ( italic_δ ) + italic_o ( italic_c ) as italic_c → 0 ,

where infimum is taken over all sequential or non-sequential tests.

Using this Bayes asymptotic optimality result, it can be proven that the MSPRT is also nearly optimal to within o(1)𝑜1o(1)italic_o ( 1 ) with respect to the expected sample sizes 𝖤i[T]subscript𝖤𝑖delimited-[]𝑇{\mathsf{E}}_{i}[T]sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T ] for all hypotheses among all tests with constrained error probabilities. In other words, the MSPRT has an asymptotic property similar to the exact optimality of the SPRT for two hypotheses. This result is more practical than the above Bayes optimality.

The following theorem provides detailed specifications, resembling Theorem 4 and its corollary in Lorden (1977a). Recall that αij(δ)=𝖯i(d=j)subscript𝛼𝑖𝑗𝛿subscript𝖯𝑖𝑑𝑗\alpha_{ij}(\delta)={\mathsf{P}}_{i}(d=j)italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_δ ) = sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d = italic_j ) represents the probability to erroneously accept the hypothesis jsubscript𝑗\operatorname{\mathcal{H}}_{j}caligraphic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT when isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is true. In addition, denote as α~i(δ)=𝖯i(di)subscript~𝛼𝑖𝛿subscript𝖯𝑖𝑑𝑖\tilde{\alpha}_{i}(\delta)={\mathsf{P}}_{i}(d\neq i)over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_δ ) = sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d ≠ italic_i ) the probability of erroneously rejecting isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT when it is true, and βj(δ)=i=1Nwij𝖯i(d=j)subscript𝛽𝑗𝛿superscriptsubscript𝑖1𝑁subscript𝑤𝑖𝑗subscript𝖯𝑖𝑑𝑗\beta_{j}(\delta)=\sum_{i=1}^{N}w_{ij}{\mathsf{P}}_{i}(d=j)italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_δ ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d = italic_j ) as the weighted probability of accepting jsubscript𝑗\operatorname{\mathcal{H}}_{j}caligraphic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, where (wij)i,j𝒩subscriptsubscript𝑤𝑖𝑗𝑖𝑗𝒩(w_{ij})_{i,j\in{\mathcal{N}}}( italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i , italic_j ∈ caligraphic_N end_POSTSUBSCRIPT is a given matrix of positive weights. Recall the definition of the class of tests (2) for which the probabilities of errors 𝖯i(d=j)subscript𝖯𝑖𝑑𝑗{\mathsf{P}}_{i}(d=j)sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d = italic_j ) do not exceed prescribed values αijsubscript𝛼𝑖𝑗\alpha_{ij}italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and introduce two more classes that upper-bound the weighted probabilities of errors βj(δ)subscript𝛽𝑗𝛿\beta_{j}(\delta)italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_δ ) and probabilities of errors α~i(δ)subscript~𝛼𝑖𝛿\tilde{\alpha}_{i}(\delta)over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_δ ), respectively,

¯(𝜷)¯𝜷\displaystyle\overline{{\mathbb{C}}}({\bm{\beta}})over¯ start_ARG blackboard_C end_ARG ( bold_italic_β ) ={δ:βj(δ)βjforj=1,,N},absentconditional-set𝛿formulae-sequencesubscript𝛽𝑗𝛿subscript𝛽𝑗for𝑗1𝑁\displaystyle=\left\{\delta:\beta_{j}(\delta)\leqslant\beta_{j}~{}~{}\text{for% }~{}j=1,\dots,N\right\},= { italic_δ : italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_δ ) ⩽ italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for italic_j = 1 , … , italic_N } , (14)
~(𝜶~)~~𝜶\displaystyle\tilde{{\mathbb{C}}}(\tilde{{\bm{\alpha}}})over~ start_ARG blackboard_C end_ARG ( over~ start_ARG bold_italic_α end_ARG ) ={δ:α~i(δ)α~ifori=1,,N}.absentconditional-set𝛿formulae-sequencesubscript~𝛼𝑖𝛿subscript~𝛼𝑖for𝑖1𝑁\displaystyle=\left\{\delta:\tilde{\alpha}_{i}(\delta)\leqslant\tilde{\alpha}_% {i}~{}~{}\text{for}~{}i=1,\dots,N\right\}.= { italic_δ : over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_δ ) ⩽ over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for italic_i = 1 , … , italic_N } . (15)

If Aij=Aij(c)subscript𝐴𝑖𝑗subscript𝐴𝑖𝑗𝑐A_{ij}=A_{ij}(c)italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_c ) is a function of the small parameter c𝑐citalic_c, then the error probabilities αij(c)superscriptsubscript𝛼𝑖𝑗𝑐\alpha_{ij}^{*}(c)italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ), α~i(c)subscriptsuperscript~𝛼𝑖𝑐\tilde{\alpha}^{*}_{i}(c)over~ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_c ) and βj(c)superscriptsubscript𝛽𝑗𝑐\beta_{j}^{*}(c)italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) of the MSPRT δ(c)subscript𝛿𝑐\delta_{*}(c)italic_δ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_c ) are also functions of this parameter, and if Aji(c)subscript𝐴𝑗𝑖𝑐A_{ji}(c)\to\inftyitalic_A start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ( italic_c ) → ∞, then αij(c),βj(c)0superscriptsubscript𝛼𝑖𝑗𝑐subscriptsuperscript𝛽𝑗𝑐0\alpha_{ij}^{*}(c),\beta^{*}_{j}(c)\to 0italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) , italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_c ) → 0 as c0𝑐0c\to 0italic_c → 0. Note that α~i(c)=jiαij(c)subscriptsuperscript~𝛼𝑖𝑐subscript𝑗𝑖subscriptsuperscript𝛼𝑖𝑗𝑐\tilde{\alpha}^{*}_{i}(c)=\sum_{j\neq i}\alpha^{*}_{ij}(c)over~ start_ARG italic_α end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_c ) = ∑ start_POSTSUBSCRIPT italic_j ≠ italic_i end_POSTSUBSCRIPT italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_c ), so it also goes to zero as c0𝑐0c\to 0italic_c → 0. We denote as 𝜷(c)superscript𝜷𝑐{\bm{\beta}}^{*}(c)bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) the vector (β1(c),,βN(c))superscriptsubscript𝛽1𝑐superscriptsubscript𝛽𝑁𝑐(\beta_{1}^{*}(c),\dots,\beta_{N}^{*}(c))( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) , … , italic_β start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) ), as 𝜶~(c)superscript~𝜶𝑐\tilde{{\bm{\alpha}}}^{*}(c)over~ start_ARG bold_italic_α end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) the vector (α~1(c),,α~N(c))superscriptsubscript~𝛼1𝑐superscriptsubscript~𝛼𝑁𝑐(\tilde{\alpha}_{1}^{*}(c),\dots,\tilde{\alpha}_{N}^{*}(c))( over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) , … , over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) ) and as 𝜶(c)superscript𝜶𝑐{\bm{\alpha}}^{*}(c)bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) the matrix (αij(c))i,j𝒩subscriptsuperscriptsubscript𝛼𝑖𝑗𝑐𝑖𝑗𝒩(\alpha_{ij}^{*}(c))_{i,j\in{\mathscr{N}}}( italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) ) start_POSTSUBSCRIPT italic_i , italic_j ∈ script_N end_POSTSUBSCRIPT.

Theorem 1 (MSPRT near optimality).

Assume that the second moment condition (10) holds.

(i)

If the thresholds in the MSPRT are selected as Aji(c)=wjiij/csubscript𝐴𝑗𝑖𝑐subscript𝑤𝑗𝑖subscript𝑖𝑗𝑐A_{ji}(c)=w_{ji}\mathcal{L}_{ij}/citalic_A start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ( italic_c ) = italic_w start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT / italic_c, i,j=1,,Nformulae-sequence𝑖𝑗1𝑁i,j=1,\dots,Nitalic_i , italic_j = 1 , … , italic_N, then

𝖤i[T(c)]=infδ¯(𝜷(c))𝖤i[T]+o(1)asc0for alli=1,,N,formulae-sequenceformulae-sequencesubscript𝖤𝑖delimited-[]superscript𝑇𝑐subscriptinfimum𝛿¯superscript𝜷𝑐subscript𝖤𝑖delimited-[]𝑇𝑜1as𝑐0for all𝑖1𝑁{\mathsf{E}}_{i}[T^{*}(c)]=\inf_{\delta\in\overline{{\mathbb{C}}}({\bm{\beta}}% ^{*}(c))}{\mathsf{E}}_{i}[T]+o(1)\quad\text{as}~{}c\to 0~{}~{}\text{for all}~{% }~{}i=1,\dots,N,sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) ] = roman_inf start_POSTSUBSCRIPT italic_δ ∈ over¯ start_ARG blackboard_C end_ARG ( bold_italic_β start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T ] + italic_o ( 1 ) as italic_c → 0 for all italic_i = 1 , … , italic_N , (16)

i.e., the MSPRT minimizes to within o(1)𝑜1o(1)italic_o ( 1 ) the expected sample sizes among all tests whose weighted error probabilities are less than or equal to those of δ(c)superscript𝛿𝑐\delta^{*}(c)italic_δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ).

(ii)

For any matrix 𝐁=(Bij)𝐁subscript𝐵𝑖𝑗\mathbf{B}=(B_{ij})bold_B = ( italic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) (Bij>0subscript𝐵𝑖𝑗0B_{ij}>0italic_B start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT > 0, ij𝑖𝑗i\neq jitalic_i ≠ italic_j), let Aji=Bji/csubscript𝐴𝑗𝑖subscript𝐵𝑗𝑖𝑐A_{ji}=B_{ji}/citalic_A start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT = italic_B start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT / italic_c. The MSPRT δ(c)superscript𝛿𝑐\delta^{*}(c)italic_δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) asymptotically minimizes the expected sample sizes for all hypotheses to within o(1)𝑜1o(1)italic_o ( 1 ) as c0𝑐0c\to 0italic_c → 0 among all tests whose error probabilities αij(δ)subscript𝛼𝑖𝑗𝛿\alpha_{ij}(\delta)italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_δ ) are less than or equal to those of δ(c)superscript𝛿𝑐\delta^{*}(c)italic_δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) as well as whose error probabilities α~i(δ)subscript~𝛼𝑖𝛿\tilde{\alpha}_{i}(\delta)over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_δ ) are less than or equal to those of δ(c)superscript𝛿𝑐\delta^{*}(c)italic_δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ), i.e.,

𝖤i[T(c)]=infδ(𝜶(c))𝖤i[T]+o(1)asc0for alli=1,,Nformulae-sequencesubscript𝖤𝑖delimited-[]superscript𝑇𝑐subscriptinfimum𝛿superscript𝜶𝑐subscript𝖤𝑖delimited-[]𝑇𝑜1formulae-sequenceas𝑐0for all𝑖1𝑁{\mathsf{E}}_{i}[T^{*}(c)]=\inf_{\delta\in{\mathbb{C}}({\bm{\alpha}}^{*}(c))}{% \mathsf{E}}_{i}[T]+o(1)\quad\text{as}~{}~{}c\to 0\quad\text{for all}~{}~{}i=1,% \dots,Nsansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) ] = roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C ( bold_italic_α start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T ] + italic_o ( 1 ) as italic_c → 0 for all italic_i = 1 , … , italic_N (17)

and

𝖤i[T(c)]=infδ~(𝜶~(c))𝖤i[T]+o(1)asc0for alli=1,,N.formulae-sequencesubscript𝖤𝑖delimited-[]superscript𝑇𝑐subscriptinfimum𝛿~superscript~𝜶𝑐subscript𝖤𝑖delimited-[]𝑇𝑜1formulae-sequenceas𝑐0for all𝑖1𝑁{\mathsf{E}}_{i}[T^{*}(c)]=\inf_{\delta\in\tilde{{\mathbb{C}}}(\tilde{{\bm{% \alpha}}}^{*}(c))}{\mathsf{E}}_{i}[T]+o(1)\quad\text{as}~{}~{}c\to 0\quad\text% {for all}~{}~{}i=1,\dots,N.sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) ] = roman_inf start_POSTSUBSCRIPT italic_δ ∈ over~ start_ARG blackboard_C end_ARG ( over~ start_ARG bold_italic_α end_ARG start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_T ] + italic_o ( 1 ) as italic_c → 0 for all italic_i = 1 , … , italic_N . (18)

The intuition behind these results is that since the MSPRT is a combination of one-sided SPRTs τij(aji)subscript𝜏𝑖𝑗subscript𝑎𝑗𝑖\tau_{ij}(a_{ji})italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ) defined in (12) and since the ζij=ij/Iijsubscript𝜁𝑖𝑗subscript𝑖𝑗subscript𝐼𝑖𝑗\zeta_{ij}=\mathcal{L}_{ij}/I_{ij}italic_ζ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = caligraphic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT / italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT are correction factors to the error probability bound 𝖯j(τij(aji)<)eajisubscript𝖯𝑗subscript𝜏𝑖𝑗subscript𝑎𝑗𝑖superscript𝑒subscript𝑎𝑗𝑖{\mathsf{P}}_{j}(\tau_{ij}(a_{ji})<\infty)\leqslant e^{-a_{ji}}sansserif_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ) < ∞ ) ⩽ italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, the asymptotic approximation

𝖯j(τij(aji)<)=ζijeaji(1+o(1))asaji,formulae-sequencesubscript𝖯𝑗subscript𝜏𝑖𝑗subscript𝑎𝑗𝑖subscript𝜁𝑖𝑗superscript𝑒subscript𝑎𝑗𝑖1𝑜1assubscript𝑎𝑗𝑖{\mathsf{P}}_{j}(\tau_{ij}(a_{ji})<\infty)=\zeta_{ij}e^{-a_{ji}}(1+o(1))\quad% \text{as}~{}a_{ji}\to\infty,sansserif_P start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT ) < ∞ ) = italic_ζ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( 1 + italic_o ( 1 ) ) as italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT → ∞ ,

works well even for moderate values of ajisubscript𝑎𝑗𝑖a_{ji}italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT. So taking aji=log(Iij/ijα)subscript𝑎𝑗𝑖subscript𝐼𝑖𝑗subscript𝑖𝑗𝛼a_{ji}=\log(I_{ij}/\mathcal{L}_{ij}\alpha)italic_a start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT = roman_log ( italic_I start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT / caligraphic_L start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_α ) allows one to attain a nearly optimal solution in the frequentist problem. The proofs of these results are extremely tedious and require many non-standard and sophisticated mathematical tools developed by Lorden.

Notice that Theorem 1 only addresses the asymptotically symmetric case where

limc0logβj(c)logβk(c)=1,limc0logα~i(c)logα~k(c)=1andlimc0logαij(c)logαks(c)=1.formulae-sequencesubscript𝑐0superscriptsubscript𝛽𝑗𝑐superscriptsubscript𝛽𝑘𝑐1formulae-sequencesubscript𝑐0superscriptsubscript~𝛼𝑖𝑐superscriptsubscript~𝛼𝑘𝑐1andsubscript𝑐0superscriptsubscript𝛼𝑖𝑗𝑐superscriptsubscript𝛼𝑘𝑠𝑐1\lim_{c\to 0}~{}\frac{\log\beta_{j}^{*}(c)}{\log\beta_{k}^{*}(c)}=1,\quad\lim_% {c\to 0}~{}\frac{\log\tilde{\alpha}_{i}^{*}(c)}{\log\tilde{\alpha}_{k}^{*}(c)}% =1\quad\text{and}\quad\lim_{c\to 0}~{}\frac{\log\alpha_{ij}^{*}(c)}{\log\alpha% _{ks}^{*}(c)}=1.roman_lim start_POSTSUBSCRIPT italic_c → 0 end_POSTSUBSCRIPT divide start_ARG roman_log italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) end_ARG start_ARG roman_log italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) end_ARG = 1 , roman_lim start_POSTSUBSCRIPT italic_c → 0 end_POSTSUBSCRIPT divide start_ARG roman_log over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) end_ARG start_ARG roman_log over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) end_ARG = 1 and roman_lim start_POSTSUBSCRIPT italic_c → 0 end_POSTSUBSCRIPT divide start_ARG roman_log italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) end_ARG start_ARG roman_log italic_α start_POSTSUBSCRIPT italic_k italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_c ) end_ARG = 1 . (19)

Introducing for the hypotheses isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT different observation costs cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that may go to 00 at different rates, i.e., setting Aji=Bji/cisubscript𝐴𝑗𝑖subscript𝐵𝑗𝑖subscript𝑐𝑖A_{ji}=B_{ji}/c_{i}italic_A start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT = italic_B start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT / italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the results of Theorem 1 can be generalized to the more general asymmetric case where the ratios in (19) are bounded away from zero and infinity. This generalization is important for certain applications.

Lorden’s outlined results and methodologies hold significant potential for application across various problems and domains. For instance, consider their relevance in the multistream (or multichannel) problem involving two decisions and multiple data streams, as explored by Fellouris and Tartakovsky (2017) and discussed in (Tartakovsky 2020, Chapter 1). Sequential hypothesis testing within multiple data streams, such as sensors, populations, or multichannel systems, carries numerous practical implications and applications.

Suppose observations are sequentially acquired over time in N𝑁Nitalic_N streams. The observations in the i𝑖iitalic_ith data stream correspond to a realization of a stochastic process X(i)={Xn(i)}n𝑋𝑖subscriptsubscript𝑋𝑛𝑖𝑛X(i)=\{X_{n}(i)\}_{n\in\mathbb{N}}italic_X ( italic_i ) = { italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i ) } start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT, where i𝒩:={1,,N}𝑖𝒩assign1𝑁i\in{\mathscr{N}}:=\{1,\ldots,N\}italic_i ∈ script_N := { 1 , … , italic_N } and ={1,2,}12\mathbb{N}=\{1,2,\dots\}blackboard_N = { 1 , 2 , … }. Let 0subscript0\operatorname{\mathcal{H}}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be the null hypothesis according to which all N𝑁Nitalic_N streams are not affected, i.e., there are no “signals” in all streams at all. For any given non-empty subset of components, 𝒩𝒩{\mathscr{B}}\subset{\mathscr{N}}script_B ⊂ script_N, let subscript\operatorname{\mathcal{H}}_{{\mathscr{B}}}caligraphic_H start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT be the hypothesis according to which only the components X(i)𝑋𝑖X(i)italic_X ( italic_i ) with i𝑖iitalic_i in {\mathscr{B}}script_B contain signals. Denote by 𝖯0subscript𝖯0{\mathsf{P}}_{0}sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝖯subscript𝖯{\mathsf{P}}_{{\mathscr{B}}}sansserif_P start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT the distributions of 𝐗𝐗{\mathbf{X}}bold_X under hypotheses 0subscript0\operatorname{\mathcal{H}}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and subscript\operatorname{\mathcal{H}}_{{\mathscr{B}}}caligraphic_H start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT, respectively. Next, let 𝒫𝒫\mathcal{P}caligraphic_P be a class of subsets of 𝒩𝒩{\mathscr{N}}script_N that incorporates a priori information that may be available regarding the subset of affected streams. Denote by |||{\mathscr{B}}|| script_B | the size of a subset {\mathscr{B}}script_B, i.e., the number of signals under subscript\operatorname{\mathcal{H}}_{\mathscr{B}}caligraphic_H start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT, and by |𝒫|𝒫|\mathcal{P}|| caligraphic_P | the size of class 𝒫𝒫\mathcal{P}caligraphic_P, i.e., the number of possible alternatives in 𝒫𝒫\mathcal{P}caligraphic_P. For example, if we know upper K¯N¯𝐾𝑁\overline{K}\leqslant Nover¯ start_ARG italic_K end_ARG ⩽ italic_N and lower K¯1¯𝐾1\underline{K}\geqslant 1under¯ start_ARG italic_K end_ARG ⩾ 1 bounds on the size of the affected subset or when we know that at most K𝐾Kitalic_K streams can be affected, then 𝒫=𝒫K¯,K¯={𝒩:K¯||K¯}𝒫subscript𝒫¯𝐾¯𝐾conditional-set𝒩¯𝐾¯𝐾\mathcal{P}=\mathcal{P}_{\underline{K},\overline{K}}=\{{\mathscr{B}}\subset{% \mathscr{N}}:\underline{K}\leqslant|{\mathscr{B}}|\leqslant\overline{K}\}caligraphic_P = caligraphic_P start_POSTSUBSCRIPT under¯ start_ARG italic_K end_ARG , over¯ start_ARG italic_K end_ARG end_POSTSUBSCRIPT = { script_B ⊂ script_N : under¯ start_ARG italic_K end_ARG ⩽ | script_B | ⩽ over¯ start_ARG italic_K end_ARG } and 𝒫=𝒫K={𝒩:1||K}𝒫subscript𝒫𝐾conditional-set𝒩1𝐾\mathcal{P}=\mathcal{P}_{K}=\{{\mathscr{B}}\subset{\mathscr{N}}:1\leqslant|{% \mathscr{B}}|\leqslant K\}caligraphic_P = caligraphic_P start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT = { script_B ⊂ script_N : 1 ⩽ | script_B | ⩽ italic_K }, respectively.

We aim to test 0subscript0\operatorname{\mathcal{H}}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the simple null hypothesis indicating no signals in any data stream, against the composite alternative 1subscript1\operatorname{\mathcal{H}}_{1}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, according to which the subset of streams with signals belongs to 𝒫𝒫\mathcal{P}caligraphic_P. We denote 𝖯0n=𝖯0|nsuperscriptsubscript𝖯0𝑛evaluated-atsubscript𝖯0subscript𝑛{\mathsf{P}}_{0}^{n}={\mathsf{P}}_{0}|_{{\mathscr{F}}_{n}}sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | start_POSTSUBSCRIPT script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 𝖯n=𝖯|nsuperscriptsubscript𝖯𝑛evaluated-atsubscript𝖯subscript𝑛{\mathsf{P}}_{\mathscr{B}}^{n}={\mathsf{P}}_{\mathscr{B}}|_{{\mathscr{F}}_{n}}sansserif_P start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = sansserif_P start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT | start_POSTSUBSCRIPT script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT as restrictions of probability measures 𝖯0subscript𝖯0{\mathsf{P}}_{0}sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝖯subscript𝖯{\mathsf{P}}_{\mathscr{B}}sansserif_P start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT to the σ𝜎\sigmaitalic_σ-algebra nsubscript𝑛{\mathscr{F}}_{n}script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and let p0(𝐗n)subscript𝑝0superscript𝐗𝑛p_{0}({\mathbf{X}}^{n})italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) and p(𝐗n)subscript𝑝superscript𝐗𝑛p_{\mathscr{B}}({\mathbf{X}}^{n})italic_p start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) denote the corresponding probability densities of these measures with respect to some non-degenerate σ𝜎\sigmaitalic_σ-finite measure, where 𝐗n=(𝐗1,,𝐗n)superscript𝐗𝑛subscript𝐗1subscript𝐗𝑛{\mathbf{X}}^{n}=({\mathbf{X}}_{1},\dots,{\mathbf{X}}_{n})bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = ( bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) denotes the concatenation of the first n𝑛nitalic_n observations from all data streams.

In what follows, we confine ourselves to the i.i.d. scenario where observations across streams are independent. Moreover, within specific streams, observations are also independent, possessing densities gi(x)subscript𝑔𝑖𝑥g_{i}(x)italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) and fi(x)subscript𝑓𝑖𝑥f_{i}(x)italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) if the i𝑖iitalic_i-th stream is unaffected and contains a signal, respectively. Hence, the hypothesis testing problem can be formulated as

0::subscript0absent\displaystyle\operatorname{\mathcal{H}}_{0}:caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : p(𝐗n)=p0(𝐗n)=i=1Nt=1ngi(Xt(i));𝑝superscript𝐗𝑛subscript𝑝0superscript𝐗𝑛superscriptsubscriptproduct𝑖1𝑁superscriptsubscriptproduct𝑡1𝑛subscript𝑔𝑖subscript𝑋𝑡𝑖\displaystyle\quad p({\mathbf{X}}^{n})=p_{0}({\mathbf{X}}^{n})=\prod_{i=1}^{N}% \prod_{t=1}^{n}g_{i}(X_{t}(i));italic_p ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i ) ) ;
1=𝒫::subscript1subscript𝒫subscriptabsent\displaystyle\operatorname{\mathcal{H}}_{1}=\bigcup_{{\mathscr{B}}\in\mathcal{% P}}\operatorname{\mathcal{H}}_{{\mathscr{B}}}:caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ⋃ start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT : p(𝐗n)=it=1nfi(Xt(i))×i𝒩t=1ngi(Xt(i)).subscript𝑝superscript𝐗𝑛subscriptproduct𝑖superscriptsubscriptproduct𝑡1𝑛subscript𝑓𝑖subscript𝑋𝑡𝑖subscriptproduct𝑖𝒩superscriptsubscriptproduct𝑡1𝑛subscript𝑔𝑖subscript𝑋𝑡𝑖\displaystyle\quad p_{\mathscr{B}}({\mathbf{X}}^{n})=\prod_{i\in{\mathscr{B}}}% \prod_{t=1}^{n}f_{i}(X_{t}(i))\times\prod_{i\in{\mathcal{N}}\setminus{\mathscr% {B}}}\prod_{t=1}^{n}g_{i}(X_{t}(i)).italic_p start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_i ∈ script_B end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i ) ) × ∏ start_POSTSUBSCRIPT italic_i ∈ caligraphic_N ∖ script_B end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i ) ) .

Since the hypothesis testing problem is binary the terminal decision d𝑑ditalic_d takes two values 00 and 1111, so d{0,1}𝑑01d\in\{0,1\}italic_d ∈ { 0 , 1 } is a Tsubscript𝑇{\mathscr{F}}_{T}script_F start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT-measurable random variable such that {d=j}={T<,jis selected}𝑑𝑗𝑇subscript𝑗is selected\{d=j\}=\{T<\infty,\operatorname{\mathcal{H}}_{j}\;\text{is selected}\}{ italic_d = italic_j } = { italic_T < ∞ , caligraphic_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is selected }, j=0,1𝑗01j=0,1italic_j = 0 , 1.

A sequential test should be designed in such a way that the type-I (false alarm) and type-II (missed detection) error probabilities are controlled, i.e., do not exceed given, user-specified levels. Denote by 𝒫(α0,α1)subscript𝒫subscript𝛼0subscript𝛼1{\mathbb{C}_{\mathcal{P}}}(\alpha_{0},\alpha_{1})blackboard_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) the class of sequential tests with the probability of false alarm below α0(0,1)subscript𝛼001\alpha_{0}\in(0,1)italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and the probability of missed detection below α1(0,1)subscript𝛼101\alpha_{1}\in(0,1)italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ ( 0 , 1 ), i.e.,

𝒫(α0,α1)={δ:𝖯0(d=1)α0andmax𝒫𝖯(d=0)α1}.subscript𝒫subscript𝛼0subscript𝛼1conditional-set𝛿subscript𝖯0𝑑1subscript𝛼0andsubscript𝒫subscript𝖯𝑑0subscript𝛼1{\mathbb{C}_{\mathcal{P}}}(\alpha_{0},\alpha_{1})=\left\{\delta:{\mathsf{P}}_{% 0}(d=1)\leqslant\alpha_{0}~{}~{}\text{and}~{}~{}\max_{{\mathscr{B}}\in\mathcal% {P}}{\mathsf{P}}_{{\mathscr{B}}}(d=0)\leqslant\alpha_{1}\right\}.blackboard_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = { italic_δ : sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_d = 1 ) ⩽ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and roman_max start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT sansserif_P start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_d = 0 ) ⩽ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } . (20)

In general, it is not possible to design the tests that are third-order (to within o(1)𝑜1o(1)italic_o ( 1 )) or even second-order (to within a constant term O(1)𝑂1O(1)italic_O ( 1 )) asymptotically optimal as αmax=max(α0,α1)0subscript𝛼maxsubscript𝛼0subscript𝛼10{\alpha_{\rm max}}=\max(\alpha_{0},\alpha_{1})\to 0italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = roman_max ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → 0. Only finding a test Tsubscript𝑇T_{*}italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT that minimizes the expected sample sizes 𝖤0[T]subscript𝖤0delimited-[]𝑇{\mathsf{E}}_{0}[T]sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_T ] and 𝖤[T]subscript𝖤delimited-[]𝑇{\mathsf{E}}_{{\mathscr{B}}}[T]sansserif_E start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT [ italic_T ] for every 𝒫𝒫{\mathscr{B}}\in\mathcal{P}script_B ∈ caligraphic_P to first order is possible, that is,

𝖤0[T]subscript𝖤0delimited-[]subscript𝑇\displaystyle{\mathsf{E}}_{0}[T_{*}]sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ] infδ𝒫(α0,α1)𝖤0[T],similar-toabsentsubscriptinfimum𝛿subscript𝒫subscript𝛼0subscript𝛼1subscript𝖤0delimited-[]𝑇\displaystyle\sim\inf\limits_{\delta\in{\mathbb{C}_{\mathcal{P}}}(\alpha_{0},% \alpha_{1})}{\mathsf{E}}_{0}[T],∼ roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_T ] ,
𝖤[T]subscript𝖤delimited-[]subscript𝑇\displaystyle{\mathsf{E}}_{{\mathscr{B}}}[T_{*}]sansserif_E start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ] infδ𝒫(α0,α1)𝖤[T]for all𝒫,formulae-sequencesimilar-toabsentsubscriptinfimum𝛿subscript𝒫subscript𝛼0subscript𝛼1subscript𝖤delimited-[]𝑇for all𝒫\displaystyle\sim\inf\limits_{\delta\in{\mathbb{C}_{\mathcal{P}}}(\alpha_{0},% \alpha_{1})}{\mathsf{E}}_{{\mathscr{B}}}[T]\quad\text{for all}~{}{\mathscr{B}}% \in\mathcal{P},∼ roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT [ italic_T ] for all script_B ∈ caligraphic_P ,

where 𝖤0subscript𝖤0{\mathsf{E}}_{0}sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝖤subscript𝖤{\mathsf{E}}_{{\mathscr{B}}}sansserif_E start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT are expectations under 𝖯0subscript𝖯0{\mathsf{P}}_{0}sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝖯subscript𝖯{\mathsf{P}}_{\mathscr{B}}sansserif_P start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT, respectively.

Hereafter we use the notation xαyαsimilar-tosubscript𝑥𝛼subscript𝑦𝛼x_{\alpha}\sim y_{\alpha}italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ∼ italic_y start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT as α0𝛼0\alpha\to 0italic_α → 0 when limα0(xα/yα)=1subscript𝛼0subscript𝑥𝛼subscript𝑦𝛼1\lim_{\alpha\to 0}(x_{\alpha}/y_{\alpha})=1roman_lim start_POSTSUBSCRIPT italic_α → 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT / italic_y start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ) = 1.

Let 𝒫𝒫\mathcal{P}caligraphic_P be an arbitrary class of subsets of 𝒩𝒩{\mathscr{N}}script_N. For any 𝒫𝒫{\mathscr{B}}\in\mathcal{P}script_B ∈ caligraphic_P, let Λ(n)subscriptΛ𝑛\Lambda_{{\mathscr{B}}}(n)roman_Λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) be the likelihood ratio of subscript\operatorname{\mathcal{H}}_{{\mathscr{B}}}caligraphic_H start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT against 0subscript0\operatorname{\mathcal{H}}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT given the observations from all streams up to time n𝑛nitalic_n, and let λ(n)subscript𝜆𝑛\lambda_{{\mathscr{B}}}(n)italic_λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) be the corresponding log-likelihood ratio (LLR),

Λ(n)subscriptΛ𝑛\displaystyle\Lambda_{{\mathscr{B}}}(n)roman_Λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) =d𝖯nd𝖯0n=it=1nfi(Xt(i))gi(Xt(i)),absentdsuperscriptsubscript𝖯𝑛dsuperscriptsubscript𝖯0𝑛subscriptproduct𝑖superscriptsubscriptproduct𝑡1𝑛subscript𝑓𝑖subscript𝑋𝑡𝑖subscript𝑔𝑖subscript𝑋𝑡𝑖\displaystyle=\frac{{\mathrm{d}}{\mathsf{P}}_{{\mathscr{B}}}^{n}}{{\mathrm{d}}% {\mathsf{P}}_{0}^{n}}=\prod_{i\in{\mathscr{B}}}\prod_{t=1}^{n}\frac{f_{i}(X_{t% }(i))}{g_{i}(X_{t}(i))},= divide start_ARG roman_d sansserif_P start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG start_ARG roman_d sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG = ∏ start_POSTSUBSCRIPT italic_i ∈ script_B end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i ) ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i ) ) end_ARG ,
λ(n)subscript𝜆𝑛\displaystyle\lambda_{{\mathscr{B}}}(n)italic_λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) =logΛ(n)=it=1nlog[fi(Xt(i))gi(Xt(i))].absentsubscriptΛ𝑛subscript𝑖superscriptsubscript𝑡1𝑛subscript𝑓𝑖subscript𝑋𝑡𝑖subscript𝑔𝑖subscript𝑋𝑡𝑖\displaystyle=\log\Lambda_{{\mathscr{B}}}(n)=\sum_{i\in{\mathscr{B}}}\sum_{t=1% }^{n}\log\left[\frac{f_{i}(X_{t}(i))}{g_{i}(X_{t}(i))}\right].= roman_log roman_Λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) = ∑ start_POSTSUBSCRIPT italic_i ∈ script_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log [ divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i ) ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_i ) ) end_ARG ] .

The natural popular statistic for testing 0subscript0\operatorname{\mathcal{H}}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT against 1subscript1\operatorname{\mathcal{H}}_{1}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT at time n𝑛nitalic_n is the maximum (generalized) likelihood ratio (GLR) statistic Λ^(n)=max𝒫Λ(n).^Λ𝑛subscript𝒫subscriptΛ𝑛\widehat{\Lambda}(n)=\max_{{\mathscr{B}}\in\mathcal{P}}\;\Lambda_{{\mathscr{B}% }}(n).over^ start_ARG roman_Λ end_ARG ( italic_n ) = roman_max start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) . However, applying the conventional GLR statistic leads only to the first-order asymptotically optimal test. In order to obtain second and third-order optimality, we need to modify the GLR statistic into the weighed GLR Λ^(n;𝝅)=max𝒫πΛ(n),^Λ𝑛𝝅subscript𝒫subscript𝜋subscriptΛ𝑛\widehat{\Lambda}(n;{\bm{\pi}})=\max_{{\mathscr{B}}\in\mathcal{P}}\;\pi_{% \mathscr{B}}\Lambda_{{\mathscr{B}}}(n),over^ start_ARG roman_Λ end_ARG ( italic_n ; bold_italic_π ) = roman_max start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT roman_Λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) , where 𝝅={π,𝒫}𝝅subscript𝜋𝒫{\bm{\pi}}=\{\pi_{{\mathscr{B}}},{\mathscr{B}}\in\mathcal{P}\}bold_italic_π = { italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT , script_B ∈ caligraphic_P } is a probability mass function on 𝒩𝒩{\mathcal{N}}caligraphic_N fully supported on 𝒫𝒫\mathcal{P}caligraphic_P, i.e., π>0subscript𝜋0\pi_{{\mathscr{B}}}>0italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT > 0 for all 𝒫𝒫{\mathscr{B}}\in\mathcal{P}script_B ∈ caligraphic_P and 𝒫π=1subscript𝒫subscript𝜋1\sum_{{\mathscr{B}}\in\mathcal{P}}\pi_{{\mathscr{B}}}=1∑ start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT = 1. The corresponding weighted generalized log-likelihood ratio (GLLR) statistic is λ^(n;𝝅)=max𝒫(λ(n)+logπ).^𝜆𝑛𝝅subscript𝒫subscript𝜆𝑛subscript𝜋\widehat{\lambda}(n;{\bm{\pi}})=\max_{{\mathscr{B}}\in\mathcal{P}}\left(% \lambda_{{\mathscr{B}}}(n)+\log\pi_{{\mathscr{B}}}\right).over^ start_ARG italic_λ end_ARG ( italic_n ; bold_italic_π ) = roman_max start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) + roman_log italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ) .

The Generalized Sequential Likelihood Ratio Test (GSLRT) δ^=(T^,d^)^𝛿^𝑇^𝑑\widehat{\delta}=(\widehat{T},\widehat{d})over^ start_ARG italic_δ end_ARG = ( over^ start_ARG italic_T end_ARG , over^ start_ARG italic_d end_ARG ) is defined as

T^=inf{n1:λ^(n;𝝅1)a1orλ^(n;𝝅0)a0},d^={1ifλ^(T^;𝝅1)a10ifλ^(T^;𝝅0)a0,formulae-sequence^𝑇infimumconditional-set𝑛1^𝜆𝑛subscript𝝅1subscript𝑎1or^𝜆𝑛subscript𝝅0subscript𝑎0^𝑑cases1if^𝜆^𝑇subscript𝝅1subscript𝑎10if^𝜆^𝑇subscript𝝅0subscript𝑎0\displaystyle\widehat{T}=\inf\{n\geqslant 1:\widehat{\lambda}(n;{\bm{\pi}}_{1}% )\geqslant a_{1}\;\text{or}\;\widehat{\lambda}(n;{\bm{\pi}}_{0})\leqslant-a_{0% }\},\quad\widehat{d}=\begin{cases}1&\;\text{if}\quad\widehat{\lambda}(\widehat% {T};{\bm{\pi}}_{1})\geqslant a_{1}\\ 0&\;\text{if}\quad\widehat{\lambda}(\widehat{T};{\bm{\pi}}_{0})\leqslant-a_{0}% \\ \end{cases},over^ start_ARG italic_T end_ARG = roman_inf { italic_n ⩾ 1 : over^ start_ARG italic_λ end_ARG ( italic_n ; bold_italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⩾ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT or over^ start_ARG italic_λ end_ARG ( italic_n ; bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ⩽ - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } , over^ start_ARG italic_d end_ARG = { start_ROW start_CELL 1 end_CELL start_CELL if over^ start_ARG italic_λ end_ARG ( over^ start_ARG italic_T end_ARG ; bold_italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⩾ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL if over^ start_ARG italic_λ end_ARG ( over^ start_ARG italic_T end_ARG ; bold_italic_π start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ⩽ - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW ,

where 𝝅j={πj,,𝒫}subscript𝝅𝑗subscript𝜋𝑗𝒫{\bm{\pi}}_{j}=\{\pi_{j,{\mathscr{B}}},{\mathscr{B}}\in\mathcal{P}\}bold_italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { italic_π start_POSTSUBSCRIPT italic_j , script_B end_POSTSUBSCRIPT , script_B ∈ caligraphic_P }, j=0,1𝑗01j=0,1italic_j = 0 , 1 are not necessarily identical weights and a0,a1>0subscript𝑎0subscript𝑎10a_{0},a_{1}>0italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 are thresholds that should be selected appropriately in order to guarantee the desired error probabilities, i.e., so that T^^𝑇\widehat{T}over^ start_ARG italic_T end_ARG belongs to class 𝒫(α0,α1)subscript𝒫subscript𝛼0subscript𝛼1{\mathbb{C}_{\mathcal{P}}}(\alpha_{0},\alpha_{1})blackboard_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) for given α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with almost exact equalities. The LLR in the i𝑖iitalic_i-th stream is λi(n)=t=1nlog[fi(Xn(i))/gi(Xn(i))]subscript𝜆𝑖𝑛superscriptsubscript𝑡1𝑛subscript𝑓𝑖subscript𝑋𝑛𝑖subscript𝑔𝑖subscript𝑋𝑛𝑖\lambda_{i}(n)=\sum_{t=1}^{n}\log[f_{i}(X_{n}(i))/g_{i}(X_{n}(i))]italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_n ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log [ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i ) ) / italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i ) ) ], so that λ(n)=iλi(n)subscript𝜆𝑛subscript𝑖subscript𝜆𝑖𝑛\lambda_{\mathscr{B}}(n)=\sum_{i\in{\mathscr{B}}}\lambda_{i}(n)italic_λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) = ∑ start_POSTSUBSCRIPT italic_i ∈ script_B end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_n ).

The \mathcal{L}caligraphic_L-number is

=exp{n=11n[𝖯0(λ(n)>0)+𝖯(λ(n)0)]},subscriptsuperscriptsubscript𝑛11𝑛delimited-[]subscript𝖯0subscript𝜆𝑛0subscript𝖯subscript𝜆𝑛0\mathcal{L}_{{\mathscr{B}}}=\exp\left\{-\sum_{n=1}^{\infty}\frac{1}{n}\Bigl{[}% {\mathsf{P}}_{0}(\lambda_{{\mathscr{B}}}(n)>0)+{\mathsf{P}}_{{\mathscr{B}}}(% \lambda_{{\mathscr{B}}}(n)\leqslant 0)\Bigr{]}\right\},caligraphic_L start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT = roman_exp { - ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG [ sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) > 0 ) + sansserif_P start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ( italic_n ) ⩽ 0 ) ] } , (21)

which takes into account the overshoot; compare with Lorden’s \mathcal{L}caligraphic_L-numbers (11).

Denote by δ^(𝝅)=(T^(𝝅),d^(𝝅))subscript^𝛿𝝅subscript^𝑇𝝅subscript^𝑑𝝅{\widehat{\delta}}_{*}({\bm{\pi}})=({\widehat{T}}_{*}({\bm{\pi}}),{\widehat{d}% }_{*}({\bm{\pi}}))over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_π ) = ( over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_π ) , over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_π ) ) the GSLRT with weights

π1,=π𝒫(π/)andπ0,=π𝒫(π),𝒫.formulae-sequencesubscript𝜋1subscript𝜋subscriptsubscript𝒫subscript𝜋subscriptandformulae-sequencesubscript𝜋0subscript𝜋subscriptsubscript𝒫subscript𝜋subscript𝒫\pi_{1,{\mathscr{B}}}=\frac{\pi_{{\mathscr{B}}}}{\mathcal{L}_{\mathscr{B}}\sum% _{{\mathscr{B}}\in\mathcal{P}}(\pi_{\mathscr{B}}/\mathcal{L}_{\mathscr{B}})}% \quad\text{and}\quad\pi_{0,{\mathscr{B}}}=\frac{\pi_{{\mathscr{B}}}\,\mathcal{% L}_{\mathscr{B}}}{\sum_{{\mathscr{B}}\in\mathcal{P}}(\pi_{\mathscr{B}}\,% \mathcal{L}_{\mathscr{B}})},\quad{\mathscr{B}}\in\mathcal{P}.italic_π start_POSTSUBSCRIPT 1 , script_B end_POSTSUBSCRIPT = divide start_ARG italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT end_ARG start_ARG caligraphic_L start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT / caligraphic_L start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ) end_ARG and italic_π start_POSTSUBSCRIPT 0 , script_B end_POSTSUBSCRIPT = divide start_ARG italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT ) end_ARG , script_B ∈ caligraphic_P . (22)

The next theorem states that δ^(𝝅)subscript^𝛿𝝅{\widehat{\delta}}_{*}({\bm{\pi}})over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_π ) is third-order asymptotically optimal, minimizing the weighted expected sample size 𝖤𝝅[T]superscript𝖤𝝅delimited-[]𝑇{\mathsf{E}}^{{\bm{\pi}}}[T]sansserif_E start_POSTSUPERSCRIPT bold_italic_π end_POSTSUPERSCRIPT [ italic_T ] to within an o(1)𝑜1o(1)italic_o ( 1 ) term, where 𝖤𝝅superscript𝖤𝝅{\mathsf{E}}^{{\bm{\pi}}}sansserif_E start_POSTSUPERSCRIPT bold_italic_π end_POSTSUPERSCRIPT is expectation with respect to the probability measure 𝖯𝝅=𝒫π𝖯superscript𝖯𝝅subscript𝒫subscript𝜋subscript𝖯{\mathsf{P}}^{{\bm{\pi}}}=\sum_{{\mathscr{B}}\in\mathcal{P}}\pi_{\mathscr{B}}% \,{\mathsf{P}}_{\mathscr{B}}sansserif_P start_POSTSUPERSCRIPT bold_italic_π end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT sansserif_P start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT, i.e., the weighted expectation 𝖤𝝅[]=𝒫π𝖤[]superscript𝖤𝝅delimited-[]subscript𝒫subscript𝜋subscript𝖤delimited-[]{\mathsf{E}}^{{\bm{\pi}}}[\cdot]=\sum_{{\mathscr{B}}\in\mathcal{P}}\pi_{% \mathscr{B}}\,{\mathsf{E}}_{\mathscr{B}}[\cdot]sansserif_E start_POSTSUPERSCRIPT bold_italic_π end_POSTSUPERSCRIPT [ ⋅ ] = ∑ start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT [ ⋅ ].

Theorem 2.

Assume the second moment conditions for LLRs 𝖤i|λi(1)|2<subscript𝖤𝑖superscriptsubscript𝜆𝑖12{\mathsf{E}}_{i}|\lambda_{i}(1)|^{2}<\inftysansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < ∞ and 𝖤0|λi(1)|2<subscript𝖤0superscriptsubscript𝜆𝑖12{\mathsf{E}}_{0}|\lambda_{i}(1)|^{2}<\inftysansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < ∞, i=1,,N𝑖1𝑁i=1,\dots,Nitalic_i = 1 , … , italic_N. Let α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT approach 00 so that |logα0|/|logα1|1subscript𝛼0subscript𝛼11|\log\alpha_{0}|/|\log\alpha_{1}|\to 1| roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | / | roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | → 1. If thresholds a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are selected so that δ^(𝛑)subscript^𝛿𝛑{\widehat{\delta}}_{*}({\bm{\pi}})over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_π ) belongs to 𝒫(α0,α1)subscript𝒫subscript𝛼0subscript𝛼1{\mathbb{C}_{\mathcal{P}}}(\alpha_{0},\alpha_{1})blackboard_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), 𝖯0(d^(𝛑)=1)α0similar-tosubscript𝖯0subscript^𝑑𝛑1subscript𝛼0{\mathsf{P}}_{0}({\widehat{d}}_{*}({\bm{\pi}})=1)\sim\alpha_{0}sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_π ) = 1 ) ∼ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and 𝖯1(d^(𝛑)=0)α1similar-tosubscript𝖯1subscript^𝑑𝛑0subscript𝛼1{\mathsf{P}}_{1}({\widehat{d}}_{*}({\bm{\pi}})=0)\sim\alpha_{1}sansserif_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG italic_d end_ARG start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_π ) = 0 ) ∼ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, then the GSLRT is asymptotically optimal to third order in the class 𝒫(α0,α1)subscript𝒫subscript𝛼0subscript𝛼1{\mathbb{C}_{\mathcal{P}}}(\alpha_{0},\alpha_{1})blackboard_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ):

infδ𝒫(α0,α1)𝖤𝝅[T]=𝖤𝝅[T^(𝝅)]+o(1)asαmax0.formulae-sequencesubscriptinfimum𝛿subscript𝒫subscript𝛼0subscript𝛼1superscript𝖤𝝅delimited-[]𝑇superscript𝖤𝝅delimited-[]subscript^𝑇𝝅𝑜1assubscript𝛼max0\inf_{\delta\in{\mathbb{C}_{\mathcal{P}}}(\alpha_{0},\alpha_{1})}{\mathsf{E}}^% {{\bm{\pi}}}[T]={\mathsf{E}}^{{\bm{\pi}}}[{\widehat{T}}_{*}({\bm{\pi}})]+o(1)% \quad\text{as}~{}{\alpha_{\rm max}}\to 0.roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C start_POSTSUBSCRIPT caligraphic_P end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT sansserif_E start_POSTSUPERSCRIPT bold_italic_π end_POSTSUPERSCRIPT [ italic_T ] = sansserif_E start_POSTSUPERSCRIPT bold_italic_π end_POSTSUPERSCRIPT [ over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( bold_italic_π ) ] + italic_o ( 1 ) as italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT → 0 .

The central idea of the proof of this result is to consider a purely Bayesian sequential testing problem with the 1+|𝒫|1𝒫1+|\mathcal{P}|1 + | caligraphic_P | states “0:densitygi:subscript0densitysubscript𝑔𝑖\operatorname{\mathcal{H}}_{0}:\text{density}~{}g_{i}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : density italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for all i=1,,N𝑖1𝑁i=1,\dots,Nitalic_i = 1 , … , italic_N” and “1:densityf:superscriptsubscript1densitysubscript𝑓\operatorname{\mathcal{H}}_{1}^{\mathscr{B}}:\text{density}~{}f_{\mathscr{B}}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT script_B end_POSTSUPERSCRIPT : density italic_f start_POSTSUBSCRIPT script_B end_POSTSUBSCRIPT for 𝒫𝒫{\mathscr{B}}\in\mathcal{P}script_B ∈ caligraphic_P”, and two terminal decisions d=0𝑑0d=0italic_d = 0 (accept 0subscript0\operatorname{\mathcal{H}}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) and d=1𝑑1d=1italic_d = 1 (accept 1=𝒫H1subscript1subscript𝒫superscriptsubscript𝐻1\operatorname{\mathcal{H}}_{1}=\,{\textstyle\bigcup}\,_{{\mathscr{B}}\in% \mathcal{P}}H_{1}^{\mathscr{B}}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ⋃ start_POSTSUBSCRIPT script_B ∈ caligraphic_P end_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT script_B end_POSTSUPERSCRIPT). Then we can exploit Lorden’s methods and results to get the proof. Without Lorden’s (1977a) paper this would not be possible. Moreover, the whole idea of using \mathcal{L}caligraphic_L-numbers for corrections is based on Lorden’s fundamental contribution to the field.

2.2 Lorden’s (1970) Inequality for the Excess Over the Boundary

Partially motivated by seeking improved estimates of the error probabilities and other operating characteristics of Wald’s SPRT discussed above, Lorden (1970) considered an upper bound for estimating a random walk’s “worst case” expected overshoot

supa0𝖤[Ra],subscriptsupremum𝑎0𝖤delimited-[]subscript𝑅𝑎\sup_{a\geqslant 0}{\mathsf{E}}[R_{a}],roman_sup start_POSTSUBSCRIPT italic_a ⩾ 0 end_POSTSUBSCRIPT sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] , (23)

where a0𝑎0a\geqslant 0italic_a ⩾ 0 is the boundary,

Ra=ST(a)ais the overshoot,subscript𝑅𝑎subscript𝑆𝑇𝑎𝑎is the overshoot,R_{a}=S_{T(a)}-a\quad\mbox{is the overshoot,}italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = italic_S start_POSTSUBSCRIPT italic_T ( italic_a ) end_POSTSUBSCRIPT - italic_a is the overshoot, (24)

Sn=t=1nZtsubscript𝑆𝑛superscriptsubscript𝑡1𝑛subscript𝑍𝑡S_{n}=\sum_{t=1}^{n}Z_{t}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the random walk, T(a)=inf{n1:Sn>a}𝑇𝑎infimumconditional-set𝑛1subscript𝑆𝑛𝑎T(a)=\inf\{n\geqslant 1:\;S_{n}>a\}italic_T ( italic_a ) = roman_inf { italic_n ⩾ 1 : italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT > italic_a } is the stop** time, and, relaxing slightly our notation from Section 2.1.1, here the Znsubscript𝑍𝑛Z_{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are i.i.d. random variables with positive mean m𝑚mitalic_m; let Z𝑍Zitalic_Z denote a variate with the same distribution as the Znsubscript𝑍𝑛Z_{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Wald’s (1946) equation tells us that, whenever the following quantities are finite, m𝖤[T(a)]=𝖤[ST(a)]=a+𝖤[Ra]𝑚𝖤delimited-[]𝑇𝑎𝖤delimited-[]subscript𝑆𝑇𝑎𝑎𝖤delimited-[]subscript𝑅𝑎m{\mathsf{E}}[T(a)]={\mathsf{E}}[S_{T(a)}]=a+{\mathsf{E}}[R_{a}]italic_m sansserif_E [ italic_T ( italic_a ) ] = sansserif_E [ italic_S start_POSTSUBSCRIPT italic_T ( italic_a ) end_POSTSUBSCRIPT ] = italic_a + sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ], so an upper bound on 𝖤[Ra]𝖤delimited-[]subscript𝑅𝑎{\mathsf{E}}[R_{a}]sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] provides an upper bound on the expected stop** time 𝖤[T(a)]𝖤delimited-[]𝑇𝑎{\mathsf{E}}[T(a)]sansserif_E [ italic_T ( italic_a ) ] for the random walk Snsubscript𝑆𝑛S_{n}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to cross the boundary a𝑎aitalic_a. This is closely related to estimates of the expected stop** time 𝖤[T]𝖤delimited-[]subscript𝑇{\mathsf{E}}[T_{*}]sansserif_E [ italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ] of the SPRT in (4), as we shall see below.

Wald (1947) provided the upper bound for (23) of supa0𝖤[Za|Z>a]subscriptsupremum𝑎0𝖤delimited-[]𝑍𝑎ket𝑍𝑎\sup_{a\geqslant 0}{\mathsf{E}}[Z-a|Z>a]roman_sup start_POSTSUBSCRIPT italic_a ⩾ 0 end_POSTSUBSCRIPT sansserif_E [ italic_Z - italic_a | italic_Z > italic_a ], which is exact for the exponential distribution and provides reasonable bounds in some other cases, but has serious deficiencies in general: it can be difficult to calculate, is overly conservative in cases like when the distribution of Z𝑍Zitalic_Z has large “gaps,” and may be infinite even when 𝖤[(Z+)2]<𝖤delimited-[]superscriptsuperscript𝑍2{\mathsf{E}}[(Z^{+})^{2}]<\inftysansserif_E [ ( italic_Z start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] < ∞, a sufficient condition for finiteness of (23). Here and throughout this section, z+=max{z,0}superscript𝑧𝑧0z^{+}=\max\{z,0\}italic_z start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = roman_max { italic_z , 0 } is the positive part of z𝑧zitalic_z.

For nonnegative Z𝑍Zitalic_Z, results from renewal theory (see Feller 1966) provide estimates of 𝖤[Ra]𝖤delimited-[]subscript𝑅𝑎{\mathsf{E}}[R_{a}]sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] close to 𝖤[Z2]/m=𝖤[(Z+)2]/m𝖤delimited-[]superscript𝑍2𝑚𝖤delimited-[]superscriptsuperscript𝑍2𝑚{\mathsf{E}}[Z^{2}]/m={\mathsf{E}}[(Z^{+})^{2}]/msansserif_E [ italic_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] / italic_m = sansserif_E [ ( italic_Z start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] / italic_m for both a=0𝑎0a=0italic_a = 0 and as a𝑎a\rightarrow\inftyitalic_a → ∞. Lorden showed that this is indeed an upper bound for (23) more generally: for arbitrary i.i.d. Znsubscript𝑍𝑛Z_{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT allowed to be discrete or continuous, and take both positive and negative values, a necessary generalization of the renewal theory results for application to sequential testing and changepoint detection and analysis in which the Znsubscript𝑍𝑛Z_{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are log-likelihood summands or other sequential test statistic terms.

Theorem 3 (Lorden (1970), Theorem 1).

If Z,Z1,Z2,𝑍subscript𝑍1subscript𝑍2Z,Z_{1},Z_{2},\ldotsitalic_Z , italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … are i.i.d. random variables with mean 𝖤[Z]>0𝖤delimited-[]𝑍0{\mathsf{E}}[Z]>0sansserif_E [ italic_Z ] > 0 and 𝖤[(Z+)2]<𝖤delimited-[]superscriptsuperscript𝑍2{\mathsf{E}}[(Z^{+})^{2}]<\inftysansserif_E [ ( italic_Z start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] < ∞, then Rasubscript𝑅𝑎R_{a}italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT as defined in (24) satisfies

supa0𝖤[Ra]𝖤[(Z+)2]𝖤[Z].subscriptsupremum𝑎0𝖤delimited-[]subscript𝑅𝑎𝖤delimited-[]superscriptsuperscript𝑍2𝖤delimited-[]𝑍\sup_{a\geqslant 0}{\mathsf{E}}[R_{a}]\leqslant\frac{{\mathsf{E}}[(Z^{+})^{2}]% }{{\mathsf{E}}[Z]}.roman_sup start_POSTSUBSCRIPT italic_a ⩾ 0 end_POSTSUBSCRIPT sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] ⩽ divide start_ARG sansserif_E [ ( italic_Z start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG sansserif_E [ italic_Z ] end_ARG . (25)

Lorden’s proof of this theorem involves a number of characteristically clever techniques, of which we highlight a few here. First, he considers the stochastic process aRamaps-to𝑎subscript𝑅𝑎a\mapsto R_{a}italic_a ↦ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, noting that (w.p. 1111) it is piecewise-linear, each “piece” having slope 11-1- 1. Next, since aRamaps-to𝑎subscript𝑅𝑎a\mapsto R_{a}italic_a ↦ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and even a𝖤[Ra]maps-to𝑎𝖤delimited-[]subscript𝑅𝑎a\mapsto{\mathsf{E}}[R_{a}]italic_a ↦ sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] can behave erratically and be resistant to estimation and bounding, Lorden uses the smoothing technique of instead estimating 0b𝖤[Ra]dasuperscriptsubscript0𝑏𝖤delimited-[]subscript𝑅𝑎differential-d𝑎\int_{0}^{b}{\mathsf{E}}[R_{a}]{\mathrm{d}}a∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] roman_d italic_a for b0𝑏0b\geqslant 0italic_b ⩾ 0, which is more regularly behaved, as Lorden shows. Finally, the smoothed expected overshoot 0b𝖤[Ra]dasuperscriptsubscript0𝑏𝖤delimited-[]subscript𝑅𝑎differential-d𝑎\int_{0}^{b}{\mathsf{E}}[R_{a}]{\mathrm{d}}a∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] roman_d italic_a is bounded from above using properties of the process Rasubscript𝑅𝑎R_{a}italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, and then bounded from below using the following sub-additivity property of the integrand a𝖤[Ra]maps-to𝑎𝖤delimited-[]subscript𝑅𝑎a\mapsto{\mathsf{E}}[R_{a}]italic_a ↦ sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] established from the sub-additivity of a𝖤[T(a)]maps-to𝑎𝖤delimited-[]𝑇𝑎a\mapsto{\mathsf{E}}[T(a)]italic_a ↦ sansserif_E [ italic_T ( italic_a ) ] and Wald’s equation: For any 0ab0𝑎𝑏0\leqslant a\leqslant b0 ⩽ italic_a ⩽ italic_b,

𝖤[Ra]+𝖤[Rba]𝖤delimited-[]subscript𝑅𝑎𝖤delimited-[]subscript𝑅𝑏𝑎\displaystyle{\mathsf{E}}[R_{a}]+{\mathsf{E}}[R_{b-a}]sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] + sansserif_E [ italic_R start_POSTSUBSCRIPT italic_b - italic_a end_POSTSUBSCRIPT ] =𝖤[ST(a)]a+𝖤[ST(ba)](ba)absent𝖤delimited-[]subscript𝑆𝑇𝑎𝑎𝖤delimited-[]subscript𝑆𝑇𝑏𝑎𝑏𝑎\displaystyle={\mathsf{E}}[S_{T(a)}]-a+{\mathsf{E}}[S_{T(b-a)}]-(b-a)= sansserif_E [ italic_S start_POSTSUBSCRIPT italic_T ( italic_a ) end_POSTSUBSCRIPT ] - italic_a + sansserif_E [ italic_S start_POSTSUBSCRIPT italic_T ( italic_b - italic_a ) end_POSTSUBSCRIPT ] - ( italic_b - italic_a )
=m𝖤[T(a)]+m𝖤[T(ba)]b(Wald’s equation)absent𝑚𝖤delimited-[]𝑇𝑎𝑚𝖤delimited-[]𝑇𝑏𝑎𝑏(Wald’s equation)\displaystyle=m{\mathsf{E}}[T(a)]+m{\mathsf{E}}[T(b-a)]-b\quad\mbox{(Wald's % equation)}= italic_m sansserif_E [ italic_T ( italic_a ) ] + italic_m sansserif_E [ italic_T ( italic_b - italic_a ) ] - italic_b (Wald’s equation)
m𝖤[T(b)]b(sub-additivity of 𝖤[T(b)])absent𝑚𝖤delimited-[]𝑇𝑏𝑏(sub-additivity of 𝖤[T(b)])\displaystyle\geqslant m{\mathsf{E}}[T(b)]-b\quad\mbox{(sub-additivity of ${% \mathsf{E}}[T(b)]$)}⩾ italic_m sansserif_E [ italic_T ( italic_b ) ] - italic_b (sub-additivity of sansserif_E [ italic_T ( italic_b ) ] )
=𝖤[ST(b)]b(Wald’s equation)absent𝖤delimited-[]subscript𝑆𝑇𝑏𝑏(Wald’s equation)\displaystyle={\mathsf{E}}[S_{T(b)}]-b\quad\mbox{(Wald's equation)}= sansserif_E [ italic_S start_POSTSUBSCRIPT italic_T ( italic_b ) end_POSTSUBSCRIPT ] - italic_b (Wald’s equation)
=𝖤[Rb].absent𝖤delimited-[]subscript𝑅𝑏\displaystyle={\mathsf{E}}[R_{b}].= sansserif_E [ italic_R start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ] .

Returning to the stop** time Tsubscript𝑇T_{*}italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT of the SPRT in (4), now let the Znsubscript𝑍𝑛Z_{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the log-likelihood ratio terms as in (3), a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT the boundaries in (4), and expectation and probability are under the alternative hypothesis density f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The random walk Snsubscript𝑆𝑛S_{n}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT now coincides with the log-likelihood ratio statistic λnsubscript𝜆𝑛\lambda_{n}italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in (3), although we continue to use the S𝑆Sitalic_S notation here for clarity. In order to relate Tsubscript𝑇T_{*}italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT to T(a1)𝑇subscript𝑎1T(a_{1})italic_T ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) Lorden observes that

STmin{ST,a1}+(ST(a1)a1),subscript𝑆subscript𝑇subscript𝑆subscript𝑇subscript𝑎1subscript𝑆𝑇subscript𝑎1subscript𝑎1S_{T_{*}}\leqslant\min\{S_{T_{*}},a_{1}\}+(S_{T(a_{1})}-a_{1}),italic_S start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⩽ roman_min { italic_S start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } + ( italic_S start_POSTSUBSCRIPT italic_T ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , (26)

and then applying (25) to the latter term gives the upper bound

𝖤[T](1α1)a1α1a0m+𝖤[(Z+)2]m2𝖤delimited-[]subscript𝑇1subscript𝛼1subscript𝑎1subscript𝛼1subscript𝑎0𝑚𝖤delimited-[]superscriptsuperscript𝑍2superscript𝑚2{\mathsf{E}}[T_{*}]\leqslant\frac{(1-\alpha_{1})a_{1}-\alpha_{1}a_{0}}{m}+% \frac{{\mathsf{E}}[(Z^{+})^{2}]}{m^{2}}sansserif_E [ italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ] ⩽ divide start_ARG ( 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG + divide start_ARG sansserif_E [ ( italic_Z start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (27)

on the expected stop** time of the SPRT under the alternative hypothesis, with a bound under the null hypothesis obtained analogously. Wald (1947) provides a well-known upper bound on the type II error probability α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, but in order to apply (27) what is needed is clearly a lower bound on α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and a lower bound on α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for the corresponding bound under the null. Both of these can be obtained by another application of Lorden’s theorem, as follows. Wald’s argument gives that

α01α1=𝖤[exp(ST)|ST>a1].subscript𝛼01subscript𝛼1𝖤delimited-[]subscript𝑆subscript𝑇ketsubscript𝑆subscript𝑇subscript𝑎1\frac{\alpha_{0}}{1-\alpha_{1}}={\mathsf{E}}[\exp(-S_{T_{*}})|S_{T_{*}}>a_{1}].divide start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG = sansserif_E [ roman_exp ( - italic_S start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) | italic_S start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT > italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] .

Using the conditional Jensen’s inequality with a bound like (26) after multiplying by the indicator of the event {ST>a1}subscript𝑆subscript𝑇subscript𝑎1\{S_{T_{*}}>a_{1}\}{ italic_S start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT > italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }, Lorden obtains

α01α1exp[𝖤[(ST|ST>a1)]exp[(a1+𝖤[(Z+)2](1α1)m)].\frac{\alpha_{0}}{1-\alpha_{1}}\geqslant\exp[-{\mathsf{E}}[(S_{T_{*}}|S_{T_{*}% }>a_{1})]\geqslant\exp\left[-\left(a_{1}+\frac{{\mathsf{E}}[(Z^{+})^{2}]}{(1-% \alpha_{1})m}\right)\right].divide start_ARG italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ⩾ roman_exp [ - sansserif_E [ ( italic_S start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_S start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUBSCRIPT > italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] ⩾ roman_exp [ - ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG sansserif_E [ ( italic_Z start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG ( 1 - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_m end_ARG ) ] .

Using the standard upper bound α1ea1subscript𝛼1superscript𝑒subscript𝑎1\alpha_{1}\leqslant e^{-a_{1}}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⩽ italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, this gives

α0(1ea1)exp[(a1+𝖤[(Z+)2](1ea1)m)],subscript𝛼01superscript𝑒subscript𝑎1subscript𝑎1𝖤delimited-[]superscriptsuperscript𝑍21superscript𝑒subscript𝑎1𝑚\alpha_{0}\geqslant(1-e^{-a_{1}})\exp\left[-\left(a_{1}+\frac{{\mathsf{E}}[(Z^% {+})^{2}]}{(1-e^{-a_{1}})m}\right)\right],italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⩾ ( 1 - italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) roman_exp [ - ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG sansserif_E [ ( italic_Z start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG ( 1 - italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) italic_m end_ARG ) ] ,

with an analogous lower bound for α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

Lorden (1970, Section 2) also obtains generalizations of (25) to cases in which the variates Znsubscript𝑍𝑛Z_{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are not necessarily i.i.d. They key property is the sub-additivity of T(a)𝑇𝑎T(a)italic_T ( italic_a ) for which Lorden assumes the sufficient condition

𝖤[(Zn+)2|T(a)n]r𝖤[Zn|T(a)n]𝖤delimited-[]conditionalsuperscriptsuperscriptsubscript𝑍𝑛2𝑇𝑎𝑛𝑟𝖤delimited-[]conditionalsubscript𝑍𝑛𝑇𝑎𝑛{\mathsf{E}}[(Z_{n}^{+})^{2}|T(a)\geqslant n]\leqslant r\cdot{\mathsf{E}}[Z_{n% }|T(a)\geqslant n]sansserif_E [ ( italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | italic_T ( italic_a ) ⩾ italic_n ] ⩽ italic_r ⋅ sansserif_E [ italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_T ( italic_a ) ⩾ italic_n ]

for some factor r𝑟ritalic_r. Under this condition Lorden obtains analogous bounds on 𝖤[Ra]𝖤delimited-[]subscript𝑅𝑎{\mathsf{E}}[R_{a}]sansserif_E [ italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] and bounds on the moments supa0𝖤[(Ra)p]subscriptsupremum𝑎0𝖤delimited-[]superscriptsubscript𝑅𝑎𝑝\sup_{a\geqslant 0}{\mathsf{E}}[(R_{a})^{p}]roman_sup start_POSTSUBSCRIPT italic_a ⩾ 0 end_POSTSUBSCRIPT sansserif_E [ ( italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ] for non-i.i.d. observations Znsubscript𝑍𝑛Z_{n}italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (Lorden 1970, Theorems 2 and 3), as well as bounds on the tail probability 𝖯(Ra>x)𝖯subscript𝑅𝑎𝑥{\mathsf{P}}(R_{a}>x)sansserif_P ( italic_R start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT > italic_x ) for i.i.d. observations (Lorden 1970, Theorem 4).

Other than his seminal 1971 paper on changepoint detection, Lorden (1970) is his most highly cited paper. In addition to its uses in sequential testing, changepoint detection, and renewal theory, it has found applications in reliability theory (Rausand and Hoyland 2003), clinical trial design (Whitehead 1997), finance (Novak 2011), and queuing theory (Kalashnikov 2013), among other applications. Perhaps reflecting its fundamental nature and wealth of applications, Lorden’s Inequality – as (25) has become known – even has its own Wikipedia entry (https://en.wikipedia.org/wiki/Lorden%27s_inequality).

2.3 Lorden’s 2-SPRT and the Kiefer–Weiss Minimax Optimality

Suppose that based on a sequence of independent observations {Xn}n1subscriptsubscript𝑋𝑛𝑛1\{X_{n}\}_{n\geqslant 1}{ italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ⩾ 1 end_POSTSUBSCRIPT with common parametric density fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT one wishes to test the hypothesis 0:θ=θ0:subscript0𝜃subscript𝜃0\operatorname{\mathcal{H}}_{0}:\theta=\theta_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_θ = italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT versus 1:θ=θ1:subscript1𝜃subscript𝜃1\operatorname{\mathcal{H}}_{1}:\theta=\theta_{1}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_θ = italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (θ0<θ1subscript𝜃0subscript𝜃1\theta_{0}<\theta_{1}italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) with error probabilities at most α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Even though the SPRT has the remarkable optimality property of minimizing the expected sample size for both statistical hypotheses 𝖤θi[T]subscript𝖤subscript𝜃𝑖delimited-[]𝑇{\mathsf{E}}_{\theta_{i}}[T]sansserif_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_T ], i=0,1𝑖01i=0,1italic_i = 0 , 1, its performance may be poor when the true parameter value θ=ϑ(θ0,θ1)𝜃italic-ϑsubscript𝜃0subscript𝜃1\theta=\vartheta\in(\theta_{0},\theta_{1})italic_θ = italic_ϑ ∈ ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) differs from putative ones θ0subscript𝜃0\theta_{0}italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT or θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Its expected sample size 𝖤ϑ[T]subscript𝖤italic-ϑdelimited-[]𝑇{\mathsf{E}}_{\vartheta}[T]sansserif_E start_POSTSUBSCRIPT italic_ϑ end_POSTSUBSCRIPT [ italic_T ] can be even much larger than that of the fixed sample size of the Neyman-Pearson test. See, e.g., Section 5.2 in Tartakovsky, Nikiforov, and Basseville (2015). Much work has been directed toward finding sequential tests that reduce the expected sample size of the SPRT for parameter values between the hypotheses.

Let (α0,α1)={δ:αi(δ)αi,i=0,1}subscript𝛼0subscript𝛼1conditional-set𝛿formulae-sequencesubscript𝛼𝑖𝛿subscript𝛼𝑖𝑖01{\mathbb{C}}(\alpha_{0},\alpha_{1})=\{\delta:\alpha_{i}(\delta)\leqslant\alpha% _{i},~{}i=0,1\}blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = { italic_δ : italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_δ ) ⩽ italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 0 , 1 } denote the class of tests with error probabilities at most α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and let

𝖤𝖲𝖲(α0,α1)=infδ(α0,α1)supθ𝖤θ[T]𝖤𝖲𝖲subscript𝛼0subscript𝛼1subscriptinfimum𝛿subscript𝛼0subscript𝛼1subscriptsupremum𝜃subscript𝖤𝜃delimited-[]𝑇{\mathsf{ESS}}(\alpha_{0},\alpha_{1})=\inf_{\delta\in{\mathbb{C}}(\alpha_{0},% \alpha_{1})}\sup_{\theta}{\mathsf{E}}_{\theta}[T]sansserif_ESS ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ]

denote the expected sample size of an optimal test in the class (α0,α1)subscript𝛼0subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) in the worst-case scenario. The problem of finding a test δ0=(T0,d0)subscript𝛿0subscript𝑇0subscript𝑑0\delta_{0}=(T_{0},d_{0})italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) such that supθ𝖤θ[T0]=𝖤𝖲𝖲(α0,α1)subscriptsupremum𝜃subscript𝖤𝜃delimited-[]subscript𝑇0𝖤𝖲𝖲subscript𝛼0subscript𝛼1\sup_{\theta}{\mathsf{E}}_{\theta}[T_{0}]={\mathsf{ESS}}(\alpha_{0},\alpha_{1})roman_sup start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] = sansserif_ESS ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) subject to the error probability constraints α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is known as the Kiefer–Weiss problem. No strictly optimal test has been found so far. Kiefer and Weiss (1957) presented structured results about tests which minimize the expected sample size 𝖤θ[T]subscript𝖤𝜃delimited-[]𝑇{\mathsf{E}}_{\theta}[T]sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] at a selected point θ=ϑ(θ0,θ1)𝜃italic-ϑsubscript𝜃0subscript𝜃1\theta=\vartheta\in(\theta_{0},\theta_{1})italic_θ = italic_ϑ ∈ ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), which is referred to as the modified Kiefer–Weiss problem. Weiss (1962) proved that the Kiefer–Weiss problem reduces to the modified problem in symmetric cases for normal and binomial distributions. Lorden (1976) made a valuable contribution to the modified Kiefer–Weiss problem for two not necessarily parametric hypotheses i:𝖯=𝖯i:subscript𝑖𝖯subscript𝖯𝑖\operatorname{\mathcal{H}}_{i}:{\mathsf{P}}={\mathsf{P}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : sansserif_P = sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=0,1𝑖01i=0,1italic_i = 0 , 1, when the observations X1,X2,subscript𝑋1subscript𝑋2X_{1},X_{2},\dotsitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … are i.i.d. and their true distribution 𝖯2subscript𝖯2{\mathsf{P}}_{2}sansserif_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT may be different from 𝖯0subscript𝖯0{\mathsf{P}}_{0}sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝖯1subscript𝖯1{\mathsf{P}}_{1}sansserif_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Lorden (1976) introduced a simple combination of one-sided SPRTs, called the 2-SPRT, and proved that it is third-order asymptotically optimal. Later, Lorden (1980) proved theorems that characterize the basic structure of optimal sequential tests for the modified Kiefer–Weiss problem. His work has generated several works related to both the modified Kiefer–Weiss problem and the original Kiefer–Weiss problem of minimizing the maximal expected sample size; see, e.g., Huffman (1983), Dragalin and Novikov (1987), and Tartakovsky, Nikiforov, and Basseville (2015, Section 5.3).

Consider the following modified Kiefer–Weiss problem. Let (Ω,,n,𝖯)Ωsubscript𝑛𝖯(\Omega,{\mathscr{F}},{\mathscr{F}}_{n},{\mathsf{P}})( roman_Ω , script_F , script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , sansserif_P ), n+𝑛subscriptn\in\mathbb{Z}_{+}italic_n ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, be a filtered probability space where the sub-σ𝜎\sigmaitalic_σ-algebra n=σ(𝐗n)subscript𝑛𝜎superscript𝐗𝑛{\mathscr{F}}_{n}=\sigma({\mathbf{X}}^{n})script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_σ ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) of {\mathscr{F}}script_F is generated by the observations 𝐗n={Xt, 1tn}superscript𝐗𝑛subscript𝑋𝑡1𝑡𝑛{\mathbf{X}}^{n}=\{X_{t},\>1\leqslant t\leqslant n\}bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = { italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , 1 ⩽ italic_t ⩽ italic_n }. The goal is to test the hypotheses i:𝖯=𝖯i:subscript𝑖𝖯subscript𝖯𝑖\operatorname{\mathcal{H}}_{i}:~{}{\mathsf{P}}={\mathsf{P}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : sansserif_P = sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=0,1𝑖01i=0,1italic_i = 0 , 1, where 𝖯0,𝖯1subscript𝖯0subscript𝖯1{\mathsf{P}}_{0},{\mathsf{P}}_{1}sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , sansserif_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are given probability measures which are locally mutually absolutely continuous. The true probability measure is either one of 𝖯isubscript𝖯𝑖{\mathsf{P}}_{i}sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT or an “intermediate” measure 𝖯2subscript𝖯2{\mathsf{P}}_{2}sansserif_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT which is also locally absolute continuous with respect to 𝖯isubscript𝖯𝑖{\mathsf{P}}_{i}sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Let 𝖰nsuperscript𝖰𝑛{\mathsf{Q}}^{n}sansserif_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be a dominating measure. The observations are i.i.d. under 𝖯0,𝖯1,𝖯2subscript𝖯0subscript𝖯1subscript𝖯2{\mathsf{P}}_{0},{\mathsf{P}}_{1},{\mathsf{P}}_{2}sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , sansserif_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , sansserif_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT so the sample 𝐗n=(X1,,Xn)superscript𝐗𝑛subscript𝑋1subscript𝑋𝑛{\mathbf{X}}^{n}=(X_{1},\dots,X_{n})bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) has joint densities pi(𝐗n)=t=1nfi(Xt)subscript𝑝𝑖superscript𝐗𝑛superscriptsubscriptproduct𝑡1𝑛subscript𝑓𝑖subscript𝑋𝑡p_{i}({\mathbf{X}}^{n})=\prod_{t=1}^{n}f_{i}(X_{t})italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) for i=0,1,2𝑖012i=0,1,2italic_i = 0 , 1 , 2 with respect to 𝖰nsuperscript𝖰𝑛{\mathsf{Q}}^{n}sansserif_Q start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where fi(Xt)subscript𝑓𝑖subscript𝑋𝑡f_{i}(X_{t})italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), t1𝑡1t\geqslant 1italic_t ⩾ 1, are densities for the t𝑡titalic_t-th observation.

For n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N and i=0,1𝑖01i=0,1italic_i = 0 , 1, define the LR and LLR processes

Λi(n)=d𝖯2nd𝖯in(𝐗n)=t=1nf2(Xt)fi(Xt),λi(n)=logΛi(n)=t=1nlog[f2(Xt)fi(Xt)],formulae-sequencesubscriptΛ𝑖𝑛dsuperscriptsubscript𝖯2𝑛dsuperscriptsubscript𝖯𝑖𝑛superscript𝐗𝑛superscriptsubscriptproduct𝑡1𝑛subscript𝑓2subscript𝑋𝑡subscript𝑓𝑖subscript𝑋𝑡subscript𝜆𝑖𝑛subscriptΛ𝑖𝑛superscriptsubscript𝑡1𝑛subscript𝑓2subscript𝑋𝑡subscript𝑓𝑖subscript𝑋𝑡\displaystyle\Lambda_{i}(n)=\frac{{\mathrm{d}}{\mathsf{P}}_{2}^{n}}{{\mathrm{d% }}{\mathsf{P}}_{i}^{n}}({\mathbf{X}}^{n})=\prod_{t=1}^{n}\frac{f_{2}(X_{t})}{f% _{i}(X_{t})},\quad\lambda_{i}(n)=\log\Lambda_{i}(n)=\sum_{t=1}^{n}\log\left[% \frac{f_{2}(X_{t})}{f_{i}(X_{t})}\right],roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_n ) = divide start_ARG roman_d sansserif_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG start_ARG roman_d sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_n ) = roman_log roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_n ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log [ divide start_ARG italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG ] ,

with Λi(0)=1subscriptΛ𝑖01\Lambda_{i}(0)=1roman_Λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 0 ) = 1 and λi(0)=0subscript𝜆𝑖00\lambda_{i}(0)=0italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 0 ) = 0.

Define two parallel one-sided SPRTs

T0=inf{n1:λ1(n)a1},T1=inf{n1:λ0(n)a0}.formulae-sequencesubscript𝑇0infimumconditional-set𝑛1subscript𝜆1𝑛subscript𝑎1subscript𝑇1infimumconditional-set𝑛1subscript𝜆0𝑛subscript𝑎0T_{0}=\inf\left\{n\geqslant 1:\lambda_{1}(n)\geqslant a_{1}\right\},\quad T_{1% }=\inf\left\{n\geqslant 1:\lambda_{0}(n)\geqslant a_{0}\right\}.italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_inf { italic_n ⩾ 1 : italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n ) ⩾ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_inf { italic_n ⩾ 1 : italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_n ) ⩾ italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } . (28)

The stop** time of Lorden’s 2-SPRT (Lorden 1976) is T=min(T0,T1)superscript𝑇subscript𝑇0subscript𝑇1T^{\star}=\min(T_{0},T_{1})italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = roman_min ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and the terminal decision is d=argmini=0,1Tisuperscript𝑑subscript𝑖01subscript𝑇𝑖d^{\star}=\arg\min_{i=0,1}T_{i}italic_d start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_i = 0 , 1 end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. If ai=log(1/αi)subscript𝑎𝑖1subscript𝛼𝑖a_{i}=\log(1/\alpha_{i})italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_log ( 1 / italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), i=0,1𝑖01i=0,1italic_i = 0 , 1, then αi(δ)=𝖯i(di)αisubscript𝛼𝑖superscript𝛿subscript𝖯𝑖superscript𝑑𝑖subscript𝛼𝑖\alpha_{i}(\delta^{\star})={\mathsf{P}}_{i}(d^{\star}\neq i)\leqslant\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) = sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ≠ italic_i ) ⩽ italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i.e., this test belongs to class (α0,α1)={δ:α0(δ)α0,α1(δ)α1}subscript𝛼0subscript𝛼1conditional-set𝛿formulae-sequencesubscript𝛼0𝛿subscript𝛼0subscript𝛼1𝛿subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})=\{\delta:\alpha_{0}(\delta)\leqslant\alpha% _{0},\alpha_{1}(\delta)\leqslant\alpha_{1}\}blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = { italic_δ : italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_δ ) ⩽ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_δ ) ⩽ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }. These upper bounds may be rather conservative. For example, in the symmetric case 𝖯2(d=1)=𝖯2(d=0)=1/2subscript𝖯2superscript𝑑1subscript𝖯2superscript𝑑012{\mathsf{P}}_{2}(d^{\star}=1)={\mathsf{P}}_{2}(d^{\star}=0)=1/2sansserif_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = 1 ) = sansserif_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_d start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = 0 ) = 1 / 2, we have αi(δ)αi/2subscript𝛼𝑖superscript𝛿subscript𝛼𝑖2\alpha_{i}(\delta^{\star})\leqslant\alpha_{i}/2italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ) ⩽ italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / 2.

Let 𝖤2subscript𝖤2{\mathsf{E}}_{2}sansserif_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT denote expectation under 𝖯2subscript𝖯2{\mathsf{P}}_{2}sansserif_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and let Ii=𝖤2[λi(1)]subscript𝐼𝑖subscript𝖤2delimited-[]subscript𝜆𝑖1I_{i}={\mathsf{E}}_{2}[\lambda_{i}(1)]italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = sansserif_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 ) ], i=0,1𝑖01i=0,1italic_i = 0 , 1, denote Kullback–Leibler information numbers. The following theorem, proved by Lorden (1976), establishes third-order asymptotic optimality of Lorden’s 2-SPRT for small probabilities of errors αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Its proof is based on Bayesian arguments. This theorem emerges from Theorem 1 in Lorden (1977a), which was proved a year later.

Theorem 4.

Let the observations {Xn}n1subscriptsubscript𝑋𝑛𝑛1\{X_{n}\}_{n\geqslant 1}{ italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ⩾ 1 end_POSTSUBSCRIPT be i.i.d. under 𝖯isubscript𝖯𝑖{\mathsf{P}}_{i}sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=0,1,2𝑖012i=0,1,2italic_i = 0 , 1 , 2. Assume that the Kullback-Leibler information numbers I0subscript𝐼0I_{0}italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and I1subscript𝐼1I_{1}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are positive and, in addition, the second-moment conditions 𝖤2|λi(1)|2<subscript𝖤2superscriptsubscript𝜆𝑖12{\mathsf{E}}_{2}|\lambda_{i}(1)|^{2}<\inftysansserif_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < ∞, i=0,1𝑖01i=0,1italic_i = 0 , 1, hold. Let α0(a0,a1)subscriptsuperscript𝛼0subscript𝑎0subscript𝑎1\alpha^{\star}_{0}(a_{0},a_{1})italic_α start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and α1(a0,a1)subscriptsuperscript𝛼1subscript𝑎0subscript𝑎1\alpha^{\star}_{1}(a_{0},a_{1})italic_α start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) denote the error probabilities of the 2-SPRT δ(a0,a1)=(T(a0,a1),d(a0,a1))superscript𝛿subscript𝑎0subscript𝑎1superscript𝑇subscript𝑎0subscript𝑎1superscript𝑑subscript𝑎0subscript𝑎1\delta^{\star}(a_{0},a_{1})=(T^{\star}(a_{0},a_{1}),d^{\star}(a_{0},a_{1}))italic_δ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = ( italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_d start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ). Let 𝖤𝖲𝖲(a0,a1)𝖤𝖲𝖲subscript𝑎0subscript𝑎1{\mathsf{ESS}}(a_{0},a_{1})sansserif_ESS ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) denote infimum of the expected sample size 𝖤2[T]subscript𝖤2delimited-[]𝑇{\mathsf{E}}_{2}[T]sansserif_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_T ] over all tests with 𝖯0(d=1)α0(a0,a1)subscript𝖯0𝑑1superscriptsubscript𝛼0subscript𝑎0subscript𝑎1{\mathsf{P}}_{0}(d=1)\leqslant\alpha_{0}^{\star}(a_{0},a_{1})sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_d = 1 ) ⩽ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and 𝖯1(d=0)α1(a0,a1)subscript𝖯1𝑑0superscriptsubscript𝛼1subscript𝑎0subscript𝑎1{\mathsf{P}}_{1}(d=0)\leqslant\alpha_{1}^{\star}(a_{0},a_{1})sansserif_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_d = 0 ) ⩽ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). Then

𝖤𝖲𝖲(a0,a1)=𝖤2[T(a0,a1)]+o(1)asmin(a0,a1),formulae-sequence𝖤𝖲𝖲subscript𝑎0subscript𝑎1subscript𝖤2delimited-[]superscript𝑇subscript𝑎0subscript𝑎1𝑜1assubscript𝑎0subscript𝑎1{\mathsf{ESS}}(a_{0},a_{1})={\mathsf{E}}_{2}[T^{\star}(a_{0},a_{1})]+o(1)\quad% \text{as}~{}\min(a_{0},a_{1})\to\infty,sansserif_ESS ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = sansserif_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] + italic_o ( 1 ) as roman_min ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → ∞ , (29)

where o(1)0𝑜10o(1)\to 0italic_o ( 1 ) → 0 as min(a0,a1)subscript𝑎0subscript𝑎1\min(a_{0},a_{1})\to\inftyroman_min ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → ∞.

This theorem implies that if the thresholds a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the 2-SPRT are selected so that the error probabilities α0(a0,a1)=α0subscriptsuperscript𝛼0subscript𝑎0subscript𝑎1subscript𝛼0\alpha^{\star}_{0}(a_{0},a_{1})=\alpha_{0}italic_α start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and α1(a0,a1)=α1subscriptsuperscript𝛼1subscript𝑎0subscript𝑎1subscript𝛼1\alpha^{\star}_{1}(a_{0},a_{1})=\alpha_{1}italic_α start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are exactly equal to the given values α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, then it is third-order asymptotically optimal as αmax0subscript𝛼max0{\alpha_{\rm max}}\to 0italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT → 0 in the class (α0,α1)subscript𝛼0subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). The requirement of exact error probabilities can also be relaxed to the asymptotic equalities αi(a0,a1)=αi(1+o(1))subscriptsuperscript𝛼𝑖subscript𝑎0subscript𝑎1subscript𝛼𝑖1𝑜1\alpha^{\star}_{i}(a_{0},a_{1})=\alpha_{i}(1+o(1))italic_α start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 + italic_o ( 1 ) ), i=0,1𝑖01i=0,1italic_i = 0 , 1.

The significance of this result cannot be overstated, as Lorden’s simple test is nearly optimal. Simultaneously, the optimal test can be computed using Bellman’s backward induction algorithm since the optimal sequential test is truncated, meaning it has a bounded maximal sample size, as demonstrated by Kiefer and Weiss (1957). For one-parameter exponential families {𝖯θ,θθ}subscript𝖯𝜃𝜃𝜃\{{\mathsf{P}}_{\theta},\theta\in\theta\}{ sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_θ ∈ italic_θ }, the optimal bounds exhibit curvature in the (Sn,n)subscript𝑆𝑛𝑛(S_{n},n)( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_n ) plane, where Sn=t=1nXisubscript𝑆𝑛superscriptsubscript𝑡1𝑛subscript𝑋𝑖S_{n}=\sum_{t=1}^{n}X_{i}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and determining them typically entails substantial computation. In contrast, Lorden’s 2-SPRT approximates optimal curved boundaries with simple linear ones, resulting in a continuation region shaped like a triangle, as illustrated in Figure 2.

Refer to caption
Figure 2: The boundaries h1θ(n)superscriptsubscript1𝜃𝑛h_{1}^{\theta}(n)italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( italic_n ) and h0θ(n)superscriptsubscript0𝜃𝑛h_{0}^{\theta}(n)italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( italic_n ) of the 2222-SPRT (solid) and optimal boundaries (dashed) as functions of n𝑛nitalic_n. Snθ=Snn𝖤θ[X1]superscriptsubscript𝑆𝑛𝜃subscript𝑆𝑛𝑛subscript𝖤𝜃delimited-[]subscript𝑋1S_{n}^{\theta}=S_{n}-n{\mathsf{E}}_{\theta}[X_{1}]italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT = italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ].

Lorden (1976) performed an extensive performance analysis for testing the mean θ𝜃\thetaitalic_θ of the Gaussian distribution Xn𝒩(θ,1)similar-tosubscript𝑋𝑛𝒩𝜃1X_{n}\sim{\mathcal{N}}(\theta,1)italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∼ caligraphic_N ( italic_θ , 1 ) with the hypotheses i:θ=θi:subscript𝑖𝜃subscript𝜃𝑖\operatorname{\mathcal{H}}_{i}:\theta=\theta_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : italic_θ = italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, (i=0,1𝑖01i=0,1italic_i = 0 , 1) in the symmetric case where α0=α1subscript𝛼0subscript𝛼1\alpha_{0}=\alpha_{1}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝖯2𝒩(θ,1)similar-tosubscript𝖯2𝒩subscript𝜃1{\mathsf{P}}_{2}\sim{\mathcal{N}}(\theta_{\star},1)sansserif_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ caligraphic_N ( italic_θ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT , 1 ), θ=(θ0+θ1)/2subscript𝜃subscript𝜃0subscript𝜃12\theta_{\star}=(\theta_{0}+\theta_{1})/2italic_θ start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT = ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) / 2. The conclusion is that the 2222-SPRT performs very closely to the optimal test with its curved boundaries obtained using backward induction. The efficiency depends on the error probabilities, but it was over 99%percent9999\%99 % in all his performed experiments. Similar results were obtained by Huffman (1983) for the exponential example fθ(x)=θeθxsubscript𝑓𝜃𝑥𝜃superscript𝑒𝜃𝑥f_{\theta}(x)=\theta e^{-\theta x}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) = italic_θ italic_e start_POSTSUPERSCRIPT - italic_θ italic_x end_POSTSUPERSCRIPT, x0𝑥0x\geqslant 0italic_x ⩾ 0, θ>0𝜃0\theta>0italic_θ > 0. Here the 2222-SPRT has efficiency over 98%percent9898\%98 % and almost always over 99%percent9999\%99 % for a broad range of error probabilities and parameter values.

The results of Theorem 4 can be extended to the multiple hypothesis case with N+1𝑁1N+1italic_N + 1 hypotheses, N>1𝑁1N>1italic_N > 1. Specifically, the modified matrix SPRT is also third-order asymptotically optimal as αmax=max0iNαi0subscript𝛼maxsubscript0𝑖𝑁subscript𝛼𝑖0{\alpha_{\rm max}}=\max_{0\leqslant i\leqslant N}\alpha_{i}\to 0italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT 0 ⩽ italic_i ⩽ italic_N end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → 0 in the class of tests

(𝜶)={δ:αi(δ)αifori=0,1,,N},𝜶conditional-set𝛿formulae-sequencesubscript𝛼𝑖𝛿subscript𝛼𝑖for𝑖01𝑁{\mathbb{C}}({\bm{\alpha}})=\left\{\delta:\alpha_{i}(\delta)\leqslant\alpha_{i% }~{}\text{for}~{}i=0,1,\dots,N\right\},blackboard_C ( bold_italic_α ) = { italic_δ : italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_δ ) ⩽ italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for italic_i = 0 , 1 , … , italic_N } ,

where αi(δ)=𝖯i(di)subscript𝛼𝑖𝛿subscript𝖯𝑖𝑑𝑖\alpha_{i}(\delta)={\mathsf{P}}_{i}(d\neq i)italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_δ ) = sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_d ≠ italic_i ) and 𝜶=(α0,α1,,αN)𝜶subscript𝛼0subscript𝛼1subscript𝛼𝑁{\bm{\alpha}}=(\alpha_{0},\alpha_{1},\dots,\alpha_{N})bold_italic_α = ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_α start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) is a vector of given error probabilities, c.f. Tartakovsky, Nikiforov, and Basseville (2015, Theorem 5.3.3 (page 240)).

Lorden’s results have sparked further research into the minimax Kiefer–Weiss problem, which aims to establish near-optimal solutions for the least favorable intermediate distribution 𝖯2subscript𝖯2{\mathsf{P}}_{2}sansserif_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT within the single-parameter exponential family. Consider the parametric case 𝖯2=𝖯θsubscript𝖯2subscript𝖯𝜃{\mathsf{P}}_{2}={\mathsf{P}}_{\theta}sansserif_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, 𝖯i=𝖯θisubscript𝖯𝑖subscript𝖯subscript𝜃𝑖{\mathsf{P}}_{i}={\mathsf{P}}_{\theta_{i}}sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = sansserif_P start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT where the hypotheses are 0:𝖯0=𝖯θ0:subscript0subscript𝖯0subscript𝖯subscript𝜃0\operatorname{\mathcal{H}}_{0}:{\mathsf{P}}_{0}={\mathsf{P}}_{\theta_{0}}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = sansserif_P start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and 1:𝖯1=𝖯θ1:subscript1subscript𝖯1subscript𝖯subscript𝜃1\operatorname{\mathcal{H}}_{1}:{\mathsf{P}}_{1}={\mathsf{P}}_{\theta_{1}}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : sansserif_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = sansserif_P start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, θ0<θ1subscript𝜃0subscript𝜃1\theta_{0}<\theta_{1}italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Let θ𝜃\thetaitalic_θ be an arbitrary point belonging to the interval (θ0,θ1)subscript𝜃0subscript𝜃1(\theta_{0},\theta_{1})( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and let δ(θ)=(d(θ),T(θ))superscript𝛿𝜃superscript𝑑𝜃superscript𝑇𝜃\delta^{\star}(\theta)=(d^{\star}(\theta),T^{\star}(\theta))italic_δ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ ) = ( italic_d start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ ) , italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ ) ) denote the 2222-SPRT tuned to θ𝜃\thetaitalic_θ. In other words, T(θ)=min(T0θ,T1θ)superscript𝑇𝜃superscriptsubscript𝑇0𝜃superscriptsubscript𝑇1𝜃T^{\star}(\theta)=\min(T_{0}^{\theta},T_{1}^{\theta})italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ ) = roman_min ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT , italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ), where the Tiθsuperscriptsubscript𝑇𝑖𝜃T_{i}^{\theta}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT’s are defined by (28) with the LLRs λiθ(n)=log[d𝖯θn/d𝖯θin](𝐗n)superscriptsubscript𝜆𝑖𝜃𝑛dsuperscriptsubscript𝖯𝜃𝑛dsuperscriptsubscript𝖯subscript𝜃𝑖𝑛superscript𝐗𝑛\lambda_{i}^{\theta}(n)=\log[{\mathrm{d}}{\mathsf{P}}_{\theta}^{n}/{\mathrm{d}% }{\mathsf{P}}_{\theta_{i}}^{n}]({\mathbf{X}}^{n})italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( italic_n ) = roman_log [ roman_d sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT / roman_d sansserif_P start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ), i=0,1𝑖01i=0,1italic_i = 0 , 1, tuned to θ𝜃\thetaitalic_θ.

Theorem 4 implies that the 2222-SPRT δ(θ)superscript𝛿𝜃\delta^{\star}(\theta)italic_δ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ ) is third-order asymptotically optimal for minimizing 𝖤θ[T]subscript𝖤𝜃delimited-[]𝑇{\mathsf{E}}_{\theta}[T]sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] at the intermediate point θ(θ0,θ1)𝜃subscript𝜃0subscript𝜃1\theta\in(\theta_{0},\theta_{1})italic_θ ∈ ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) when the second moments 𝖤θ|λ0θ(1)|2subscript𝖤𝜃superscriptsuperscriptsubscript𝜆0𝜃12{\mathsf{E}}_{\theta}|\lambda_{0}^{\theta}(1)|^{2}sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT | italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( 1 ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and 𝖤θ|λ1θ(1)|2subscript𝖤𝜃superscriptsuperscriptsubscript𝜆1𝜃12{\mathsf{E}}_{\theta}|\lambda_{1}^{\theta}(1)|^{2}sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT | italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( 1 ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are finite, and the thresholds aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are selected so that the error probabilities are either exactly equal, or at least close, to the given numbers αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, a challenging task. However, setting ai=|logαi|subscript𝑎𝑖subscript𝛼𝑖a_{i}=|\log\alpha_{i}|italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | roman_log italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | embeds the 2222-SPRT into the class (α0,α1)subscript𝛼0subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), and Theorem 4 suggests that if one can find a nearly least favorable point θsuperscript𝜃\theta^{*}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, i.e., θsuperscript𝜃\theta^{*}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT can be selected so that supθ𝖤θ[T(θ)]𝖤θ[T(θ)]subscriptsupremum𝜃subscript𝖤𝜃delimited-[]superscript𝑇superscript𝜃subscript𝖤superscript𝜃delimited-[]superscript𝑇superscript𝜃\sup_{\theta}{\mathsf{E}}_{\theta}[T^{\star}(\theta^{*})]\approx{\mathsf{E}}_{% \theta^{*}}[T^{\star}(\theta^{*})]roman_sup start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ] ≈ sansserif_E start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ], then δ(θ)superscript𝛿superscript𝜃\delta^{\star}(\theta^{*})italic_δ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is an approximate solution to the Kiefer–Weiss problem of minimizing supθ𝖤θ[T]subscriptsupremum𝜃subscript𝖤𝜃delimited-[]𝑇\sup_{\theta}{\mathsf{E}}_{\theta}[T]roman_sup start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ].

For the single-parameter exponential family with density {fθ(x),θΘ}subscript𝑓𝜃𝑥𝜃Θ\{f_{\theta}(x),\theta\in\Theta\}{ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) , italic_θ ∈ roman_Θ }, where

fθ(x)fθ~(x)=exp{(θθ~)x(b(θ)b(θ~))},subscript𝑓𝜃𝑥subscript𝑓~𝜃𝑥𝜃~𝜃𝑥𝑏𝜃𝑏~𝜃\frac{f_{\theta}(x)}{f_{\tilde{\theta}}(x)}=\exp\left\{(\theta-\tilde{\theta})% x-(b(\theta)-b(\tilde{\theta}))\right\},divide start_ARG italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT over~ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT ( italic_x ) end_ARG = roman_exp { ( italic_θ - over~ start_ARG italic_θ end_ARG ) italic_x - ( italic_b ( italic_θ ) - italic_b ( over~ start_ARG italic_θ end_ARG ) ) } , (30)

with b(θ)𝑏𝜃b(\theta)italic_b ( italic_θ ) being a convex and infinitely differentiable function on Θ~Θ~ΘΘ\widetilde{\Theta}\subset\Thetaover~ start_ARG roman_Θ end_ARG ⊂ roman_Θ, it is feasible to identify the nearly least favorable point θ(α0,α1,θ0,θ1)superscript𝜃subscript𝛼0subscript𝛼1subscript𝜃0subscript𝜃1\theta^{*}(\alpha_{0},\alpha_{1},\theta_{0},\theta_{1})italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) such that the 2222-SPRT with thresholds ai=log(1/αi)subscript𝑎𝑖1subscript𝛼𝑖a_{i}=\log(1/\alpha_{i})italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_log ( 1 / italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) achieves second-order asymptotic minimaxity, meaning the residual term in the discrepancy between the expectation of the sample size of the optimal test and the 2-SPRT is of order O(1)𝑂1O(1)italic_O ( 1 ) for small αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Initially addressed by Huffman (1983), who proposed θ(α0,α1,θ0,θ1)superscript𝜃subscript𝛼0subscript𝛼1subscript𝜃0subscript𝜃1\theta^{*}(\alpha_{0},\alpha_{1},\theta_{0},\theta_{1})italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) leading to a residual term of order o(|logα|1/2)𝑜superscript𝛼12o(|\log\alpha|^{1/2})italic_o ( | roman_log italic_α | start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ), this problem was further advanced by Dragalin and Novikov (1987), who demonstrated the second-order optimality of Huffman’s version of the 2-SPRT.

As previously noted, the formulas ai=|logαi|subscript𝑎𝑖subscript𝛼𝑖a_{i}=|\log\alpha_{i}|italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | roman_log italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |, ensuring the inequalities αi(δ(θ))αisubscript𝛼𝑖superscript𝛿𝜃subscript𝛼𝑖\alpha_{i}(\delta^{\star}(\theta))\leqslant\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ ) ) ⩽ italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, tend to be overly conservative. A refinement can be achieved by observing that

α1(δ(θ))subscript𝛼1superscript𝛿𝜃\displaystyle\alpha_{1}(\delta^{\star}(\theta))italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ ) ) =𝖯θ(T=T0θ)ea1𝖤θ{e(a1λ1θ(T0θ))|T=T0θ},absentsubscript𝖯𝜃superscript𝑇superscriptsubscript𝑇0𝜃superscript𝑒subscript𝑎1subscript𝖤𝜃conditional-setsuperscript𝑒subscript𝑎1superscriptsubscript𝜆1𝜃superscriptsubscript𝑇0𝜃superscript𝑇superscriptsubscript𝑇0𝜃\displaystyle={\mathsf{P}}_{\theta}(T^{\star}=T_{0}^{\theta})e^{-a_{1}}{% \mathsf{E}}_{\theta}\left\{e^{-(a_{1}-\lambda_{1}^{\theta}(T_{0}^{\theta}))}|T% ^{\star}=T_{0}^{\theta}\right\},= sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ) italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT { italic_e start_POSTSUPERSCRIPT - ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ) ) end_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT } ,
α0(δ(θ))subscript𝛼0superscript𝛿𝜃\displaystyle\alpha_{0}(\delta^{\star}(\theta))italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_θ ) ) =𝖯θ(T=T1θ)ea0𝖤θ{e(a0λ0θ(T1θ))|T=T1θ}absentsubscript𝖯𝜃superscript𝑇superscriptsubscript𝑇1𝜃superscript𝑒subscript𝑎0subscript𝖤𝜃conditional-setsuperscript𝑒subscript𝑎0superscriptsubscript𝜆0𝜃superscriptsubscript𝑇1𝜃superscript𝑇superscriptsubscript𝑇1𝜃\displaystyle={\mathsf{P}}_{\theta}(T^{\star}=T_{1}^{\theta})e^{-a_{0}}{% \mathsf{E}}_{\theta}\left\{e^{-(a_{0}-\lambda_{0}^{\theta}(T_{1}^{\theta}))}|T% ^{\star}=T_{1}^{\theta}\right\}= sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ) italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT { italic_e start_POSTSUPERSCRIPT - ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ) ) end_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT }

where, asymptotically as aisubscript𝑎𝑖a_{i}\to\inftyitalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → ∞,

𝖤θ{e(a0λ0θ(T1θ))|T=T1θ}ζ0θ,𝖤θ{e(a1λ1θ(T0θ))|T=T0θ}ζ1θ.formulae-sequencesubscript𝖤𝜃conditional-setsuperscript𝑒subscript𝑎0superscriptsubscript𝜆0𝜃superscriptsubscript𝑇1𝜃superscript𝑇superscriptsubscript𝑇1𝜃superscriptsubscript𝜁0𝜃subscript𝖤𝜃conditional-setsuperscript𝑒subscript𝑎1superscriptsubscript𝜆1𝜃superscriptsubscript𝑇0𝜃superscript𝑇superscriptsubscript𝑇0𝜃superscriptsubscript𝜁1𝜃{\mathsf{E}}_{\theta}\left\{e^{-(a_{0}-\lambda_{0}^{\theta}(T_{1}^{\theta}))}|% T^{\star}=T_{1}^{\theta}\right\}\to\zeta_{0}^{\theta},\quad{\mathsf{E}}_{% \theta}\left\{e^{-(a_{1}-\lambda_{1}^{\theta}(T_{0}^{\theta}))}|T^{\star}=T_{0% }^{\theta}\right\}\to\zeta_{1}^{\theta}.sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT { italic_e start_POSTSUPERSCRIPT - ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ) ) end_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT } → italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT , sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT { italic_e start_POSTSUPERSCRIPT - ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ) ) end_POSTSUPERSCRIPT | italic_T start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT } → italic_ζ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT .

For the non-arithmetic case, ζiθsuperscriptsubscript𝜁𝑖𝜃\zeta_{i}^{\theta}italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT can be computed using the renewal-theoretic argument similar to (11) and (13):

ζiθ=1I(θ,θi)exp{n=11n[𝖯θ(λiθ(n)>0)+𝖯i(λiθ(n)0)]},superscriptsubscript𝜁𝑖𝜃1𝐼𝜃subscript𝜃𝑖superscriptsubscript𝑛11𝑛delimited-[]subscript𝖯𝜃superscriptsubscript𝜆𝑖𝜃𝑛0subscript𝖯𝑖superscriptsubscript𝜆𝑖𝜃𝑛0\zeta_{i}^{\theta}=\frac{1}{I(\theta,\theta_{i})}\exp\left\{-\sum_{n=1}^{% \infty}\frac{1}{n}\left[{\mathsf{P}}_{\theta}(\lambda_{i}^{\theta}(n)>0)+{% \mathsf{P}}_{i}(\lambda_{i}^{\theta}(n)\leqslant 0)\right]\right\},italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_I ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG roman_exp { - ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG [ sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( italic_n ) > 0 ) + sansserif_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( italic_n ) ⩽ 0 ) ] } ,

where I(θ,θi)=(θθi)b.(θ)(b(θ)b(θi))𝐼𝜃subscript𝜃𝑖𝜃subscript𝜃𝑖bold-.𝑏𝜃𝑏𝜃𝑏subscript𝜃𝑖I(\theta,\theta_{i})=(\theta-\theta_{i})\overset{\bm{.}}{b}(\theta)-(b(\theta)% -b(\theta_{i}))italic_I ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( italic_θ - italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) overbold_. start_ARG italic_b end_ARG ( italic_θ ) - ( italic_b ( italic_θ ) - italic_b ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) are the Kullback–Leibler numbers.

The efficiency of Lorden’s 2-SPRT in stop** to both reject and accept the null was appealing in clinical trial designs, where these actions are known as efficacy and futility stop**. Lai and Shih (2004) extended the 2-SPRT from the fully sequential setting to the group sequential setting while maintaining its efficiency, and balancing the tradeoff between efficacy and futility stop**.

2.4 Near Uniform Optimality of the GLR SPRT for Composite Hypotheses

For practical purposes, it is considerably more significant to devise tests that minimize the expected sample size 𝖤θ[T]subscript𝖤𝜃delimited-[]𝑇{\mathsf{E}}_{\theta}[T]sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] for all possible parameter values (i.e., uniformly optimal) rather than to address the minimax Kiefer–Weiss problem of minimizing 𝖤θ[T]subscript𝖤𝜃delimited-[]𝑇{\mathsf{E}}_{\theta}[T]sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] at a least favorable point. In this section, our primary objective is to explore the design of sequential tests that are at least approximately uniformly optimal for small error probabilities or asymptotically Bayesian for a small cost of observations for testing composite hypotheses.

Consider a sequence of i.i.d. observations X1,X2,subscript𝑋1subscript𝑋2X_{1},X_{2},\dotsitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … originating from a common distribution 𝖯θsubscript𝖯𝜃{\mathsf{P}}_{\theta}sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT with density fθ(x)subscript𝑓𝜃𝑥f_{\theta}(x)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) with respect to some non-degenerate sigma-finite measure, where the \ellroman_ℓ-dimensional parameter θ=(θ1,,θ)𝜃subscript𝜃1subscript𝜃\theta=(\theta_{1},\dots,\theta_{\ell})italic_θ = ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) belongs to a subset ΘΘ\Thetaroman_Θ of the Euclidean space superscript\mathbb{R}^{\ell}blackboard_R start_POSTSUPERSCRIPT roman_ℓ end_POSTSUPERSCRIPT. The parameter space ΘΘ\Thetaroman_Θ is partitioned into 3333 disjoint sets Θ0,Θ1subscriptΘ0subscriptΘ1\Theta_{0},\Theta_{1}roman_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝖨insubscript𝖨in{\mathsf{I}_{\mathrm{in}}}sansserif_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT, i.e., Θ=Θ0Θ2𝖨inΘsubscriptΘ0subscriptΘ2subscript𝖨in\Theta=\Theta_{0}\,{\textstyle\bigcup}\,\Theta_{2}\,{\textstyle\bigcup}\,{% \mathsf{I}_{\mathrm{in}}}roman_Θ = roman_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋃ roman_Θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋃ sansserif_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT. The objective is to test the two composite hypotheses 0:θΘ0:subscript0𝜃subscriptΘ0\operatorname{\mathcal{H}}_{0}:\theta\in\Theta_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT against 1:θΘ1:subscript1𝜃subscriptΘ1\operatorname{\mathcal{H}}_{1}:\theta\in\Theta_{1}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The subset 𝖨insubscript𝖨in{\mathsf{I}_{\mathrm{in}}}sansserif_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT of ΘΘ\Thetaroman_Θ denotes an indifference zone where the loss L(θ,d)𝐿𝜃𝑑L(\theta,d)italic_L ( italic_θ , italic_d ) associated with correct or incorrect decisions d𝑑ditalic_d is zero, i.e., no constraints on the probabilities 𝖯θ(d=i)subscript𝖯𝜃𝑑𝑖{\mathsf{P}}_{\theta}(d=i)sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_d = italic_i ) are imposed if θ𝖨in𝜃subscript𝖨in\theta\in{\mathsf{I}_{\mathrm{in}}}italic_θ ∈ sansserif_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT. The introduction of an indifference zone is typically motivated by the recognition that in many applications, the correct action is not crucial and often not even feasible when the hypotheses are very close. However, in principle 𝖨insubscript𝖨in{\mathsf{I}_{\mathrm{in}}}sansserif_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT may be an empty set.

We aim to find a sequential test δ=(T,d)𝛿𝑇𝑑\delta=(T,d)italic_δ = ( italic_T , italic_d ) that minimizes the expected sample size 𝖤θ[T]subscript𝖤𝜃delimited-[]𝑇{\mathsf{E}}_{\theta}[T]sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] uniformly for all θΘ𝜃Θ\theta\in\Thetaitalic_θ ∈ roman_Θ in the class of tests (α0,α1)subscript𝛼0subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) in which the maximal error probabilities supθΘi𝖯θ(di)subscriptsupremum𝜃subscriptΘ𝑖subscript𝖯𝜃𝑑𝑖\sup_{\theta\in\Theta_{i}}{\mathsf{P}}_{\theta}(d\neq i)roman_sup start_POSTSUBSCRIPT italic_θ ∈ roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_d ≠ italic_i ) are upper-bounded by the given values:

(α0,α1)={δ:supθΘ0𝖯θ(d=1)α0andsupθΘ1𝖯θ(d=0)α1}.subscript𝛼0subscript𝛼1conditional-set𝛿subscriptsupremum𝜃subscriptΘ0subscript𝖯𝜃𝑑1subscript𝛼0andsubscriptsupremum𝜃subscriptΘ1subscript𝖯𝜃𝑑0subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})=\left\{\delta:\sup_{\theta\in\Theta_{0}}{% \mathsf{P}}_{\theta}(d=1)\leqslant\alpha_{0}~{}~{}\text{and}~{}~{}\sup_{\theta% \in\Theta_{1}}{\mathsf{P}}_{\theta}(d=0)\leqslant\alpha_{1}\right\}.blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = { italic_δ : roman_sup start_POSTSUBSCRIPT italic_θ ∈ roman_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_d = 1 ) ⩽ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and roman_sup start_POSTSUBSCRIPT italic_θ ∈ roman_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_d = 0 ) ⩽ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } . (31)

Thus, we are interested in the frequentist problem of finding a test δoptsubscript𝛿opt\delta_{\rm opt}italic_δ start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT such that

infδ(α0,α1)𝖤θ[T]=𝖤θ[Topt]uniformly inθΘ.formulae-sequencesubscriptinfimum𝛿subscript𝛼0subscript𝛼1subscript𝖤𝜃delimited-[]𝑇subscript𝖤𝜃delimited-[]subscript𝑇optuniformly in𝜃Θ\inf_{\delta\in{\mathbb{C}}(\alpha_{0},\alpha_{1})}{\mathsf{E}}_{\theta}[T]={% \mathsf{E}}_{\theta}[T_{\rm opt}]\quad\text{uniformly in}~{}~{}\theta\in\Theta.roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] = sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT ] uniformly in italic_θ ∈ roman_Θ . (32)

Unfortunately, such a uniformly optimal solution does not exist, and one has to resort to finding asymptotic approximations for small error probabilities. In the frequentist setting, it is possible to find first-order asymptotically optimal tests that satisfy

limαmax0infδ(α0,α1)𝖤θ[T]𝖤θ[T]=1for allθΘ.formulae-sequencesubscriptsubscript𝛼max0subscriptinfimum𝛿subscript𝛼0subscript𝛼1subscript𝖤𝜃delimited-[]𝑇subscript𝖤𝜃delimited-[]𝑇1for all𝜃Θ\lim_{{\alpha_{\rm max}}\to 0}\frac{\inf_{\delta\in{\mathbb{C}}(\alpha_{0},% \alpha_{1})}{\mathsf{E}}_{\theta}[T]}{{\mathsf{E}}_{\theta}[T]}=1\quad\text{% for all}~{}~{}\theta\in\Theta.roman_lim start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT → 0 end_POSTSUBSCRIPT divide start_ARG roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] end_ARG start_ARG sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] end_ARG = 1 for all italic_θ ∈ roman_Θ . (33)

In addition to the frequentist problems (32)-(33), it is of interest to consider a Bayesian approach putting an a priori distribution W(θ)𝑊𝜃W(\theta)italic_W ( italic_θ ) on ΘΘ\Thetaroman_Θ with a cost c𝑐citalic_c per observation and a loss function L(θ)𝐿𝜃L(\theta)italic_L ( italic_θ ) at the point θ𝜃\thetaitalic_θ associated with accepting the incorrect hypothesis and find asymptotically optimal tests when the cost c𝑐citalic_c is small. The Bayes average (integrated) risk of a sequential test δ=(T,d)𝛿𝑇𝑑\delta=(T,d)italic_δ = ( italic_T , italic_d ) is

ρcW(δ)=θθ0L(θ)𝖯θ(d=1)W(dθ)+θθ1L(θ)𝖯θ(d=0)W(dθ)+cΘ𝖤θ[T]W(dθ).superscriptsubscript𝜌𝑐𝑊𝛿subscript𝜃subscript𝜃0𝐿𝜃subscript𝖯𝜃𝑑1𝑊d𝜃subscript𝜃subscript𝜃1𝐿𝜃subscript𝖯𝜃𝑑0𝑊d𝜃𝑐subscriptΘsubscript𝖤𝜃delimited-[]𝑇𝑊d𝜃\rho_{c}^{W}(\delta)=\int_{\theta\leqslant\theta_{0}}L(\theta){\mathsf{P}}_{% \theta}(d=1)\,W({\mathrm{d}}\theta)+\int_{\theta\geqslant\theta_{1}}L(\theta){% \mathsf{P}}_{\theta}(d=0)\,W({\mathrm{d}}\theta)+c\int_{\Theta}{\mathsf{E}}_{% \theta}[T]\,W({\mathrm{d}}\theta).italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) = ∫ start_POSTSUBSCRIPT italic_θ ⩽ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L ( italic_θ ) sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_d = 1 ) italic_W ( roman_d italic_θ ) + ∫ start_POSTSUBSCRIPT italic_θ ⩾ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L ( italic_θ ) sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_d = 0 ) italic_W ( roman_d italic_θ ) + italic_c ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] italic_W ( roman_d italic_θ ) .

It turns out that in the Bayesian context, it is possible to find tests that are not only asymptotically (as c0𝑐0c\to 0italic_c → 0) first-order optimal, infδρcW(δ)=ρcW(δ)(1+o(1))subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑊𝛿superscriptsubscript𝜌𝑐𝑊𝛿1𝑜1\inf_{\delta}\rho_{c}^{W}(\delta)=\rho_{c}^{W}(\delta)(1+o(1))roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) = italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) ( 1 + italic_o ( 1 ) ), but also second-order optimal, i.e., infδρcW(δ)=ρcW(δ)+O(c)subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑊𝛿superscriptsubscript𝜌𝑐𝑊𝛿𝑂𝑐\inf_{\delta}\rho_{c}^{W}(\delta)=\rho_{c}^{W}(\delta)+O(c)roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) = italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) + italic_O ( italic_c ) and even third-order optimal, i.e., infδρcW(δ)=ρcW(δ)+o(c)subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑊𝛿superscriptsubscript𝜌𝑐𝑊𝛿𝑜𝑐\inf_{\delta}\rho_{c}^{W}(\delta)=\rho_{c}^{W}(\delta)+o(c)roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) = italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) + italic_o ( italic_c ).

In the case of the one-parameter exponential family (30), using optimal stop** theory, it can be shown that the optimal Bayesian test δopt=(Topt,dopt)subscript𝛿optsubscript𝑇optsubscript𝑑opt\delta_{\rm opt}=(T_{\rm opt},d_{\rm opt})italic_δ start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT = ( italic_T start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT ) is

Topt=inf{n1:(Sn,n)c},dopt=jif(Sn,n)cj,j=0,1,formulae-sequenceformulae-sequencesubscript𝑇optinfimumconditional-set𝑛1subscript𝑆𝑛𝑛subscript𝑐subscript𝑑opt𝑗ifsubscript𝑆𝑛𝑛superscriptsubscript𝑐𝑗𝑗01T_{\rm opt}=\inf\left\{n\geqslant 1:(S_{n},n)\in\mathcal{B}_{c}\right\},\quad d% _{\rm opt}=j~{}~{}\text{if}~{}(S_{n},n)\in\mathcal{B}_{c}^{j},\quad j=0,1,italic_T start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT = roman_inf { italic_n ⩾ 1 : ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_n ) ∈ caligraphic_B start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT } , italic_d start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT = italic_j if ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_n ) ∈ caligraphic_B start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_j = 0 , 1 ,

where Sn=X1++Xnsubscript𝑆𝑛subscript𝑋1subscript𝑋𝑛S_{n}=X_{1}+\cdots+X_{n}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋯ + italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and c=c0c1subscript𝑐superscriptsubscript𝑐0superscriptsubscript𝑐1\mathcal{B}_{c}=\mathcal{B}_{c}^{0}\,{\textstyle\bigcup}\,\mathcal{B}_{c}^{1}caligraphic_B start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = caligraphic_B start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⋃ caligraphic_B start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT is a set that can be found numerically.

Schwarz (1962) derived the test δ(θ^)superscript𝛿^𝜃\delta^{\star}(\hat{\theta})italic_δ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( over^ start_ARG italic_θ end_ARG ) with θ^={θ^n}^𝜃subscript^𝜃𝑛\hat{\theta}=\{\hat{\theta}_{n}\}over^ start_ARG italic_θ end_ARG = { over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } being the maximum likelihood estimator (MLE) of θ𝜃\thetaitalic_θ, as an asymptotic solution as c0𝑐0c\to 0italic_c → 0 to the Bayesian problem with the 01010-10 - 1 loss function. Specifically, the a posteriori risk of stop** is

Rnst(Sn)=mini=0,1{Θiexp{θSnnb(θ)}W(dθ)Θexp{θSnnb(θ)}W(dθ)},superscriptsubscript𝑅𝑛stsubscript𝑆𝑛subscript𝑖01subscriptsubscriptΘ𝑖𝜃subscript𝑆𝑛𝑛𝑏𝜃𝑊d𝜃subscriptΘ𝜃subscript𝑆𝑛𝑛𝑏𝜃𝑊d𝜃R_{n}^{\rm st}(S_{n})=\min_{i=0,1}\left\{\frac{\int_{\Theta_{i}}\exp\left\{% \theta S_{n}-nb(\theta)\right\}\,W({\mathrm{d}}\theta)}{\int_{\Theta}\exp\left% \{\theta S_{n}-nb(\theta)\right\}\,W({\mathrm{d}}\theta)}\right\},italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_st end_POSTSUPERSCRIPT ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = roman_min start_POSTSUBSCRIPT italic_i = 0 , 1 end_POSTSUBSCRIPT { divide start_ARG ∫ start_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp { italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ ) } italic_W ( roman_d italic_θ ) end_ARG start_ARG ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT roman_exp { italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ ) } italic_W ( roman_d italic_θ ) end_ARG } , (34)

where Θ0={θθ0}subscriptΘ0𝜃subscript𝜃0\Theta_{0}=\{\theta\leqslant\theta_{0}\}roman_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = { italic_θ ⩽ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT }, Θ1={θθ1}subscriptΘ1𝜃subscript𝜃1\Theta_{1}=\{\theta\geqslant\theta_{1}\}roman_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = { italic_θ ⩾ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }. Schwarz showed that c/|logc|0subscript𝑐𝑐subscript0\mathcal{B}_{c}/|\log c|\to\mathcal{B}_{0}caligraphic_B start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT / | roman_log italic_c | → caligraphic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as c0𝑐0c\to 0italic_c → 0 and proposed a simple procedure: continue sampling until Rnst(Sn)superscriptsubscript𝑅𝑛stsubscript𝑆𝑛R_{n}^{\rm st}(S_{n})italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_st end_POSTSUPERSCRIPT ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) is less than c𝑐citalic_c and upon stop** accept the hypothesis for which the minimum is attained in (34). Denote this procedure by δ~(c)=(T~(c),d~(c))~𝛿𝑐~𝑇𝑐~𝑑𝑐\widetilde{\delta}(c)=(\widetilde{T}(c),\widetilde{d}(c))over~ start_ARG italic_δ end_ARG ( italic_c ) = ( over~ start_ARG italic_T end_ARG ( italic_c ) , over~ start_ARG italic_d end_ARG ( italic_c ) ). Applying Laplace’s asymptotic integration method to evaluate the integrals in (34) leads to the likelihood ratio test where the true parameter is replaced by the MLE θ^nsubscript^𝜃𝑛\hat{\theta}_{n}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. This approximation prescribes stop** sampling at the time T^(θ^)=min(T^0(θ^),T^1(θ^))^𝑇^𝜃subscript^𝑇0^𝜃subscript^𝑇1^𝜃\widehat{T}(\hat{\theta})=\min(\widehat{T}_{0}(\hat{\theta}),\widehat{T}_{1}(% \hat{\theta}))over^ start_ARG italic_T end_ARG ( over^ start_ARG italic_θ end_ARG ) = roman_min ( over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) , over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) ), where

T^i(θ^)=inf{n:supθΘ[θSnnb(θ)][θiSnnb(θi)]|logc|}.subscript^𝑇𝑖^𝜃infimumconditional-set𝑛subscriptsupremum𝜃Θdelimited-[]𝜃subscript𝑆𝑛𝑛𝑏𝜃delimited-[]subscript𝜃𝑖subscript𝑆𝑛𝑛𝑏subscript𝜃𝑖𝑐\displaystyle\widehat{T}_{i}(\hat{\theta})=\inf\left\{n:\sup_{\theta\in\Theta}% [\theta S_{n}-nb(\theta)]-[\theta_{i}S_{n}-nb(\theta_{i})]\geqslant|\log c|% \right\}.over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) = roman_inf { italic_n : roman_sup start_POSTSUBSCRIPT italic_θ ∈ roman_Θ end_POSTSUBSCRIPT [ italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ ) ] - [ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] ⩾ | roman_log italic_c | } . (35)

The terminal decision rule d^(θ^)^𝑑^𝜃\hat{d}(\hat{\theta})over^ start_ARG italic_d end_ARG ( over^ start_ARG italic_θ end_ARG ) of the test δ^(θ^)=(T^(θ^),d^(θ^))^𝛿^𝜃^𝑇^𝜃^𝑑^𝜃\hat{\delta}(\hat{\theta})=(\widehat{T}(\hat{\theta}),\hat{d}(\hat{\theta}))over^ start_ARG italic_δ end_ARG ( over^ start_ARG italic_θ end_ARG ) = ( over^ start_ARG italic_T end_ARG ( over^ start_ARG italic_θ end_ARG ) , over^ start_ARG italic_d end_ARG ( over^ start_ARG italic_θ end_ARG ) ) accepts 0subscript0\operatorname{\mathcal{H}}_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT if θ^T^<θsubscript^𝜃^𝑇superscript𝜃\hat{\theta}_{\widehat{T}}<\theta^{*}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_T end_ARG end_POSTSUBSCRIPT < italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, where θsuperscript𝜃\theta^{*}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is such that I(θ,θ0)=I(θ,θ1)𝐼superscript𝜃subscript𝜃0𝐼superscript𝜃subscript𝜃1I(\theta^{*},\theta_{0})=I(\theta^{*},\theta_{1})italic_I ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_I ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). Note also that

T^=inf{n1:nmax[I(θ^n,θ0),I(θ^n,θ1)]|logc|}.^𝑇infimumconditional-set𝑛1𝑛𝐼subscript^𝜃𝑛subscript𝜃0𝐼subscript^𝜃𝑛subscript𝜃1𝑐\widehat{T}=\inf\left\{n\geqslant 1:n\max[I(\hat{\theta}_{n},\theta_{0}),I(% \hat{\theta}_{n},\theta_{1})]\geqslant|\log c|\right\}.over^ start_ARG italic_T end_ARG = roman_inf { italic_n ⩾ 1 : italic_n roman_max [ italic_I ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_I ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] ⩾ | roman_log italic_c | } . (36)

The tests which use the maximum likelihood estimators of unknown parameters are usually referred to as the Generalized Sequential Likelihood Ratio Tests (GSLRT).

Wong (1968) showed that the GSLRT δ^^𝛿\hat{\delta}over^ start_ARG italic_δ end_ARG is first-order asymptotically Bayes as c0𝑐0c\to 0italic_c → 0:

ρcW(δ^)infδρcW(δ)c|logc|ΘW(dθ)Imax(θ),𝖤θ[T^]|logc|Imax(θ)for everyθΘ,formulae-sequencesimilar-tosuperscriptsubscript𝜌𝑐𝑊^𝛿subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑊𝛿similar-to𝑐𝑐subscriptΘ𝑊d𝜃subscript𝐼𝜃formulae-sequencesimilar-tosubscript𝖤𝜃delimited-[]^𝑇𝑐subscript𝐼𝜃for every𝜃Θ\rho_{c}^{W}(\hat{\delta})\sim\inf_{\delta}\rho_{c}^{W}(\delta)\sim c|\log c|% \int_{\Theta}\frac{W({\mathrm{d}}\theta)}{I_{\max}(\theta)},\quad{\mathsf{E}}_% {\theta}[\widehat{T}]\sim\frac{|\log c|}{I_{\max}(\theta)}\quad\text{for every% }~{}~{}\theta\in\Theta,italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( over^ start_ARG italic_δ end_ARG ) ∼ roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) ∼ italic_c | roman_log italic_c | ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT divide start_ARG italic_W ( roman_d italic_θ ) end_ARG start_ARG italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_θ ) end_ARG , sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ over^ start_ARG italic_T end_ARG ] ∼ divide start_ARG | roman_log italic_c | end_ARG start_ARG italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_θ ) end_ARG for every italic_θ ∈ roman_Θ ,

where Imax(θ)=max{I(θ,θ0),I(θ,θ1)}subscript𝐼𝜃𝐼𝜃subscript𝜃0𝐼𝜃subscript𝜃1I_{\max}(\theta)=\max\left\{I(\theta,\theta_{0}),I(\theta,\theta_{1})\right\}italic_I start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_θ ) = roman_max { italic_I ( italic_θ , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_I ( italic_θ , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) }.

Kiefer and Sacks (1963) showed that the procedure δ~(c)=(T~(c),d~(c))~𝛿𝑐~𝑇𝑐~𝑑𝑐\widetilde{\delta}(c)=(\widetilde{T}(c),\widetilde{d}(c))over~ start_ARG italic_δ end_ARG ( italic_c ) = ( over~ start_ARG italic_T end_ARG ( italic_c ) , over~ start_ARG italic_d end_ARG ( italic_c ) ) with the stop** time T~(c)=inf{n1:Rnst(Sn)c},~𝑇𝑐infimumconditional-set𝑛1superscriptsubscript𝑅𝑛stsubscript𝑆𝑛𝑐\widetilde{T}(c)=\inf\left\{n\geqslant 1:R_{n}^{\rm st}(S_{n})\leqslant c% \right\},over~ start_ARG italic_T end_ARG ( italic_c ) = roman_inf { italic_n ⩾ 1 : italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_st end_POSTSUPERSCRIPT ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⩽ italic_c } , proposed by Schwarz (1962), is also first-order asymptotically Bayes. In other words, for any prior distribution W𝑊Witalic_W, ρcW(δ~(c))superscriptsubscript𝜌𝑐𝑊~𝛿𝑐\rho_{c}^{W}(\widetilde{\delta}(c))italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( over~ start_ARG italic_δ end_ARG ( italic_c ) ) behaves asymptotically like infδρcW(δ)subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑊𝛿\inf_{\delta}\rho_{c}^{W}(\delta)roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) as c0𝑐0c\to 0italic_c → 0. Lorden (1967) refined this result by introducing the stop** region as the first n𝑛nitalic_n such that Rnst(Sn)Qcsuperscriptsubscript𝑅𝑛stsubscript𝑆𝑛𝑄𝑐R_{n}^{\rm st}(S_{n})\leqslant Qcitalic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_st end_POSTSUPERSCRIPT ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⩽ italic_Q italic_c, where Q𝑄Qitalic_Q is a positive constant, and demonstrated that it can be made second-order asymptotically optimal, i.e., infδρcW(δ)=ρcW(δ~(Qc))+O(c)subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑊𝛿superscriptsubscript𝜌𝑐𝑊~𝛿𝑄𝑐𝑂𝑐\inf_{\delta}\rho_{c}^{W}(\delta)=\rho_{c}^{W}(\widetilde{\delta}(Qc))+O(c)roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) = italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( over~ start_ARG italic_δ end_ARG ( italic_Q italic_c ) ) + italic_O ( italic_c ) as c0𝑐0c\to 0italic_c → 0, while infδρcW(δ)=O(c|logc|)subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑊𝛿𝑂𝑐𝑐\inf_{\delta}\rho_{c}^{W}(\delta)=O(c|\log c|)roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT ( italic_δ ) = italic_O ( italic_c | roman_log italic_c | ). It’s noteworthy that the problem addressed by Lorden (1967) is more general than what we are discussing here since it encompasses general i.i.d. models, not limited to exponential families, and multiple-decision cases. Additionally, see Lorden (1972) for multiple hypotheses in one-parameter exponential families.

A significant advancement in Bayesian theory for testing separated hypotheses about the parameter of the one-parameter exponential family (30) was made by Lorden (1977b) (an unpublished manuscript). In this work, Lorden demonstrated that the family of GSLRTs can be devised to ensure third-order asymptotic optimality. This implies that they achieve the Bayes risk to within o(c)𝑜𝑐o(c)italic_o ( italic_c ) as c0𝑐0c\to 0italic_c → 0.

Lorden provided sufficient conditions for families of tests to be third-order asymptotically Bayes and presented examples of such procedures based not only on the Generalized Likelihood Ratio (GLR) approach but also on mixtures of likelihood ratios. Furthermore, the error probabilities of the GSLRTs were evaluated asymptotically as a consequence of a general theorem on boundary-crossing probabilities.

Due to the significance of this work, let’s delve into a more detailed overview of Lorden’s theory. It’s worth noting that the paper by Lorden (1977b) extends the results obtained by Lorden (1977a) for multiple discrete cases, which we discussed in Subsection 2.1.2, to the continuous parameter case.

The hypotheses to be tested are 0:θ¯θθ0:subscript0¯𝜃𝜃subscript𝜃0\operatorname{\mathcal{H}}_{0}:\underline{\theta}\leqslant\theta\leqslant% \theta_{0}caligraphic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : under¯ start_ARG italic_θ end_ARG ⩽ italic_θ ⩽ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 1:θ¯θθ1:subscript1¯𝜃𝜃subscript𝜃1\operatorname{\mathcal{H}}_{1}:\overline{\theta}\geqslant\theta\geqslant\theta% _{1}caligraphic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : over¯ start_ARG italic_θ end_ARG ⩾ italic_θ ⩾ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, where θ¯¯𝜃\underline{\theta}under¯ start_ARG italic_θ end_ARG and θ¯¯𝜃\overline{\theta}over¯ start_ARG italic_θ end_ARG are interior points of the natural parameter space ΘΘ\Thetaroman_Θ. Let θ^n[θ¯,θ¯]subscript^𝜃𝑛¯𝜃¯𝜃\hat{\theta}_{n}\in[\underline{\theta},\overline{\theta}]over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ [ under¯ start_ARG italic_θ end_ARG , over¯ start_ARG italic_θ end_ARG ] be the MLE that maximizes the likelihood over θ𝜃\thetaitalic_θ in [θ¯,θ¯]¯𝜃¯𝜃[\underline{\theta},\overline{\theta}][ under¯ start_ARG italic_θ end_ARG , over¯ start_ARG italic_θ end_ARG ]. Lorden’s GSLRT stops at T^^𝑇\widehat{T}over^ start_ARG italic_T end_ARG which is the minimum of the Markov times T^0,T^1subscript^𝑇0subscript^𝑇1\widehat{T}_{0},\widehat{T}_{1}over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT defined as

T^0(θ^)subscript^𝑇0^𝜃\displaystyle\widehat{T}_{0}(\hat{\theta})over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) =inf{n1:k=1nlog[fθ^n(Xk)fθ0(Xk)h0(θ^n)]aandθ^nθ},absentinfimumconditional-set𝑛1superscriptsubscript𝑘1𝑛subscript𝑓subscript^𝜃𝑛subscript𝑋𝑘subscript𝑓subscript𝜃0subscript𝑋𝑘subscript0subscript^𝜃𝑛𝑎andsubscript^𝜃𝑛superscript𝜃\displaystyle=\inf\left\{n\geqslant 1:\sum_{k=1}^{n}\log\left[\frac{f_{\hat{% \theta}_{n}}(X_{k})}{f_{\theta_{0}}(X_{k})}~{}h_{0}(\hat{\theta}_{n})\right]% \geqslant a~{}\text{and}~{}\hat{\theta}_{n}\geqslant\theta^{*}\right\},= roman_inf { italic_n ⩾ 1 : ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log [ divide start_ARG italic_f start_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] ⩾ italic_a and over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩾ italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } , (37)
T^1(θ^)subscript^𝑇1^𝜃\displaystyle\widehat{T}_{1}(\hat{\theta})over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) =inf{n1:k=1nlog[fθ^n(Xk)fθ1(Xk)h1(θ^n)]aandθ^nθ},absentinfimumconditional-set𝑛1superscriptsubscript𝑘1𝑛subscript𝑓subscript^𝜃𝑛subscript𝑋𝑘subscript𝑓subscript𝜃1subscript𝑋𝑘subscript1subscript^𝜃𝑛𝑎andsubscript^𝜃𝑛superscript𝜃\displaystyle=\inf\left\{n\geqslant 1:\sum_{k=1}^{n}\log\left[\frac{f_{\hat{% \theta}_{n}}(X_{k})}{f_{\theta_{1}}(X_{k})}~{}h_{1}(\hat{\theta}_{n})\right]% \geqslant a~{}\text{and}~{}\hat{\theta}_{n}\leqslant\theta^{*}\right\},= roman_inf { italic_n ⩾ 1 : ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log [ divide start_ARG italic_f start_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] ⩾ italic_a and over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ,

where a𝑎aitalic_a is a threshold, θsuperscript𝜃\theta^{*}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT satisfies I(θ,θ0)=I(θ,θ1)𝐼superscript𝜃subscript𝜃0𝐼superscript𝜃subscript𝜃1I(\theta^{*},\theta_{0})=I(\theta^{*},\theta_{1})italic_I ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_I ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), and h0,h1subscript0subscript1h_{0},h_{1}italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are positive continuous functions on [θ,θ¯],[θ¯,θ]superscript𝜃¯𝜃¯𝜃superscript𝜃[\theta^{*},\overline{\theta}],[\underline{\theta},\theta^{*}][ italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , over¯ start_ARG italic_θ end_ARG ] , [ under¯ start_ARG italic_θ end_ARG , italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ], respectively. The hypothesis isubscript𝑖\operatorname{\mathcal{H}}_{i}caligraphic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is rejected when T^=T^i^𝑇subscript^𝑇𝑖\widehat{T}=\widehat{T}_{i}over^ start_ARG italic_T end_ARG = over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. To summarize, Lorden’s family of GSPRTs is defined as

T^(θ^)=min{T^0(θ^),T^1(θ^)},d^={0ifT^(θ^)=T^1(θ^)1ifT^(θ^)=T^0(θ^),formulae-sequence^𝑇^𝜃subscript^𝑇0^𝜃subscript^𝑇1^𝜃^𝑑cases0if^𝑇^𝜃subscript^𝑇1^𝜃1if^𝑇^𝜃subscript^𝑇0^𝜃\widehat{T}(\hat{\theta})=\min\left\{\widehat{T}_{0}(\hat{\theta}),\widehat{T}% _{1}(\hat{\theta})\right\},\quad\hat{d}=\begin{cases}0&\text{if}~{}~{}\widehat% {T}(\hat{\theta})=\widehat{T}_{1}(\hat{\theta})\\ 1&\text{if}~{}\widehat{T}(\hat{\theta})=\widehat{T}_{0}(\hat{\theta})\end{% cases},over^ start_ARG italic_T end_ARG ( over^ start_ARG italic_θ end_ARG ) = roman_min { over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) , over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) } , over^ start_ARG italic_d end_ARG = { start_ROW start_CELL 0 end_CELL start_CELL if over^ start_ARG italic_T end_ARG ( over^ start_ARG italic_θ end_ARG ) = over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL if over^ start_ARG italic_T end_ARG ( over^ start_ARG italic_θ end_ARG ) = over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) end_CELL end_ROW , (38)

with the T^i(θ^)subscript^𝑇𝑖^𝜃\widehat{T}_{i}(\hat{\theta})over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG )’s in (37).

Denote by

λn(θ,θi)=k=1nlog[fθ(Xk)fθi(Xk)]=(θθi)Sn[b(θ)b(θi)]nsubscript𝜆𝑛𝜃subscript𝜃𝑖superscriptsubscript𝑘1𝑛subscript𝑓𝜃subscript𝑋𝑘subscript𝑓subscript𝜃𝑖subscript𝑋𝑘𝜃subscript𝜃𝑖subscript𝑆𝑛delimited-[]𝑏𝜃𝑏subscript𝜃𝑖𝑛\lambda_{n}(\theta,\theta_{i})=\sum_{k=1}^{n}\log\left[\frac{f_{\theta}(X_{k})% }{f_{\theta_{i}}(X_{k})}\right]=(\theta-\theta_{i})S_{n}-[b(\theta)-b(\theta_{% i})]nitalic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log [ divide start_ARG italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG ] = ( italic_θ - italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - [ italic_b ( italic_θ ) - italic_b ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] italic_n

the LLR between points θ𝜃\thetaitalic_θ and θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Lorden assumes that the prior distribution W(θ)𝑊𝜃W(\theta)italic_W ( italic_θ ) has a continuous density w(θ)𝑤𝜃w(\theta)italic_w ( italic_θ ) positive on [θ¯,θ¯]¯𝜃¯𝜃[\underline{\theta},\overline{\theta}][ under¯ start_ARG italic_θ end_ARG , over¯ start_ARG italic_θ end_ARG ], and that the loss L(θ)𝐿𝜃L(\theta)italic_L ( italic_θ ) equals zero in the indifference zone (θ0,θ1)subscript𝜃0subscript𝜃1(\theta_{0},\theta_{1})( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and is continuous and positive elsewhere and bounded away from 00 on [θ¯,θ0][θ1,θ¯]¯𝜃subscript𝜃0subscript𝜃1¯𝜃[\underline{\theta},\theta_{0}]\,{\textstyle\bigcup}\,[\theta_{1},\overline{% \theta}][ under¯ start_ARG italic_θ end_ARG , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] ⋃ [ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over¯ start_ARG italic_θ end_ARG ]. The main results in (Lorden 1977b, Theorem 1) can be briefly outlined as follows.

(i)

Under these assumptions the family of GSLRTs defined by (37)–(38) with a=|logc|12log|logc|𝑎𝑐12𝑐a=|\log c|-\tfrac{1}{2}\log|\log c|italic_a = | roman_log italic_c | - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log | roman_log italic_c | is second-order asymptotically optimal, i.e.,

ρcw(δ^)=infδρcw(δ)+O(c)asc0,formulae-sequencesuperscriptsubscript𝜌𝑐𝑤^𝛿subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑤𝛿𝑂𝑐as𝑐0\rho_{c}^{w}(\hat{\delta})=\inf_{\delta}\rho_{c}^{w}(\delta)+O(c)\quad\text{as% }~{}c\to 0,italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ( over^ start_ARG italic_δ end_ARG ) = roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ( italic_δ ) + italic_O ( italic_c ) as italic_c → 0 ,

where ρcw(δ)superscriptsubscript𝜌𝑐𝑤𝛿\rho_{c}^{w}(\delta)italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ( italic_δ ) is the average risk of the test δ=(T,d)𝛿𝑇𝑑\delta=(T,d)italic_δ = ( italic_T , italic_d ):

ρcw(δ)=θ¯θ0L(θ)𝖯θ(d=1)w(θ)dθ+θ1θ¯L(θ)𝖯θ(d=0)w(θ)dθ+cθ¯θ¯𝖤θ[T]w(θ)dθ.superscriptsubscript𝜌𝑐𝑤𝛿superscriptsubscript¯𝜃subscript𝜃0𝐿𝜃subscript𝖯𝜃𝑑1𝑤𝜃differential-d𝜃superscriptsubscriptsubscript𝜃1¯𝜃𝐿𝜃subscript𝖯𝜃𝑑0𝑤𝜃differential-d𝜃𝑐superscriptsubscript¯𝜃¯𝜃subscript𝖤𝜃delimited-[]𝑇𝑤𝜃differential-d𝜃\rho_{c}^{w}(\delta)=\int_{\underline{\theta}}^{\theta_{0}}L(\theta){\mathsf{P% }}_{\theta}(d=1)w(\theta)\,{\mathrm{d}}\theta+\int_{\theta_{1}}^{\overline{% \theta}}L(\theta){\mathsf{P}}_{\theta}(d=0)w(\theta)\,{\mathrm{d}}\theta+c\int% _{\underline{\theta}}^{\overline{\theta}}{\mathsf{E}}_{\theta}[T]w(\theta)\,{% \mathrm{d}}\theta.italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ( italic_δ ) = ∫ start_POSTSUBSCRIPT under¯ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_L ( italic_θ ) sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_d = 1 ) italic_w ( italic_θ ) roman_d italic_θ + ∫ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_θ end_ARG end_POSTSUPERSCRIPT italic_L ( italic_θ ) sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_d = 0 ) italic_w ( italic_θ ) roman_d italic_θ + italic_c ∫ start_POSTSUBSCRIPT under¯ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_θ end_ARG end_POSTSUPERSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] italic_w ( italic_θ ) roman_d italic_θ .
(ii)

This result can be improved from O(c)𝑂𝑐O(c)italic_O ( italic_c ) to o(c)𝑜𝑐o(c)italic_o ( italic_c ), i.e., to the third order

ρcw(δ^)=infδρcw(δ)+o(c)asc0,formulae-sequencesuperscriptsubscript𝜌𝑐𝑤^𝛿subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑤𝛿𝑜𝑐as𝑐0\rho_{c}^{w}(\hat{\delta})=\inf_{\delta}\rho_{c}^{w}(\delta)+o(c)\quad\text{as% }~{}c\to 0,italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ( over^ start_ARG italic_δ end_ARG ) = roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ( italic_δ ) + italic_o ( italic_c ) as italic_c → 0 ,

making the right choice of the functions h0subscript0h_{0}italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and h1subscript1h_{1}italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by setting

hi(θ)=2πI3(θ,θi)b¨(θ)w(θ)|b.(θ)b.(θi)|w(θi)L(θi)ζ(θ,θi),i=0,1,formulae-sequencesubscript𝑖𝜃2𝜋superscript𝐼3𝜃subscript𝜃𝑖¨𝑏𝜃𝑤𝜃bold-.𝑏𝜃bold-.𝑏subscript𝜃𝑖𝑤subscript𝜃𝑖𝐿subscript𝜃𝑖𝜁𝜃subscript𝜃𝑖𝑖01h_{i}(\theta)=\sqrt{\frac{2\pi}{I^{3}(\theta,\theta_{i})\ddot{b}(\theta)}}~{}% \frac{w(\theta)|\overset{\bm{.}}{b}(\theta)-\overset{\bm{.}}{b}(\theta_{i})|}{% w(\theta_{i})L(\theta_{i})\zeta(\theta,\theta_{i})},\quad i=0,1,italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_θ ) = square-root start_ARG divide start_ARG 2 italic_π end_ARG start_ARG italic_I start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) over¨ start_ARG italic_b end_ARG ( italic_θ ) end_ARG end_ARG divide start_ARG italic_w ( italic_θ ) | overbold_. start_ARG italic_b end_ARG ( italic_θ ) - overbold_. start_ARG italic_b end_ARG ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | end_ARG start_ARG italic_w ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_L ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_ζ ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG , italic_i = 0 , 1 ,

where ζ(θ,θi)=(θ,θi)/I(θ,θi)𝜁𝜃subscript𝜃𝑖𝜃subscript𝜃𝑖𝐼𝜃subscript𝜃𝑖\zeta(\theta,\theta_{i})={\mathcal{L}}(\theta,\theta_{i})/I(\theta,\theta_{i})italic_ζ ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = caligraphic_L ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) / italic_I ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is a correction for the overshoot over the boundary, the factor which is the subject of renewal theory. Specifically,

ζ(θ,θi)=lima𝖤θexp{[λτa(θ,θi)a]},τa=inf{n:λn(θ,θi)a},formulae-sequence𝜁𝜃subscript𝜃𝑖subscript𝑎subscript𝖤𝜃delimited-[]subscript𝜆subscript𝜏𝑎𝜃subscript𝜃𝑖𝑎subscript𝜏𝑎infimumconditional-set𝑛subscript𝜆𝑛𝜃subscript𝜃𝑖𝑎\zeta(\theta,\theta_{i})=\lim_{a\to\infty}{\mathsf{E}}_{\theta}\exp\left\{-[% \lambda_{\tau_{a}}(\theta,\theta_{i})-a]\right\},\quad\tau_{a}=\inf\left\{n:% \lambda_{n}(\theta,\theta_{i})\geqslant a\right\},italic_ζ ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = roman_lim start_POSTSUBSCRIPT italic_a → ∞ end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT roman_exp { - [ italic_λ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_a ] } , italic_τ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = roman_inf { italic_n : italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⩾ italic_a } , (39)

where in the non-arithmetic case ζ(θ,θi)𝜁𝜃subscript𝜃𝑖\zeta(\theta,\theta_{i})italic_ζ ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) can be computed as

ζ(θ,θi)=1I(θ,θi)exp{n=11n[𝖯θ(λn(θ,θi)0)+𝖯θi(λn(θ,θi)>0)]}.𝜁𝜃subscript𝜃𝑖1𝐼𝜃subscript𝜃𝑖superscriptsubscript𝑛11𝑛delimited-[]subscript𝖯𝜃subscript𝜆𝑛𝜃subscript𝜃𝑖0subscript𝖯subscript𝜃𝑖subscript𝜆𝑛𝜃subscript𝜃𝑖0\zeta(\theta,\theta_{i})=\frac{1}{I(\theta,\theta_{i})}\exp\left\{-\sum_{n=1}^% {\infty}\frac{1}{n}\left[{\mathsf{P}}_{\theta}(\lambda_{n}(\theta,\theta_{i})% \leqslant 0)+{\mathsf{P}}_{\theta_{i}}(\lambda_{n}(\theta,\theta_{i})>0)\right% ]\right\}.italic_ζ ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_I ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG roman_exp { - ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG [ sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⩽ 0 ) + sansserif_P start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) > 0 ) ] } . (40)

Since the Bayes average risk infδρcw(δ)subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑤𝛿\inf_{\delta}\rho_{c}^{w}(\delta)roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ( italic_δ ) is of order c|logc|𝑐𝑐c|\log c|italic_c | roman_log italic_c |, this implies that the asymptotic relative efficiency c=[ρcw(δ^)infδρcw(δ)]/ρcw(δ^)subscript𝑐delimited-[]superscriptsubscript𝜌𝑐𝑤^𝛿subscriptinfimum𝛿superscriptsubscript𝜌𝑐𝑤𝛿superscriptsubscript𝜌𝑐𝑤^𝛿\mathcal{E}_{c}=[\rho_{c}^{w}(\hat{\delta})-\inf_{\delta}\rho_{c}^{w}(\delta)]% /\rho_{c}^{w}(\hat{\delta})caligraphic_E start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = [ italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ( over^ start_ARG italic_δ end_ARG ) - roman_inf start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ( italic_δ ) ] / italic_ρ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ( over^ start_ARG italic_δ end_ARG ) of Lorden’s test is of order 1o(1/|logc|)1𝑜1𝑐1-o(1/|\log c|)1 - italic_o ( 1 / | roman_log italic_c | ) as c0𝑐0c\to 0italic_c → 0.

Note the crucial difference between Schwarz’s GSLRT (35) and Lorden’s GSLRT (38). In the Schwarz test, hi1subscript𝑖1h_{i}\equiv 1italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≡ 1 and the threshold is set as a=|logc|𝑎𝑐a=|\log c|italic_a = | roman_log italic_c |. However, in the Lorden test, two innovations emerge. Firstly, the threshold is reduced by 12log|logc|12𝑐\tfrac{1}{2}\log|\log c|divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log | roman_log italic_c |, and secondly, adaptive weights hi(θ^n)subscript𝑖subscript^𝜃𝑛h_{i}(\hat{\theta}_{n})italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) are incorporated into the GLR statistic. Since the stop** times T^isubscript^𝑇𝑖\widehat{T}_{i}over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be written as

T^0(θ^)subscript^𝑇0^𝜃\displaystyle\widehat{T}_{0}(\hat{\theta})over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) =inf{n1:λn(θ^n,θ0)alogh0(θ^n)andθ^nθ},absentinfimumconditional-set𝑛1subscript𝜆𝑛subscript^𝜃𝑛subscript𝜃0𝑎subscript0subscript^𝜃𝑛andsubscript^𝜃𝑛superscript𝜃\displaystyle=\inf\left\{n\geqslant 1:\lambda_{n}(\hat{\theta}_{n},\theta_{0})% \geqslant a-\log h_{0}(\hat{\theta}_{n})~{}\text{and}~{}\hat{\theta}_{n}% \geqslant\theta^{*}\right\},= roman_inf { italic_n ⩾ 1 : italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ⩾ italic_a - roman_log italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) and over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩾ italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } , (41)
T^1(θ^)subscript^𝑇1^𝜃\displaystyle\widehat{T}_{1}(\hat{\theta})over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG ) =inf{n1:λn(θ^n,θ1)alogh1(θ^n)andθ^nθ},absentinfimumconditional-set𝑛1subscript𝜆𝑛subscript^𝜃𝑛subscript𝜃1𝑎subscript1subscript^𝜃𝑛andsubscript^𝜃𝑛superscript𝜃\displaystyle=\inf\left\{n\geqslant 1:\lambda_{n}(\hat{\theta}_{n},\theta_{1})% \geqslant a-\log h_{1}(\hat{\theta}_{n})~{}\text{and}~{}\hat{\theta}_{n}% \leqslant\theta^{*}\right\},= roman_inf { italic_n ⩾ 1 : italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⩾ italic_a - roman_log italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) and over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } ,

Lorden’s GSLRT can alternatively be perceived as the GSLRT with curved adaptive boundaries

ai(θ^n)=|logc|12log|logc|loghi(θ^n),i=0,1,formulae-sequencesubscript𝑎𝑖subscript^𝜃𝑛𝑐12𝑐subscript𝑖subscript^𝜃𝑛𝑖01a_{i}(\hat{\theta}_{n})=|\log c|-\tfrac{1}{2}\log|\log c|-\log h_{i}(\hat{% \theta}_{n}),\quad i=0,1,italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = | roman_log italic_c | - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log | roman_log italic_c | - roman_log italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_i = 0 , 1 ,

which depend on the behavior of the MLE θ^nsubscript^𝜃𝑛\hat{\theta}_{n}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. These two innovations render this modification of the GLR test nearly optimal.

Given the complexity of Lorden’s formal mathematical proof, we offer a heuristic sketch that captures the main ideas of the approach. The Bayesian perspective naturally guides us toward the mixture LR statistics

Λ¯ni=θ¯θ¯eθSnnb(θ)w(θ)dθΘiL(θ)eθSnnb(θ)w(θ)dθ,i=0,1,formulae-sequencesuperscriptsubscript¯Λ𝑛𝑖superscriptsubscript¯𝜃¯𝜃superscript𝑒𝜃subscript𝑆𝑛𝑛𝑏𝜃𝑤𝜃differential-d𝜃subscriptsubscriptΘ𝑖𝐿𝜃superscript𝑒𝜃subscript𝑆𝑛𝑛𝑏𝜃𝑤𝜃differential-d𝜃𝑖01\bar{\Lambda}_{n}^{i}=\frac{\int_{\underline{\theta}}^{\overline{\theta}}e^{% \theta S_{n}-nb(\theta)}w(\theta)\,{\mathrm{d}}\theta}{\int_{\Theta_{i}}L(% \theta)e^{\theta S_{n}-nb(\theta)}w(\theta)\,{\mathrm{d}}\theta},\quad i=0,1,over¯ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = divide start_ARG ∫ start_POSTSUBSCRIPT under¯ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_θ end_ARG end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ ) end_POSTSUPERSCRIPT italic_w ( italic_θ ) roman_d italic_θ end_ARG start_ARG ∫ start_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L ( italic_θ ) italic_e start_POSTSUPERSCRIPT italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ ) end_POSTSUPERSCRIPT italic_w ( italic_θ ) roman_d italic_θ end_ARG , italic_i = 0 , 1 ,

where Θ0=[θ¯,θ0]subscriptΘ0¯𝜃subscript𝜃0\Theta_{0}=[\underline{\theta},\theta_{0}]roman_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = [ under¯ start_ARG italic_θ end_ARG , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ], Θ1=[θ1,θ¯]subscriptΘ1subscript𝜃1¯𝜃\Theta_{1}=[\theta_{1},\overline{\theta}]roman_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = [ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over¯ start_ARG italic_θ end_ARG ] and L(θ)=1𝐿𝜃1L(\theta)=1italic_L ( italic_θ ) = 1 for the simple 01010-10 - 1 loss function. Indeed, the a posteriori stop** risk is given by

Rnst(Sn)=mini=0,1{ΘiL(θ)eθSnnb(θ)w(θ)dθθ¯θ¯eθSnnb(θ)w(θ)dθ}.superscriptsubscript𝑅𝑛stsubscript𝑆𝑛subscript𝑖01subscriptsubscriptΘ𝑖𝐿𝜃superscript𝑒𝜃subscript𝑆𝑛𝑛𝑏𝜃𝑤𝜃differential-d𝜃superscriptsubscript¯𝜃¯𝜃superscript𝑒𝜃subscript𝑆𝑛𝑛𝑏𝜃𝑤𝜃differential-d𝜃R_{n}^{\rm st}(S_{n})=\min_{i=0,1}\left\{\frac{\int_{\Theta_{i}}L(\theta)e^{% \theta S_{n}-nb(\theta)}w(\theta)\,{\mathrm{d}}\theta}{\int_{\underline{\theta% }}^{\overline{\theta}}e^{\theta S_{n}-nb(\theta)}w(\theta)\,{\mathrm{d}}\theta% }\right\}.italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_st end_POSTSUPERSCRIPT ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = roman_min start_POSTSUBSCRIPT italic_i = 0 , 1 end_POSTSUBSCRIPT { divide start_ARG ∫ start_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L ( italic_θ ) italic_e start_POSTSUPERSCRIPT italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ ) end_POSTSUPERSCRIPT italic_w ( italic_θ ) roman_d italic_θ end_ARG start_ARG ∫ start_POSTSUBSCRIPT under¯ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_θ end_ARG end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ ) end_POSTSUPERSCRIPT italic_w ( italic_θ ) roman_d italic_θ end_ARG } . (42)

A candidate for the approximate optimum is the procedure that stops as soon as Rnst(Sn)Acsuperscriptsubscript𝑅𝑛stsubscript𝑆𝑛subscript𝐴𝑐R_{n}^{\rm st}(S_{n})\leqslant A_{c}italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_st end_POSTSUPERSCRIPT ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⩽ italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT for some Accsubscript𝐴𝑐𝑐A_{c}\approx citalic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ≈ italic_c. This is equivalent to stop** as soon as maxi=0,1Λ¯ni1/Acsubscript𝑖01superscriptsubscript¯Λ𝑛𝑖1subscript𝐴𝑐\max_{i=0,1}\bar{\Lambda}_{n}^{i}\geqslant 1/A_{c}roman_max start_POSTSUBSCRIPT italic_i = 0 , 1 end_POSTSUBSCRIPT over¯ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ⩾ 1 / italic_A start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT. The GLR statistics are approximated as

Λ^ni=maxθ[θ¯,θ¯]eθSnnb(θ)maxθΘieθSnnb(θ)maxθ[θ¯,θ¯]eθSnnb(θ)eθiSnnb(θi),i=0,1,formulae-sequencesuperscriptsubscript^Λ𝑛𝑖subscript𝜃¯𝜃¯𝜃superscript𝑒𝜃subscript𝑆𝑛𝑛𝑏𝜃subscript𝜃subscriptΘ𝑖superscript𝑒𝜃subscript𝑆𝑛𝑛𝑏𝜃subscript𝜃¯𝜃¯𝜃superscript𝑒𝜃subscript𝑆𝑛𝑛𝑏𝜃superscript𝑒subscript𝜃𝑖subscript𝑆𝑛𝑛𝑏subscript𝜃𝑖𝑖01\hat{\Lambda}_{n}^{i}=\frac{\max_{\theta\in[\underline{\theta},\overline{% \theta}]}~{}e^{\theta S_{n}-nb(\theta)}}{\max_{\theta\in\Theta_{i}}~{}e^{% \theta S_{n}-nb(\theta)}}\approx\frac{\max_{\theta\in[\underline{\theta},% \overline{\theta}]}~{}e^{\theta S_{n}-nb(\theta)}}{e^{\theta_{i}S_{n}-nb(% \theta_{i})}},\quad i=0,1,over^ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = divide start_ARG roman_max start_POSTSUBSCRIPT italic_θ ∈ [ under¯ start_ARG italic_θ end_ARG , over¯ start_ARG italic_θ end_ARG ] end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ ) end_POSTSUPERSCRIPT end_ARG start_ARG roman_max start_POSTSUBSCRIPT italic_θ ∈ roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ ) end_POSTSUPERSCRIPT end_ARG ≈ divide start_ARG roman_max start_POSTSUBSCRIPT italic_θ ∈ [ under¯ start_ARG italic_θ end_ARG , over¯ start_ARG italic_θ end_ARG ] end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_n italic_b ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG , italic_i = 0 , 1 ,

and the stop** posterior risk (42) is approximated as

Rnst(Sn)mini=0,1w(θi)L(θi)[b¨(θ^n)/2πn]1/2w(θ^n)|b.(θi)b.(θ^n)|eλn(θ^n,θi),superscriptsubscript𝑅𝑛stsubscript𝑆𝑛subscript𝑖01𝑤subscript𝜃𝑖𝐿subscript𝜃𝑖superscriptdelimited-[]¨𝑏subscript^𝜃𝑛2𝜋𝑛12𝑤subscript^𝜃𝑛bold-.𝑏subscript𝜃𝑖bold-.𝑏subscript^𝜃𝑛superscript𝑒subscript𝜆𝑛subscript^𝜃𝑛subscript𝜃𝑖R_{n}^{\rm st}(S_{n})\approx\min_{i=0,1}\frac{w(\theta_{i})L(\theta_{i})[\ddot% {b}(\hat{\theta}_{n})/2\pi n]^{1/2}}{w(\hat{\theta}_{n})|\overset{\bm{.}}{b}(% \theta_{i})-\overset{\bm{.}}{b}(\hat{\theta}_{n})|}~{}e^{-\lambda_{n}(\hat{% \theta}_{n},\theta_{i})},italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_st end_POSTSUPERSCRIPT ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≈ roman_min start_POSTSUBSCRIPT italic_i = 0 , 1 end_POSTSUBSCRIPT divide start_ARG italic_w ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_L ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) [ over¨ start_ARG italic_b end_ARG ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) / 2 italic_π italic_n ] start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_w ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) | overbold_. start_ARG italic_b end_ARG ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - overbold_. start_ARG italic_b end_ARG ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) | end_ARG italic_e start_POSTSUPERSCRIPT - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , (43)

where i=0𝑖0i=0italic_i = 0 if θ^nθsubscript^𝜃𝑛superscript𝜃\hat{\theta}_{n}\leqslant\theta^{*}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩽ italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and i=1𝑖1i=1italic_i = 1 otherwise. These approximations stem from Laplace’s method for asymptotic integral expansions, and its variations.

Subsequently, Lorden demonstrated the existence of Q>1𝑄1Q>1italic_Q > 1 such that if the stop** risk exceeds Qc𝑄𝑐Qcitalic_Q italic_c, then the continuation risk becomes smaller than the stop** risk. Therefore, it is approximately optimal to stop at the first instance such that Rnstsuperscriptsubscript𝑅𝑛stR_{n}^{\rm st}italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_st end_POSTSUPERSCRIPT falls below Qc𝑄𝑐Qcitalic_Q italic_c. This finding, coupled with the approximation (43), results in Toptmin(τ0,τ1)subscript𝑇optsubscript𝜏0subscript𝜏1T_{\rm opt}\approx\min(\tau_{0},\tau_{1})italic_T start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT ≈ roman_min ( italic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), where

τi=inf{n:eλn(θ^n,θi)/h~i(θ^n)n1/2Qc}=inf{n:λn(θ^n,θi)log[n1/2h~i(θ^n)Qc]}subscript𝜏𝑖infimumconditional-set𝑛superscript𝑒subscript𝜆𝑛subscript^𝜃𝑛subscript𝜃𝑖subscript~𝑖subscript^𝜃𝑛superscript𝑛12𝑄𝑐infimumconditional-set𝑛subscript𝜆𝑛subscript^𝜃𝑛subscript𝜃𝑖superscript𝑛12subscript~𝑖subscript^𝜃𝑛𝑄𝑐\tau_{i}=\inf\left\{n:e^{-\lambda_{n}(\hat{\theta}_{n},\theta_{i})}/\widetilde% {h}_{i}(\hat{\theta}_{n})n^{1/2}\leqslant Qc\right\}=\inf\left\{n:\lambda_{n}(% \hat{\theta}_{n},\theta_{i})\geqslant-\log[n^{1/2}\widetilde{h}_{i}(\hat{% \theta}_{n})Qc]\right\}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_inf { italic_n : italic_e start_POSTSUPERSCRIPT - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT / over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ⩽ italic_Q italic_c } = roman_inf { italic_n : italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⩾ - roman_log [ italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) italic_Q italic_c ] }

with h~i(θ^n)subscript~𝑖subscript^𝜃𝑛\widetilde{h}_{i}(\hat{\theta}_{n})over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) given by

h~i(θ^n)=2πb¨(θ^n)w(θ)|b.(θ^n)b.(θi)|w(θi)L(θi).subscript~𝑖subscript^𝜃𝑛2𝜋¨𝑏subscript^𝜃𝑛𝑤𝜃bold-.𝑏subscript^𝜃𝑛bold-.𝑏subscript𝜃𝑖𝑤subscript𝜃𝑖𝐿subscript𝜃𝑖\widetilde{h}_{i}(\hat{\theta}_{n})=\sqrt{\frac{2\pi}{\ddot{b}(\hat{\theta}_{n% })}}\frac{w(\theta)|\overset{\bm{.}}{b}(\hat{\theta}_{n})-\overset{\bm{.}}{b}(% \theta_{i})|}{w(\theta_{i})L(\theta_{i})}.over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = square-root start_ARG divide start_ARG 2 italic_π end_ARG start_ARG over¨ start_ARG italic_b end_ARG ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG end_ARG divide start_ARG italic_w ( italic_θ ) | overbold_. start_ARG italic_b end_ARG ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - overbold_. start_ARG italic_b end_ARG ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | end_ARG start_ARG italic_w ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_L ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG .

For small c𝑐citalic_c, the expectation 𝖤θ[τi]subscript𝖤𝜃delimited-[]subscript𝜏𝑖{\mathsf{E}}_{\theta}[\tau_{i}]sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] is of order |logc|𝑐|\log c|| roman_log italic_c |, so n1/2superscript𝑛12n^{1/2}italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT can be replaced by |logc|1/2superscript𝑐12|\log c|^{1/2}| roman_log italic_c | start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT, which yields

τiT^i=inf{n:λn(θ^n,θi)log[c|logc|1/2Qh~i(θ^n)]}.subscript𝜏𝑖subscript^𝑇𝑖infimumconditional-set𝑛subscript𝜆𝑛subscript^𝜃𝑛subscript𝜃𝑖𝑐superscript𝑐12𝑄subscript~𝑖subscript^𝜃𝑛\tau_{i}\approx\widehat{T}_{i}=\inf\left\{n:\lambda_{n}(\hat{\theta}_{n},% \theta_{i})\geqslant-\log[c|\log c|^{1/2}Q\widetilde{h}_{i}(\hat{\theta}_{n})]% \right\}.italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≈ over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_inf { italic_n : italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⩾ - roman_log [ italic_c | roman_log italic_c | start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_Q over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] } .

Note that these stop** times look exactly like the ones defined in (41) with the stop** boundaries

ai(θ^n)=|logc|12log|logc|log[Qh~i(θ^n)],i=0,1.formulae-sequencesubscript𝑎𝑖subscript^𝜃𝑛𝑐12𝑐𝑄subscript~𝑖subscript^𝜃𝑛𝑖01a_{i}(\hat{\theta}_{n})=|\log c|-\tfrac{1}{2}\log|\log c|-\log[Q\widetilde{h}_% {i}(\hat{\theta}_{n})],\quad i=0,1.italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = | roman_log italic_c | - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log | roman_log italic_c | - roman_log [ italic_Q over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] , italic_i = 0 , 1 .

The test based on these stop** times is already optimal to the second order. However, to achieve third-order optimality, one must carefully choose the constant Q𝑄Qitalic_Q to address the overshoots. Specifically, leveraging this result, Lorden demonstrates that the risks of an optimal rule and of the GSLRT are both linked to the risks of the family of one-sided tests τa(θ)=inf{n:λn(θ,θi)a}subscript𝜏𝑎𝜃infimumconditional-set𝑛subscript𝜆𝑛𝜃subscript𝜃𝑖𝑎\tau_{a}(\theta)=\inf\{n:\lambda_{n}(\theta,\theta_{i})\geqslant a\}italic_τ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_θ ) = roman_inf { italic_n : italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⩾ italic_a }, which are strictly optimal in the problem:

ρ(θ,v)=infT{𝖤θT+v𝖯θi(T<)}=infT𝖤θ{T+vn=1Tpθi(Xn)pθ(Xn)}.𝜌𝜃𝑣subscriptinfimum𝑇subscript𝖤𝜃𝑇𝑣subscript𝖯subscript𝜃𝑖𝑇subscriptinfimum𝑇subscript𝖤𝜃𝑇𝑣superscriptsubscriptproduct𝑛1𝑇subscript𝑝subscript𝜃𝑖subscript𝑋𝑛subscript𝑝𝜃subscript𝑋𝑛\rho(\theta,v)=\inf_{T}\left\{{\mathsf{E}}_{\theta}T+v{\mathsf{P}}_{\theta_{i}% }(T<\infty)\right\}=\inf_{T}{\mathsf{E}}_{\theta}\left\{T+v\prod_{n=1}^{T}% \frac{p_{\theta_{i}}(X_{n})}{p_{\theta}(X_{n})}\right\}.italic_ρ ( italic_θ , italic_v ) = roman_inf start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT { sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_T + italic_v sansserif_P start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T < ∞ ) } = roman_inf start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT { italic_T + italic_v ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG italic_p start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG } .

If we set a=log[v(θ,θi)]𝑎𝑣𝜃subscript𝜃𝑖a=\log[v{\mathcal{L}}(\theta,\theta_{i})]italic_a = roman_log [ italic_v caligraphic_L ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ], then by taking Q=1/(θ,θi)𝑄1𝜃subscript𝜃𝑖Q=1/{\mathcal{L}}(\theta,\theta_{i})italic_Q = 1 / caligraphic_L ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) the resulting test will be nearly optimal to within o(c)𝑜𝑐o(c)italic_o ( italic_c ). Since θ𝜃\thetaitalic_θ is unknown, we need to replace it with the estimate θ^nsubscript^𝜃𝑛\hat{\theta}_{n}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT to obtain

ai(θ^n)=|logc|12log|logc|log[h~i(θ^n)/(θ^n,θi)]=|logc|12log|logc|log[hi(θ^n)].subscript𝑎𝑖subscript^𝜃𝑛𝑐12𝑐subscript~𝑖subscript^𝜃𝑛subscript^𝜃𝑛subscript𝜃𝑖𝑐12𝑐subscript𝑖subscript^𝜃𝑛a_{i}(\hat{\theta}_{n})=|\log c|-\tfrac{1}{2}\log|\log c|-\log[\widetilde{h}_{% i}(\hat{\theta}_{n})/{\mathcal{L}}(\hat{\theta}_{n},\theta_{i})]=|\log c|-% \tfrac{1}{2}\log|\log c|-\log[h_{i}(\hat{\theta}_{n})].italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = | roman_log italic_c | - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log | roman_log italic_c | - roman_log [ over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) / caligraphic_L ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] = | roman_log italic_c | - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log | roman_log italic_c | - roman_log [ italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ] .

It’s intriguing to compare Lorden’s approach with the Kiefer–Sacks test that stops the first time Rnstsuperscriptsubscript𝑅𝑛stR_{n}^{\rm st}italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_st end_POSTSUPERSCRIPT becomes smaller than c𝑐citalic_c. Lorden’s approach allows us to show that the test with the stop** time

T^=inf{n:Rnst(Sn)c/(θ^n)},^𝑇infimumconditional-set𝑛superscriptsubscript𝑅𝑛stsubscript𝑆𝑛𝑐subscript^𝜃𝑛\widehat{T}=\inf\left\{n:R_{n}^{\rm st}(S_{n})\leqslant c/{\mathcal{L}}(\hat{% \theta}_{n})\right\},over^ start_ARG italic_T end_ARG = roman_inf { italic_n : italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_st end_POSTSUPERSCRIPT ( italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⩽ italic_c / caligraphic_L ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) } ,

where (θ^n)=(θ^n,θ1)subscript^𝜃𝑛subscript^𝜃𝑛subscript𝜃1{\mathcal{L}}(\hat{\theta}_{n})={\mathcal{L}}(\hat{\theta}_{n},\theta_{1})caligraphic_L ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = caligraphic_L ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) if θ^n<θsubscript^𝜃𝑛superscript𝜃\hat{\theta}_{n}<\theta^{*}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and (θ^n)=(θ^n,θ0)subscript^𝜃𝑛subscript^𝜃𝑛subscript𝜃0{\mathcal{L}}(\hat{\theta}_{n})={\mathcal{L}}(\hat{\theta}_{n},\theta_{0})caligraphic_L ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = caligraphic_L ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) otherwise, is nearly optimal to within o(c)𝑜𝑐o(c)italic_o ( italic_c ). It’s worth recalling that the factor ζ(θ,θi)=I(θ,θi)1(θ,θi)𝜁𝜃subscript𝜃𝑖𝐼superscript𝜃subscript𝜃𝑖1𝜃subscript𝜃𝑖\zeta(\theta,\theta_{i})=I(\theta,\theta_{i})^{-1}{\mathcal{L}}(\theta,\theta_% {i})italic_ζ ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_I ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT caligraphic_L ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) provides a necessary correction for the excess over the thresholds at stop**; see (39). This offers a significant enhancement over the Kiefer–Sacks test, which disregards the overshoots. Notably, this improvement is not limited to testing close hypotheses when (θ,θi)1much-less-than𝜃subscript𝜃𝑖1{\mathcal{L}}(\theta,\theta_{i})\ll 1caligraphic_L ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≪ 1. Even in cases where the parameter values are well-separated, this correction could be crucial. For instance, in the binomial case with the success probabilities θ1=0.6subscript𝜃10.6\theta_{1}=0.6italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.6 and θ0=0.4subscript𝜃00.4\theta_{0}=0.4italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0.4, we have (θ1,θ0)1/15subscript𝜃1subscript𝜃0115{\mathcal{L}}(\theta_{1},\theta_{0})\approx 1/15caligraphic_L ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≈ 1 / 15, so Lorden’s test will terminate much earlier.

Certainly, it’s important to note that implementing Lorden’s fully optimized GSLRT may encounter difficulties. This is primarily because computing the numbers ζ(θ,θi)𝜁𝜃subscript𝜃𝑖\zeta(\theta,\theta_{i})italic_ζ ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) analytically is often not feasible, except for specific models such as the exponential. For instance, when testing the mean in the Gaussian case, these numbers can only be computed numerically. While Siegmund’s (1985) corrected Brownian motion approximations can be utilized, they are sufficiently accurate only when the difference between θ𝜃\thetaitalic_θ and θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is relatively small. Hence, for practical purposes, only partially optimized solutions, which provide O(c)𝑂𝑐O(c)italic_O ( italic_c )-optimality, are typically feasible. A workaround involves discretizing the parameter space.

Let α^0(θ)=𝖯θ(d^=1)subscript^𝛼0𝜃subscript𝖯𝜃^𝑑1\hat{\alpha}_{0}(\theta)={\mathsf{P}}_{\theta}(\hat{d}=1)over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_θ ) = sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_d end_ARG = 1 ), θΘ0=[θ¯,θ0]𝜃subscriptΘ0¯𝜃subscript𝜃0\theta\in\Theta_{0}=[\underline{\theta},\theta_{0}]italic_θ ∈ roman_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = [ under¯ start_ARG italic_θ end_ARG , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] and α^1(θ)=𝖯θ(d^=0)subscript^𝛼1𝜃subscript𝖯𝜃^𝑑0\hat{\alpha}_{1}(\theta)={\mathsf{P}}_{\theta}(\hat{d}=0)over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_θ ) = sansserif_P start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_d end_ARG = 0 ), θΘ1=[θ1,θ¯]𝜃subscriptΘ1subscript𝜃1¯𝜃\theta\in\Theta_{1}=[\theta_{1},\overline{\theta}]italic_θ ∈ roman_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = [ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over¯ start_ARG italic_θ end_ARG ] denote the error probabilities of the GSLRT δ^asubscript^𝛿𝑎\hat{\delta}_{a}over^ start_ARG italic_δ end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. Note that due to the monotonicity of α^i(θ)subscript^𝛼𝑖𝜃\hat{\alpha}_{i}(\theta)over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_θ ), supθΘiα^i(θ)=α^i(θi)subscriptsupremum𝜃subscriptΘ𝑖subscript^𝛼𝑖𝜃subscript^𝛼𝑖subscript𝜃𝑖\sup_{\theta\in\Theta_{i}}\hat{\alpha}_{i}(\theta)=\hat{\alpha}_{i}(\theta_{i})roman_sup start_POSTSUBSCRIPT italic_θ ∈ roman_Θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_θ ) = over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). In addition to the Bayesian third-order optimality property, Lorden established asymptotic approximations to the error probabilities of the GSLRT. Specifically, by Theorem 2 of Lorden (1977b),

α^i(θi)=aeaCi(θi)(1+o(1)),i=0,1asa,formulae-sequencesubscript^𝛼𝑖subscript𝜃𝑖𝑎superscript𝑒𝑎subscript𝐶𝑖subscript𝜃𝑖1𝑜1formulae-sequence𝑖01as𝑎\hat{\alpha}_{i}(\theta_{i})=\sqrt{a}e^{-a}C_{i}(\theta_{i})(1+o(1)),\quad i=0% ,1\quad\text{as}~{}a\to\infty,over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = square-root start_ARG italic_a end_ARG italic_e start_POSTSUPERSCRIPT - italic_a end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( 1 + italic_o ( 1 ) ) , italic_i = 0 , 1 as italic_a → ∞ , (44)

where

C0(θ0)subscript𝐶0subscript𝜃0\displaystyle C_{0}(\theta_{0})italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =θθ¯ζ(θ,θ0)h0(θ)b¨(θ)2πI(θ,θ0)dθ,absentsuperscriptsubscriptsuperscript𝜃¯𝜃𝜁𝜃subscript𝜃0subscript0𝜃¨𝑏𝜃2𝜋𝐼𝜃subscript𝜃0differential-d𝜃\displaystyle=\int_{\theta^{*}}^{\overline{\theta}}\zeta(\theta,\theta_{0})h_{% 0}(\theta)\sqrt{\frac{\ddot{b}(\theta)}{2\pi I(\theta,\theta_{0})}}\,{\mathrm{% d}}\theta,= ∫ start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_θ end_ARG end_POSTSUPERSCRIPT italic_ζ ( italic_θ , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_θ ) square-root start_ARG divide start_ARG over¨ start_ARG italic_b end_ARG ( italic_θ ) end_ARG start_ARG 2 italic_π italic_I ( italic_θ , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG end_ARG roman_d italic_θ ,
C1(θ1)subscript𝐶1subscript𝜃1\displaystyle C_{1}(\theta_{1})italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) =θ¯θζ(θ,θ1)h1(θ)b¨(θ)2πI(θ,θ1)dθabsentsuperscriptsubscript¯𝜃superscript𝜃𝜁𝜃subscript𝜃1subscript1𝜃¨𝑏𝜃2𝜋𝐼𝜃subscript𝜃1differential-d𝜃\displaystyle=\int_{\underline{\theta}}^{\theta^{*}}\zeta(\theta,\theta_{1})h_% {1}(\theta)\sqrt{\frac{\ddot{b}(\theta)}{2\pi I(\theta,\theta_{1})}}\,{\mathrm% {d}}\theta= ∫ start_POSTSUBSCRIPT under¯ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_ζ ( italic_θ , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_θ ) square-root start_ARG divide start_ARG over¨ start_ARG italic_b end_ARG ( italic_θ ) end_ARG start_ARG 2 italic_π italic_I ( italic_θ , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG end_ARG roman_d italic_θ

and where ζ(θ,θi)𝜁𝜃subscript𝜃𝑖\zeta(\theta,\theta_{i})italic_ζ ( italic_θ , italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), i=0,1𝑖01i=0,1italic_i = 0 , 1, are defined in (39)–(40). These approximations hold significance for frequentist problems, which are typically of primary interest in most applications. While there are no strict upper bounds on the error probabilities, leading to no specific prescription on how to embed the GSLRT into class (α0,α1)subscript𝛼0subscript𝛼1{\mathbb{C}}(\alpha_{0},\alpha_{1})blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), the asymptotic approximations (44) enable us to select thresholds aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the stop** times T^isubscript^𝑇𝑖\widehat{T}_{i}over^ start_ARG italic_T end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT so that α^i(θi)αisubscript^𝛼𝑖subscript𝜃𝑖subscript𝛼𝑖\hat{\alpha}_{i}(\theta_{i})\approx\alpha_{i}over^ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≈ italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=0,1𝑖01i=0,1italic_i = 0 , 1, at least for sufficiently small αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Note that in this latter case, the threshold a𝑎aitalic_a in (41) should be replaced with aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the roots of the transcendental equations

ai12logai=log[Ci(θi)/αi],i=0,1.formulae-sequencesubscript𝑎𝑖12subscript𝑎𝑖subscript𝐶𝑖subscript𝜃𝑖subscript𝛼𝑖𝑖01a_{i}-\frac{1}{2}\log a_{i}=\log[C_{i}(\theta_{i})/\alpha_{i}],\quad i=0,1.italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_log [ italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) / italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] , italic_i = 0 , 1 .

With this choice, the GSLRT is asymptotically uniformly first-order optimal with respect to the expected sample size, i.e.,

infδ(α0,α1)𝖤θ[T]=𝖤θ[T^](1+o(1))asαmax0for allθ[θ¯,θ¯],formulae-sequencesubscriptinfimum𝛿subscript𝛼0subscript𝛼1subscript𝖤𝜃delimited-[]𝑇subscript𝖤𝜃delimited-[]^𝑇1𝑜1formulae-sequenceassubscript𝛼0for all𝜃¯𝜃¯𝜃\inf_{\delta\in{\mathbb{C}}(\alpha_{0},\alpha_{1})}{\mathsf{E}}_{\theta}[T]={% \mathsf{E}}_{\theta}[\widehat{T}](1+o(1))\quad\text{as}~{}\alpha_{\max}\to 0% \quad\text{for all}~{}\theta\in[\underline{\theta},\overline{\theta}],roman_inf start_POSTSUBSCRIPT italic_δ ∈ blackboard_C ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_T ] = sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ over^ start_ARG italic_T end_ARG ] ( 1 + italic_o ( 1 ) ) as italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT → 0 for all italic_θ ∈ [ under¯ start_ARG italic_θ end_ARG , over¯ start_ARG italic_θ end_ARG ] ,

where the o(1)𝑜1o(1)italic_o ( 1 ) term is of order O(log|logαmax|/|logαmax|)𝑂subscript𝛼subscript𝛼O(\log|\log\alpha_{\max}|/|\log\alpha_{\max}|)italic_O ( roman_log | roman_log italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT | / | roman_log italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT | ). It’s noteworthy that this result holds true not only in the asymptotically symmetric case where logα0logα1similar-tosubscript𝛼0subscript𝛼1\log\alpha_{0}\sim\log\alpha_{1}roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and a0a1similar-tosubscript𝑎0subscript𝑎1a_{0}\sim a_{1}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as αmax0subscript𝛼0\alpha_{\max}\to 0italic_α start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT → 0, but also in the asymmetric case where a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and a1subscript𝑎1a_{1}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT diverge with different rates, as long as a1ea00subscript𝑎1superscript𝑒subscript𝑎00a_{1}e^{-a_{0}}\to 0italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → 0.

Note that the Schwarz–Lorden asymptotic theory operates under the assumption of a fixed indifference zone that does not permit local alternatives, meaning that θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT cannot approach θ0subscript𝜃0\theta_{0}italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as c0𝑐0c\to 0italic_c → 0. In simpler terms, this theory is confined to scenarios where the width of the indifference zone θ1θ0subscript𝜃1subscript𝜃0\theta_{1}-\theta_{0}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is considerably larger than c1/2superscript𝑐12c^{1/2}italic_c start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT.

We conclude this section by mentioning Lorden’s (1973) paper on the properties of the one-sided (open-ended) GSLRTs for the one-parameter exponential family. These tests reject a null hypothesis θ=θ0𝜃subscript𝜃0\theta=\theta_{0}italic_θ = italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in favor of θ>θ0𝜃subscript𝜃0\theta>\theta_{0}italic_θ > italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT within the class of stop** times satisfying 𝖯θ0(T<)αsubscript𝖯subscript𝜃0𝑇𝛼{\mathsf{P}}_{\theta_{0}}(T<\infty)\leqslant\alphasansserif_P start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T < ∞ ) ⩽ italic_α for a prescribed 0<α<1/30𝛼130<\alpha<1/30 < italic_α < 1 / 3.

2.5 Optimal Multistage Testing

What is the fewest number of stages for which a multistage hypothesis test can be asymptotically equivalent to an optimal fully sequential test? Lorden (1983) took up this question and reached the definitive answer of needing 3 stages in general, except in a special symmetric situation in which 2 stages are possible, described in the next section. Here, “needing 3 stages” means allowing the possibility of 3 stages, although Lorden’s optimal procedures can (and do, with probability approaching 1111) terminate earlier; see Section 2.5.3. Lorden (1983) shows this first in the simple vs. simple testing setup, and then for testing separated composite hypotheses in an exponential family. In this area again, Lorden’s work was groundbreaking and formed the foundation for later, more general theoretical investigations in optimal multistage testing (e.g. Bartroff 2006a, b, 2007; Xing and Fellouris 2023) and in applications to clinical trial designs where the problem is sometimes known as “sample size adjustment” or “re-estimation” (e.g., Bartroff and Lai 2008a, b; Bartroff, Lai, and Shih 2013). In this literature especially, multistage procedures are often referred to as group sequential. Throughout this section, i.i.d. observations are assumed.

2.5.1 Simple vs. Simple Testing: Multistage Competitors of the SPRT

Beginning with the simple vs. simple testing setup of Section 2.1.1 and adopting the notation there, some of Lorden’s main ideas can be seen by first considering the symmetric case where the error probabilities α0,α10subscript𝛼0subscript𝛼10\alpha_{0},\alpha_{1}\to 0italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → 0 in such a way that

logα11I0logα01I1.similar-tosuperscriptsubscript𝛼11subscript𝐼0superscriptsubscript𝛼01subscript𝐼1\frac{\log\alpha_{1}^{-1}}{I_{0}}\sim\frac{\log\alpha_{0}^{-1}}{I_{1}}.divide start_ARG roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∼ divide start_ARG roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG . (45)

Letting λnsubscript𝜆𝑛\lambda_{n}italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT be the log-likelihood ratio statistic in (3) and t𝑡t\to\inftyitalic_t → ∞ an argument parameterizing α0,α10subscript𝛼0subscript𝛼10\alpha_{0},\alpha_{1}\to 0italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → 0, Lorden (1983) begins by arguing that there is a sample size n=n(t)t𝑛𝑛𝑡𝑡n=n(t)\geqslant titalic_n = italic_n ( italic_t ) ⩾ italic_t such that n=t+o(t)𝑛𝑡𝑜𝑡n=t+o(t)italic_n = italic_t + italic_o ( italic_t ),

𝖯0(λn<tI0)0,and𝖯1(λn<tI1)0.formulae-sequencesubscript𝖯0subscript𝜆𝑛𝑡subscript𝐼00andsubscript𝖯1subscript𝜆𝑛𝑡subscript𝐼10{\mathsf{P}}_{0}(-\lambda_{n}<tI_{0})\to 0,\quad\mbox{and}\quad{\mathsf{P}}_{1% }(\lambda_{n}<tI_{1})\to 0.sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( - italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < italic_t italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) → 0 , and sansserif_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < italic_t italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) → 0 . (46)

More explicitly, this is achievable by taking n=t+δt𝑛𝑡subscript𝛿𝑡n=t+\delta_{t}italic_n = italic_t + italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with

tδttmuch-less-than𝑡subscript𝛿𝑡much-less-than𝑡\sqrt{t}\ll\delta_{t}\ll tsquare-root start_ARG italic_t end_ARG ≪ italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≪ italic_t (47)

since, assuming finite second moments 𝖤i[λ12]<subscript𝖤𝑖delimited-[]superscriptsubscript𝜆12{\mathsf{E}}_{i}[\lambda_{1}^{2}]<\inftysansserif_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] < ∞, Chebyshev’s inequality gives

𝖯1(λn<tI1)𝗏𝖺𝗋1(λ1)/nI12(1t/n)2subscript𝖯1subscript𝜆𝑛𝑡subscript𝐼1subscript𝗏𝖺𝗋1subscript𝜆1𝑛superscriptsubscript𝐼12superscript1𝑡𝑛2{\mathsf{P}}_{1}(\lambda_{n}<tI_{1})\leqslant\frac{{\mathsf{var}}_{1}(\lambda_% {1})/n}{I_{1}^{2}(1-t/n)^{2}}sansserif_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < italic_t italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⩽ divide start_ARG sansserif_var start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) / italic_n end_ARG start_ARG italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_t / italic_n ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

which, ignoring constants, under (47) is

1/n(1t/n)2=nδt2=t+δtδt2=o(1)as t.formulae-sequence1𝑛superscript1𝑡𝑛2𝑛superscriptsubscript𝛿𝑡2𝑡subscript𝛿𝑡superscriptsubscript𝛿𝑡2𝑜1as t.\frac{1/n}{(1-t/n)^{2}}=\frac{n}{\delta_{t}^{2}}=\frac{t+\delta_{t}}{\delta_{t% }^{2}}=o(1)\quad\mbox{as $t\to\infty$.}divide start_ARG 1 / italic_n end_ARG start_ARG ( 1 - italic_t / italic_n ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_n end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_t + italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = italic_o ( 1 ) as italic_t → ∞ .

A similar argument shows that the other probability in (46) approaches 00 as well.

In this symmetric situation, an optimal 2-stage competitor to the SPRT can be described in terms of n(t)𝑛𝑡n(t)italic_n ( italic_t ), which is the size of the first stage with t𝑡titalic_t taken to be the larger of the two sides of (45). Note that, for either i=0𝑖0i=0italic_i = 0 or 1111, we have tIilogα1i1similar-to𝑡subscript𝐼𝑖superscriptsubscript𝛼1𝑖1tI_{i}\sim\log\alpha_{1-i}^{-1}italic_t italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ roman_log italic_α start_POSTSUBSCRIPT 1 - italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT so that t𝑡titalic_t is asymptotically the same as the expected stop** time of the SPRT (under either hypothesis) and the first stage n(t)𝑛𝑡n(t)italic_n ( italic_t ) is of the same order but slightly larger. The procedure stops after the first stage if

λn(t)(logα1,logα01),subscript𝜆𝑛𝑡subscript𝛼1superscriptsubscript𝛼01\lambda_{n(t)}\not\in(\log\alpha_{1},\log\alpha_{0}^{-1}),italic_λ start_POSTSUBSCRIPT italic_n ( italic_t ) end_POSTSUBSCRIPT ∉ ( roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) , (48)

making the appropriate terminal decision. Using (46), the probability under the null of terminating and making the correct terminal decision after this first stage is

𝖯0(λn(t)logα1)=𝖯0(λn(t)logα11)𝖯0(λn(t)tI0)1,subscript𝖯0subscript𝜆𝑛𝑡subscript𝛼1subscript𝖯0subscript𝜆𝑛𝑡superscriptsubscript𝛼11subscript𝖯0subscript𝜆𝑛𝑡𝑡subscript𝐼01{\mathsf{P}}_{0}(\lambda_{n(t)}\leqslant\log\alpha_{1})={\mathsf{P}}_{0}(-% \lambda_{n(t)}\geqslant\log\alpha_{1}^{-1})\geqslant{\mathsf{P}}_{0}(-\lambda_% {n(t)}\geqslant tI_{0})\to 1,sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_n ( italic_t ) end_POSTSUBSCRIPT ⩽ roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( - italic_λ start_POSTSUBSCRIPT italic_n ( italic_t ) end_POSTSUBSCRIPT ⩾ roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ⩾ sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( - italic_λ start_POSTSUBSCRIPT italic_n ( italic_t ) end_POSTSUBSCRIPT ⩾ italic_t italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) → 1 , (49)

with a similar argument showing that

𝖯1(λn(t)logα01)1.subscript𝖯1subscript𝜆𝑛𝑡superscriptsubscript𝛼011{\mathsf{P}}_{1}(\lambda_{n(t)}\geqslant\log\alpha_{0}^{-1})\to 1.sansserif_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_n ( italic_t ) end_POSTSUBSCRIPT ⩾ roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) → 1 . (50)

Otherwise, the test continues to a total sample size n2subscript𝑛2n_{2}italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT which is that of the fixed-sample size test with error probabilities α0,α1subscript𝛼0subscript𝛼1\alpha_{0},\alpha_{1}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and uses that terminal decision rule. This can be accomplished in at most n2CtCn(t)subscript𝑛2𝐶𝑡𝐶𝑛𝑡n_{2}\leqslant Ct\leqslant Cn(t)italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⩽ italic_C italic_t ⩽ italic_C italic_n ( italic_t ) total observations for some constant C𝐶Citalic_C. Thus, under the null, the total expected sample size is at most

n(t)+Cn(t)𝖯0(λn(t)>logα1)=n(t)[1+o(1)]tlogα11I0,𝑛𝑡𝐶𝑛𝑡subscript𝖯0subscript𝜆𝑛𝑡subscript𝛼1𝑛𝑡delimited-[]1𝑜1similar-to𝑡similar-tosuperscriptsubscript𝛼11subscript𝐼0n(t)+Cn(t){\mathsf{P}}_{0}(\lambda_{n(t)}>\log\alpha_{1})=n(t)[1+o(1)]\sim t% \sim\frac{\log\alpha_{1}^{-1}}{I_{0}},italic_n ( italic_t ) + italic_C italic_n ( italic_t ) sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_n ( italic_t ) end_POSTSUBSCRIPT > roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_n ( italic_t ) [ 1 + italic_o ( 1 ) ] ∼ italic_t ∼ divide start_ARG roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ,

and is of the same order I11logα01superscriptsubscript𝐼11superscriptsubscript𝛼01I_{1}^{-1}\log\alpha_{0}^{-1}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT under the alternative by a similar argument. By definition of the 2 stages, the procedure has type I error probability at most 2α02subscript𝛼02\alpha_{0}2 italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and type II error probability at most 2α12subscript𝛼12\alpha_{1}2 italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, so repeating the construction with αi/2subscript𝛼𝑖2\alpha_{i}/2italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / 2 replacing αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (i=0,1𝑖01i=0,1italic_i = 0 , 1) controls the error probabilities at the nominal levels and does not affect the asymptotic estimates above. Thus, this 2-stage procedure is asymptotically as efficient as the SPRT in this symmetric case.

If the asymptotic equivalence (45) does not hold but we assume that

logα01logα11is bounded away from 0 and ,superscriptsubscript𝛼01superscriptsubscript𝛼11is bounded away from 0 and ,\frac{\log\alpha_{0}^{-1}}{\log\alpha_{1}^{-1}}\quad\mbox{is bounded away from% $0$ and $\infty$,}divide start_ARG roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG is bounded away from 0 and ∞ , (51)

Lorden shows that no 2-stage test can be asymptotically optimal, itself a nontrivial result that we discuss in the next section. For this case Lorden gives a 3-stage procedure that is a slight modification of the one above. Letting t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT be the left- and right-hand sides of (45), respectively, the first stage of the procedure is of size min{n(t1),n(t2)}𝑛subscript𝑡1𝑛subscript𝑡2\min\{n(t_{1}),n(t_{2})\}roman_min { italic_n ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_n ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) }, and the second stage (if needed) brings the total sample size to max{n(t1),n(t2)}𝑛subscript𝑡1𝑛subscript𝑡2\max\{n(t_{1}),n(t_{2})\}roman_max { italic_n ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_n ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) }, both using the stop** rule (48) and corresponding decision rule. If not stopped by the second stage, a third stage brings the total sample size to that of the fixed-sample size with error probabilities αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which is Cmax{n(t1),n(t2)}absent𝐶𝑛subscript𝑡1𝑛subscript𝑡2\leqslant C\max\{n(t_{1}),n(t_{2})\}⩽ italic_C roman_max { italic_n ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_n ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) } as above, and uses that terminal decision rule. Since n(ti+1)ti+1logα1i1/Iisimilar-to𝑛subscript𝑡𝑖1subscript𝑡𝑖1similar-tosuperscriptsubscript𝛼1𝑖1subscript𝐼𝑖n(t_{i+1})\sim t_{i+1}\sim\log\alpha_{1-i}^{-1}/I_{i}italic_n ( italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) ∼ italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ∼ roman_log italic_α start_POSTSUBSCRIPT 1 - italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT / italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for both i=0,1𝑖01i=0,1italic_i = 0 , 1, and (49) and (50) hold for n(t1)𝑛subscript𝑡1n(t_{1})italic_n ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and n(t2)𝑛subscript𝑡2n(t_{2})italic_n ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), respectively, the expected sample size of this 3-stage procedure is asymptotically equal to the corresponding side of (45), and is thus minimized under both the null and alternative.

2.5.2 The Necessity of 3 Stages

Continuing with the simple vs. simple testing setup of the previous section, Lorden’s (1983, Corollary 1) result mentioned above that, in the absence of symmetry (45), 3 stages are necessary for asymptotic optimality, is far from obvious since it may seem that the first 2 stages of the 3 stage procedure defined above would suffice. That is, why is it that a first stage of min{n(t1),n(t2)}𝑛subscript𝑡1𝑛subscript𝑡2\min\{n(t_{1}),n(t_{2})\}roman_min { italic_n ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_n ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) } and (if needed) a second stage giving total sample size max{n(t1),n(t2)}𝑛subscript𝑡1𝑛subscript𝑡2\max\{n(t_{1}),n(t_{2})\}roman_max { italic_n ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_n ( italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) } would not be optimal? One clue may be that, if that were true, then the same reasoning would seem to imply that a single-stage test could be optimal under symmetry (45), which is known to not hold. More generally, Lorden provides the following general result about asymptotically optimal k𝑘kitalic_k-stage (k2𝑘2k\geqslant 2italic_k ⩾ 2) tests: that their expected sample size after k1𝑘1k-1italic_k - 1 stages must be asymptotically the same as after k𝑘kitalic_k stages. In other words, the final stage of an asymptotically optimal multistage test is asymptotically negligible in size, but necessary. In what follows let I(f,g)𝐼𝑓𝑔I(f,g)italic_I ( italic_f , italic_g ) denote the information number for arbitrary densities f,g𝑓𝑔f,gitalic_f , italic_g.

Theorem 5 (Lorden (1983), Theorem 3).

For testing f0subscript𝑓0f_{0}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT vs. f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the setup of Section 2.1.1, let N𝑁Nitalic_N denote the sample size of a k𝑘kitalic_k-stage (k2𝑘2k\geqslant 2italic_k ⩾ 2) test with error probabilities α0subscript𝛼0\alpha_{0}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and α1subscript𝛼1\alpha_{1}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, and let M𝑀Mitalic_M be the total sample size of this test after k1𝑘1k-1italic_k - 1 stages. If N𝑁Nitalic_N is asymptotically optimal as α0,α10subscript𝛼0subscript𝛼10\alpha_{0},\alpha_{1}\to 0italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → 0 and g𝑔gitalic_g is a density distinct from f0subscript𝑓0f_{0}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that

logα11logα01Q>I(g,f1)I(g,f0)superscriptsubscript𝛼11superscriptsubscript𝛼01𝑄𝐼𝑔subscript𝑓1𝐼𝑔subscript𝑓0\frac{\log\alpha_{1}^{-1}}{\log\alpha_{0}^{-1}}\geqslant Q>\frac{I(g,f_{1})}{I% (g,f_{0})}divide start_ARG roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ⩾ italic_Q > divide start_ARG italic_I ( italic_g , italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_I ( italic_g , italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG

for some Q>0𝑄0Q>0italic_Q > 0 as α0,α10subscript𝛼0subscript𝛼10\alpha_{0},\alpha_{1}\to 0italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → 0, then

Mlogα01I(g,f0)in g-probability, and𝖤g[M]logα01I(g,f0)𝖤g[N]as α00.formulae-sequenceformulae-sequence𝑀superscriptsubscript𝛼01𝐼𝑔subscript𝑓0in g-probability, andsimilar-tosubscript𝖤𝑔delimited-[]𝑀superscriptsubscript𝛼01𝐼𝑔subscript𝑓0similar-tosubscript𝖤𝑔delimited-[]𝑁as α00.M\to\frac{\log\alpha_{0}^{-1}}{I(g,f_{0})}\quad\mbox{in $g$-probability, and}% \quad{\mathsf{E}}_{g}[M]\sim\frac{\log\alpha_{0}^{-1}}{I(g,f_{0})}\sim{\mathsf% {E}}_{g}[N]\quad\mbox{as $\alpha_{0}\to 0$.}italic_M → divide start_ARG roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_I ( italic_g , italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG in italic_g -probability, and sansserif_E start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT [ italic_M ] ∼ divide start_ARG roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_I ( italic_g , italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG ∼ sansserif_E start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT [ italic_N ] as italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT → 0 .

Lorden’s proof of this theorem is technical and requires detailed upper bounds on the conditional error probabilities after the (k1)𝑘1(k-1)( italic_k - 1 )st stage; that is, the probabilities of test error given the first M𝑀Mitalic_M observations. Roughly speaking, showing that these error probabilities are small shows that their corresponding sample size M𝑀Mitalic_M must be large, so large in fact that it is asymptotically equivalent to its maximum value N𝑁Nitalic_N.

Lorden (1983, Corollary 1) then uses Theorem 5 to show that there is an asymptotically optimal 2-stage test if and only if the symmetry condition (45) holds, with the construction of the 2-stage test above providing the “if” argument. For the converse, applying Theorem 5 with g=f1𝑔subscript𝑓1g=f_{1}italic_g = italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT shows that the first stage of an optimal 2-stage test must be asymptotic to (logα01)/I1superscriptsubscript𝛼01subscript𝐼1(\log\alpha_{0}^{-1})/I_{1}( roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) / italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. After reversing the roles of f0subscript𝑓0f_{0}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the theorem and applying it again with g=f0𝑔subscript𝑓0g=f_{0}italic_g = italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, it also shows that the first stage must be asymptotic to (logα11)/I0superscriptsubscript𝛼11subscript𝐼0(\log\alpha_{1}^{-1})/I_{0}( roman_log italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) / italic_I start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, establishing symmetry (45).

2.5.3 Composite Hypotheses

For testing separated hypotheses θθ0𝜃subscript𝜃0\theta\leqslant\theta_{0}italic_θ ⩽ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT vs. θθ1>θ0𝜃subscript𝜃1subscript𝜃0\theta\geqslant\theta_{1}>\theta_{0}italic_θ ⩾ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT about the 1-dimensional parameter θ𝜃\thetaitalic_θ of an exponential family, Lorden (1983, Section 3) constructs an asymptotically optimal 3-stage test utilizing a description of the optimal stop** boundary related to Schwarz’s (1962) study of Bayes asymptotic shapes for fully sequential tests, described in Section 2.4. Let n(θ)𝑛𝜃n(\theta)italic_n ( italic_θ ) denote the expected sample size to Schwarz’s boundary under θ𝜃\thetaitalic_θ. Lorden’s test utilizes the “worst case” competing parameter value θ(θ0,θ1)superscript𝜃subscript𝜃0subscript𝜃1\theta^{*}\in(\theta_{0},\theta_{1})italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) which maximizes the expected sample size n(θ)=maxθn(θ)n𝑛superscript𝜃subscript𝜃𝑛𝜃superscript𝑛n(\theta^{*})=\max_{\theta}n(\theta)\equiv n^{\star}italic_n ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = roman_max start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_n ( italic_θ ) ≡ italic_n start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT. The first stage size of Lorden’s procedure is a fixed fraction of nsuperscript𝑛n^{\star}italic_n start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT. If the procedure does not stop after the first stage, utilizing Schwarz’s boundary, the second stage brings the total sample size to min{n,(1+ε)n(θ^)}superscript𝑛1𝜀𝑛^𝜃\min\{n^{\star},(1+\varepsilon)n(\widehat{\theta})\}roman_min { italic_n start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT , ( 1 + italic_ε ) italic_n ( over^ start_ARG italic_θ end_ARG ) }, where θ^^𝜃\widehat{\theta}over^ start_ARG italic_θ end_ARG is the MLE of θ𝜃\thetaitalic_θ from the first stage data and ε0𝜀0\varepsilon\searrow 0italic_ε ↘ 0 is a chosen sequence. Finally, if needed, the third stage brings the total sample size up to nsuperscript𝑛n^{\star}italic_n start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT. Under (51), Lorden (1983, Theorem 1) proves that this test asymptotically minimizes the expected sample size to first order, not just for θ𝜃\thetaitalic_θ in the hypotheses but uniformly in θ𝜃\thetaitalic_θ over any interval in the parameter space containing [θ0,θ1]subscript𝜃0subscript𝜃1[\theta_{0},\theta_{1}][ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ]. The first order term is of order logαi1superscriptsubscript𝛼𝑖1\log\alpha_{i}^{-1}roman_log italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, as above, and the second order term is of order O(((logαi1)loglogαi1)1/2)𝑂superscriptsuperscriptsubscript𝛼𝑖1superscriptsubscript𝛼𝑖112O(((\log\alpha_{i}^{-1})\log\log\alpha_{i}^{-1})^{1/2})italic_O ( ( ( roman_log italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) roman_log roman_log italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ), i=0,1𝑖01i=0,1italic_i = 0 , 1.

These results were extended to asymptotically optimal 3-stage tests of multidimensional parameters in Bartroff (2006a) and Bartroff and Lai (2008a), and more general multidimensional composite hypotheses in Bartroff and Lai (2008b). On the other hand, Lorden’s procedures were generalized to optimal k𝑘kitalic_k-stage tests, for arbitrary k3𝑘3k\geqslant 3italic_k ⩾ 3, in Bartroff (2006b, 2007).

Regarding the necessity of 3 stages in this composite hypothesis setting, Lorden (1983, Corollary 2) proves that, under (51), 3 stages are necessary (and sufficient, by his own procedure) for asymptotic optimality at more than 3 values of θ𝜃\thetaitalic_θ, and so certainly for asymptotic optimality over an interval of θ𝜃\thetaitalic_θ values, as in Lorden’s result. An interesting detail that shows this result to be best possible is that an optimal 2-stage test can be constructed at 3 values of θ𝜃\thetaitalic_θ if the special symmetry condition I(θ,θ0)I(θ0,θ1)=I(θ,θ1)I(θ1,θ0)𝐼superscript𝜃subscript𝜃0𝐼subscript𝜃0subscript𝜃1𝐼superscript𝜃subscript𝜃1𝐼subscript𝜃1subscript𝜃0I(\theta^{\prime},\theta_{0})I(\theta_{0},\theta_{1})=I(\theta^{\prime},\theta% _{1})I(\theta_{1},\theta_{0})italic_I ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) italic_I ( italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_I ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_I ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) holds for some θθ0,θ1superscript𝜃subscript𝜃0subscript𝜃1\theta^{\prime}\neq\theta_{0},\theta_{1}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Then a 2-stage procedure similar to the one described in Section 2.5.1 that uses second stage total sample size of logα01/I(θ,θ0)superscriptsubscript𝛼01𝐼superscript𝜃subscript𝜃0\log\alpha_{0}^{-1}/I(\theta^{\prime},\theta_{0})roman_log italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT / italic_I ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) will be optimal at the 3 values θ=θ𝜃superscript𝜃\theta=\theta^{\prime}italic_θ = italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, θ0subscript𝜃0\theta_{0}italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

3 Sequential Changepoint Detection: Lorden’s Minimax Change Detection Theory

In numerous practical applications, the observed process undergoes an abrupt change in statistical properties at an unknown point in time. Examples encompass aerospace navigation and flight systems integrity monitoring, cyber-security, identification of terrorist activity, industrial monitoring, air pollution monitoring, radar, sonar, and electrooptics surveillance systems. Consequently, this problem has garnered interest from many practitioners for some time.

In classical quickest changepoint detection, the objective is to detect changes in the distribution as swiftly as possible, thereby minimizing the expected delay to detection assuming the change is in effect.

More specifically, the changepoint problem posits that one obtains a series of observations X1,X2,subscript𝑋1subscript𝑋2X_{1},X_{2},\dotsitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … such that, for some value ν𝜈\nuitalic_ν, ν+={0,1,2,}𝜈subscript012\nu\in\mathbb{Z}_{+}=\{0,1,2,\dots\}italic_ν ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = { 0 , 1 , 2 , … } (the changepoint), X1,X2,,Xνsubscript𝑋1subscript𝑋2subscript𝑋𝜈X_{1},X_{2},\dots,X_{\nu}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT have one distribution and Xν+1,Xν+2,subscript𝑋𝜈1subscript𝑋𝜈2X_{\nu+1},X_{\nu+2},\dotsitalic_X start_POSTSUBSCRIPT italic_ν + 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_ν + 2 end_POSTSUBSCRIPT , … have another distribution. The changepoint ν𝜈\nuitalic_ν is unknown, and the sequence {Xn}n1subscriptsubscript𝑋𝑛𝑛1\{X_{n}\}_{n\geqslant 1}{ italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ⩾ 1 end_POSTSUBSCRIPT is being monitored for detecting a change. A sequential detection procedure is a stop** time T𝑇Titalic_T with respect to the X𝑋Xitalic_Xs, so that after observing X1,X2,,XTsubscript𝑋1subscript𝑋2subscript𝑋𝑇X_{1},X_{2},\dots,X_{T}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT it is declared that a change is in effect. That is, T𝑇Titalic_T is an integer-valued random variable, such that the event {T=n}𝑇𝑛\{T=n\}{ italic_T = italic_n } belongs to the sigma-algebra n=σ(X1,,Xn)subscript𝑛𝜎subscript𝑋1subscript𝑋𝑛{\mathscr{F}}_{n}=\sigma(X_{1},\dots,X_{n})script_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_σ ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) generated by observations X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

Historically, the field of changepoint detection began to take shape in the 1920s to 1930s, spurred by considerations in quality control. Shewhart’s charts were particularly influential during this period (Shewhart 1931). However, optimal and nearly optimal sequential detection procedures didn’t come into prominence until much later, in the 1950s to 1970s, following the advent of Sequential Analysis (Wald 1947). The concepts initiated by Shewhart and Wald laid the foundation for extensive research into sequential changepoint detection.

The desire to detect the change quickly often leads to being “trigger-happy,” which, on one hand, results in an unacceptably high false alarm rate – terminating the process prematurely before a real change has occurred. On the other hand, attempting to avoid false alarms too strenuously causes a long delay between the true change point and its detection. Thus, the essence of the problem lies in achieving a tradeoff between two conflicting performance measures – the loss associated with the delay in detecting a true change and that associated with raising a false alarm. An efficient detection procedure is expected to minimize the average loss associated with the detection delay, while subject to a constraint on the loss associated with false alarms, or vice versa.

Let pν(𝐗n)=p(X1,,Xn|ν)subscript𝑝𝜈superscript𝐗𝑛𝑝subscript𝑋1conditionalsubscript𝑋𝑛𝜈p_{\nu}({\mathbf{X}}^{n})=p(X_{1},\dots,X_{n}|\nu)italic_p start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_p ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_ν ) denote the joint probability density of the sample 𝐗n=(X1,,Xn)superscript𝐗𝑛subscript𝑋1subscript𝑋𝑛{\mathbf{X}}^{n}=(X_{1},\dots,X_{n})bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) when the changepoint ν𝜈\nuitalic_ν is fixed (0ν<0𝜈0\leqslant\nu<\infty0 ⩽ italic_ν < ∞) and p(𝐗n)subscript𝑝superscript𝐗𝑛p_{\infty}({\mathbf{X}}^{n})italic_p start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) the joint density when ν=𝜈\nu=\inftyitalic_ν = ∞, i.e., when there is never a change. Let 𝖯ν,𝖯subscript𝖯𝜈subscript𝖯{\mathsf{P}}_{\nu},{\mathsf{P}}_{\infty}sansserif_P start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT , sansserif_P start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and 𝖤ν,𝖤subscript𝖤𝜈subscript𝖤{\mathsf{E}}_{\nu},{\mathsf{E}}_{\infty}sansserif_E start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT , sansserif_E start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT denote the corresponding probability measures and expectations. Assume that the observations {Xn}n1subscriptsubscript𝑋𝑛𝑛1\{X_{n}\}_{n\geqslant 1}{ italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n ⩾ 1 end_POSTSUBSCRIPT are independent and such that X1,,Xνsubscript𝑋1subscript𝑋𝜈X_{1},\ldots,X_{\nu}italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT are each distributed according to a common (pre-change) density f0(x)subscript𝑓0𝑥f_{0}(x)italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ), while Xν+1,Xν+2,subscript𝑋𝜈1subscript𝑋𝜈2X_{\nu+1},X_{\nu+2},\ldotsitalic_X start_POSTSUBSCRIPT italic_ν + 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_ν + 2 end_POSTSUBSCRIPT , … each follows a common (post-change) density f1(x)subscript𝑓1𝑥f_{1}(x)italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ). Hence, the model can be represented as

pν(𝐗n)={t=1νf0(Xt)×t=ν+1nf1(X)fornν+1p(𝐗n)=t=1nf0(Xt)for1nν.p_{\nu}({\mathbf{X}}^{n})=\begin{cases}\prod_{t=1}^{\nu}f_{0}(X_{t})\times% \prod_{t=\nu+1}^{n}f_{1}(X_{)}&\text{for}~{}n\geqslant\nu+1\\ p_{\infty}({\mathbf{X}}^{n})=\prod_{t=1}^{n}f_{0}(X_{t})&\text{for}~{}1% \leqslant n\leqslant\nu\end{cases}.italic_p start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = { start_ROW start_CELL ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) × ∏ start_POSTSUBSCRIPT italic_t = italic_ν + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT ) end_POSTSUBSCRIPT end_CELL start_CELL for italic_n ⩾ italic_ν + 1 end_CELL end_ROW start_ROW start_CELL italic_p start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_CELL start_CELL for 1 ⩽ italic_n ⩽ italic_ν end_CELL end_ROW . (52)

Note that we assume that Xνsubscript𝑋𝜈X_{\nu}italic_X start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT is the last pre-change observation, which is different from many publications (including Lorden’s) where it is assumed that Xνsubscript𝑋𝜈X_{\nu}italic_X start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT is the first post-change observation. The diagram below illustrates this case

X1,,Xνi.i.d., f0,Xν+1,Xν+2,i.i.d., f1.subscriptsubscript𝑋1subscript𝑋𝜈i.i.d., f0subscriptsubscript𝑋𝜈1subscript𝑋𝜈2i.i.d., f1\underbrace{X_{1},\cdots,X_{\nu}}_{\text{i.i.d., $f_{0}$}},~{}\underbrace{X_{% \nu+1},X_{\nu+2},\cdots}_{\text{i.i.d., $f_{1}$}}.under⏟ start_ARG italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_X start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT i.i.d., italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , under⏟ start_ARG italic_X start_POSTSUBSCRIPT italic_ν + 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_ν + 2 end_POSTSUBSCRIPT , ⋯ end_ARG start_POSTSUBSCRIPT i.i.d., italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

Denote by :ν=:subscript𝜈\operatorname{\mathcal{H}}_{\infty}:\nu=\inftycaligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT : italic_ν = ∞ the hypothesis that the change never occurs and by νsubscript𝜈\operatorname{\mathcal{H}}_{\nu}caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT the hypothesis that the change occurs at time 0ν<0𝜈0\leqslant\nu<\infty0 ⩽ italic_ν < ∞. Let Zt=log[f1(Xt)/f0(Xt)]subscript𝑍𝑡subscript𝑓1subscript𝑋𝑡subscript𝑓0subscript𝑋𝑡Z_{t}=\log[f_{1}(X_{t})/f_{0}(X_{t})]italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_log [ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] denote the LLR for the t𝑡titalic_t-th observation Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

We now introduce the CUMULATIVE SUM (CUSUM) detection procedure, which was first proposed by Page (1954). The changepoint detection problem can be viewed as a problem of testing two hypotheses: νsubscript𝜈\operatorname{\mathcal{H}}_{\nu}caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT that the change occurs at a fixed point 0ν<0𝜈0\leqslant\nu<\infty0 ⩽ italic_ν < ∞ against the alternative :ν=:subscript𝜈\operatorname{\mathcal{H}}_{\infty}:\nu=\inftycaligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT : italic_ν = ∞ that the change never occurs. The LLR between these hypotheses is λnν=t=ν+1nZtsuperscriptsubscript𝜆𝑛𝜈superscriptsubscript𝑡𝜈1𝑛subscript𝑍𝑡\lambda_{n}^{\nu}=\sum_{t=\nu+1}^{n}Z_{t}italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = italic_ν + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT for ν<n𝜈𝑛\nu<nitalic_ν < italic_n and 00 for νn𝜈𝑛\nu\geqslant nitalic_ν ⩾ italic_n. Since the hypothesis νsubscript𝜈\operatorname{\mathcal{H}}_{\nu}caligraphic_H start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT is composite, we may employ the GLR approach, maximizing the LLR λnνsuperscriptsubscript𝜆𝑛𝜈\lambda_{n}^{\nu}italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT over ν𝜈\nuitalic_ν, to obtain the log-GLR statistic:

Wn=maxν0t=ν+1nZt,subscript𝑊𝑛subscript𝜈0superscriptsubscript𝑡𝜈1𝑛subscript𝑍𝑡W_{n}=\max_{\nu\geqslant 0}\sum_{t=\nu+1}^{n}Z_{t},italic_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_ν ⩾ 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = italic_ν + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (53)

which follows the recursion

Wn=(Wn1+Zn)+,n1,W0=0.formulae-sequencesubscript𝑊𝑛superscriptsubscript𝑊𝑛1subscript𝑍𝑛formulae-sequence𝑛1subscript𝑊00W_{n}=\left(W_{n-1}+Z_{n}\right)^{+},\quad n\geqslant 1,~{}~{}W_{0}=0.italic_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ( italic_W start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT , italic_n ⩾ 1 , italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 . (54)

This statistic is called the CUSUM statistic. Page’s CUSUM procedure is the first time n1𝑛1n\geqslant 1italic_n ⩾ 1 such that the CUSUM statistic Wnsubscript𝑊𝑛W_{n}italic_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT exceeds a positive threshold a𝑎aitalic_a:

Ta=inf{n1:Wna}.subscript𝑇𝑎infimumconditional-set𝑛1subscript𝑊𝑛𝑎T_{a}=\inf\{n\geqslant 1:W_{n}\geqslant a\}.italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = roman_inf { italic_n ⩾ 1 : italic_W start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⩾ italic_a } . (55)

Page (1954) proposed measuring the risk due to a false alarm by the mean time to false alarm 𝖤[T]subscript𝖤delimited-[]𝑇{\mathsf{E}}_{\infty}[T]sansserif_E start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT [ italic_T ] and the risk associated with a true change detection by the mean time to detection 𝖤0[T]subscript𝖤0delimited-[]𝑇{\mathsf{E}}_{0}[T]sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_T ] when the change occurs at the very beginning. These are commonly known as the Average Run Length (ARL). Page also analyzed the CUSUM procedure defined by equations (53)–(55) using these operating characteristics.

While the false alarm rate is reasonable to measure by the ARL to false alarm 𝖠𝖱𝖫𝖥𝖠(T)=𝖤[T]𝖠𝖱𝖫𝖥𝖠𝑇subscript𝖤delimited-[]𝑇\operatorname{\mathsf{ARLFA}}(T)={\mathsf{E}}_{\infty}[T]sansserif_ARLFA ( italic_T ) = sansserif_E start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT [ italic_T ], the risk due to a true change detection is better measured by the conditional expected delay to detection 𝖤ν[Tν|T>ν]subscript𝖤𝜈delimited-[]𝑇𝜈ket𝑇𝜈{\mathsf{E}}_{\nu}[T-\nu|T>\nu]sansserif_E start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT [ italic_T - italic_ν | italic_T > italic_ν ] for any possible change point ν+𝜈subscript\nu\in\mathbb{Z}_{+}italic_ν ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, rather than by the ARL to detection 𝖤0[T]subscript𝖤0delimited-[]𝑇{\mathsf{E}}_{0}[T]sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_T ]. Ideally, a good detection procedure should guarantee small values of the expected detection delay for all change points ν+𝜈subscript\nu\in\mathbb{Z}_{+}italic_ν ∈ blackboard_Z start_POSTSUBSCRIPT + end_POSTSUBSCRIPT when 𝖠𝖱𝖫𝖥𝖠(T)𝖠𝖱𝖫𝖥𝖠𝑇\operatorname{\mathsf{ARLFA}}(T)sansserif_ARLFA ( italic_T ) is set at a certain level. However, if the false alarm risk is measured in terms of the ARL to false alarm, i.e., it is required that 𝖠𝖱𝖫𝖥𝖠(T)γ𝖠𝖱𝖫𝖥𝖠𝑇𝛾\operatorname{\mathsf{ARLFA}}(T)\geqslant\gammasansserif_ARLFA ( italic_T ) ⩾ italic_γ for some γ1𝛾1\gamma\geqslant 1italic_γ ⩾ 1, then a procedure that minimizes the conditional expected delay to detection 𝖤ν[Tν|T>ν]subscript𝖤𝜈delimited-[]𝑇𝜈ket𝑇𝜈{\mathsf{E}}_{\nu}[T-\nu|T>\nu]sansserif_E start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT [ italic_T - italic_ν | italic_T > italic_ν ] uniformly over all ν𝜈\nuitalic_ν does not exist. For this reason, we must resort to different optimality criteria, such as Bayesian and minimax criteria.

The minimax approach posits that the changepoint is an unknown not necessarily random number. Even if it is random its distribution is unknown.

Lorden (1971) was the first who addressed the minimax change detection problem and developed the first minimax theory. He proposed to measure the false alarm risk by the ARL to false alarm 𝖠𝖱𝖫𝖥𝖠(T)=𝖤[T]𝖠𝖱𝖫𝖥𝖠𝑇subscript𝖤delimited-[]𝑇\operatorname{\mathsf{ARLFA}}(T)={\mathsf{E}}_{\infty}[T]sansserif_ARLFA ( italic_T ) = sansserif_E start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT [ italic_T ], i.e., to consider the class of change detection procedures (γ)={T:𝖠𝖱𝖫𝖥𝖠(T)γ}𝛾conditional-set𝑇𝖠𝖱𝖫𝖥𝖠𝑇𝛾{\mathbb{C}}(\gamma)=\left\{T:\operatorname{\mathsf{ARLFA}}(T)\geqslant\gamma\right\}blackboard_C ( italic_γ ) = { italic_T : sansserif_ARLFA ( italic_T ) ⩾ italic_γ } for some γ1𝛾1\gamma\geqslant 1italic_γ ⩾ 1, and the risk associated with detection delay by the worst-case expected detection delay

𝖤𝖲𝖤𝖣𝖣(T)=sup0ν<{esssup𝖤ν[(Tν)+|X1,,Xν]}.𝖤𝖲𝖤𝖣𝖣𝑇subscriptsupremum0𝜈esssupsubscript𝖤𝜈delimited-[]conditionalsuperscript𝑇𝜈subscript𝑋1subscript𝑋𝜈{\mathsf{ESEDD}}(T)=\sup_{0\leqslant\nu<\infty}\biggl{\{}% \operatornamewithlimits{ess\,sup}{\mathsf{E}}_{\nu}[(T-\nu)^{+}|X_{1},\dots,X_% {\nu}]\biggr{\}}.sansserif_ESEDD ( italic_T ) = roman_sup start_POSTSUBSCRIPT 0 ⩽ italic_ν < ∞ end_POSTSUBSCRIPT { start_OPERATOR roman_ess roman_sup end_OPERATOR sansserif_E start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT [ ( italic_T - italic_ν ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT | italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ] } . (56)

In other words, the conditional expected detection delay is maximized over all possible trajectories (X1,,Xν)subscript𝑋1subscript𝑋𝜈(X_{1},\dots,X_{\nu})( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ) up to the changepoint and then over the changepoint ν𝜈\nuitalic_ν.

Lorden’s minimax criterion is

infTsupν0esssupω𝖤ν[TνT>ν,ν]subject to𝖠𝖱𝖫𝖥𝖠(T)γ,subscriptinfimum𝑇subscriptsupremum𝜈0subscriptesssup𝜔subscript𝖤𝜈𝑇𝜈ket𝑇𝜈subscript𝜈subject to𝖠𝖱𝖫𝖥𝖠𝑇𝛾\inf_{T}\sup_{\nu\geqslant 0}\operatornamewithlimits{ess\,sup}_{\omega}{% \mathsf{E}}_{\nu}[T-\nu\mid T>\nu,{\mathscr{F}}_{\nu}]\quad\text{subject to}~{% }\operatorname{\mathsf{ARLFA}}(T)\geqslant\gamma,roman_inf start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_ν ⩾ 0 end_POSTSUBSCRIPT start_OPERATOR roman_ess roman_sup end_OPERATOR start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT sansserif_E start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT [ italic_T - italic_ν ∣ italic_T > italic_ν , script_F start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ] subject to sansserif_ARLFA ( italic_T ) ⩾ italic_γ ,

i.e., Lorden’s minimax optimization problem seeks to

Find Topt(γ) such that 𝖤𝖲𝖤𝖣𝖣(Topt)=infT(γ)𝖤𝖲𝖤𝖣𝖣(T) for every γ1.Find Topt(γ) such that 𝖤𝖲𝖤𝖣𝖣(Topt)=infT(γ)𝖤𝖲𝖤𝖣𝖣(T) for every γ1\text{Find $T_{\mathrm{opt}}\in{\mathbb{C}}(\gamma)$ such that ${\mathsf{ESEDD% }}(T_{\mathrm{opt}})=\inf_{T\in{\mathbb{C}}(\gamma)}{\mathsf{ESEDD}}(T)$ for % every $\gamma\geqslant 1$}.Find italic_T start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT ∈ blackboard_C ( italic_γ ) such that sansserif_ESEDD ( italic_T start_POSTSUBSCRIPT roman_opt end_POSTSUBSCRIPT ) = roman_inf start_POSTSUBSCRIPT italic_T ∈ blackboard_C ( italic_γ ) end_POSTSUBSCRIPT sansserif_ESEDD ( italic_T ) for every italic_γ ⩾ 1 . (57)

Lorden (1971) demonstrated that Page’s CUSUM procedure achieves first-order asymptotic minimax optimality as γ𝛾\gammaitalic_γ approaches infinity. This groundbreaking finding marked the initial optimality result in the minimax change detection problem. Given the significance of this outcome and the widespread adoption of Lorden’s minimax criterion not only within statistical circles but also across various practical domains, we proceed to provide further elaboration.

To establish the asymptotic optimality of Page’s CUSUM procedure, Lorden employs an intriguing method that permits the utilization of one-sided hypothesis tests to assess a collection of change detection procedures, among them Page’s method. Let τ=τ(α)𝜏𝜏𝛼\tau=\tau(\alpha)italic_τ = italic_τ ( italic_α ) be a stop** time with respect to X1,X2,subscript𝑋1subscript𝑋2X_{1},X_{2},\ldotsitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … such that

𝖯(τ<)α,subscript𝖯𝜏𝛼{\mathsf{P}}_{\infty}(\tau<\infty)\leqslant\alpha,sansserif_P start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_τ < ∞ ) ⩽ italic_α , (58)

where α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ). For k=0,1,2,𝑘012k=0,1,2,\dotsitalic_k = 0 , 1 , 2 , … define the stop** time τksubscript𝜏𝑘\tau_{k}italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT obtained by applying τ𝜏\tauitalic_τ to the sequence Xk+1,Xk+2,subscript𝑋𝑘1subscript𝑋𝑘2X_{k+1},X_{k+2},\dotsitalic_X start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k + 2 end_POSTSUBSCRIPT , … and let τ=mink0(τk+k)superscript𝜏subscript𝑘0subscript𝜏𝑘𝑘\tau^{*}=\min_{k\geqslant 0}(\tau_{k}+k)italic_τ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_min start_POSTSUBSCRIPT italic_k ⩾ 0 end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_k ).

The subsequent theorem, resembling Theorem 2 in Lorden (1971), empowers the construction of nearly optimal change detection procedures and facilitates the demonstration of the near optimality of the CUSUM procedure. It’s important to recall that 𝖯subscript𝖯{\mathsf{P}}_{\infty}sansserif_P start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT denotes the distribution characterized by the density f0(x)subscript𝑓0𝑥f_{0}(x)italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ), while 𝖯0subscript𝖯0{\mathsf{P}}_{0}sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT corresponds to the distribution with density f1(x)subscript𝑓1𝑥f_{1}(x)italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ).

Theorem 6.

The random variable τsuperscript𝜏\tau^{*}italic_τ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a stop** time with respect to X1,X2,subscript𝑋1subscript𝑋2italic-…X_{1},X_{2},\dotsitalic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_… and if condition (58) is satisfied, then the following two inequalities hold:

𝖤[τ]1/αsubscript𝖤delimited-[]superscript𝜏1𝛼{\mathsf{E}}_{\infty}[\tau^{*}]\geqslant 1/\alphasansserif_E start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT [ italic_τ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] ⩾ 1 / italic_α (59)

and

𝖤0[τ]𝖤0[τ].subscript𝖤0delimited-[]superscript𝜏subscript𝖤0delimited-[]𝜏{\mathsf{E}}_{0}[\tau^{*}]\leqslant{\mathsf{E}}_{0}[\tau].sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_τ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] ⩽ sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_τ ] . (60)

The cumulative LLR for the sample (Xk+1,,Xn)subscript𝑋𝑘1subscript𝑋𝑛(X_{k+1},\dots,X_{n})( italic_X start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) is λnk=t=k+1nZt.superscriptsubscript𝜆𝑛𝑘superscriptsubscript𝑡𝑘1𝑛subscript𝑍𝑡\lambda_{n}^{k}=\sum_{t=k+1}^{n}Z_{t}.italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . Let τ(α)=inf{n1:λn0|logα|}𝜏𝛼infimumconditional-set𝑛1superscriptsubscript𝜆𝑛0𝛼\tau(\alpha)=\inf\left\{n\geqslant 1:\lambda_{n}^{0}\geqslant|\log\alpha|\right\}italic_τ ( italic_α ) = roman_inf { italic_n ⩾ 1 : italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⩾ | roman_log italic_α | } denote the stop** time of the one-sided SPRT for testing f0subscript𝑓0f_{0}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT versus f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with threshold |logα|𝛼|\log\alpha|| roman_log italic_α |. Then 𝖯(τ(α)<)αsubscript𝖯𝜏𝛼𝛼{\mathsf{P}}_{\infty}(\tau(\alpha)<\infty)\leqslant\alphasansserif_P start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_τ ( italic_α ) < ∞ ) ⩽ italic_α, so condition (58) holds. If the Kullback-Leibler information number I=𝖤0[Z1]𝐼subscript𝖤0delimited-[]subscript𝑍1I={\mathsf{E}}_{0}[Z_{1}]italic_I = sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] is positive and finite, then it is well-known that

𝖤0[τ(α)]=|logα|I(1+o(1))asα0.formulae-sequencesubscript𝖤0delimited-[]𝜏𝛼𝛼𝐼1𝑜1as𝛼0{\mathsf{E}}_{0}[\tau(\alpha)]=\frac{|\log\alpha|}{I}(1+o(1))\quad\text{as}~{}% \alpha\to 0.sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_τ ( italic_α ) ] = divide start_ARG | roman_log italic_α | end_ARG start_ARG italic_I end_ARG ( 1 + italic_o ( 1 ) ) as italic_α → 0 .

Next, note that the CUSUM statistic defined in (53) is the maximum of λnksuperscriptsubscript𝜆𝑛𝑘\lambda_{n}^{k}italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT over k0𝑘0k\geqslant 0italic_k ⩾ 0, so the stop** time of the CUSUM procedure (54) can obviously be written as Ta=mink0{τk(α)+k}τsubscript𝑇𝑎subscript𝑘0subscript𝜏𝑘𝛼𝑘superscript𝜏T_{a}=\min_{k\geqslant 0}\{\tau_{k}(\alpha)+k\}\equiv\tau^{*}italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_k ⩾ 0 end_POSTSUBSCRIPT { italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_α ) + italic_k } ≡ italic_τ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT for a=aα=|logα|𝑎subscript𝑎𝛼𝛼a=a_{\alpha}=|\log\alpha|italic_a = italic_a start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = | roman_log italic_α |, where

τk(α)=inf{n1:λk+nk|logα|}.subscript𝜏𝑘𝛼infimumconditional-set𝑛1superscriptsubscript𝜆𝑘𝑛𝑘𝛼\tau_{k}(\alpha)=\inf\left\{n\geqslant 1:\lambda_{k+n}^{k}\geqslant|\log\alpha% |\right\}.italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_α ) = roman_inf { italic_n ⩾ 1 : italic_λ start_POSTSUBSCRIPT italic_k + italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⩾ | roman_log italic_α | } .

It follows from Theorem 6 that setting α=γ1𝛼superscript𝛾1\alpha=\gamma^{-1}italic_α = italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT gives 𝖤[Taγ]γsubscript𝖤delimited-[]subscript𝑇subscript𝑎𝛾𝛾{\mathsf{E}}_{\infty}[T_{a_{\gamma}}]\geqslant\gammasansserif_E start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] ⩾ italic_γ, so Taγ(γ)subscript𝑇subscript𝑎𝛾𝛾T_{a_{\gamma}}\in{\mathbb{C}}(\gamma)italic_T start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_C ( italic_γ ), and

𝖤𝖲𝖤𝖣𝖣(Taγ)𝖤0[Taγ]=logγI(1+o(1))asγ.formulae-sequence𝖤𝖲𝖤𝖣𝖣subscript𝑇subscript𝑎𝛾subscript𝖤0delimited-[]subscript𝑇subscript𝑎𝛾𝛾𝐼1𝑜1as𝛾{\mathsf{ESEDD}}(T_{a_{\gamma}})\equiv{\mathsf{E}}_{0}[T_{a_{\gamma}}]=\frac{% \log\gamma}{I}(1+o(1))\quad\text{as}~{}\gamma\to\infty.sansserif_ESEDD ( italic_T start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≡ sansserif_E start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ italic_T start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] = divide start_ARG roman_log italic_γ end_ARG start_ARG italic_I end_ARG ( 1 + italic_o ( 1 ) ) as italic_γ → ∞ .

To complete the proof of the first-order asymptotic optimality of the CUSUM procedure with threshold a=aγ=logγ𝑎subscript𝑎𝛾𝛾a=a_{\gamma}=\log\gammaitalic_a = italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT = roman_log italic_γ it suffices to establish that this is the best one can do, i.e., to prove the asymptotic lower bound

infT(γ)𝖤𝖲𝖤𝖣𝖣(T)logγI(1+o(1))asγ,formulae-sequencesubscriptinfimum𝑇𝛾𝖤𝖲𝖤𝖣𝖣𝑇𝛾𝐼1𝑜1as𝛾\inf_{T\in{\mathbb{C}}(\gamma)}{\mathsf{ESEDD}}(T)\geqslant\frac{\log\gamma}{I% }(1+o(1))\quad\text{as}~{}\gamma\to\infty,roman_inf start_POSTSUBSCRIPT italic_T ∈ blackboard_C ( italic_γ ) end_POSTSUBSCRIPT sansserif_ESEDD ( italic_T ) ⩾ divide start_ARG roman_log italic_γ end_ARG start_ARG italic_I end_ARG ( 1 + italic_o ( 1 ) ) as italic_γ → ∞ , (61)

which also yields

infT(γ)𝖤𝖲𝖤𝖣𝖣(T)logγI𝖤𝖲𝖤𝖣𝖣(Taγ)asγ.formulae-sequencesimilar-tosubscriptinfimum𝑇𝛾𝖤𝖲𝖤𝖣𝖣𝑇𝛾𝐼similar-to𝖤𝖲𝖤𝖣𝖣subscript𝑇subscript𝑎𝛾as𝛾\inf_{T\in{\mathbb{C}}(\gamma)}{\mathsf{ESEDD}}(T)\sim\frac{\log\gamma}{I}\sim% {\mathsf{ESEDD}}(T_{a_{\gamma}})\quad\text{as}~{}\gamma\to\infty.roman_inf start_POSTSUBSCRIPT italic_T ∈ blackboard_C ( italic_γ ) end_POSTSUBSCRIPT sansserif_ESEDD ( italic_T ) ∼ divide start_ARG roman_log italic_γ end_ARG start_ARG italic_I end_ARG ∼ sansserif_ESEDD ( italic_T start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) as italic_γ → ∞ .

Theorem 3 of Lorden (1971) establishes this fact using a rather sophisticated argument. Note, however, that Lai (1998) established the lower bound (61) in a general non-i.i.d. case, assuming that n1λν+nνsuperscript𝑛1superscriptsubscript𝜆𝜈𝑛𝜈n^{-1}\lambda_{\nu+n}^{\nu}italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_ν + italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT converges to a positive and finite number I𝐼Iitalic_I as n𝑛n\to\inftyitalic_n → ∞, under a certain additional condition. In the i.i.d. case, by the SLLN n1λν+nνsuperscript𝑛1superscriptsubscript𝜆𝜈𝑛𝜈n^{-1}\lambda_{\nu+n}^{\nu}italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_ν + italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT converges to the Kullback–Leibler information number I𝐼Iitalic_I almost surely under 𝖯νsubscript𝖯𝜈{\mathsf{P}}_{\nu}sansserif_P start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT. This implies that as M𝑀M\to\inftyitalic_M → ∞ for all ε>0𝜀0\varepsilon>0italic_ε > 0

supν0𝖯ν{1Mmax0nMλν+nν(1+ε)I}=𝖯0{1Mmax0nMλn0(1+ε)I}0.subscriptsupremum𝜈0subscript𝖯𝜈1𝑀subscript0𝑛𝑀superscriptsubscript𝜆𝜈𝑛𝜈1𝜀𝐼subscript𝖯01𝑀subscript0𝑛𝑀superscriptsubscript𝜆𝑛01𝜀𝐼0\sup_{\nu\geqslant 0}{\mathsf{P}}_{\nu}\left\{\frac{1}{M}\max_{0\leqslant n% \leqslant M}\lambda_{\nu+n}^{\nu}\geqslant(1+\varepsilon)I\right\}={\mathsf{P}% }_{0}\left\{\frac{1}{M}\max_{0\leqslant n\leqslant M}\lambda_{n}^{0}\geqslant(% 1+\varepsilon)I\right\}\to 0.roman_sup start_POSTSUBSCRIPT italic_ν ⩾ 0 end_POSTSUBSCRIPT sansserif_P start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT { divide start_ARG 1 end_ARG start_ARG italic_M end_ARG roman_max start_POSTSUBSCRIPT 0 ⩽ italic_n ⩽ italic_M end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_ν + italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT ⩾ ( 1 + italic_ε ) italic_I } = sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT { divide start_ARG 1 end_ARG start_ARG italic_M end_ARG roman_max start_POSTSUBSCRIPT 0 ⩽ italic_n ⩽ italic_M end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ⩾ ( 1 + italic_ε ) italic_I } → 0 . (62)

Using (62), the lower bound (61) can be obtained from Theorem 1 in Lai (1998).

To handle a composite parametric post-change hypothesis, which is typical in many applications, let fθ(x)subscript𝑓𝜃𝑥f_{\theta}(x)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) be the post-change density, where θΘ𝜃Θ\theta\in\Thetaitalic_θ ∈ roman_Θ. Denote Zn(θ)=log[fθ(Xn)/f0(Xn)]subscript𝑍𝑛𝜃subscript𝑓𝜃subscript𝑋𝑛subscript𝑓0subscript𝑋𝑛Z_{n}(\theta)=\log[f_{\theta}(X_{n})/f_{0}(X_{n})]italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_θ ) = roman_log [ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) / italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ]. Then, inequality (60) in Theorem 6 holds for expectation 𝖤θ[τ]subscript𝖤𝜃delimited-[]superscript𝜏{\mathsf{E}}_{\theta}[\tau^{*}]sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_τ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ]. Additionally, assuming that the Kullback-Leibler information number I(θ)=𝖤θ[Z1(θ)]𝐼𝜃subscript𝖤𝜃delimited-[]subscript𝑍1𝜃I(\theta)={\mathsf{E}}_{\theta}[Z_{1}(\theta)]italic_I ( italic_θ ) = sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_θ ) ] is positive and finite, then asymptotic lower bound (61) holds with I(θ)𝐼𝜃I(\theta)italic_I ( italic_θ ), i.e.,

infT(γ)𝖤𝖲𝖤𝖣𝖣θ(T)logγI(θ)(1+o(1))asγ,formulae-sequencesubscriptinfimum𝑇𝛾subscript𝖤𝖲𝖤𝖣𝖣𝜃𝑇𝛾𝐼𝜃1𝑜1as𝛾\inf_{T\in{\mathbb{C}}(\gamma)}{\mathsf{ESEDD}}_{\theta}(T)\geqslant\frac{\log% \gamma}{I(\theta)}(1+o(1))\quad\text{as}~{}\gamma\to\infty,roman_inf start_POSTSUBSCRIPT italic_T ∈ blackboard_C ( italic_γ ) end_POSTSUBSCRIPT sansserif_ESEDD start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_T ) ⩾ divide start_ARG roman_log italic_γ end_ARG start_ARG italic_I ( italic_θ ) end_ARG ( 1 + italic_o ( 1 ) ) as italic_γ → ∞ , (63)

where 𝖤𝖲𝖤𝖣𝖣θ(T)=sup0ν<esssup𝖤ν,θ[(Tν)+|ν]subscript𝖤𝖲𝖤𝖣𝖣𝜃𝑇subscriptsupremum0𝜈esssupsubscript𝖤𝜈𝜃delimited-[]conditionalsuperscript𝑇𝜈subscript𝜈{\mathsf{ESEDD}}_{\theta}(T)=\sup_{0\leqslant\nu<\infty}% \operatornamewithlimits{ess\,sup}{\mathsf{E}}_{\nu,\theta}[(T-\nu)^{+}|{% \mathscr{F}}_{\nu}]sansserif_ESEDD start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_T ) = roman_sup start_POSTSUBSCRIPT 0 ⩽ italic_ν < ∞ end_POSTSUBSCRIPT start_OPERATOR roman_ess roman_sup end_OPERATOR sansserif_E start_POSTSUBSCRIPT italic_ν , italic_θ end_POSTSUBSCRIPT [ ( italic_T - italic_ν ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT | script_F start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ] and 𝖤ν,θsubscript𝖤𝜈𝜃{\mathsf{E}}_{\nu,\theta}sansserif_E start_POSTSUBSCRIPT italic_ν , italic_θ end_POSTSUBSCRIPT is the expectation under 𝖯ν,θsubscript𝖯𝜈𝜃{\mathsf{P}}_{\nu,\theta}sansserif_P start_POSTSUBSCRIPT italic_ν , italic_θ end_POSTSUBSCRIPT when the change occurs at ν𝜈\nuitalic_ν with the post-change density fθsubscript𝑓𝜃f_{\theta}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT.

Lorden (1971) addressed the composite hypothesis for the exponential family (30) with f0=fθ=0subscript𝑓0subscript𝑓𝜃0f_{0}=f_{\theta=0}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_θ = 0 end_POSTSUBSCRIPT, i.e.,

fθ(Xn)f0(Xn)=exp{θXnb(θ)},θΘ,n=1,2,formulae-sequencesubscript𝑓𝜃subscript𝑋𝑛subscript𝑓0subscript𝑋𝑛𝜃subscript𝑋𝑛𝑏𝜃formulae-sequence𝜃Θ𝑛12\frac{f_{\theta}(X_{n})}{f_{0}(X_{n})}=\exp\left\{\theta X_{n}-b(\theta)\right% \},\quad\theta\in\Theta,\quad n=1,2,\dotsdivide start_ARG italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG = roman_exp { italic_θ italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_b ( italic_θ ) } , italic_θ ∈ roman_Θ , italic_n = 1 , 2 , …

where b(θ)𝑏𝜃b(\theta)italic_b ( italic_θ ) is a convex and infinitely differentiable function on the natural parameter space ΘΘ\Thetaroman_Θ, b(0)=0𝑏00b(0)=0italic_b ( 0 ) = 0. Let Θ~=Θ0~ΘΘ0\widetilde{\Theta}=\Theta-0over~ start_ARG roman_Θ end_ARG = roman_Θ - 0.

In order to find asymptotically optimal procedures by applying Theorem 6 along with inequality (63) we need to determine stop** times, τ(γ)(γ)𝜏𝛾𝛾\tau(\gamma)\in{\mathbb{C}}(\gamma)italic_τ ( italic_γ ) ∈ blackboard_C ( italic_γ ), such that

𝖯0(τ(γ)<)1/γforγ>0formulae-sequencesubscript𝖯0𝜏𝛾1𝛾for𝛾0{\mathsf{P}}_{0}(\tau(\gamma)<\infty)\leqslant 1/\gamma\quad\text{for}~{}% \gamma>0sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_τ ( italic_γ ) < ∞ ) ⩽ 1 / italic_γ for italic_γ > 0 (64)

and

𝖤θ[τ(γ)]=logγI(θ)(1+o(1))asγfor allθΘ~,formulae-sequencesubscript𝖤𝜃delimited-[]𝜏𝛾𝛾𝐼𝜃1𝑜1formulae-sequenceas𝛾for all𝜃~Θ{\mathsf{E}}_{\theta}[\tau(\gamma)]=\frac{\log\gamma}{I(\theta)}(1+o(1))\quad% \text{as}~{}\gamma\to\infty\quad\text{for all}~{}\theta\in\widetilde{\Theta},sansserif_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ italic_τ ( italic_γ ) ] = divide start_ARG roman_log italic_γ end_ARG start_ARG italic_I ( italic_θ ) end_ARG ( 1 + italic_o ( 1 ) ) as italic_γ → ∞ for all italic_θ ∈ over~ start_ARG roman_Θ end_ARG , (65)

where I(θ)=θb.(θ)b(θ)𝐼𝜃𝜃bold-.𝑏𝜃𝑏𝜃I(\theta)=\theta\overset{\bm{.}}{b}(\theta)-b(\theta)italic_I ( italic_θ ) = italic_θ overbold_. start_ARG italic_b end_ARG ( italic_θ ) - italic_b ( italic_θ ).

The LLR for the sample (Xk+1,,Xn)subscript𝑋𝑘1subscript𝑋𝑛(X_{k+1},\dots,X_{n})( italic_X start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) is

λnk(θ)=:log[t=k+1nfθ(Xt)f0(Xt)]=θSnk(nk)b(θ),\lambda_{n}^{k}(\theta)=:\log\left[\prod_{t=k+1}^{n}\frac{f_{\theta}(X_{t})}{f% _{0}(X_{t})}\right]=\theta S_{n}^{k}-(n-k)b(\theta),italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ( italic_θ ) = : roman_log [ ∏ start_POSTSUBSCRIPT italic_t = italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG ] = italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - ( italic_n - italic_k ) italic_b ( italic_θ ) ,

where Snk=Xk+1++Xnsuperscriptsubscript𝑆𝑛𝑘subscript𝑋𝑘1subscript𝑋𝑛S_{n}^{k}=X_{k+1}+\cdots+X_{n}italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = italic_X start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + ⋯ + italic_X start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Define the GLR one-sided test

τ(h)=inf{n1:supθ|θ1|[θSn0nb(θ)]>h(γ)},𝜏infimumconditional-set𝑛1subscriptsupremum𝜃subscript𝜃1delimited-[]𝜃superscriptsubscript𝑆𝑛0𝑛𝑏𝜃𝛾\tau(h)=\inf\left\{n\geqslant 1:\sup_{\theta\geqslant|\theta_{1}|}\left[\theta S% _{n}^{0}-nb(\theta)\right]>h(\gamma)\right\},italic_τ ( italic_h ) = roman_inf { italic_n ⩾ 1 : roman_sup start_POSTSUBSCRIPT italic_θ ⩾ | italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | end_POSTSUBSCRIPT [ italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - italic_n italic_b ( italic_θ ) ] > italic_h ( italic_γ ) } ,

where θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT may be either a fixed value if the alternative hypothesis is θθ1𝜃subscript𝜃1\theta\leqslant-\theta_{1}italic_θ ⩽ - italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT or θθ1𝜃subscript𝜃1\theta\geqslant\theta_{1}italic_θ ⩾ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT or θ1(γ)0subscript𝜃1𝛾0\theta_{1}(\gamma)\to 0italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_γ ) → 0 as γ𝛾\gamma\to\inftyitalic_γ → ∞ if the hypothesis is θ0𝜃0\theta\neq 0italic_θ ≠ 0. Lorden demonstrates that

𝖯0(τ(h)<)exp{h(γ)}[1+h(γ)min(I(θ1),I(θ1)],{\mathsf{P}}_{0}(\tau(h)<\infty)\leqslant\exp\left\{-h(\gamma)\right\}\left[1+% \frac{h(\gamma)}{\min(I(\theta_{1}),I(-\theta_{1})}\right],sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_τ ( italic_h ) < ∞ ) ⩽ roman_exp { - italic_h ( italic_γ ) } [ 1 + divide start_ARG italic_h ( italic_γ ) end_ARG start_ARG roman_min ( italic_I ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_I ( - italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG ] , (66)

so h(γ)𝛾h(\gamma)italic_h ( italic_γ ) can be selected so that h(γ)logγsimilar-to𝛾𝛾h(\gamma)\sim\log\gammaitalic_h ( italic_γ ) ∼ roman_log italic_γ as γ𝛾\gamma\to\inftyitalic_γ → ∞. Hence, (64) and (65) hold. Applying τ(h)𝜏\tau(h)italic_τ ( italic_h ) to Xk+1,Xk+2,subscript𝑋𝑘1subscript𝑋𝑘2X_{k+1},X_{k+2},\dotsitalic_X start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_k + 2 end_POSTSUBSCRIPT , … we obtain the stop** time τk(h)subscript𝜏𝑘\tau_{k}(h)italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_h ), so that τ(h)=mink0(τk+k)superscript𝜏subscript𝑘0subscript𝜏𝑘𝑘\tau^{*}(h)=\min_{k\geqslant 0}(\tau_{k}+k)italic_τ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) = roman_min start_POSTSUBSCRIPT italic_k ⩾ 0 end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_k ) is the stop** time of the GLR CUSUM procedure,

τ(h)=inf{n1:max0νnsupθ|θ1|[θSnν(nν)b(θ)]>h(γ)}.superscript𝜏infimumconditional-set𝑛1subscript0𝜈𝑛subscriptsupremum𝜃subscript𝜃1delimited-[]𝜃superscriptsubscript𝑆𝑛𝜈𝑛𝜈𝑏𝜃𝛾\tau^{*}(h)=\inf\left\{n\geqslant 1:\max_{0\leqslant\nu\leqslant n}\sup_{% \theta\geqslant|\theta_{1}|}\left[\theta S_{n}^{\nu}-(n-\nu)b(\theta)\right]>h% (\gamma)\right\}.italic_τ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) = roman_inf { italic_n ⩾ 1 : roman_max start_POSTSUBSCRIPT 0 ⩽ italic_ν ⩽ italic_n end_POSTSUBSCRIPT roman_sup start_POSTSUBSCRIPT italic_θ ⩾ | italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | end_POSTSUBSCRIPT [ italic_θ italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ν end_POSTSUPERSCRIPT - ( italic_n - italic_ν ) italic_b ( italic_θ ) ] > italic_h ( italic_γ ) } .

Thus, the GLR CUSUM procedure is asymptotically first-order minimax.

The inequality (66) is usually overly pessimistic. A much better result gives the approximation 𝖯0(τ(h)<)h(γ)exp{h(γ)}C,subscript𝖯0𝜏𝛾𝛾𝐶{\mathsf{P}}_{0}(\tau(h)<\infty)\approx\sqrt{h(\gamma)}\exp\left\{-h(\gamma)% \right\}C,sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_τ ( italic_h ) < ∞ ) ≈ square-root start_ARG italic_h ( italic_γ ) end_ARG roman_exp { - italic_h ( italic_γ ) } italic_C , which follows from (44). However, the latter one does not guarantee the inequality 𝖯0(τ(h)<)γ1subscript𝖯0𝜏superscript𝛾1{\mathsf{P}}_{0}(\tau(h)<\infty)\leqslant\gamma^{-1}sansserif_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_τ ( italic_h ) < ∞ ) ⩽ italic_γ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and therefore, the inequality 𝖤[τ(h)]γsubscript𝖤delimited-[]superscript𝜏𝛾{\mathsf{E}}_{\infty}[\tau^{*}(h)]\geqslant\gammasansserif_E start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT [ italic_τ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_h ) ] ⩾ italic_γ.

Later, Lorden and Pollak (2005, 2008) proposed adaptive Shiryaev-Roberts and CUSUM procedures that utilize one-step delayed estimators of unknown post-change parameters θ𝜃\thetaitalic_θ. In these procedures, an estimate θ^n1(X1,,Xn1)subscript^𝜃𝑛1subscript𝑋1subscript𝑋𝑛1\hat{\theta}_{n-1}(X_{1},\dots,X_{n-1})over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT ) is used after observing the sample of size n𝑛nitalic_n, similar to the Robbins–Siegmund one-sided adaptive SPRT; see Robbins and Siegmund (1972, 1974). They compared the performance of these adaptive procedures with that of the mixture-based Shiryaev-Roberts procedure. Notably, these adaptive procedures are computationally simpler than the GLR CUSUM procedure.

We conclude with some remarks on later, related developments.

REMARKS

1. Fifteen years later, Moustakides (1986) advanced Lorden’s asymptotic theory by demonstrating, using optimal stop** theory, that the CUSUM procedure is strictly optimal for any ARL to false alarm γ1𝛾1\gamma\geqslant 1italic_γ ⩾ 1 if the threshold a=a(γ)𝑎𝑎𝛾a=a(\gamma)italic_a = italic_a ( italic_γ ) is chosen such that 𝖠𝖱𝖫𝖥𝖠(Ta)=γ𝖠𝖱𝖫𝖥𝖠subscript𝑇𝑎𝛾\operatorname{\mathsf{ARLFA}}(T_{a})=\gammasansserif_ARLFA ( italic_T start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) = italic_γ.

2. Shiryaev (1996) showed that the CUSUM procedure is strictly optimal in the continuous-time scenario for detecting the change in the mean of the Wiener process according to Lorden’s minimax criterion.

3. Pollak (1985) introduced a distinct minimax criterion aimed at minimizing the supremum expected detection delay supν0𝖤[Tν|T>ν]subscriptsupremum𝜈0𝖤delimited-[]𝑇𝜈ket𝑇𝜈\sup_{\nu\geqslant 0}{\mathsf{E}}[T-\nu|T>\nu]roman_sup start_POSTSUBSCRIPT italic_ν ⩾ 0 end_POSTSUBSCRIPT sansserif_E [ italic_T - italic_ν | italic_T > italic_ν ]. Additionally, Pollak proposed a modification of the conventional Shiryaev–Roberts (SR) procedure known as the SRP procedure, which initiates from a randomly distributed point following the quasi-stationary distribution of the SR statistic. He proved that this procedure is third-order asymptotically minimax, minimizing supν0𝖤[Tν|T>ν]subscriptsupremum𝜈0𝖤delimited-[]𝑇𝜈ket𝑇𝜈\sup_{\nu\geqslant 0}{\mathsf{E}}[T-\nu|T>\nu]roman_sup start_POSTSUBSCRIPT italic_ν ⩾ 0 end_POSTSUBSCRIPT sansserif_E [ italic_T - italic_ν | italic_T > italic_ν ] to within o(1)𝑜1o(1)italic_o ( 1 ) as γ𝛾\gamma\to\inftyitalic_γ → ∞ within the class (γ)𝛾{\mathbb{C}}(\gamma)blackboard_C ( italic_γ ).

4. Tartakovsky, Pollak, and Polunchenko (2012) proved that the specially designed SR-r𝑟ritalic_r procedure that starts from a fixed point r=r(γ)𝑟𝑟𝛾r=r(\gamma)italic_r = italic_r ( italic_γ ) is third-order asymptotically optimal with respect to Pollak’s measure supν0𝖤[Tν|T>ν]subscriptsupremum𝜈0𝖤delimited-[]𝑇𝜈ket𝑇𝜈\sup_{\nu\geqslant 0}{\mathsf{E}}[T-\nu|T>\nu]roman_sup start_POSTSUBSCRIPT italic_ν ⩾ 0 end_POSTSUBSCRIPT sansserif_E [ italic_T - italic_ν | italic_T > italic_ν ] within the class (γ)𝛾{\mathbb{C}}(\gamma)blackboard_C ( italic_γ ) as γ𝛾\gamma\to\inftyitalic_γ → ∞.

5. Polunchenko and Tartakovsky (2010) demonstrated that the specially designed SR-r𝑟ritalic_r procedure, which commences from a predetermined point r=r(γ)𝑟𝑟𝛾r=r(\gamma)italic_r = italic_r ( italic_γ ), is strictly optimal with respect to Pollak’s measure supν0𝖤[Tν|T>ν]subscriptsupremum𝜈0𝖤delimited-[]𝑇𝜈ket𝑇𝜈\sup_{\nu\geqslant 0}{\mathsf{E}}[T-\nu|T>\nu]roman_sup start_POSTSUBSCRIPT italic_ν ⩾ 0 end_POSTSUBSCRIPT sansserif_E [ italic_T - italic_ν | italic_T > italic_ν ] within the class (γ)𝛾{\mathbb{C}}(\gamma)blackboard_C ( italic_γ ) for a specific model.

6. Pollak and Tartakovsky (2009) proved strict optimality of the repeated SR procedure that starts from zero in the problem of detecting distant changes.

7. Moustakides, Polunchenko, and Tartakovsky (2009) conducted a thorough comparison of CUSUM and SR procedures, demonstrating that CUSUM outperforms SR in terms of the conditional expected detection delay 𝖤ν[Tν|T>ν]subscript𝖤𝜈delimited-[]𝑇𝜈ket𝑇𝜈{\mathsf{E}}_{\nu}[T-\nu|T>\nu]sansserif_E start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT [ italic_T - italic_ν | italic_T > italic_ν ] for relatively small values of the change point ν𝜈\nuitalic_ν. However, SR proves to be more effective than CUSUM for relatively large ν𝜈\nuitalic_ν.

Acknowledgements

We express our gratitude to the reviewers, the associate editor, and the editor for their valuable comments, which greatly enhanced the quality of the paper. We also thank Caltech for the use of the photos in Figure 1.

AT: I am grateful to Gary Lorden for multiple helpful and insightful conversations starting in 1993 and Gary’s many papers that we have discussed in this article. Gary’s work meaningfully influenced my research, from 1977 on.

JB: I am lucky to be able to call Gary Lorden not only my PhD advisor, but also a mentor and friend. Gary had a profound effect on my life, both within and “beyond the boundaries” of academics.

References

  • Armitage (1950) Armitage, P. 1950. “Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis.” Journal of the Royal Statistical Society - Series B Methodology 12 (1): 137–144.
  • Bartroff (2006a) Bartroff, J. 2006a. “Efficient three-stage t𝑡titalic_t-tests.” In Recent Developments in Nonparametric Inference and Probability: Festschrift for Michael Woodroofe, edited by Michael Woodroofe and Jiayang Sun, Vol. 50 of IMS Lecture Notes Monograph Series, Hayward, 105–111. Institute of Mathematical Statistics.
  • Bartroff (2006b) Bartroff, J. 2006b. “Optimal multistage sampling in a boundary-crossing problem.” Sequential Analysis 25: 59–84.
  • Bartroff (2007) Bartroff, J. 2007. “Asymptotically optimal multistage tests of simple hypotheses.” The Annals of Statistics 35: 2075–2105.
  • Bartroff and Lai (2008a) Bartroff, J., and T. L. Lai. 2008a. “Efficient adaptive designs with mid-course sample size adjustment in clinical trials.” Statistics in Medicine 27: 1593–1611.
  • Bartroff and Lai (2008b) Bartroff, J., and T. L. Lai, 2008b. “Generalized likelihood ratio statistics and uncertainty adjustments in adaptive design of clinical trials.” Sequential Analysis 27: 254–276.
  • Bartroff, Lai, and Shih (2013) Bartroff, J., T. L. Lai, and M. Shih. 2013. Sequential Experimentation in Clinical Trials: Design and Analysis. New York: Springer.
  • Chernoff (1959) Chernoff, Herman. 1959. “Sequential design of experiments.” Annals of Mathematical Statistics 30 (3): 755–770.
  • Devlin and Lorden (2007) Devlin, Keith, and Gary Lorden. 2007. The Numbers Behind NUMB3RS: Solving Crime With Mathematics. New York: Penguin Books.
  • Dragalin and Novikov (1987) Dragalin, V. P., and A. A. Novikov. 1987. “Asymptotic solution of the Kiefer–Weiss problem for processes with independent increments.” Theory of Probability and its Applications 32 (4): 617–627.
  • Feller (1966) Feller, William. 1966. An Introduction to Probability Theory and Its Applications (2nd ed.). Vol. 2 of Series in Probability and Mathematical Statistics. John Wiley & Sons, Inc.
  • Fellouris and Tartakovsky (2017) Fellouris, Georgios, and Alexander G. Tartakovsky. 2017. “Multichannel sequential detection—Part I: Non-i.i.d. data.” IEEE Transactions on Information Theory 63 (7): 4551–4571.
  • Huffman (1983) Huffman, M. D. 1983. “An efficient approximate solution to the Kiefer–Weiss problem.” Annals of Statistics 11 (1): 306–316.
  • Kalashnikov (2013) Kalashnikov, Vladimir V. 2013. Mathematical Methods in Queuing Theory. Vol. 271. Springer Science & Business Media.
  • Kiefer and Sacks (1963) Kiefer, Jack, and J. Sacks. 1963. “Asymptotically optimal sequential inference and design.” Annals of Mathematical Statistics 34 (3): 705–750.
  • Kiefer and Weiss (1957) Kiefer, Jack, and Lionel Weiss. 1957. “Properties of generalized sequential probability ratio tests.” Annals of Mathematical Statistics 28 (1): 57–74.
  • Lai (1981) Lai, Tze Leung. 1981. “Asymptotic optimality of invariant sequential probability ratio tests.” Annals of Statistics 9 (2): 318–333.
  • Lai (1998) Lai, Tze Leung. 1998. “Information bounds and quick detection of parameter changes in stochastic systems.” IEEE Transactions on Information Theory 44 (7): 2917–2929.
  • Lai and Shih (2004) Lai, Tze Leung, and Mei-Chiung Shih. 2004. “Power, Sample Size and Adaptation Considerations in the Design of Group Sequential Clinical Trials.” Biometrika 91(3): 507-528.
  • Lorden (1967) Lorden, Gary. 1967. “Integrated risk of asymptotically Bayes sequential tests.” Annals of Mathematical Statistics 38 (5): 1399–1422.
  • Lorden (1970) Lorden, Gary. 1970. “On excess over the boundary.” Annals of Mathematical Statistics 41 (2): 520–527.
  • Lorden (1971) Lorden, Gary. 1971. “Procedures for reacting to a change in distribution.” Annals of Mathematical Statistics 42 (6): 1897–1908.
  • Lorden (1972) Lorden, Gary, 1972. “Likelihood Ratio Tests for Sequential k-Decision Problems.” Annals of Mathematical Statistics 43 (5):1412–1427.
  • Lorden (1973) Lorden, Gary, 1973. “Open-Ended Tests for Koopman-Darmois Families.” Annals of Statistics 1(4): 633–643.
  • Lorden (1976) Lorden, Gary. 1976. “2-SPRT’s and the modified Kiefer-Weiss problem of minimizing an expected sample size.” Annals of Statistics 4 (2): 281–291.
  • Lorden (1977a) Lorden, Gary. 1977a. “Nearly-optimal sequential tests for finitely many parameter values.” Annals of Statistics 5 (1): 1–21.
  • Lorden (1977b) Lorden, Gary. 1977b. “Nearly optimal sequential tests for exponential families.” Unpublished Manuscript. Available from http://jaybartroff.com/research/gary.pdf
  • Lorden (1980) Lorden, Gary. 1980. “Structure of sequential tests minimizing an expected sample size.” Probability Theory and Related Fields 51 (2): 291–302.
  • Lorden (1983) Lorden, Gary. 1983. “Asymptotic efficiency of three-stage hypothesis tests.” The Annals of Statistics 11: 129–140.
  • Lorden and Pollak (2005) Lorden, Gary, and Moshe Pollak. 2005. “Nonanticipating estimation applied to sequential analysis and changepoint detection.” Annals of Statistics 33 (3): 1422–1454.
  • Lorden and Pollak (2008) Lorden, Gary, and Moshe Pollak. 2008. “Sequential change-point detection procedures that are nearly optimal and computationally simple.” Sequential Analysis 27 (4): 476–512.
  • Moustakides (1986) Moustakides, George V. 1986. “Optimal stop** times for detecting changes in distributions.” Annals of Statistics 14 (4): 1379–1387.
  • Moustakides, Polunchenko, and Tartakovsky (2009) Moustakides, George V., Aleksey S. Polunchenko, and Alexander G. Tartakovsky. 2009. “Numerical comparison of CUSUM and Shiryaev–Roberts procedures for detecting changes in distributions.” Communications in Statistics - Theory and Methods 38 (16–17): 3225–3239.
  • Nathan et al., (2018) Nathan, A., Albert, J., Bartroff, J., Blandford, R., Brooks, D., Derenski, J., Goldstein, L., Hosoi, P., Lorden, G., and Smith, L. 2018. Report of the committee studying home run rates in major league baseball. Technical report, Commissioner of Major League Baseball.
  • Novak (2011) Novak, Serguei Y. 2011. Extreme Value Methods with Applications to Finance. CRC Press.
  • Page (1954) Page, E. S. 1954. “Continuous inspection schemes.” Biometrika 41 (1–2): 100–114.
  • Pollak and Tartakovsky (2009) Pollak, M., and A. G. Tartakovsky. 2009. “Optimality properties of the Shiryaev–Roberts procedure.” Statistica Sinica 19 (4): 1729–1739.
  • Pollak (1985) Pollak, Moshe. 1985. “Optimal detection of a change in distribution.” Annals of Statistics 13 (1): 206–227.
  • Polunchenko and Tartakovsky (2010) Polunchenko, A. S., and A. G. Tartakovsky. 2010. “On Optimality of the Shiryaev–Roberts Procedure for Detecting a Change in Distribution.” Annals of Statistics 38 (6): 3445–3457.
  • Rausand and Hoyland (2003) Rausand, Marvin, and Arnljot Hoyland. 2003. System Reliability Theory: Models, Statistical Methods, and Applications. Vol. 396. John Wiley & Sons.
  • Robbins and Siegmund (1972) Robbins, Herbert, and David Siegmund. 1972. “A class of stop** rules for testing parameter hypotheses.” In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, June 21–July 18, 1970, edited by L. M. Le Cam, J. Neyman, and E. L. Scott, Vol. 4: Biology and Health, 37–41. Berkeley, CA, USA: University of California Press.
  • Robbins and Siegmund (1974) Robbins, Herbert, and David Siegmund. 1974. “The expected sample size of some tests of power one.” Annals of Statistics 2 (3): 415–436.
  • Schwarz (1962) Schwarz, G. 1962. “Asymptotic shapes of Bayes sequential testing regions.” Annals of Mathematical Statistics 33 (1): 224–236.
  • Shewhart (1931) Shewhart, Walter Andrew. 1931. Economic Control of Quality of Manufactured Products. New York, USA: D. Van Nostrand Co.
  • Shiryaev (1996) Shiryaev, A. N. 1996. “Minimax optimality of the method of cumulative sums (CUSUM) in the case of continuous time.” Russian Mathematical Surveys 51 (4): 750–751.
  • Siegmund (1985) Siegmund, David. 1985. Sequential Analysis: Tests and Confidence Intervals. Series in Statistics. New York, USA: Springer-Verlag.
  • Siegmund (2013) Siegmund, David. 2013. “Change-points: From Sequential Detection to Biology and Back.” Sequential Analysis 32 (1): 2–14.
  • Tartakovsky (1998) Tartakovsky, A. G. 1998. “Asymptotic Optimality of Certain Multihypothesis Sequential Tests: Non-i.i.d. Case.” Statistical Inference for Stochastic Processes 1 (3): 265–295.
  • Tartakovsky (2024) Tartakovsky, A.G. 2024. “Nearly Optimum Properties of Certain Multi-Decision Sequential Rules for General Non-i.i.d. Stochastic Models.” Annals of Mathematical Sciences and Applications (submitted). arXiv2405.00928 available at http://arxiv.longhoe.net/abs/2405.00928
  • Tartakovsky (2020) Tartakovsky, A. G. 2020. Sequential Change Detection and Hypothesis Testing: General Non-i.i.d. Stochastic Models and Asymptotically Optimal Rules. Monographs on Statistics and Applied Probability 165. Boca Raton, London, New York: Chapman & Hall/CRC Press, Taylor & Francis Group.
  • Tartakovsky, Nikiforov, and Basseville (2015) Tartakovsky, A. G., I. V. Nikiforov, and M. Basseville. 2015. Sequential Analysis: Hypothesis Testing and Changepoint Detection. Monographs on Statistics and Applied Probability 136. Boca Raton, London, New York: Chapman & Hall/CRC Press, Taylor & Francis Group.
  • Tartakovsky, Pollak, and Polunchenko (2012) Tartakovsky, Alexander G., Moshe Pollak, and Aleksey S. Polunchenko. 2012. “Third-order Asymptotic Optimality of the Generalized Shiryaev–Roberts Changepoint Detection Procedures.” Theory of Probability and its Applications 56 (3): 457–484.
  • Wald (1945) Wald, Abraham. 1945. “Sequential tests of statistical hypotheses.” Annals of Mathematical Statistics 16 (2): 117–186.
  • Wald (1946) Wald, Abraham. 1946. “Differentiation Under the Expectation Sign in the Fundamental Identity of Sequential Analysis.” Annals of Mathematical Statistics 17 (4): 493–497.
  • Wald (1947) Wald, Abraham. 1947. Sequential Analysis. New York, USA: John Wiley & Sons, Inc.
  • Wald and Wolfowitz (1948) Wald, Abraham, and J. Wolfowitz. 1948. “Optimum Character of the Sequential Probability Ratio Test.” Annals of Mathematical Statistics 19 (3): 326–339.
  • Weiss (1962) Weiss, Lionel. 1962. “On Sequential Tests Which Minimize the Maximum Expected Sample Size.” Journal of the American Statistical Association 57 (299): 551–557.
  • Whitehead (1997) Whitehead, John. 1997. The Design and Analysis of Sequential Clinical Trials. John Wiley & Sons.
  • Wong (1968) Wong, Seok Pin. 1968. “Asymptotically Optimum Properties of Certain Sequential Tests.” Annals of Mathematical Statistics 39 (4): 1244–1263.
  • Xing and Fellouris (2023) Xing, Yiming, and Georgios Fellouris. 2023. “Signal recovery with multistage tests and without sparsity constraints.” IEEE Transactions on Information Theory 69 (11): 7220–7245.