Relative belief inferences from decision theory

Michael Evans
Department of Statistical Sciences, University of Toronto
   Gun Ho Jang
Ontario Institute for Cancer Research
Abstract

Relative belief inferences are shown to arise as Bayes rules or limiting Bayes rules. These inferences are invariant under reparameterizations and possess a number of optimal properties. In particular, relative belief inferences are based on a direct measure of statistical evidence.

Key words and phrases: Bayesian inference, evidential inference, statistical evidence, relative belief, loss functions, Bayesian unbiasedness, Bayes rules, admissibility, limits of Bayes rules.

1 Introduction

Consider a sampling model for data x𝑥xitalic_x, given by a collection of densities {f(|θ):θΘ}\{f(\cdot\,|\,\theta):\theta\in\Theta\}{ italic_f ( ⋅ | italic_θ ) : italic_θ ∈ roman_Θ } with respect to a support measure μ𝜇\muitalic_μ on sample space 𝒳,𝒳\mathcal{X},caligraphic_X , and a proper prior, given by density π𝜋\piitalic_π with respect to support measure ν𝜈\nuitalic_ν on Θ.Θ\Theta.roman_Θ . When the data x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X is observed these ingredients lead to the posterior distribution on ΘΘ\Thetaroman_Θ with density given by π(θ|x)=π(θ)f(x|θ)/m(x)𝜋conditional𝜃𝑥𝜋𝜃𝑓conditional𝑥𝜃𝑚𝑥\pi(\theta\,|\,x)=\pi(\theta)f(x\,|\,\theta)/m(x)italic_π ( italic_θ | italic_x ) = italic_π ( italic_θ ) italic_f ( italic_x | italic_θ ) / italic_m ( italic_x ) with respect to support measure ν𝜈\nuitalic_ν where m(x)=Θπ(θ)f(x|θ)ν(dθ)𝑚𝑥subscriptΘ𝜋𝜃𝑓conditional𝑥𝜃𝜈𝑑𝜃m(x)=\int_{\Theta}\pi(\theta)f(x\,|\,\theta)\,\nu(d\theta)italic_m ( italic_x ) = ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_π ( italic_θ ) italic_f ( italic_x | italic_θ ) italic_ν ( italic_d italic_θ ) is the prior predictive density of the data. In addition, there is a quantity of interest ψ=Ψ(θ),𝜓Ψ𝜃\psi=\Psi(\theta),italic_ψ = roman_Ψ ( italic_θ ) , where Ψ:ΘΨ(Θ):ΨΘΨΘ\Psi:\Theta\rightarrow\Psi(\Theta)roman_Ψ : roman_Θ → roman_Ψ ( roman_Θ ) for which inferences, such as an estimate ψ(x)𝜓𝑥\psi(x)italic_ψ ( italic_x ) or a hypothesis assessment H0:Ψ(θ)=ψ0,:subscript𝐻0Ψ𝜃subscript𝜓0H_{0}:\Psi(\theta)=\psi_{0},italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : roman_Ψ ( italic_θ ) = italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , are required. Let πΨsubscript𝜋Ψ\pi_{\Psi}italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT denote the marginal prior density of ψ𝜓\psiitalic_ψ and m(x|ψ)=Θf(x|θ)Π(dθ|ψ)𝑚conditional𝑥𝜓subscriptΘ𝑓conditional𝑥𝜃Πconditional𝑑𝜃𝜓m(x\,|\,\psi)=\int_{\Theta}f(x\,|\,\theta)\,\Pi(d\theta\,|\,\psi)italic_m ( italic_x | italic_ψ ) = ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_f ( italic_x | italic_θ ) roman_Π ( italic_d italic_θ | italic_ψ ) be the conditional prior predictive of the data after integrating out the nuisance parameters via the conditional distribution of θ𝜃\thetaitalic_θ given Ψ(θ)=ψ.Ψ𝜃𝜓\Psi(\theta)=\psi.roman_Ψ ( italic_θ ) = italic_ψ . Bayesian inferences for ψ𝜓\psiitalic_ψ are then based on the ingredients ({m(|ψ):ψΨ(Θ)},πΨ,x)(\{m(\cdot\,|\,\psi):\psi\in\Psi(\Theta)\},\pi_{\Psi},x)( { italic_m ( ⋅ | italic_ψ ) : italic_ψ ∈ roman_Ψ ( roman_Θ ) } , italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT , italic_x ) alone or by adding a loss function L.𝐿L.italic_L .

There are several different general approaches to deriving inferences based on such ingredients. The two most commonly used are MAP-based inferences and Bayesian decision theory. MAP-based inferences are based implicitly on assuming that posterior probabilities can measure statistical evidence and do not use a loss function explicitly. Bayesian decision theory seeks inferences that are optimal with respect to risk which is defined as the expected loss incurred by an inference under the joint distribution of (θ,x)π(θ)f(x|θ).similar-to𝜃𝑥𝜋𝜃𝑓conditional𝑥𝜃(\theta,x)\sim\pi(\theta)f(x\,|\,\theta).( italic_θ , italic_x ) ∼ italic_π ( italic_θ ) italic_f ( italic_x | italic_θ ) . Such optimal inferences are referred to as Bayes rules and they generally exist. A concern with MAP-based inferences is that it is not clear that posterior probabilities do measure evidence in addition to measuring belief. A concern with decision-theory inferences is that, while the model and prior are checkable against the data through model checking and checking for prior-data conflict, it is not clear how to check the loss function which can be viewed as being a somewhat arbitrary choice. In both cases this renders such inferences of questionable validity for scientific applications.

Another approach to deriving Bayesian inferences is through relative belief. Relative belief refers to how beliefs change from a priori to a posteriori. This leads to a more natural approach to characterizing statistical evidence: since it is the data that leads to change in beliefs from a priori to a posteriori, it is this change that tells us whether evidence has been found in favor of or against some specific value ψ.𝜓\psi.italic_ψ . In essence, in this approach it is the a posteriori beliefs relative to the a priori beliefs that determine inferences and not just a posteriori beliefs alone. Also, a loss function plays no role in determining the inferences. Using relative belief as the basis for deriving inferences produces statistical methodology with a number of attractive features.

A historical theme in statistical research has been to seek an acceptable definition of statistical evidence and, once found, use this to derive inferences. For example, this is the focus of much of the work of Alan Birnbaum, see Birnbaum (1962), who sought such a definition within the context of frequentist inference. Frequentist theory, as opposed to Bayesian theory, uses the ingredients ({f(|θ):θΘ},x)(\{f(\cdot\,|\,\theta):\theta\in\Theta\},x)( { italic_f ( ⋅ | italic_θ ) : italic_θ ∈ roman_Θ } , italic_x ) together with the idea that inferences should be graded according to their behavior in hypothetical repeated sampling experiments and, hopefully this would lead to a prescription of the inferences. Despite some impressive accomplishments, it is fair to say that Birnbaum’s program did not succeed as there is still no such generally acceptable definition of statistical evidence within the frequentist context. The pure likelihood theory of Royall (1997) is also concerned with basing inference on a definition of statistical evidence, and also uses just the ingredients ({f(|θ):θΘ},x).(\{f(\cdot\,|\,\theta):\theta\in\Theta\},x).( { italic_f ( ⋅ | italic_θ ) : italic_θ ∈ roman_Θ } , italic_x ) . Pure likelihood theory invokes the likelihood principle to assert that the likelihood function itself is the appropriate characterization of statistical evidence and bases all inferences on the likelihood with no appeal to repeated sampling. Frequency theory and likelihood theory have some appealing characteristics, but both leave gaps in their approach to statistical reasoning. In particular, inferences for marginal parameters ψ=Ψ(θ)𝜓Ψ𝜃\psi=\Psi(\theta)italic_ψ = roman_Ψ ( italic_θ ) can be problematical. These issues are discussed in more detail in Evans (2024).

While not concerned directly with statistical evidence, Bayesian decision theory has some obvious virtues. In particular, there is an axiomatization due to Savage (1971). These axioms suggest that to not follow this path in carrying out a statistical analysis is to commit an error. One may not find the specific axioms of Savage acceptable, but it is difficult to argue that the subject of Statistics does not need such an axiomatic formulation as otherwise almost any statistical analysis seems justifiable.

The addition of a prior is what leads to a clear definition of statistical evidence and so, provided one accepts the usage of priors, relative belief essentially solves Birnbaum’s problem. The purpose of this paper is to review and extend results that show that relative belief inferences can be considered as arising within the context of Bayesian decision theory. This, of course requires the use of a loss function and it will be seen that the loss functions that are used are checkable against the data and so appropriate for scientific applications. So these inferences satisfy two of the great themes of statistical research over the years, namely, they are evidence based and yet justifiable within the context of decision theory.

It can be shown, for example, see Bernardo and Smith (2000), that MAP inferences arise as the limits of Bayes rules via a sequence of loss functions

Lλ(θ,ψ)=IBλc(ψ)(Ψ(θ))subscript𝐿𝜆𝜃𝜓subscript𝐼superscriptsubscript𝐵𝜆𝑐𝜓Ψ𝜃L_{\lambda}(\theta,\psi)=I_{B_{{}_{\lambda}}^{c}(\psi)}(\Psi(\theta))italic_L start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_θ , italic_ψ ) = italic_I start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_λ end_FLOATSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_ψ ) end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) (1)

where λ>0𝜆0\lambda>0italic_λ > 0 and Bλ(Ψ(θ))subscript𝐵𝜆Ψ𝜃B_{\lambda}(\Psi(\theta))italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) is the ball of radius λ𝜆\lambdaitalic_λ centered at ψ.𝜓\psi.italic_ψ . These inferences, however, are not invariant under reparameterizations. It is shown here that relative belief inferences also arise via a sequence of loss functions similar to (1) but based on the prior and these inferences are invariant. In general, Bayes rules will also not be invariant under reparameterizations. Robert (1996) proposed using the intrinsic loss function based on a measure of distance between sampling distributions as Bayes rules with respect to such losses are invariant. Bernardo (2005) proposed using the intrinsic loss function based on the Kullback-Leibler divergence KL(fθ,fθ)𝐾𝐿subscript𝑓𝜃subscript𝑓superscript𝜃KL(f_{\theta},f_{\theta^{\prime}})italic_K italic_L ( italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) between f(|θ)f(\cdot\,|\,\theta)italic_f ( ⋅ | italic_θ ) and f(|θ).f(\cdot\,|\,\theta^{\prime}).italic_f ( ⋅ | italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) . When ψ=θ𝜓𝜃\psi=\thetaitalic_ψ = italic_θ the intrinsic loss function is given by L(θ,θ)=min(KL(f(|θ),f(|θ)),KL(f(|θ),f(|θ))).L(\theta,\theta^{\prime})=\min(KL(f(\cdot\,|\,\theta),f(\cdot\,|\,\theta^{% \prime})),KL(f(\cdot\,|\,\theta^{\prime}),f(\cdot\,|\,\theta))).italic_L ( italic_θ , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = roman_min ( italic_K italic_L ( italic_f ( ⋅ | italic_θ ) , italic_f ( ⋅ | italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) , italic_K italic_L ( italic_f ( ⋅ | italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , italic_f ( ⋅ | italic_θ ) ) ) . For a general marginal parameter ψ𝜓\psiitalic_ψ the intrinsic loss function is L(θ,ψ)=infθΨ1{ψ}L(θ,θ).𝐿𝜃𝜓subscriptinfimumsuperscript𝜃superscriptΨ1𝜓𝐿𝜃superscript𝜃L(\theta,\psi)=\inf_{\theta^{\prime}\in\Psi^{-1}\{\psi\}}L(\theta,\theta^{% \prime}).italic_L ( italic_θ , italic_ψ ) = roman_inf start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ roman_Ψ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT { italic_ψ } end_POSTSUBSCRIPT italic_L ( italic_θ , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) . These loss functions are intrinsic because they are based on the sampling model alone and they are checkable via model checking.

Another possibility for an intrinsic loss function is to base the loss function on the prior and this is in essence how the loss function arises in the relative belief context. It is to be stressed, however, that the essential ingredient of this approach is the clear characterization of what is meant by statistical evidence and the loss function approach is not essential for its justification. It is, however, a satisfying result that relative belief can be placed into the decision-theoretic context with the loss being checkable via checking for prior-data conflict. Furthermore, the loss function used has some direct appeal.

In some contexts relative belief inferences are Bayes rules, but in a general context they are seen to arise as the limits of Bayes rules. This approach has some historical antecedents. For example, in Le Cam (1953) it is shown that the MLE is asymptotically Bayes but this is for a fixed loss function, with increasing amounts of data and a sequence of priors. In the context discussed here there is a fixed amount of data, a fixed model and prior but there is a sequence of loss functions all based on the single fixed prior.

While it is preferable in many applications to state the inferences solely based on the evidence in the data, one can still consider inferences that possess some kind of optimality with respect to loss. Any discrepancy can then be justified based on particular characteristics of the application, e.g., evidence is obtained that a drug generally prevents the progression of a disease but the expense and side effects are too great to warrant its usage. So it is not being suggested here that decision-theoretic inferences are not relevant as indeed the concept of utility or loss is a significant component of many applications.

Section 2 is concerned with describing the general characteristics of the three approaches to deriving Bayesian inferences. Section 3 shows how relative belief estimation and prediction inferences can be seen to arise from decision theory and Section 4 does this for credible regions and hypothesis assessment. Throughout the paper the probability measures associated with a density are denoted by the same letter but capitalized. All proofs of theorems and corollaries are in the Appendix excepting the case where Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is finite as these are quite straightforward and supply motivation for the more complicated contexts. The overall goal of the paper is to show that relative belief inferences can arise through decision-theoretic considerations even though their primary motivation is through characterizing statistical evidence. In particular, it is shown here that relative belief estimators, as used in practice, are admissible. Some of the discussion here has appeared in the book Evans (2015) and is included to provide a complete exposition of this relationship.

2 Bayesian inference

The various approaches to deriving Bayesian inferences are now described in some detail.

2.1 MAP inferences

The highest posterior density (hpd), or MAP-based, approach to determining inferences constructs credible regions of the form

Hγ(x)={ψ:πΨ(ψ|x)hγ(x)}subscript𝐻𝛾𝑥conditional-set𝜓subscript𝜋Ψconditional𝜓𝑥subscript𝛾𝑥H_{\gamma}(x)=\{\psi:\pi_{\Psi}(\psi\,|\,x)\geq h_{\gamma}(x)\}italic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_h start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) } (2)

where πΨ(|x)\pi_{\Psi}(\cdot\,|\,x)italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( ⋅ | italic_x ) is the marginal posterior density with respect to a support measure νΨsubscript𝜈Ψ\nu_{\Psi}italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT on Ψ(Θ),ΨΘ\Psi(\Theta),roman_Ψ ( roman_Θ ) , and hγ(x)subscript𝛾𝑥h_{\gamma}(x)italic_h start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) is chosen so that hγ(x)=sup{k:ΠΨ({ψ:πΨ(ψ|x)k}|x)γ}.subscript𝛾𝑥supremumconditional-set𝑘subscriptΠΨconditionalconditional-set𝜓subscript𝜋Ψconditional𝜓𝑥𝑘𝑥𝛾h_{\gamma}(x)=\sup\{k:\Pi_{\Psi}(\{\psi:\pi_{\Psi}(\psi\,|\,x)\geq k\}\,|\,x)% \geq\gamma\}.italic_h start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = roman_sup { italic_k : roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( { italic_ψ : italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_k } | italic_x ) ≥ italic_γ } . It follows from (2) that, to assess the hypothesis H0:Ψ(θ)=ψ0,:subscript𝐻0Ψ𝜃subscript𝜓0H_{0}:\Psi(\theta)=\psi_{0},italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : roman_Ψ ( italic_θ ) = italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , then we can use the tail probability given by 1inf{γ:ψ0Hγ(x)}.1infimumconditional-set𝛾subscript𝜓0subscript𝐻𝛾𝑥1-\inf\{\gamma:\psi_{0}\in H_{\gamma}(x)\}.1 - roman_inf { italic_γ : italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ italic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) } . Furthermore, the class of sets Hγ(x)subscript𝐻𝛾𝑥H_{\gamma}(x)italic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) is naturally ”centered” at the posterior mode (when it exists uniquely) as Hγ(x)subscript𝐻𝛾𝑥H_{\gamma}(x)italic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) converges to this point as γ0.𝛾0\gamma\rightarrow 0.italic_γ → 0 . The use of the posterior mode as an estimator is commonly referred to as MAP (maximum a posteriori) estimation. We can then think of the size of the set Hγ(x),subscript𝐻𝛾𝑥H_{\gamma}(x),italic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) , say for γ=0.95,𝛾0.95\gamma=0.95,italic_γ = 0.95 , as a measure of how accurate the MAP estimator is in a given context. Furthermore, when Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is an open subset of a Euclidean space, then Hγ(x)subscript𝐻𝛾𝑥H_{\gamma}(x)italic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) minimizes volume among all γ𝛾\gammaitalic_γ-credible regions.

It is well-known, however, that hpd inferences suffer from a defect. In particular, in the continuous case MAP inferences are not invariant under reparameterizations. For example, this means that if ψMAP(x)subscript𝜓𝑀𝐴𝑃𝑥\psi_{MAP}(x)italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_x ) is the MAP estimate of ψ𝜓\psiitalic_ψ, then it is not necessarily true that Υ(ψMAP(x))Υsubscript𝜓𝑀𝐴𝑃𝑥\Upsilon(\psi_{MAP}(x))roman_Υ ( italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_x ) ) is the MAP estimate of τ=Υ(ψ)𝜏Υ𝜓\tau=\Upsilon(\psi)italic_τ = roman_Υ ( italic_ψ ) when ΥΥ\Upsilonroman_Υ is a 1-1, smooth transformation. The noninvariance of a statistical procedure seems very unnatural as it implies that the statistical analysis depends on the parameterization and typically there does not seem to be a good reason for this. Note too that estimates based upon taking posterior expectations will also suffer from this lack of invariance. It is also the case that MAP inferences are not based on a direct characterization of statistical evidence. Both of these issues motivate the development of relative belief inferences.

2.2 Bayesian decision theory

An ingredient that is commonly added to ({f(|θ):θΘ},π,x)(\{f(\cdot\,|\,\theta):\theta\in\Theta\},\pi,x)( { italic_f ( ⋅ | italic_θ ) : italic_θ ∈ roman_Θ } , italic_π , italic_x ) is a loss function, namely, L:Θ×Ψ(Θ)[0,):𝐿ΘΨΘ0L:\Theta\times\Psi(\Theta)\rightarrow[0,\infty)italic_L : roman_Θ × roman_Ψ ( roman_Θ ) → [ 0 , ∞ ) satisfying L(θ,ψ)=L(θ,ψ)𝐿𝜃𝜓𝐿superscript𝜃𝜓L(\theta,\psi)=L(\theta^{\prime},\psi)italic_L ( italic_θ , italic_ψ ) = italic_L ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_ψ ) whenever Ψ(θ)=Ψ(θ)Ψ𝜃Ψsuperscript𝜃\Psi(\theta)=\Psi(\theta^{\prime})roman_Ψ ( italic_θ ) = roman_Ψ ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and L(θ,ψ)=0𝐿𝜃𝜓0L(\theta,\psi)=0italic_L ( italic_θ , italic_ψ ) = 0 only when ψ=Ψ(θ).𝜓Ψ𝜃\psi=\Psi(\theta).italic_ψ = roman_Ψ ( italic_θ ) . The goal is to find a procedure, say δ(x)Ψ(Θ),𝛿𝑥ΨΘ\delta(x)\in\Psi(\Theta),italic_δ ( italic_x ) ∈ roman_Ψ ( roman_Θ ) , that in some sense minimizes the loss L(θ,δ(x))𝐿𝜃𝛿𝑥L(\theta,\delta(x))italic_L ( italic_θ , italic_δ ( italic_x ) ) based on the joint distribution of (θ,x).𝜃𝑥(\theta,x).( italic_θ , italic_x ) . Given the assumptions on the loss function the loss function can instead be thought of as a map L:Ψ(Θ)×Ψ(Θ)[0,):𝐿ΨΘΨΘ0L:\Psi(\Theta)\times\Psi(\Theta)\rightarrow[0,\infty)italic_L : roman_Ψ ( roman_Θ ) × roman_Ψ ( roman_Θ ) → [ 0 , ∞ ) with L(ψ,ψ)=0𝐿𝜓superscript𝜓0L(\psi,\psi^{\prime})=0\,italic_L ( italic_ψ , italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = 0iff ψ=ψ𝜓superscript𝜓\psi=\psi^{\prime}italic_ψ = italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and the ingredients can be represented as ({m(|ψ):ψΨ(Θ)},πΨ,L,x).(\{m(\cdot\,|\,\psi):\psi\in\Psi(\Theta)\},\pi_{\Psi},L,x).( { italic_m ( ⋅ | italic_ψ ) : italic_ψ ∈ roman_Ψ ( roman_Θ ) } , italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT , italic_L , italic_x ) .

The goal of a decision analysis is then to find a decision function δ:𝒳Ψ(Θ):𝛿𝒳ΨΘ\delta:\mathcal{X}\rightarrow\Psi(\Theta)italic_δ : caligraphic_X → roman_Ψ ( roman_Θ ) that minimizes the prior risk

r(δ)=Ψ(Θ)𝒳L(ψ,δ(x))M(dx|ψ)ΠΨ(dψ)=𝒳r(δ|x)M(dx)𝑟𝛿subscriptΨΘsubscript𝒳𝐿𝜓𝛿𝑥𝑀conditional𝑑𝑥𝜓subscriptΠΨ𝑑𝜓subscript𝒳𝑟conditional𝛿𝑥𝑀𝑑𝑥r(\delta)=\int_{\Psi(\Theta)}\int_{\mathcal{X}}L(\psi,\delta(x))\,M(dx\,|\,% \psi)\,\Pi_{\Psi}(d\psi)=\int_{\mathcal{X}}r(\delta\,|\,x)\,M(dx)\,italic_r ( italic_δ ) = ∫ start_POSTSUBSCRIPT roman_Ψ ( roman_Θ ) end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_L ( italic_ψ , italic_δ ( italic_x ) ) italic_M ( italic_d italic_x | italic_ψ ) roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ ) = ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_r ( italic_δ | italic_x ) italic_M ( italic_d italic_x )

where r(δ|x)=Ψ(Θ)L(ψ,δ(x))ΠΨ(dψ|x)𝑟conditional𝛿𝑥subscriptΨΘ𝐿𝜓𝛿𝑥subscriptΠΨconditional𝑑𝜓𝑥r(\delta\,|\,x)=\int_{\Psi(\Theta)}L(\psi,\delta(x))\,\Pi_{\Psi}(d\psi\,|\,x)italic_r ( italic_δ | italic_x ) = ∫ start_POSTSUBSCRIPT roman_Ψ ( roman_Θ ) end_POSTSUBSCRIPT italic_L ( italic_ψ , italic_δ ( italic_x ) ) roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ | italic_x ) is the posterior risk. Such a δ𝛿\deltaitalic_δ is called a Bayes rule and clearly a δ𝛿\deltaitalic_δ that minimizes r(δ|x)𝑟conditional𝛿𝑥r(\delta\,|\,x)italic_r ( italic_δ | italic_x ) for each x𝑥xitalic_x is a Bayes rule. Further discussion of Bayesian decision theory can be found in Berger (1985).

As noted in Bernardo (2005) a decision formulation also leads to credible regions for ψ,𝜓\psi,italic_ψ , namely, a γ𝛾\gammaitalic_γ-lowest posterior loss credible region is defined by

Dγ(x)={ψ:r(ψ|x)dγ(x)}subscript𝐷𝛾𝑥conditional-set𝜓𝑟conditional𝜓𝑥subscript𝑑𝛾𝑥D_{\gamma}(x)=\{\psi:r(\psi\,|\,x)\leq d_{\gamma}(x)\}italic_D start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_r ( italic_ψ | italic_x ) ≤ italic_d start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) } (3)

where dγ(x)=inf{k:ΠΨ({ψ:r(ψ|x)k}|x)γ.d_{\gamma}(x)=\inf\{k:\Pi_{\Psi}(\{\psi:r(\psi\,|\,x)\leq k\}\,|\,x)\geq\gamma.italic_d start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = roman_inf { italic_k : roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( { italic_ψ : italic_r ( italic_ψ | italic_x ) ≤ italic_k } | italic_x ) ≥ italic_γ . Note that ψ𝜓\psiitalic_ψ in (3) is interpreted as the decision function that takes the value ψ𝜓\psiitalic_ψ constantly in x.𝑥x.italic_x . Clearly as γ0𝛾0\gamma\rightarrow 0italic_γ → 0 the set Dγ(x)subscript𝐷𝛾𝑥D_{\gamma}(x)italic_D start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) converges to the value of a Bayes rule at x.𝑥x.italic_x . For example, with quadratic loss the Bayes rule is given by the posterior mean and a γ𝛾\gammaitalic_γ-lowest posterior loss region is the smallest sphere centered at the mean containing at least γ𝛾\gammaitalic_γ of the posterior probability.

2.3 Relative belief inferences

Relative belief inferences, like MAP inferences, are based on the ingredients ({m(|ψ):ψΨ(Θ)},πΨ,x).(\{m(\cdot\,|\,\psi):\psi\in\Psi(\Theta)\},\pi_{\Psi},x).( { italic_m ( ⋅ | italic_ψ ) : italic_ψ ∈ roman_Ψ ( roman_Θ ) } , italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT , italic_x ) . Note that underlying both approaches is the principle (axiom) of conditional probability that says that initial beliefs about ψ,𝜓\psi,italic_ψ , as expressed by the prior πΨ,subscript𝜋Ψ\pi_{\Psi},italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT , must be replaced by conditional beliefs as expressed by he posterior πΨ(|x).\pi_{\Psi}(\cdot\,|\,x).italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( ⋅ | italic_x ) . In this approach, however, a measure of statistical evidence is used given by the relative belief ratio

RBΨ(ψ|x)=πΨ(ψ|x)πΨ(ψ)=m(x|ψ)m(x).𝑅subscript𝐵Ψconditional𝜓𝑥subscript𝜋Ψconditional𝜓𝑥subscript𝜋Ψ𝜓𝑚conditional𝑥𝜓𝑚𝑥RB_{\Psi}(\psi\,|\,x)=\frac{\pi_{\Psi}(\psi\,|\,x)}{\pi_{\Psi}(\psi)}=\frac{m(% x\,|\,\psi)}{m(x)}.italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) = divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) end_ARG = divide start_ARG italic_m ( italic_x | italic_ψ ) end_ARG start_ARG italic_m ( italic_x ) end_ARG . (4)

The relative belief ratio produces the following conclusions: if RBΨ(ψ|x)>1,𝑅subscript𝐵Ψconditional𝜓𝑥1RB_{\Psi}(\psi\,|\,x)>1,italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) > 1 , then there is evidence in favor of ψ𝜓\psiitalic_ψ being the true value, if RBΨ(ψ|x)<1,𝑅subscript𝐵Ψconditional𝜓𝑥1RB_{\Psi}(\psi\,|\,x)<1,italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) < 1 , there is evidence against ψ𝜓\psiitalic_ψ being the true value and if RBΨ(ψ|x)=1,𝑅subscript𝐵Ψconditional𝜓𝑥1RB_{\Psi}(\psi\,|\,x)=1,italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) = 1 , then there is no evidence either way. These implications follow from a very simple principle of inference.

Principle of Evidence: for probability model (Ω,,P),Ω𝑃(\Omega,\mathcal{F},P),( roman_Ω , caligraphic_F , italic_P ) , if C𝐶C\in\mathcal{F}italic_C ∈ caligraphic_F is observed to be true where P(C)>0,𝑃𝐶0P(C)>0,italic_P ( italic_C ) > 0 , then there is evidence in favor of A𝐴A\in\mathcal{F}italic_A ∈ caligraphic_F being true if P(A|C)>P(A),𝑃conditional𝐴𝐶𝑃𝐴P(A\,|\,C)>P(A),italic_P ( italic_A | italic_C ) > italic_P ( italic_A ) , evidence against A𝐴A\in\mathcal{F}italic_A ∈ caligraphic_F being true if P(A|C)<P(A)𝑃conditional𝐴𝐶𝑃𝐴P(A\,|\,C)<P(A)italic_P ( italic_A | italic_C ) < italic_P ( italic_A ) and no evidence either way if P(A|C)=P(A).𝑃conditional𝐴𝐶𝑃𝐴P(A\,|\,C)=P(A).italic_P ( italic_A | italic_C ) = italic_P ( italic_A ) .

This principle seems obvious when ΠΨsubscriptΠΨ\Pi_{\Psi}roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT is a discrete probability measure. For the continuous case, where ΠΨ({ψ})=0,subscriptΠΨ𝜓0\Pi_{\Psi}(\{\psi\})=0,roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( { italic_ψ } ) = 0 , let Nϵ(ψ)subscript𝑁italic-ϵ𝜓N_{\epsilon}(\psi)italic_N start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( italic_ψ ) be a sequence of neighborhoods of ψ𝜓\psiitalic_ψ converging nicely to ψ𝜓\psiitalic_ψ as ϵ0italic-ϵ0\epsilon\rightarrow 0italic_ϵ → 0 (see Rudin (1974)), then under weak conditions, e.g., πΨsubscript𝜋Ψ\pi_{\Psi}italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT is continuous and positive at ψ,𝜓\psi,italic_ψ ,

limϵ0RBΨ(Nϵ(ψ)|x)=limϵ0ΠΨ(Nϵ(ψ)|x)ΠΨ(Nϵ(ψ))=πΨ(ψ|x)πΨ(ψ)=RBΨ(ψ|x)subscriptitalic-ϵ0𝑅subscript𝐵Ψconditionalsubscript𝑁italic-ϵ𝜓𝑥subscriptitalic-ϵ0subscriptΠΨconditionalsubscript𝑁italic-ϵ𝜓𝑥subscriptΠΨsubscript𝑁italic-ϵ𝜓subscript𝜋Ψconditional𝜓𝑥subscript𝜋Ψ𝜓𝑅subscript𝐵Ψconditional𝜓𝑥\lim_{\epsilon\rightarrow 0}RB_{\Psi}(N_{\epsilon}(\psi)\,|\,x)=\lim_{\epsilon% \rightarrow 0}\frac{\Pi_{\Psi}(N_{\epsilon}(\psi)\,|\,x)}{\Pi_{\Psi}(N_{% \epsilon}(\psi))}=\frac{\pi_{\Psi}(\psi\,|\,x)}{\pi_{\Psi}(\psi)}=RB_{\Psi}(% \psi\,|\,x)roman_lim start_POSTSUBSCRIPT italic_ϵ → 0 end_POSTSUBSCRIPT italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_N start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( italic_ψ ) | italic_x ) = roman_lim start_POSTSUBSCRIPT italic_ϵ → 0 end_POSTSUBSCRIPT divide start_ARG roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_N start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( italic_ψ ) | italic_x ) end_ARG start_ARG roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_N start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( italic_ψ ) ) end_ARG = divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) end_ARG = italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x )

and this justifies the general interpretation of RBΨ(ψ|x)𝑅subscript𝐵Ψconditional𝜓𝑥RB_{\Psi}(\psi\,|\,x)italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) as a measure of evidence. The relative belief ratio determines the inferences.

A natural estimate of ψ𝜓\psiitalic_ψ is the relative belief estimate

ψRB(x)=argsupψRBΨ(ψ|x)subscript𝜓𝑅𝐵𝑥subscriptsupremum𝜓𝑅subscript𝐵Ψconditional𝜓𝑥\psi_{RB}(x)=\arg\sup_{\psi}RB_{\Psi}(\psi\,|\,x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) = roman_arg roman_sup start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x )

as it has the maximum evidence in favor. To assess the accuracy of ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) there is the plausible region PlΨ(x)={ψ:RBΨ(ψ|x)>1},𝑃subscript𝑙Ψ𝑥conditional-set𝜓𝑅subscript𝐵Ψconditional𝜓𝑥1Pl_{\Psi}(x)=\{\psi:RB_{\Psi}(\psi\,|\,x)>1\},italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) > 1 } , the set of ψ𝜓\psiitalic_ψ values having evidence in favor of being the true value. The size of PlΨ(x)𝑃subscript𝑙Ψ𝑥Pl_{\Psi}(x)italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) together with its posterior content ΠΨ(PlΨ(x)|x),subscriptΠΨconditional𝑃subscript𝑙Ψ𝑥𝑥\Pi_{\Psi}(Pl_{\Psi}(x)\,|\,x),roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) , which measures the belief that the true value is in PlΨ(x),𝑃subscript𝑙Ψ𝑥Pl_{\Psi}(x),italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) , provide the assessment of the accuracy. So, if PlΨ(x)𝑃subscript𝑙Ψ𝑥Pl_{\Psi}(x)italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) is ”small” and ΠΨ(PlΨ(x)|x)1,subscriptΠΨconditional𝑃subscript𝑙Ψ𝑥𝑥1\Pi_{\Psi}(Pl_{\Psi}(x)\,|\,x)\approx 1,roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) ≈ 1 , then ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) is to be considered as an accurate estimate of ψ𝜓\psiitalic_ψ but not otherwise. A relative belief γ𝛾\gammaitalic_γ-credible region

CΨ,γ(x)={ψ:RBΨ(ψ|x)cγ(x)},subscript𝐶Ψ𝛾𝑥conditional-set𝜓𝑅subscript𝐵Ψconditional𝜓𝑥subscript𝑐𝛾𝑥C_{\Psi,\gamma}(x)=\{\psi:RB_{\Psi}(\psi\,|\,x)\geq c_{\gamma}(x)\},italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) } ,

where cγ(x)=sup{c:ΠΨ(RBΨ(ψ|x)c|x)γ},subscript𝑐𝛾𝑥supremumconditional-set𝑐subscriptΠΨ𝑅subscript𝐵Ψconditional𝜓𝑥conditional𝑐𝑥𝛾c_{\gamma}(x)=\sup\{c:\Pi_{\Psi}(RB_{\Psi}(\psi\,|\,x)\geq c\,|\,x)\geq\gamma\},italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = roman_sup { italic_c : roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_c | italic_x ) ≥ italic_γ } , for ψ𝜓\psiitalic_ψ can also be quoted provided CΨ,γ(x)PlΨ(x).subscript𝐶Ψ𝛾𝑥𝑃subscript𝑙Ψ𝑥C_{\Psi,\gamma}(x)\subset Pl_{\Psi}(x).italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) ⊂ italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) . The containment is necessary as otherwise CΨ,γ(x)subscript𝐶Ψ𝛾𝑥C_{\Psi,\gamma}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) would contain a value ψ𝜓\psiitalic_ψ for which there is evidence against ψ𝜓\psiitalic_ψ being the true value.

For assessing the hypothesis H0:Ψ(θ)=ψ0,:subscript𝐻0Ψ𝜃subscript𝜓0H_{0}:\Psi(\theta)=\psi_{0},italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : roman_Ψ ( italic_θ ) = italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , the value RBΨ(ψ0|x)𝑅subscript𝐵Ψconditionalsubscript𝜓0𝑥RB_{\Psi}(\psi_{0}\,|\,x)italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) indicates whether there is evidence in favor of or against H0.subscript𝐻0H_{0}.italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . The strength of this evidence can be measured by the posterior probability ΠΨ({ψ0}|x),subscriptΠΨconditionalsubscript𝜓0𝑥\Pi_{\Psi}(\{\psi_{0}\}\,|\,x),roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( { italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } | italic_x ) , as this measures the belief in what the evidence says. So, if RBΨ(ψ0|x)>1𝑅subscript𝐵Ψconditionalsubscript𝜓0𝑥1RB_{\Psi}(\psi_{0}\,|\,x)>1italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) > 1 and ΠΨ({ψ0}|x)1,subscriptΠΨconditionalsubscript𝜓0𝑥1\Pi_{\Psi}(\{\psi_{0}\}\,|\,x)\approx 1,roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( { italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } | italic_x ) ≈ 1 , then there is strong evidence that H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is true while, when RBΨ(ψ0|x)<1𝑅subscript𝐵Ψconditionalsubscript𝜓0𝑥1RB_{\Psi}(\psi_{0}\,|\,x)<1italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) < 1 and ΠΨ({ψ0}|x)0,subscriptΠΨconditionalsubscript𝜓0𝑥0\Pi_{\Psi}(\{\psi_{0}\}\,|\,x)\approx 0,roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( { italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } | italic_x ) ≈ 0 , there is strong evidence that H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is false. Since ΠΨ({ψ0})subscriptΠΨsubscript𝜓0\Pi_{\Psi}(\{\psi_{0}\})roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( { italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } ) can be small, even 0 in the continuous case, it makes more sense to measure the strength of the evidence in such a case by

StrΨ(ψ0|x)=ΠΨ(RBΨ(ψ|x)RBΨ(ψ0|x)|x).𝑆𝑡subscript𝑟Ψconditionalsubscript𝜓0𝑥subscriptΠΨ𝑅subscript𝐵Ψconditional𝜓𝑥conditional𝑅subscript𝐵Ψconditionalsubscript𝜓0𝑥𝑥Str_{\Psi}(\psi_{0}\,|\,x)=\Pi_{\Psi}(RB_{\Psi}(\psi\,|\,x)\leq RB_{\Psi}(\psi% _{0}\,|\,x)\,|\,x).italic_S italic_t italic_r start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) = roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≤ italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) | italic_x ) .

If RBΨ(ψ0|x)>1𝑅subscript𝐵Ψconditionalsubscript𝜓0𝑥1RB_{\Psi}(\psi_{0}\,|\,x)>1italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) > 1 and StrΨ(ψ0|x)1,𝑆𝑡subscript𝑟Ψconditionalsubscript𝜓0𝑥1Str_{\Psi}(\psi_{0}\,|\,x)\approx 1,italic_S italic_t italic_r start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) ≈ 1 , then the evidence is strong that ψ0subscript𝜓0\psi_{0}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the true values as there is small belief that the true value of ψ𝜓\psiitalic_ψ has more evidence in its favor than ψ0.subscript𝜓0\psi_{0}.italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . If RBΨ(ψ0|x)<1𝑅subscript𝐵Ψconditionalsubscript𝜓0𝑥1RB_{\Psi}(\psi_{0}\,|\,x)<1italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) < 1 and StrΨ(ψ0|x)0,𝑆𝑡subscript𝑟Ψconditionalsubscript𝜓0𝑥0Str_{\Psi}(\psi_{0}\,|\,x)\approx 0,italic_S italic_t italic_r start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) ≈ 0 , then the evidence is strong that ψ0subscript𝜓0\psi_{0}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is not the true values as there is large belief that the true value of ψ𝜓\psiitalic_ψ has more evidence in its favor than ψ0.subscript𝜓0\psi_{0}.italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . Actually, there is no reason to quote a single number to measure the strength and both ΠΨ({ψ0}|x)subscriptΠΨconditionalsubscript𝜓0𝑥\Pi_{\Psi}(\{\psi_{0}\}\,|\,x)roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( { italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } | italic_x ) and StrΨ(ψ0|x)𝑆𝑡subscript𝑟Ψconditionalsubscript𝜓0𝑥Str_{\Psi}(\psi_{0}\,|\,x)italic_S italic_t italic_r start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) can be quoted when relevant.

An important aspect of both StrΨ(ψ0|x)𝑆𝑡subscript𝑟Ψconditionalsubscript𝜓0𝑥Str_{\Psi}(\psi_{0}\,|\,x)italic_S italic_t italic_r start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) and ΠΨ(PlΨ(x)|x)subscriptΠΨconditional𝑃subscript𝑙Ψ𝑥𝑥\Pi_{\Psi}(Pl_{\Psi}(x)\,|\,x)roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) is what happens as the amount of data increases. To ensure that these behave appropriately, namely, StrΨ(ψ0|x)0(1)𝑆𝑡subscript𝑟Ψconditionalsubscript𝜓0𝑥01Str_{\Psi}(\psi_{0}\,|\,x)\rightarrow 0(1)italic_S italic_t italic_r start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) → 0 ( 1 ) when H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is false(true) and ΠΨ(PlΨ(x)|x)1,subscriptΠΨconditional𝑃subscript𝑙Ψ𝑥𝑥1\Pi_{\Psi}(Pl_{\Psi}(x)\,|\,x)\rightarrow 1,roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) → 1 , it is necessary to take into account the difference that matters δ.𝛿\delta.italic_δ . By this we mean that there is a distance measure dΨsubscript𝑑Ψd_{\Psi}italic_d start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT on Ψ(Θ)×Ψ(Θ)ΨΘΨΘ\Psi(\Theta)\times\Psi(\Theta)roman_Ψ ( roman_Θ ) × roman_Ψ ( roman_Θ ) such that if dΨ(ψ,ψ)δ,subscript𝑑Ψ𝜓superscript𝜓𝛿d_{\Psi}(\psi,\psi^{\prime})\leq\delta,italic_d start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ , italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≤ italic_δ , then in terms of the application, these values are considered equivalent. Such a δ𝛿\deltaitalic_δ always exists because measurements are always taken to finite accuracy. For example, if ψ𝜓\psiitalic_ψ is real-valued, then there is a grid of values ψ2,ψ1,ψ0,ψ1,ψ2,subscript𝜓2subscript𝜓1subscript𝜓0subscript𝜓1subscript𝜓2\ldots\psi_{-2},\psi_{-1},\psi_{0},\psi_{1},\psi_{2},\ldots… italic_ψ start_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … separated by δ𝛿\deltaitalic_δ and inferences are determined using the relative belief ratios of the intervals [ψiδ/2,ψi+δ/2).subscript𝜓𝑖𝛿2subscript𝜓𝑖𝛿2[\psi_{i}-\delta/2,\psi_{i}+\delta/2).[ italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_δ / 2 , italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_δ / 2 ) . In effect, H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is now H0:Ψ(θ)[ψ0δ/2,ψ0+δ/2).:subscript𝐻0Ψ𝜃subscript𝜓0𝛿2subscript𝜓0𝛿2H_{0}:\Psi(\theta)\in[\psi_{0}-\delta/2,\psi_{0}+\delta/2).italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : roman_Ψ ( italic_θ ) ∈ [ italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_δ / 2 , italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_δ / 2 ) . When the computations are carried out in this way then StrΨ(ψ0|x)𝑆𝑡subscript𝑟Ψconditionalsubscript𝜓0𝑥Str_{\Psi}(\psi_{0}\,|\,x)italic_S italic_t italic_r start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) and ΠΨ(PlΨ(x)|x)subscriptΠΨconditional𝑃subscript𝑙Ψ𝑥𝑥\Pi_{\Psi}(Pl_{\Psi}(x)\,|\,x)roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) do what is required. As a particular instance of this see the results in Section 4 where such a discretization plays a key role.

It is easy to see that the class of relative belief credible regions {CΨ,γ(x):γ[0,1]}conditional-setsubscript𝐶Ψ𝛾𝑥𝛾01\{C_{\Psi,\gamma}(x):\gamma\in[0,1]\}{ italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) : italic_γ ∈ [ 0 , 1 ] } for ψ𝜓\psiitalic_ψ is independent of the marginal prior πΨ.subscript𝜋Ψ\pi_{\Psi}.italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT . When a value γ[0,1]𝛾01\gamma\in[0,1]italic_γ ∈ [ 0 , 1 ] is specified, however, the set CΨ,γ(x)subscript𝐶Ψ𝛾𝑥C_{\Psi,\gamma}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) does depend on πΨsubscript𝜋Ψ\pi_{\Psi}italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT through cγ(x).subscript𝑐𝛾𝑥c_{\gamma}(x).italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) . So the form of relative belief inferences about ψ𝜓\psiitalic_ψ is completely robust to the choice of πΨsubscript𝜋Ψ\pi_{\Psi}italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT but the quantification of the uncertainty in the inferences is not. For example, when ψ=Ψ(θ)=θ,𝜓Ψ𝜃𝜃\psi=\Psi(\theta)=\theta,italic_ψ = roman_Ψ ( italic_θ ) = italic_θ , then θRB(x)subscript𝜃𝑅𝐵𝑥\theta_{RB}(x)italic_θ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) is the MLE while, in general, ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) is the maximizer of the integrated likelihood m(x|ψ).𝑚conditional𝑥𝜓m(x\,|\,\psi).italic_m ( italic_x | italic_ψ ) . Similarly, relative belief regions are likelihood regions in the case of the full parameter, and integrated likelihood regions generally. As such, likelihood regions can be seen as essentially Bayesian in character with a clear and precise characterization of evidence through the relative belief ratio and now have probability assignments through the posterior. It is the case, however, that a relative belief ratio RBΨ(ψ|x),𝑅subscript𝐵Ψconditional𝜓𝑥RB_{\Psi}(\psi\,|\,x),italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) , while proportional to an integrated likelihood, cannot be multiplied by an arbitrary positive constant, as with a likelihood, without losing its interpretation in measuring statistical evidence. It has been established in Al Labadi and Evans (2017) that relative belief inferences for ψ𝜓\psiitalic_ψ are optimally robust to the prior πΨ.subscript𝜋Ψ\pi_{\Psi}.italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT .

As can be seen from (4), relative belief inferences are always invariant under smooth reparameterizations and this is at least one reason why they are preferable to MAP inferences. It is the case, however, that any rule for measuring evidence which satisfies the principle of evidence also produces valid estimates as these lie in PlΨ(x)𝑃subscript𝑙Ψ𝑥Pl_{\Psi}(x)italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) and so will have the same ”accuracy” as ψRB(x).subscript𝜓𝑅𝐵𝑥\psi_{RB}(x).italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) . For example, if instead of the relative belief ratio the difference πΨ(ψ|x)πΨ(ψ)subscript𝜋Ψconditional𝜓𝑥subscript𝜋Ψ𝜓\pi_{\Psi}(\psi\,|\,x)-\pi_{\Psi}(\psi)italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) - italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) was used as the measure of evidence with cut-off 0, then this satisfies the principle of evidence but the estimate is no longer necessarily invariant under reparameterizations. The Bayes factor with cut-off 1 is also a valid measure of evidence but there are a number of reasons why the relative belief ratio is to be preferred to the Bayes factor for general inferences, see Al-Labadi, Alzaatreh and Evans (2024).

3 Estimation: discrete parameter space

The following theorem presents the basic definition of the loss function when ΨΨ\Psiroman_Ψ is finite and establishes an important optimality result. The indicator function for the set A𝐴Aitalic_A is denoted IA.subscript𝐼𝐴I_{A}.italic_I start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT .

Theorem 1. Suppose that πΨ(ψ)>0subscript𝜋Ψ𝜓0\pi_{\Psi}(\psi)>0italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) > 0 for every ψΨ(Θ)𝜓ΨΘ\psi\in\Psi(\Theta)italic_ψ ∈ roman_Ψ ( roman_Θ ) where Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is finite with νΨsubscript𝜈Ψ\nu_{\Psi}italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT equal to counting measure on Ψ(Θ).ΨΘ\Psi(\Theta).roman_Ψ ( roman_Θ ) . Then for the loss function

LRB(θ,ψ)=I{ψ}c(Ψ(θ))πΨ(Ψ(θ)),subscript𝐿𝑅𝐵𝜃𝜓subscript𝐼superscript𝜓𝑐Ψ𝜃subscript𝜋ΨΨ𝜃L_{RB}(\theta,\psi)=\frac{I_{\{\psi\}^{c}}(\Psi(\theta))}{\pi_{\Psi}(\Psi(% \theta))},italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_θ , italic_ψ ) = divide start_ARG italic_I start_POSTSUBSCRIPT { italic_ψ } start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) end_ARG , (5)

the relative belief estimator ψRBsubscript𝜓𝑅𝐵\psi_{RB}italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT is a Bayes rule.

Proof: We have that

r(δ|x)𝑟conditional𝛿𝑥\displaystyle r(\delta\,|\,x)italic_r ( italic_δ | italic_x ) =Ψ(Θ)I{δ(x)}c(ψ)πΨ(ψ)ΠΨ(dψ|x)absentsubscriptΨΘsubscript𝐼superscript𝛿𝑥𝑐𝜓subscript𝜋Ψ𝜓subscriptΠΨconditional𝑑𝜓𝑥\displaystyle=\int_{\Psi(\Theta)}\frac{I_{\{\delta(x)\}^{c}}(\psi)}{\pi_{\Psi}% (\psi)}\,\Pi_{\Psi}(d\psi\,|\,x)= ∫ start_POSTSUBSCRIPT roman_Ψ ( roman_Θ ) end_POSTSUBSCRIPT divide start_ARG italic_I start_POSTSUBSCRIPT { italic_δ ( italic_x ) } start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ψ ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) end_ARG roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ | italic_x )
=Ψ(Θ)RBΨ(ψ|x)νΨ(dψ)RBΨ(δ(x)|x).absentsubscriptΨΘ𝑅subscript𝐵Ψconditional𝜓𝑥subscript𝜈Ψ𝑑𝜓𝑅subscript𝐵Ψconditional𝛿𝑥𝑥\displaystyle=\int_{\Psi(\Theta)}RB_{\Psi}(\psi\,|\,x)\,\nu_{\Psi}(d\psi)-RB_{% \Psi}(\delta(x)\,|\,x).= ∫ start_POSTSUBSCRIPT roman_Ψ ( roman_Θ ) end_POSTSUBSCRIPT italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ ) - italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) | italic_x ) . (6)

Since Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is finite, the first term in (6) is finite and a Bayes rule at x𝑥xitalic_x is given by the value δ(x)𝛿𝑥\delta(x)italic_δ ( italic_x ) that maximizes the second term. Therefore, ψRBsubscript𝜓𝑅𝐵\psi_{RB}italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT is a Bayes rule. \blacksquare

The loss function LRBsubscript𝐿𝑅𝐵L_{RB}italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT seems very natural. For beliefs about the true value of ψ𝜓\psiitalic_ψ are expressed by the prior πΨsubscript𝜋Ψ\pi_{\Psi}italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT and so values where πΨ(ψ)subscript𝜋Ψ𝜓\pi_{\Psi}(\psi)italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) is very low and ψ𝜓\psiitalic_ψ is indeed a false value, would be quite misleading if the inferences pointed to such a value. So it is appropriate for such values to bear large losses. In a sense the statistician is acknowledging what such values are by the choice of prior. Of course, the prior may be wrong in the sense that the bulk of its mass is placed in a region where the true value of ψ𝜓\psiitalic_ψ does not lie. This is why checking for prior-data conflict, before inference is carried out, is always recommended. Procedures for checking a prior are discussed in Evans and Moshonov (2006) and Nott et al. (2020) and an approach to replacing a prior found to be at fault is developed in Evans and Jang (2011). The loss LRBsubscript𝐿𝑅𝐵L_{RB}italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT motivates the other losses for relative belief discussed here so this comment applies to those losses as well.

The prior risk of δ𝛿\deltaitalic_δ satisfies

r(δ)=Ψ(Θ)𝒳I{δ(x)}c(ψ)πΨ(ψ)M(dx|ψ)ΠΨ(dψ)=ψΨ(Θ)M(δ(x)ψ|ψ),𝑟𝛿subscriptΨΘsubscript𝒳subscript𝐼superscript𝛿𝑥𝑐𝜓subscript𝜋Ψ𝜓𝑀conditional𝑑𝑥𝜓subscriptΠΨ𝑑𝜓subscript𝜓ΨΘ𝑀𝛿𝑥conditional𝜓𝜓r(\delta)=\int_{\Psi(\Theta)}\int_{\mathcal{X}}\frac{I_{\{\delta(x)\}^{c}}(% \psi)}{\pi_{\Psi}(\psi)}\,M(dx\,|\,\psi)\,\Pi_{\Psi}(d\psi)=\sum_{\psi\in\Psi(% \Theta)}M(\delta(x)\neq\psi\,|\,\psi),italic_r ( italic_δ ) = ∫ start_POSTSUBSCRIPT roman_Ψ ( roman_Θ ) end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT divide start_ARG italic_I start_POSTSUBSCRIPT { italic_δ ( italic_x ) } start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ψ ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) end_ARG italic_M ( italic_d italic_x | italic_ψ ) roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ ) = ∑ start_POSTSUBSCRIPT italic_ψ ∈ roman_Ψ ( roman_Θ ) end_POSTSUBSCRIPT italic_M ( italic_δ ( italic_x ) ≠ italic_ψ | italic_ψ ) , (7)

the sum of the conditional prior error probabilities over all ψ𝜓\psiitalic_ψ values. If instead the loss function is taken to be LMAP(θ,ψ)=I{ψ}c(Ψ(θ)),subscript𝐿𝑀𝐴𝑃𝜃𝜓subscript𝐼superscript𝜓𝑐Ψ𝜃L_{MAP}(\theta,\psi)=I_{\{\psi\}^{c}}(\Psi(\theta)),italic_L start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_θ , italic_ψ ) = italic_I start_POSTSUBSCRIPT { italic_ψ } start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) , as in (1), then virtually the same proof as Theorem 1 establishes that ψMAPsubscript𝜓𝑀𝐴𝑃\psi_{MAP}italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT is a Bayes rule with respect to this loss and the prior risk equals

ψM(δ(x)ψ|ψ)πΨ(ψ),subscript𝜓𝑀𝛿𝑥conditional𝜓𝜓subscript𝜋Ψ𝜓\sum_{\psi}M(\delta(x)\neq\psi\,|\,\psi)\pi_{\Psi}(\psi),∑ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT italic_M ( italic_δ ( italic_x ) ≠ italic_ψ | italic_ψ ) italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) , (8)

the prior probability of making an error. Both LMAPsubscript𝐿𝑀𝐴𝑃L_{MAP}italic_L start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT and LRBsubscript𝐿𝑅𝐵L_{RB}italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT are two-valued loss functions but, when an incorrect decision is made, the loss is constant in Ψ(θ)Ψ𝜃\Psi(\theta)roman_Ψ ( italic_θ ) for LMAPsubscript𝐿𝑀𝐴𝑃L_{MAP}italic_L start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT while it equals the reciprocal of the prior probability of Ψ(θ)Ψ𝜃\Psi(\theta)roman_Ψ ( italic_θ ) for LRBsubscript𝐿𝑅𝐵L_{RB}italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT. So LRBsubscript𝐿𝑅𝐵L_{RB}italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT penalizes an incorrect decision much more severely when the true value of Ψ(θ)Ψ𝜃\Psi(\theta)roman_Ψ ( italic_θ ) is in the tails of the prior. Note that ψMAP=ψRBsubscript𝜓𝑀𝐴𝑃subscript𝜓𝑅𝐵\psi_{MAP}=\psi_{RB}italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT = italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT when ΠΨsubscriptΠΨ\Pi_{\Psi}roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT is uniform. It is seen too that (7) is an upper bound on (8) so controlling losses based on LRBsubscript𝐿𝑅𝐵L_{RB}italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT automatically controls the losses based on LMAP.subscript𝐿𝑀𝐴𝑃L_{MAP}.italic_L start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT .

As already noted, RBΨ(ψ|x)𝑅subscript𝐵Ψconditional𝜓𝑥RB_{\Psi}(\psi\,|\,x)italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) is proportional to the integrated likelihood of ψ.𝜓\psi.italic_ψ . So, under the conditions of Theorem 1, the maximum integrated likelihood estimator is a Bayes rule. Furthermore, the Bayes rule is the same for every choice of πΨsubscript𝜋Ψ\pi_{\Psi}italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT and only depends on the full prior through the conditional prior Π(|ψ)\Pi(\cdot\,|\,\psi)roman_Π ( ⋅ | italic_ψ ) placed on the nuisance parameters. When ψ=θ𝜓𝜃\psi=\thetaitalic_ψ = italic_θ then θRB(x)subscript𝜃𝑅𝐵𝑥\theta_{RB}(x)italic_θ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) is the MLE of θ𝜃\thetaitalic_θ and so the MLE of θ𝜃\thetaitalic_θ is a Bayes rule for every prior π.𝜋\pi.italic_π .

Note that when Ψ(Θ)={ψ0,ψ1}ΨΘsubscript𝜓0subscript𝜓1\Psi(\Theta)=\{\psi_{0},\psi_{1}\}roman_Ψ ( roman_Θ ) = { italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } then RBΨ(ψ0|x)>(<)1𝑅subscript𝐵Ψconditionalsubscript𝜓0𝑥1RB_{\Psi}(\psi_{0}\,|\,x)>(<)1italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) > ( < ) 1 iff RBΨ(ψ1|x)<(>)1𝑅subscript𝐵Ψconditionalsubscript𝜓1𝑥1RB_{\Psi}(\psi_{1}\,|\,x)<(>)1italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_x ) < ( > ) 1 so ψRB(x)=ψ0subscript𝜓𝑅𝐵𝑥subscript𝜓0\psi_{RB}(x)=\psi_{0}italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) = italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT when RBΨ(ψ0|x)>1𝑅subscript𝐵Ψconditionalsubscript𝜓0𝑥1RB_{\Psi}(\psi_{0}\,|\,x)>1italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) > 1 and ψRB(x)=ψ1subscript𝜓𝑅𝐵𝑥subscript𝜓1\psi_{RB}(x)=\psi_{1}italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) = italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT otherwise. This is the classical context for hypothesis testing and ψRB(x)=ψ0subscript𝜓𝑅𝐵𝑥subscript𝜓0\psi_{RB}(x)=\psi_{0}italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) = italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT can be viewed as acceptance of the hypothesis H0:θΨ1{ψ0}:subscript𝐻0𝜃superscriptΨ1subscript𝜓0H_{0}:\theta\in\Psi^{-1}\{\psi_{0}\}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : italic_θ ∈ roman_Ψ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT { italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } and ψRB(x)=ψ1subscript𝜓𝑅𝐵𝑥subscript𝜓1\psi_{RB}(x)=\psi_{1}italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) = italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as rejection of H0.subscript𝐻0H_{0}.italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . Theorem 1 establishes that relative belief provides a Bayes rule for the hypothesis testing problem.

The loss function (5) does not provide meaningful results when Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is infinite as (7) shows that r(δ)𝑟𝛿r(\delta)italic_r ( italic_δ ) will be infinite. So we modify (5) via a parameter η>0𝜂0\eta>0italic_η > 0 and define the loss function

LRB,η(θ,ψ)=I{ψ}c(Ψ(θ))max(η,πΨ(Ψ(θ))).subscript𝐿𝑅𝐵𝜂𝜃𝜓subscript𝐼superscript𝜓𝑐Ψ𝜃𝜂subscript𝜋ΨΨ𝜃L_{RB,\eta}(\theta,\psi)=\frac{I_{\{\psi\}^{c}}(\Psi(\theta))}{\max(\eta,\pi_{% \Psi}(\Psi(\theta)))}.italic_L start_POSTSUBSCRIPT italic_R italic_B , italic_η end_POSTSUBSCRIPT ( italic_θ , italic_ψ ) = divide start_ARG italic_I start_POSTSUBSCRIPT { italic_ψ } start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) end_ARG start_ARG roman_max ( italic_η , italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) ) end_ARG . (9)

Note that Lη,RBsubscript𝐿𝜂𝑅𝐵L_{\eta,RB}italic_L start_POSTSUBSCRIPT italic_η , italic_R italic_B end_POSTSUBSCRIPT is a bounded by 1/η.1𝜂1/\eta.1 / italic_η . This loss function is like (5) but does not allow for arbitrarily large losses. Without loss of generality we can restrict η𝜂\etaitalic_η to a sequence of values converging to 0.

Theorem 2. Suppose that πΨ(ψ)>0subscript𝜋Ψ𝜓0\pi_{\Psi}(\psi)>0italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) > 0 for every ψΨ(Θ),𝜓ΨΘ\psi\in\Psi(\Theta),italic_ψ ∈ roman_Ψ ( roman_Θ ) , that Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is countable with νΨsubscript𝜈Ψ\nu_{\Psi}italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT equal to counting measure and that ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) is the unique maximizer of RBΨ(ψ|x)𝑅subscript𝐵Ψconditional𝜓𝑥RB_{\Psi}(\psi\,|\,x)italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) for all x.𝑥x.italic_x . For the loss function (9) Bayes rule δη,subscript𝛿𝜂\delta_{\eta},italic_δ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT , then δη(x)ψRB(x)subscript𝛿𝜂𝑥subscript𝜓𝑅𝐵𝑥\delta_{\eta}(x)\rightarrow\psi_{RB}(x)italic_δ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_x ) → italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) as η0,𝜂0\eta\rightarrow 0,italic_η → 0 , for every x𝒳.𝑥𝒳x\in\mathcal{X}.italic_x ∈ caligraphic_X .

The proof of Theorem 2 also establishes the following results.

Corollary 3. For all sufficiently small η𝜂\etaitalic_η the value of a Bayes rule at x𝑥xitalic_x is given by ψRB(x).subscript𝜓𝑅𝐵𝑥\psi_{RB}(x).italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) .

The following is an immediate consequence of Theorem 1 and Corollary 3 as ψRBsubscript𝜓𝑅𝐵\psi_{RB}italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT is a Bayes rule.

Corollary 4. ψRBsubscript𝜓𝑅𝐵\psi_{RB}italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT is an admissible estimator with respect to the loss LRBsubscript𝐿𝑅𝐵L_{RB}italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT when Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is finite and the loss L=LRB,η,𝐿subscript𝐿𝑅𝐵𝜂L=L_{RB,\eta},italic_L = italic_L start_POSTSUBSCRIPT italic_R italic_B , italic_η end_POSTSUBSCRIPT , with η𝜂\etaitalic_η sufficiently small, when Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is countable.

In a general estimation problem δ𝛿\deltaitalic_δ is risk unbiased with respect to a loss function L𝐿Litalic_L if Eθ(L(θ,δ(x)))Eθ(L(θ,δ(x)))subscript𝐸𝜃𝐿superscript𝜃𝛿𝑥subscript𝐸𝜃𝐿𝜃𝛿𝑥E_{\theta}(L(\theta^{\prime},\delta(x)))\geq E_{\theta}(L(\theta,\delta(x)))italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_L ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_δ ( italic_x ) ) ) ≥ italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_L ( italic_θ , italic_δ ( italic_x ) ) ) for all θ,θΘ.superscript𝜃𝜃Θ\theta^{\prime},\theta\in\Theta.italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_θ ∈ roman_Θ . This says that on average δ(x)𝛿𝑥\delta(x)italic_δ ( italic_x ) is closer to the true value than any other value when we interpret L(θ,δ(x))𝐿𝜃𝛿𝑥L(\theta,\delta(x))italic_L ( italic_θ , italic_δ ( italic_x ) ) as a measure of distance between δ(x)𝛿𝑥\delta(x)italic_δ ( italic_x ) and Ψ(θ).Ψ𝜃\Psi(\theta).roman_Ψ ( italic_θ ) . A definition of Bayesian unbiasedness for δ𝛿\deltaitalic_δ with respect to  L𝐿Litalic_L is that

ΘΘEθ(L(θ,δ(x)))Π(dθ)Π(dθ)ΘEθ(L(θ,δ(x)))Π(dθ)=r(δ)subscriptΘsubscriptΘsubscript𝐸𝜃𝐿superscript𝜃𝛿𝑥Π𝑑𝜃Π𝑑superscript𝜃subscriptΘsubscript𝐸𝜃𝐿𝜃𝛿𝑥Π𝑑𝜃𝑟𝛿\int_{\Theta}\int_{\Theta}E_{\theta}(L(\theta^{\prime},\delta(x)))\,\Pi(d% \theta)\,\Pi(d\theta^{\prime})\geq\int_{\Theta}E_{\theta}(L(\theta,\delta(x)))% \,\Pi(d\theta)=r(\delta)∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_L ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_δ ( italic_x ) ) ) roman_Π ( italic_d italic_θ ) roman_Π ( italic_d italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≥ ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_L ( italic_θ , italic_δ ( italic_x ) ) ) roman_Π ( italic_d italic_θ ) = italic_r ( italic_δ )

as this retains the idea of being closer on average to the true value than a false value. Consider now a family of loss functions of the form

L(θ,ψ)=I{ψ}c(Ψ(θ))h(Ψ(θ))𝐿𝜃𝜓subscript𝐼superscript𝜓𝑐Ψ𝜃Ψ𝜃L(\theta,\psi)=I_{\{\psi\}^{c}}(\Psi(\theta))h(\Psi(\theta))italic_L ( italic_θ , italic_ψ ) = italic_I start_POSTSUBSCRIPT { italic_ψ } start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) italic_h ( roman_Ψ ( italic_θ ) ) (10)

where hhitalic_h is a nonnegative function satisfying Θh(Ψ(θ))Π(dθ)<subscriptΘΨ𝜃Π𝑑𝜃\int_{\Theta}h(\Psi(\theta))\,\Pi(d\theta)<\infty\ ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_h ( roman_Ψ ( italic_θ ) ) roman_Π ( italic_d italic_θ ) < ∞and  note that this includes LRBsubscript𝐿𝑅𝐵L_{RB}italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT and LMAPsubscript𝐿𝑀𝐴𝑃L_{MAP}italic_L start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT when Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is finite and LRB,η.subscript𝐿𝑅𝐵𝜂L_{RB,\eta}.italic_L start_POSTSUBSCRIPT italic_R italic_B , italic_η end_POSTSUBSCRIPT .

Theorem 5. If Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is finite or countable, then ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) is Bayesian unbiased under the loss function (10).

Suppose after observing x𝑥xitalic_x it is desired to predict a future (or concealed) value y𝒴𝑦𝒴y\in\mathcal{Y}italic_y ∈ caligraphic_Y where ygδ(θ)(y|x),similar-to𝑦subscript𝑔𝛿𝜃conditional𝑦𝑥y\sim g_{\delta(\theta)}(y\,|\,x),italic_y ∼ italic_g start_POSTSUBSCRIPT italic_δ ( italic_θ ) end_POSTSUBSCRIPT ( italic_y | italic_x ) , a density with respect to support measure μ𝒴subscript𝜇𝒴\mu_{\mathcal{Y}}italic_μ start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT on 𝒴,𝒴\mathcal{Y},caligraphic_Y , and it is assumed that the true value of θ𝜃\thetaitalic_θ in the model for x𝑥xitalic_x gives the true value of δ(θ).𝛿𝜃\delta(\theta).italic_δ ( italic_θ ) . The prior predictive density of y𝑦yitalic_y is given by q(y)=Θ𝒳π(θ)fθ(x)gδ(θ)(y|x)μ(dx)ν(dθ)𝑞𝑦subscriptΘsubscript𝒳𝜋𝜃subscript𝑓𝜃𝑥subscript𝑔𝛿𝜃conditional𝑦𝑥𝜇𝑑𝑥𝜈𝑑𝜃q(y)=\int_{\Theta}\int_{\mathcal{X}}\pi(\theta)f_{\theta}(x)g_{\delta(\theta)}% (y\,|\,x)\,\mu(dx)\,\nu(d\theta)italic_q ( italic_y ) = ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_π ( italic_θ ) italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) italic_g start_POSTSUBSCRIPT italic_δ ( italic_θ ) end_POSTSUBSCRIPT ( italic_y | italic_x ) italic_μ ( italic_d italic_x ) italic_ν ( italic_d italic_θ ) while the posterior predictive density is q(y|x)=Θπ(θ|x)gδ(θ)(y|x)ν(dθ).𝑞conditional𝑦𝑥subscriptΘ𝜋conditional𝜃𝑥subscript𝑔𝛿𝜃conditional𝑦𝑥𝜈𝑑𝜃q(y\,|\,x)=\int_{\Theta}\pi(\theta\,|\,x)g_{\delta(\theta)}(y\,|\,x)\,\nu(d% \theta).italic_q ( italic_y | italic_x ) = ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_π ( italic_θ | italic_x ) italic_g start_POSTSUBSCRIPT italic_δ ( italic_θ ) end_POSTSUBSCRIPT ( italic_y | italic_x ) italic_ν ( italic_d italic_θ ) . The relative belief ratio for a future value y𝑦yitalic_y is thus RBY(y|x)=q(y|x)/q(y)𝑅subscript𝐵𝑌conditional𝑦𝑥𝑞conditional𝑦𝑥𝑞𝑦RB_{Y}(y\,|\,x)=q(y\,|\,x)/q(y)italic_R italic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_y | italic_x ) = italic_q ( italic_y | italic_x ) / italic_q ( italic_y ) and the relative belief prediction, namely, the value maximizing RBY(|x),RB_{Y}(\cdot\,|\,x),italic_R italic_B start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( ⋅ | italic_x ) , is denoted yRB(x).subscript𝑦𝑅𝐵𝑥y_{RB}(x).italic_y start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) . When 𝒴𝒴\mathcal{Y}caligraphic_Y is finite then, with basically the same argument as in Theorem 1, yRBsubscript𝑦𝑅𝐵y_{RB}italic_y start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT is a Bayes rule under the loss function LRB(y,y)=I{y}c(y)/q(y).subscript𝐿𝑅𝐵𝑦superscript𝑦subscript𝐼superscript𝑦𝑐superscript𝑦𝑞𝑦L_{RB}(y,y^{\prime})=I_{\{y\}^{c}}(y^{\prime})/q(y).italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_I start_POSTSUBSCRIPT { italic_y } start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) / italic_q ( italic_y ) . Also, it can be proved that yRBsubscript𝑦𝑅𝐵y_{RB}italic_y start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT is a limit of Bayes rules when 𝒴𝒴\mathcal{Y}caligraphic_Y is countable.

Consider now a common application where Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is finite.

Example 1. Classification

For a classification problem there are k𝑘kitalic_k categories {ψ1,,ψk},subscript𝜓1subscript𝜓𝑘\{\psi_{1},\ldots,\psi_{k}\},{ italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } , prescribed by some function Ψ,Ψ\Psi,roman_Ψ , where πΨ(ψi)>0subscript𝜋Ψsubscript𝜓𝑖0\pi_{\Psi}(\psi_{i})>0italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) > 0 for each i.𝑖i.italic_i . Estimating ψ𝜓\psiitalic_ψ is then equivalent to classifying the data as having come from one of the distributions in the classes specified by Ψ1{ψi}.superscriptΨ1subscript𝜓𝑖\Psi^{-1}\{\psi_{i}\}.roman_Ψ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT { italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } . The standard Bayesian solution to this problem is to use ψMAP(x)subscript𝜓𝑀𝐴𝑃𝑥\psi_{MAP}(x)italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_x ) as the classifier. From (8) we have that ψMAP(x)subscript𝜓𝑀𝐴𝑃𝑥\psi_{MAP}(x)italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_x ) minimizes the prior probability of misclassification while from (7) ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) minimizes the sum of the probabilities of misclassification. The essence of the difference is that ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) treats the errors of misclassification equally while ψMAP(x)subscript𝜓𝑀𝐴𝑃𝑥\psi_{MAP}(x)italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_x ) weights the errors by their prior probabilities.

The following shows that minimizing the sum of the error probabilities is often more appropriate than minimizing the weighted sum. Suppose k=2𝑘2k=2italic_k = 2 and xsimilar-to𝑥absentx\simitalic_x ∼ Bernoulli(ψ0)subscript𝜓0(\psi_{0})( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) or xsimilar-to𝑥absentx\simitalic_x ∼ Bernoulli(ψ1)subscript𝜓1(\psi_{1})( italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) with π(ψ0)=1ϵ𝜋subscript𝜓01italic-ϵ\pi(\psi_{0})=1-\epsilonitalic_π ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = 1 - italic_ϵ and π(ψ1)=ϵ𝜋subscript𝜓1italic-ϵ\pi(\psi_{1})=\epsilonitalic_π ( italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_ϵ representing the known proportions of individuals either labelled coming from population 0 or 1. For example, consider ψ0subscript𝜓0\psi_{0}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as the probability of a positive diagnostic test for a disease in the nondiseased population while ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is this probability for the diseased population. Suppose that ψ0/ψ1subscript𝜓0subscript𝜓1\psi_{0}/\psi_{1}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is very small, indicating that the test is successful in identifying the disease while not yielding many false positives, and that ϵitalic-ϵ\epsilonitalic_ϵ is very small, so the disease is rare. The question then is to assign a randomly chosen individual to a population based on the results of their test.

The posterior is given by π(ψ0| 1)=ψ0(1ϵ)/(ψ0(1ϵ)+ψ1ϵ)𝜋conditionalsubscript𝜓01subscript𝜓01italic-ϵsubscript𝜓01italic-ϵsubscript𝜓1italic-ϵ\pi(\psi_{0}\,|\,1)=\psi_{0}(1-\epsilon)/(\psi_{0}(1-\epsilon)+\psi_{1}\epsilon)italic_π ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | 1 ) = italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_ϵ ) / ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 1 - italic_ϵ ) + italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ϵ ) and π(ψ0| 0)=(1ψ0)(1ϵ)/((1ψ0)(1ϵ)+(1ψ1)ϵ).𝜋conditionalsubscript𝜓0 01subscript𝜓01italic-ϵ1subscript𝜓01italic-ϵ1subscript𝜓1italic-ϵ\pi(\psi_{0}\,|\,0)=(1-\psi_{0})(1-\epsilon)/((1-\psi_{0})(1-\epsilon)+(1-\psi% _{1})\epsilon).italic_π ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | 0 ) = ( 1 - italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ( 1 - italic_ϵ ) / ( ( 1 - italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ( 1 - italic_ϵ ) + ( 1 - italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_ϵ ) . Therefore,

ψMAP(1)subscript𝜓𝑀𝐴𝑃1\displaystyle\psi_{MAP}(1)italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( 1 ) ={ψ0if ψ0/ψ1>ϵ/(1ϵ)ψ1otherwise\displaystyle=\left\{\begin{tabular}[c]{ll}$\psi_{0}$&if $\psi_{0}/\psi_{1}>% \epsilon/(1-\epsilon)$\\ $\psi_{1}$&otherwise\end{tabular}\ \right.= { start_ROW start_CELL italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL if italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_ϵ / ( 1 - italic_ϵ ) end_CELL end_ROW start_ROW start_CELL italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL otherwise end_CELL end_ROW
ψMAP(0)subscript𝜓𝑀𝐴𝑃0\displaystyle\psi_{MAP}(0)italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( 0 ) ={ψ0if (1ψ0)/(1ψ1)>ϵ/(1ϵ)ψ1otherwise\displaystyle=\left\{\begin{tabular}[c]{ll}$\psi_{0}$&if $(1-\psi_{0})/(1-\psi% _{1})>\epsilon/(1-\epsilon)$\\ $\psi_{1}$&otherwise\end{tabular}\ \right.= { start_ROW start_CELL italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL if ( 1 - italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) / ( 1 - italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) > italic_ϵ / ( 1 - italic_ϵ ) end_CELL end_ROW start_ROW start_CELL italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL otherwise end_CELL end_ROW

This implies that ψMAPsubscript𝜓𝑀𝐴𝑃\psi_{MAP}italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT will always classify a person to the nondiseased population when ϵitalic-ϵ\epsilonitalic_ϵ is small enough, e.g., when ψ0=0.05,ψ1=0.80,formulae-sequencesubscript𝜓00.05subscript𝜓10.80\psi_{0}=0.05,\psi_{1}=0.80,italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0.05 , italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.80 , and ϵ<0.0625.italic-ϵ0.0625\epsilon<0.0625.italic_ϵ < 0.0625 . By contrast, in this situation, ψRBsubscript𝜓𝑅𝐵\psi_{RB}italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT always classifies an individual with a positive test to the diseased population and to the nondiseased population for a negative test. Since M(|ψi)M(\cdot\,|\,\psi_{i})italic_M ( ⋅ | italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the Bernoulli(ψi)subscript𝜓𝑖(\psi_{i})( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) distribution, when ψ0<ψ1subscript𝜓0subscript𝜓1\psi_{0}<\psi_{1}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ϵitalic-ϵ\epsilonitalic_ϵ is small enough,

M(ψMAP\displaystyle M(\psi_{MAP}italic_M ( italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ψ0|ψ0)+M(ψMAPψ1|ψ1)=0+1=1,\displaystyle\neq\psi_{0}\,|\,\psi_{0})+M(\psi_{MAP}\neq\psi_{1}\,|\,\psi_{1})% =0+1=1,≠ italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_M ( italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ≠ italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 0 + 1 = 1 ,
M(ψRB\displaystyle M(\psi_{RB}italic_M ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ψ0|ψ0)+M(ψRBψ1|ψ1)=ψ0+(1ψ1)=0.25.\displaystyle\neq\psi_{0}\,|\,\psi_{0})+M(\psi_{RB}\neq\psi_{1}\,|\,\psi_{1})=% \psi_{0}+(1-\psi_{1})=0.25.≠ italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + italic_M ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ≠ italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( 1 - italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 0.25 .

This illustrates clearly the difference between these two procedures as ψRBsubscript𝜓𝑅𝐵\psi_{RB}italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT does better than ψMAPsubscript𝜓𝑀𝐴𝑃\psi_{MAP}italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT on the diseased population when ψ0subscript𝜓0\psi_{0}italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is small and ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is large as would be the case for a good diagnostic. Of course ψMAPsubscript𝜓𝑀𝐴𝑃\psi_{MAP}italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT minimizes the overall error rate but at the price of ignoring the most important class in this problem. Note that this example can be extended to the situation where we need to estimate the ψisubscript𝜓𝑖\psi_{i}italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT based on samples from the respective populations but this will not materially affect the overall conclusions.

Consider now a situation where (x,c)𝑥𝑐(x,c)( italic_x , italic_c ) is such that x|cfc,c|ϵx\,|\,c\sim f_{c},c\,|\,\epsilon\simitalic_x | italic_c ∼ italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_c | italic_ϵ ∼ Bernoulli(ϵ)italic-ϵ(\epsilon)( italic_ϵ ) where f0subscript𝑓0f_{0}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are known but ϵitalic-ϵ\epsilonitalic_ϵ is unknown with prior π.𝜋\pi.italic_π . This is a generalization of the previous discussion where ϵitalic-ϵ\epsilonitalic_ϵ was assumed to be known. Then based on a sample (x1,c1),,(xn,cn)subscript𝑥1subscript𝑐1subscript𝑥𝑛subscript𝑐𝑛(x_{1},c_{1}),\ldots,(x_{n},c_{n})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) from the joint distribution the goal is to predict the value cn+1subscript𝑐𝑛1c_{n+1}italic_c start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT for a newly observed xn+1.subscript𝑥𝑛1x_{n+1}.italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT .

The prior of c𝑐citalic_c is q(c)=01(1ϵ)1cϵcπ(ϵ)𝑑ϵ𝑞𝑐superscriptsubscript01superscript1italic-ϵ1𝑐superscriptitalic-ϵ𝑐𝜋italic-ϵdifferential-ditalic-ϵq(c)=\int_{0}^{1}(1-\epsilon)^{1-c}\epsilon^{c}\pi(\epsilon)\,d\epsilon\ italic_q ( italic_c ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT 1 - italic_c end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT italic_π ( italic_ϵ ) italic_d italic_ϵand, if ϵsimilar-toitalic-ϵabsent\epsilon\simitalic_ϵ ∼ beta(α,β),𝛼𝛽(\alpha,\beta),( italic_α , italic_β ) , so the prior predictive of cn+1subscript𝑐𝑛1c_{n+1}italic_c start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT is Bernoulli(α/(α+β)).𝛼𝛼𝛽(\alpha/(\alpha+\beta)).( italic_α / ( italic_α + italic_β ) ) . The posterior predictive density of cn+1subscript𝑐𝑛1c_{n+1}italic_c start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT equals, where c¯=n1i=1nci,¯𝑐superscript𝑛1superscriptsubscript𝑖1𝑛subscript𝑐𝑖\bar{c}=n^{-1}\sum_{i=1}^{n}c_{i},over¯ start_ARG italic_c end_ARG = italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

q(c|(x1,c1),,(xn,cn),xn+1)𝑞conditional𝑐subscript𝑥1subscript𝑐1subscript𝑥𝑛subscript𝑐𝑛subscript𝑥𝑛1\displaystyle q(c\,|\,(x_{1},c_{1}),\ldots,(x_{n},c_{n}),x_{n+1})italic_q ( italic_c | ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT )
(f0(xn+1))1c(f1(xn+1))c01ϵnc¯+c(1ϵ)n(1c¯)+(1c)π(ϵ)𝑑ϵproportional-toabsentsuperscriptsubscript𝑓0subscript𝑥𝑛11𝑐superscriptsubscript𝑓1subscript𝑥𝑛1𝑐superscriptsubscript01superscriptitalic-ϵ𝑛¯𝑐𝑐superscript1italic-ϵ𝑛1¯𝑐1𝑐𝜋italic-ϵdifferential-ditalic-ϵ\displaystyle\propto(f_{0}(x_{n+1}))^{1-c}(f_{1}(x_{n+1}))^{c}\int_{0}^{1}% \epsilon^{n\bar{c}+c}(1-\epsilon)^{n(1-\bar{c})+(1-c)}\pi(\epsilon)\,d\epsilon∝ ( italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 1 - italic_c end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUPERSCRIPT italic_n over¯ start_ARG italic_c end_ARG + italic_c end_POSTSUPERSCRIPT ( 1 - italic_ϵ ) start_POSTSUPERSCRIPT italic_n ( 1 - over¯ start_ARG italic_c end_ARG ) + ( 1 - italic_c ) end_POSTSUPERSCRIPT italic_π ( italic_ϵ ) italic_d italic_ϵ
=fc(xn+1)Γ(α+nc¯+c)Γ(β+n(1c¯)+1c).absentsubscript𝑓𝑐subscript𝑥𝑛1Γ𝛼𝑛¯𝑐𝑐Γ𝛽𝑛1¯𝑐1𝑐\displaystyle=f_{c}(x_{n+1})\Gamma\left(\alpha+n\bar{c}+c\right)\Gamma(\beta+n% (1-\bar{c})+1-c).= italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) roman_Γ ( italic_α + italic_n over¯ start_ARG italic_c end_ARG + italic_c ) roman_Γ ( italic_β + italic_n ( 1 - over¯ start_ARG italic_c end_ARG ) + 1 - italic_c ) .

It follows that, suppressing the dependence on the data,

cMAPsubscript𝑐𝑀𝐴𝑃\displaystyle c_{MAP}italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ={1if f1(xn+1)f0(xn+1)(α+nc¯)(β+n(1c¯))>10otherwise,absentcases1if subscript𝑓1subscript𝑥𝑛1subscript𝑓0subscript𝑥𝑛1𝛼𝑛¯𝑐𝛽𝑛1¯𝑐10otherwise,\displaystyle=\left\{\begin{array}[c]{cl}1&\text{if }\frac{f_{1}(x_{n+1})}{f_{% 0}(x_{n+1})}\frac{\left(\alpha+n\bar{c}\right)}{\left(\beta+n(1-\bar{c})\right% )}>1\\ 0&\text{otherwise,}\end{array}\right.= { start_ARRAY start_ROW start_CELL 1 end_CELL start_CELL if divide start_ARG italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) end_ARG divide start_ARG ( italic_α + italic_n over¯ start_ARG italic_c end_ARG ) end_ARG start_ARG ( italic_β + italic_n ( 1 - over¯ start_ARG italic_c end_ARG ) ) end_ARG > 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise, end_CELL end_ROW end_ARRAY (13)
cRBsubscript𝑐𝑅𝐵\displaystyle c_{RB}italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ={1if f1(xn+1)f0(xn+1)β(α+nc¯)α(β+n(1c¯))<10otherwiseabsentcases1if subscript𝑓1subscript𝑥𝑛1subscript𝑓0subscript𝑥𝑛1𝛽𝛼𝑛¯𝑐𝛼𝛽𝑛1¯𝑐10otherwise\displaystyle=\left\{\begin{array}[c]{cl}1&\text{if }\frac{f_{1}(x_{n+1})}{f_{% 0}(x_{n+1})}\frac{\beta\left(\alpha+n\bar{c}\right)}{\alpha\left(\beta+n(1-% \bar{c})\right)}<1\\ 0&\text{otherwise}\end{array}\right.= { start_ARRAY start_ROW start_CELL 1 end_CELL start_CELL if divide start_ARG italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) end_ARG divide start_ARG italic_β ( italic_α + italic_n over¯ start_ARG italic_c end_ARG ) end_ARG start_ARG italic_α ( italic_β + italic_n ( 1 - over¯ start_ARG italic_c end_ARG ) ) end_ARG < 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW end_ARRAY (16)

Note that cMAPsubscript𝑐𝑀𝐴𝑃c_{MAP}italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT and cRBsubscript𝑐𝑅𝐵c_{RB}italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT are identical whenever α=β.𝛼𝛽\alpha=\beta.italic_α = italic_β .

From these formulas it is apparent that a substantial difference will arise between cMAPsubscript𝑐𝑀𝐴𝑃c_{MAP}italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT and cRBsubscript𝑐𝑅𝐵c_{RB}italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT when one of α𝛼\alphaitalic_α or β𝛽\betaitalic_β is much bigger than the other. As in Example 1 these correspond to situations where we believe that ϵitalic-ϵ\epsilonitalic_ϵ or 1ϵ1italic-ϵ1-\epsilon1 - italic_ϵ is very small. Suppose we take α=1𝛼1\alpha=1italic_α = 1 and let β𝛽\betaitalic_β be relatively large, as this corresponds to knowing a priori that ϵitalic-ϵ\epsilonitalic_ϵ is very small. Then (16) implies that cMAPcRBsubscript𝑐𝑀𝐴𝑃subscript𝑐𝑅𝐵c_{MAP}\leq c_{RB}italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ≤ italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT and so cRB=1subscript𝑐𝑅𝐵1c_{RB}=1italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT = 1 whenever cMAP=1.subscript𝑐𝑀𝐴𝑃1c_{MAP}=1.italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT = 1 . A similar conclusion arises when we take β=1𝛽1\beta=1italic_β = 1 and α<1.𝛼1\alpha<1.italic_α < 1 .

To see what kind of improvement is possible consider a simulation study. Let f0subscript𝑓0f_{0}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be a N(0,1)𝑁01N(0,1)italic_N ( 0 , 1 ) density, f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT be a N(μ,1)𝑁𝜇1N(\mu,1)italic_N ( italic_μ , 1 ) density, n=10𝑛10n=10italic_n = 10 and the prior on ϵitalic-ϵ\epsilonitalic_ϵ be beta(1,β).1𝛽(1,\beta).( 1 , italic_β ) . Table 1 presents the Bayes risks for cMAPsubscript𝑐𝑀𝐴𝑃c_{MAP}\ italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPTand cRBsubscript𝑐𝑅𝐵c_{RB}italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT for various choices of β𝛽\betaitalic_β when μ=1.𝜇1\mu=1.italic_μ = 1 . When β=1𝛽1\beta=1italic_β = 1 they are equivalent but we see that as β𝛽\betaitalic_β rises the performance of cMAPsubscript𝑐𝑀𝐴𝑃c_{MAP}italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT deteriorates while cRBsubscript𝑐𝑅𝐵c_{RB}italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT improves. Large values of β𝛽\betaitalic_β correspond to having information that ϵitalic-ϵ\epsilonitalic_ϵ is small. When β=14𝛽14\beta=14italic_β = 14 about 0.500.500.500.50 of the prior probability is to the left of 0.05,0.050.05,0.05 , with β=32𝛽32\beta=32italic_β = 32 about 0.800.800.800.80 of the prior probability is to the left of 0.050.050.050.05 and with β=100𝛽100\beta=100italic_β = 100 about 0.990.990.990.99 of the prior probability is to the left of 0.05.0.050.05.0.05 . We see that the misclassification rates for the small group (c=1)𝑐1(c=1)( italic_c = 1 ) stay about the same for cRBsubscript𝑐𝑅𝐵c_{RB}italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT as β𝛽\betaitalic_β increases while they deteriorate markedly for cMAPsubscript𝑐𝑀𝐴𝑃c_{MAP}italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT as the MAP procedure basically ignores the small group.

β𝛽\betaitalic_β M0(cMAP0)+M1(cMAP1)subscript𝑀0subscript𝑐𝑀𝐴𝑃0subscript𝑀1subscript𝑐𝑀𝐴𝑃1M_{0}(c_{MAP}\neq 0)+M_{1}(c_{MAP}\neq 1)italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ≠ 0 ) + italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ≠ 1 ) M0(cRB0)+M1(cRB1)subscript𝑀0subscript𝑐𝑅𝐵0subscript𝑀1subscript𝑐𝑅𝐵1M_{0}(c_{RB}\neq 0)+M_{1}(c_{RB}\neq 1)italic_M start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ≠ 0 ) + italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ≠ 1 )
1111 0.386+0.390=0.7760.3860.3900.7760.386+0.390=0.7760.386 + 0.390 = 0.776 0.386+0.390=0.7760.3860.3900.7760.386+0.390=0.7760.386 + 0.390 = 0.776
14141414 0.002+0.975=0.9770.0020.9750.9770.002+0.975=0.9770.002 + 0.975 = 0.977 0.285+0.380=0.6650.2850.3800.6650.285+0.380=0.6650.285 + 0.380 = 0.665
32323232 0.000+0.997=0.9970.0000.9970.9970.000+0.997=0.9970.000 + 0.997 = 0.997 0.292+0.349=0.6410.2920.3490.6410.292+0.349=0.6410.292 + 0.349 = 0.641
100100100100 0.000+1.000=1.0000.0001.0001.0000.000+1.000=1.0000.000 + 1.000 = 1.000 0.300+0.324=0.6240.3000.3240.6240.300+0.324=0.6240.300 + 0.324 = 0.624
Table 1: Conditional prior probabilities of misclassification for cMAPsubscript𝑐𝑀𝐴𝑃c_{MAP}italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT and cRBsubscript𝑐𝑅𝐵c_{RB}italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT for various values of β𝛽\betaitalic_β in Example 3 when α=1𝛼1\alpha=1italic_α = 1, μ=1𝜇1\mu=1italic_μ = 1, and n𝑛nitalic_n=10.

We also investigated other choices for n𝑛nitalic_n and μ.𝜇\mu.italic_μ . There is very little change as n𝑛nitalic_n increases. When μ𝜇\muitalic_μ moves towards 00 the error rates go up and go down as μ𝜇\muitalic_μ moves away from 0, as one would expect. It is the case, however, that cRBsubscript𝑐𝑅𝐵c_{RB}italic_c start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT always dominates cMAP.subscript𝑐𝑀𝐴𝑃c_{MAP}.italic_c start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT . \blacksquare

4 Estimation: continuous parameter space

When ψ𝜓\psiitalic_ψ has a continuous prior distribution the argument in Theorem 2 does not work as ΠΨ({δ(x)}|x)=0.subscriptΠΨconditional𝛿𝑥𝑥0\Pi_{\Psi}(\{\delta(x)\}\,|\,x)=0.roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( { italic_δ ( italic_x ) } | italic_x ) = 0 . There are several possible ways to proceed but one approach is to use a discretization of the problem that uses Theorem 2. For this we will assume that the spaces involved are locally Euclidean, map**s are sufficiently smooth and take the support measures to be the analogs of Euclidean volume on the respective spaces. While the argument provided applies quite generally, it is simplified here by taking all spaces to be open subsets of Euclidean spaces and the support measures to be Euclidean volume on these sets.

For each λ>0𝜆0\lambda>0italic_λ > 0 suppose there is a discretization {Bλ(ψ):ψΨ(Θ)}conditional-setsubscript𝐵𝜆𝜓𝜓ΨΘ\{B_{\lambda}(\psi):\psi\in\Psi(\Theta)\}{ italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) : italic_ψ ∈ roman_Ψ ( roman_Θ ) } of Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) into a countable number of subsets with the following properties: ψBλ(ψ),ΠΨ(Bλ(ψ))>0formulae-sequence𝜓subscript𝐵𝜆𝜓subscriptΠΨsubscript𝐵𝜆𝜓0\psi\in B_{\lambda}(\psi),\Pi_{\Psi}(B_{\lambda}(\psi))>0italic_ψ ∈ italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) , roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) > 0 and supψΨsubscriptsupremum𝜓Ψ\sup_{\psi\in\Psi}roman_sup start_POSTSUBSCRIPT italic_ψ ∈ roman_Ψ end_POSTSUBSCRIPTdiam(Bλ(ψ))0subscript𝐵𝜆𝜓0(B_{\lambda}(\psi))\rightarrow 0( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) → 0 as λ0.𝜆0\lambda\rightarrow 0.italic_λ → 0 . So, if ψBλ(ψ),superscript𝜓subscript𝐵𝜆𝜓\psi^{\prime}\in B_{\lambda}(\psi),italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) , then Bλ(ψ)=Bλ(ψ).subscript𝐵𝜆superscript𝜓subscript𝐵𝜆𝜓B_{\lambda}(\psi^{\prime})=B_{\lambda}(\psi).italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) . For example, the Bλ(ψ)subscript𝐵𝜆𝜓B_{\lambda}(\psi)italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) could be equal volume rectangles in Rk.superscript𝑅𝑘R^{k}.italic_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT . Further, we assume that ΠΨ(Bλ(ψ))/νΨ(Bλ(ψ))πΨ(ψ)subscriptΠΨsubscript𝐵𝜆𝜓subscript𝜈Ψsubscript𝐵𝜆𝜓subscript𝜋Ψ𝜓\Pi_{\Psi}(B_{\lambda}(\psi))/\nu_{\Psi}(B_{\lambda}(\psi))\rightarrow\pi_{% \Psi}(\psi)roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) / italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) → italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) as λ0𝜆0\lambda\rightarrow 0italic_λ → 0 for every ψ.𝜓\psi.italic_ψ . This will hold whenever πΨsubscript𝜋Ψ\pi_{\Psi}italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT is continuous everywhere and Bλ(ψ)subscript𝐵𝜆𝜓B_{\lambda}(\psi)italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) converges nicely to {ψ}𝜓\{\psi\}{ italic_ψ } as λ0.𝜆0\lambda\rightarrow 0.italic_λ → 0 . Let ψλ(ψ)subscript𝜓𝜆𝜓\psi_{\lambda}(\psi)italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) denote a point in Bλ(ψ)subscript𝐵𝜆𝜓B_{\lambda}(\psi)italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) such that ψλ(ψ)=ψλ(ψ)subscript𝜓𝜆𝜓subscript𝜓𝜆superscript𝜓\psi_{\lambda}(\psi)=\psi_{\lambda}(\psi^{\prime})italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) = italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) whenever ψ,ψBλ(ψ)\psi,\psi^{\prime}\in\in B_{\lambda}(\psi)italic_ψ , italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ ∈ italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) and put Ψλ={ψλ(ψ):ψΨ(Θ)}.subscriptΨ𝜆conditional-setsubscript𝜓𝜆𝜓𝜓ΨΘ\Psi_{\lambda}=\{\psi_{\lambda}(\psi):\psi\in\Psi(\Theta)\}.roman_Ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT = { italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) : italic_ψ ∈ roman_Ψ ( roman_Θ ) } . So ΨλsubscriptΨ𝜆\Psi_{\lambda}roman_Ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT is a discretized version of Ψ(Θ).ΨΘ\Psi(\Theta).roman_Ψ ( roman_Θ ) . We will call this a regular discretization of Ψ(Θ).ΨΘ\Psi(\Theta).roman_Ψ ( roman_Θ ) . The discretized prior on ΨλsubscriptΨ𝜆\Psi_{\lambda}roman_Ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT is πΨ,λ(ψλ(ψ))=ΠΨ(Bλ(ψ))subscript𝜋Ψ𝜆subscript𝜓𝜆𝜓subscriptΠΨsubscript𝐵𝜆𝜓\pi_{\Psi,\lambda}(\psi_{\lambda}(\psi))=\Pi_{\Psi}(B_{\lambda}(\psi))italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) = roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) and the discretized posterior is πΨ,λ(ψλ(ψ)|x)=ΠΨ(Bλ(ψ)|x).subscript𝜋Ψ𝜆conditionalsubscript𝜓𝜆𝜓𝑥subscriptΠΨconditionalsubscript𝐵𝜆𝜓𝑥\pi_{\Psi,\lambda}(\psi_{\lambda}(\psi)\,|\,x)=\Pi_{\Psi}(B_{\lambda}(\psi)\,|% \,x).italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) | italic_x ) = roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) | italic_x ) .

The loss function for the discretized problem is defined as in Theorem 2 by

LRB,λ,η(θ,ψλ(ψ))=I{ψλ(ψ)}(ψλ(Ψ(θ)))max(η,πΨ,λ(ψλ(Ψ(θ))))subscript𝐿𝑅𝐵𝜆𝜂𝜃subscript𝜓𝜆𝜓subscript𝐼subscript𝜓𝜆𝜓subscript𝜓𝜆Ψ𝜃𝜂subscript𝜋Ψ𝜆subscript𝜓𝜆Ψ𝜃L_{RB,\lambda,\eta}(\theta,\psi_{\lambda}(\psi))=\frac{I_{\{\psi_{\lambda}(% \psi)\}}(\psi_{\lambda}(\Psi(\theta)))}{\max(\eta,\pi_{\Psi,\lambda}(\psi_{% \lambda}(\Psi(\theta))))}italic_L start_POSTSUBSCRIPT italic_R italic_B , italic_λ , italic_η end_POSTSUBSCRIPT ( italic_θ , italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) = divide start_ARG italic_I start_POSTSUBSCRIPT { italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) } end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) ) end_ARG start_ARG roman_max ( italic_η , italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) ) ) end_ARG (17)

and let δλ,η(x)subscript𝛿𝜆𝜂𝑥\delta_{\lambda,\eta}(x)italic_δ start_POSTSUBSCRIPT italic_λ , italic_η end_POSTSUBSCRIPT ( italic_x ) denote a Bayes rule for this problem.

Theorem 6. Suppose that πΨsubscript𝜋Ψ\pi_{\Psi}italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT is positive and continuous and we have a regular discretization of Ψ.Ψ\Psi.roman_Ψ . Further suppose that ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) is the unique maximizer of RBΨ(ψ|x)𝑅subscript𝐵Ψconditional𝜓𝑥RB_{\Psi}(\psi\,|\,x)italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) and for any ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0

sup{ψ:ψψRB(x)ϵ}RBΨ(ψ|x)<RBΨ(ψRB(x)|x).subscriptsupremumconditional-set𝜓norm𝜓subscript𝜓𝑅𝐵𝑥italic-ϵ𝑅subscript𝐵Ψconditional𝜓𝑥𝑅subscript𝐵Ψconditionalsubscript𝜓𝑅𝐵𝑥𝑥\sup_{\{\psi:||\psi-\psi_{RB}(x)||\geq\epsilon\}}RB_{\Psi}(\psi\,|\,x)<RB_{% \Psi}(\psi_{RB}(x)\,|\,x).roman_sup start_POSTSUBSCRIPT { italic_ψ : | | italic_ψ - italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) | | ≥ italic_ϵ } end_POSTSUBSCRIPT italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) < italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) | italic_x ) .

Then, there exists η(λ)0𝜂𝜆0\eta(\lambda)\downarrow 0italic_η ( italic_λ ) ↓ 0 as λ0𝜆0\lambda\rightarrow 0italic_λ → 0 such that a Bayes rule δλ,η(λ)(x),subscript𝛿𝜆𝜂𝜆𝑥\delta_{\lambda,\eta(\lambda)}(x),italic_δ start_POSTSUBSCRIPT italic_λ , italic_η ( italic_λ ) end_POSTSUBSCRIPT ( italic_x ) , under the loss LRB,λ,η(λ),subscript𝐿𝑅𝐵𝜆𝜂𝜆L_{RB,\lambda,\eta(\lambda)},italic_L start_POSTSUBSCRIPT italic_R italic_B , italic_λ , italic_η ( italic_λ ) end_POSTSUBSCRIPT , converges to ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) as λ0𝜆0\lambda\rightarrow 0italic_λ → 0 for all x.𝑥x.italic_x .

Theorem 6 says that ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) is a limit of Bayes rules. So, when Ψ(θ)=θΨ𝜃𝜃\Psi(\theta)=\thetaroman_Ψ ( italic_θ ) = italic_θ we have the result that the MLE is a limit of Bayes rules and more generally the MLE from an integrated likelihood is a limit of Bayes rules. The regularity conditions stated in Theorem 6 hold in many common statistical problems.

Now let ψ^λ(x)subscript^𝜓𝜆𝑥\hat{\psi}_{\lambda}(x)over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x ) be the relative belief estimate from the discretized problem, i.e., ψ^λ(x)subscript^𝜓𝜆𝑥\hat{\psi}_{\lambda}(x)over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x ) maximizes RBΨ(Bλ(ψ)|x)𝑅subscript𝐵Ψconditionalsubscript𝐵𝜆𝜓𝑥RB_{\Psi}(B_{\lambda}(\psi)\,|\,x)italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) | italic_x ) as a function of ψΨλ.𝜓subscriptΨ𝜆\psi\in\Psi_{\lambda}.italic_ψ ∈ roman_Ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT . The following is immediate from the proof of Theorem 6, Theorem 5 and Corollary 4.

Corollary 7. ψ^λsubscript^𝜓𝜆\hat{\psi}_{\lambda}over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT is admissible and Bayesian unbiased for the discretized problem and ψ^λ(x)ψRB(x)subscript^𝜓𝜆𝑥subscript𝜓𝑅𝐵𝑥\hat{\psi}_{\lambda}(x)\rightarrow\psi_{RB}(x)over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x ) → italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) as λ0𝜆0\lambda\rightarrow 0italic_λ → 0 for every x.𝑥x.italic_x .

By similar arguments an analog of Theorem 6 for ψMAPsubscript𝜓𝑀𝐴𝑃\psi_{MAP}italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT can be established. Actually, in this case, a simpler development can be followed in certain situations using the loss function IBλc(ψ)(Ψ(θ))subscript𝐼superscriptsubscript𝐵𝜆𝑐𝜓Ψ𝜃I_{B_{{}_{\lambda}}^{c}(\psi)}(\Psi(\theta))italic_I start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_λ end_FLOATSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_ψ ) end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ). For this note that the posterior risk of δ𝛿\deltaitalic_δ in the discretized problem is given by 1ΠΨ(Bλ(δ(x))|x)=1πΨ(δ(x)|x)νΨ(Bλ(δ(x)))1subscriptΠΨconditionalsubscript𝐵𝜆𝛿𝑥𝑥1subscript𝜋Ψconditionalsuperscript𝛿𝑥𝑥subscript𝜈Ψsubscript𝐵𝜆𝛿𝑥1-\Pi_{\Psi}(B_{\lambda}(\delta(x))\,|\,x)=1-\pi_{\Psi}(\delta^{\prime}(x)\,|% \,x)\nu_{\Psi}(B_{\lambda}(\delta(x)))1 - roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) | italic_x ) = 1 - italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) | italic_x ) italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) ) for some δ(x)Bλ(δ(x)).superscript𝛿𝑥subscript𝐵𝜆𝛿𝑥\delta^{\prime}(x)\in B_{\lambda}(\delta(x)).italic_δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_x ) ∈ italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) . Now suppose Bλ(ψ)subscript𝐵𝜆𝜓B_{\lambda}(\psi)italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) is a cube centered at ψ𝜓\psiitalic_ψ of edge length δ.𝛿\delta.italic_δ . Suppose further that for each ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 there exists λ(ϵ)>0𝜆italic-ϵ0\lambda(\epsilon)>0italic_λ ( italic_ϵ ) > 0 such that, when ψψMAP(x)>λ(ϵ)norm𝜓subscript𝜓𝑀𝐴𝑃𝑥𝜆italic-ϵ||\psi-\psi_{MAP}(x)||>\lambda(\epsilon)| | italic_ψ - italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_x ) | | > italic_λ ( italic_ϵ ) then πΨ(ψ|x)<infψBλ(ϵ)(ψMAP(x))πΨ(ψ|x).subscript𝜋Ψconditional𝜓𝑥subscriptinfimumsuperscript𝜓subscript𝐵𝜆italic-ϵsubscript𝜓𝑀𝐴𝑃𝑥subscript𝜋Ψconditionalsuperscript𝜓𝑥\pi_{\Psi}(\psi\,|\,x)<\inf_{\psi^{\prime}\in B_{\lambda(\epsilon)}(\psi_{MAP}% (x))}\pi_{\Psi}(\psi^{\prime}\,|\,x).italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) < roman_inf start_POSTSUBSCRIPT italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_B start_POSTSUBSCRIPT italic_λ ( italic_ϵ ) end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_x ) ) end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_x ) . Since νΨ(Bλ(ψ))subscript𝜈Ψsubscript𝐵𝜆𝜓\nu_{\Psi}(B_{\lambda}(\psi))italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) is constant we have that a Bayes rule δλ(ϵ)subscript𝛿𝜆italic-ϵ\delta_{\lambda(\epsilon)}italic_δ start_POSTSUBSCRIPT italic_λ ( italic_ϵ ) end_POSTSUBSCRIPT must then satisfy δλ(ϵ)(x)ψMAP(x)<ϵnormsubscript𝛿𝜆italic-ϵ𝑥subscript𝜓𝑀𝐴𝑃𝑥italic-ϵ||\delta_{\lambda(\epsilon)}(x)-\psi_{MAP}(x)||<\epsilon| | italic_δ start_POSTSUBSCRIPT italic_λ ( italic_ϵ ) end_POSTSUBSCRIPT ( italic_x ) - italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_x ) | | < italic_ϵ. This proves that ψMAPsubscript𝜓𝑀𝐴𝑃\psi_{MAP}italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT is a limit of Bayes rules. By contrast, for the loss IBλc(ψ)(Ψ(θ))/ΠΨ(Bλ(Ψ(θ))),subscript𝐼superscriptsubscript𝐵𝜆𝑐𝜓Ψ𝜃subscriptΠΨsubscript𝐵𝜆Ψ𝜃I_{B_{{}_{\lambda}}^{c}(\psi)}(\Psi(\theta))/\Pi_{\Psi}(B_{\lambda}(\Psi(% \theta))),italic_I start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_λ end_FLOATSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_ψ ) end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) / roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) ) , the posterior risk of δ𝛿\deltaitalic_δ is given by

Ψ{ΠΨ(Bλ(ψ))}1ΠΨ(dψ|x)Bλ(δ(x)){ΠΨ(Bλ(ψ))}1ΠΨ(dψ|x)subscriptΨsuperscriptsubscriptΠΨsubscript𝐵𝜆𝜓1subscriptΠΨconditional𝑑𝜓𝑥subscriptsubscript𝐵𝜆𝛿𝑥superscriptsubscriptΠΨsubscript𝐵𝜆𝜓1subscriptΠΨconditional𝑑𝜓𝑥\int_{\Psi}\{\Pi_{\Psi}(B_{\lambda}(\psi))\}^{-1}\Pi_{\Psi}(d\psi\,|\,x)-\int_% {B_{\lambda}(\delta(x))}\{\Pi_{\Psi}(B_{\lambda}(\psi))\}^{-1}\Pi_{\Psi}(d\psi% \,|\,x)∫ start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT { roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ | italic_x ) - ∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) end_POSTSUBSCRIPT { roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ | italic_x )

and the first term is generally unbounded unless Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is compact.

Consider an important example.

Example 2. Regression

Suppose that y=Xβ+e𝑦𝑋𝛽𝑒y=X\beta+eitalic_y = italic_X italic_β + italic_e where yRn,XRn×kformulae-sequence𝑦superscript𝑅𝑛𝑋superscript𝑅𝑛𝑘y\in R^{n},X\in R^{n\times k}italic_y ∈ italic_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_X ∈ italic_R start_POSTSUPERSCRIPT italic_n × italic_k end_POSTSUPERSCRIPT is fixed of rank k,βRn×k,𝑘𝛽superscript𝑅𝑛𝑘k,\beta\in R^{n\times k},italic_k , italic_β ∈ italic_R start_POSTSUPERSCRIPT italic_n × italic_k end_POSTSUPERSCRIPT , and eNn(0,σ2I).similar-to𝑒subscript𝑁𝑛0superscript𝜎2𝐼e\sim N_{n}(0,\sigma^{2}I).italic_e ∼ italic_N start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) . We will assume that σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is known to simplify the discussion but this is not necessary. Let π𝜋\piitalic_π be a prior density for β.𝛽\beta.italic_β . For every π,𝜋\pi,italic_π , having observed (X,y),𝑋𝑦(X,y),( italic_X , italic_y ) , then βRB(y)=b=(XX)1Xysubscript𝛽𝑅𝐵𝑦𝑏superscriptsuperscript𝑋𝑋1superscript𝑋𝑦\beta_{RB}(y)=b=(X^{\prime}X)^{-1}X^{\prime}yitalic_β start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) = italic_b = ( italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_y the MLE of β.𝛽\beta.italic_β .

It is interesting to contrast this result with what might be considered more standard Bayesian estimates such as MAP or the posterior mean. For example, suppose that βNk(0,τ2I).similar-to𝛽subscript𝑁𝑘0superscript𝜏2𝐼\beta\sim N_{k}(0,\tau^{2}I).italic_β ∼ italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 0 , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) . Then the posterior distribution of β𝛽\beta\ italic_βis Nk(βpost(y),Σpost)subscript𝑁𝑘subscript𝛽𝑝𝑜𝑠𝑡𝑦subscriptΣ𝑝𝑜𝑠𝑡N_{k}(\beta_{post}(y),\Sigma_{post})italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_β start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_y ) , roman_Σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ) where

βpost(y)=Σpost(σ2XXb), Σpost=(τ2I+σ2XX)1formulae-sequencesubscript𝛽𝑝𝑜𝑠𝑡𝑦subscriptΣ𝑝𝑜𝑠𝑡superscript𝜎2superscript𝑋𝑋𝑏 subscriptΣ𝑝𝑜𝑠𝑡superscriptsuperscript𝜏2𝐼superscript𝜎2superscript𝑋𝑋1\beta_{post}(y)=\Sigma_{post}(\sigma^{-2}X^{\prime}Xb),\text{ }\Sigma_{post}=(% \tau^{-2}I+\sigma^{-2}X^{\prime}X)^{-1}italic_β start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_y ) = roman_Σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_X italic_b ) , roman_Σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT = ( italic_τ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_I + italic_σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT

and note βMAP(y)=βpost(y).subscript𝛽𝑀𝐴𝑃𝑦subscript𝛽𝑝𝑜𝑠𝑡𝑦\beta_{MAP}(y)=\beta_{post}(y).italic_β start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) = italic_β start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_y ) . Writing the spectral decomposition of XXsuperscript𝑋𝑋X^{\prime}Xitalic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_X as XX=QΛQsuperscript𝑋𝑋𝑄Λsuperscript𝑄X^{\prime}X=Q\Lambda Q^{\prime}italic_X start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_X = italic_Q roman_Λ italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT we have that

βMAP(y)=(I+(σ2/τ2)Λ1)1Qb.normsubscript𝛽𝑀𝐴𝑃𝑦normsuperscript𝐼superscript𝜎2superscript𝜏2superscriptΛ11superscript𝑄𝑏||\beta_{MAP}(y)||=||(I+(\sigma^{2}/\tau^{2})\Lambda^{-1})^{-1}Q^{\prime}b||.| | italic_β start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) | | = | | ( italic_I + ( italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) roman_Λ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_b | | .

Since b=Qbnorm𝑏normsuperscript𝑄𝑏||b||=||Q^{\prime}b||| | italic_b | | = | | italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_b | | and 1/(1+τ2λi/σ2)<111superscript𝜏2subscript𝜆𝑖superscript𝜎211/(1+\tau^{2}\lambda_{i}/\sigma^{2})<11 / ( 1 + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) < 1 for each i,𝑖i,italic_i , this implies that βMAP(y)subscript𝛽𝑀𝐴𝑃𝑦\beta_{MAP}(y)italic_β start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) shrinks the MLE towards the prior mean 0.00.0 . When the columns of X𝑋Xitalic_X are orthonormal, then βMAP(y)=r(1+r)1bsubscript𝛽𝑀𝐴𝑃𝑦𝑟superscript1𝑟1𝑏\beta_{MAP}(y)=r(1+r)^{-1}bitalic_β start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) = italic_r ( 1 + italic_r ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_b where r=τ2/σ2𝑟superscript𝜏2superscript𝜎2r=\tau^{2}/\sigma^{2}italic_r = italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and so the shrinkage is substantial unless τ2superscript𝜏2\tau^{2}italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is much larger than σ2.superscript𝜎2\sigma^{2}.italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . This shrinkage is often cited as a positive attribute of these estimates. Consider, however, the situation where the true value of β𝛽\betaitalic_β is some distance from the mean. In that case it seems wrong to move β𝛽\betaitalic_β towards the prior mean and so it isn’t clear that shrinking the MLE is necessarily a good thing, particularly as this requires giving up invariance.

Suppose it is required to estimate the mean response ψ=Ψ(β)=wβ𝜓Ψ𝛽superscript𝑤𝛽\psi=\Psi(\beta)=w^{\prime}\betaitalic_ψ = roman_Ψ ( italic_β ) = italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_β at w𝑤witalic_w for the predictors. The prior distribution of ψ𝜓\psiitalic_ψ is N(0,σψ2)=N(0,τ2ww)𝑁0superscriptsubscript𝜎𝜓2𝑁0superscript𝜏2superscript𝑤𝑤N(0,\sigma_{\psi}^{2})=N(0,\tau^{2}w^{\prime}w)italic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = italic_N ( 0 , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_w ) and the posterior distribution is N(ψMAP(y),σψ,post2)=N(wβMAP(y),wΣpost(β)w).𝑁subscript𝜓𝑀𝐴𝑃𝑦superscriptsubscript𝜎𝜓𝑝𝑜𝑠𝑡2𝑁superscript𝑤subscript𝛽𝑀𝐴𝑃𝑦superscript𝑤subscriptΣ𝑝𝑜𝑠𝑡𝛽𝑤N(\psi_{MAP}(y),\sigma_{\psi,post}^{2})=N(w^{\prime}\beta_{MAP}(y),w^{\prime}% \Sigma_{post}(\beta)w).italic_N ( italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) , italic_σ start_POSTSUBSCRIPT italic_ψ , italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = italic_N ( italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) , italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_β ) italic_w ) . Note that

σψ2σψ,post2=w(τ2IΣpost)w=τ2wQ(I(I+(τ2/σ2)Λ)1)Qw>0superscriptsubscript𝜎𝜓2superscriptsubscript𝜎𝜓𝑝𝑜𝑠𝑡2superscript𝑤superscript𝜏2𝐼subscriptΣ𝑝𝑜𝑠𝑡𝑤superscript𝜏2superscript𝑤superscript𝑄𝐼superscript𝐼superscript𝜏2superscript𝜎2Λ1𝑄𝑤0\sigma_{\psi}^{2}-\sigma_{\psi,post}^{2}=w^{\prime}(\tau^{2}I-\Sigma_{post})w=% \tau^{2}w^{\prime}Q^{\prime}(I-(I+(\tau^{2}/\sigma^{2})\Lambda)^{-1})Qw>0italic_σ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_σ start_POSTSUBSCRIPT italic_ψ , italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I - roman_Σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ) italic_w = italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_I - ( italic_I + ( italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) roman_Λ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) italic_Q italic_w > 0

since 1/(1+τ2λi/σ2)<111superscript𝜏2subscript𝜆𝑖superscript𝜎211/(1+\tau^{2}\lambda_{i}/\sigma^{2})<11 / ( 1 + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) < 1 for each i.𝑖i.italic_i . Therefore, maximizing the ratio of the posterior to prior densities leads to

ψRB(y)=(1σψ,post2/σψ2)1ψMAP(y).subscript𝜓𝑅𝐵𝑦superscript1superscriptsubscript𝜎𝜓𝑝𝑜𝑠𝑡2superscriptsubscript𝜎𝜓21subscript𝜓𝑀𝐴𝑃𝑦\psi_{RB}(y)=(1-\sigma_{\psi,post}^{2}/\sigma_{\psi}^{2})^{-1}\psi_{MAP}(y).italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) = ( 1 - italic_σ start_POSTSUBSCRIPT italic_ψ , italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_σ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) . (18)

Then σψ2>σψ,post2superscriptsubscript𝜎𝜓2superscriptsubscript𝜎𝜓𝑝𝑜𝑠𝑡2\sigma_{\psi}^{2}>\sigma_{\psi,post}^{2}italic_σ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > italic_σ start_POSTSUBSCRIPT italic_ψ , italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT implies |ψRB(y)|>|ψMAP(y)|.subscript𝜓𝑅𝐵𝑦subscript𝜓𝑀𝐴𝑃𝑦|\psi_{RB}(y)|>|\psi_{MAP}(y)|.| italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) | > | italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) | . Note that when σψ,post2superscriptsubscript𝜎𝜓𝑝𝑜𝑠𝑡2\sigma_{\psi,post}^{2}italic_σ start_POSTSUBSCRIPT italic_ψ , italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is much smaller than σψ2,superscriptsubscript𝜎𝜓2\sigma_{\psi}^{2},italic_σ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , in other words the posterior is much more concentrated than the prior, then ψRB(y)subscript𝜓𝑅𝐵𝑦\psi_{RB}(y)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) and ψMAP(y)subscript𝜓𝑀𝐴𝑃𝑦\psi_{MAP}(y)italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) are very similar. In general ψRB(y)subscript𝜓𝑅𝐵𝑦\psi_{RB}(y)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) is not equal to wb,superscript𝑤𝑏w^{\prime}b,italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_b , the plug-in MLE of ψ,𝜓\psi,italic_ψ , although it is the MLE from the integrated likelihood, ψRB(y)wbsubscript𝜓𝑅𝐵𝑦superscript𝑤𝑏\psi_{RB}(y)\rightarrow w^{\prime}bitalic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) → italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_b as τ2superscript𝜏2\tau^{2}\rightarrow\inftyitalic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT → ∞ and when X𝑋Xitalic_X has orthonormal columns ψRB(y)=wb.subscript𝜓𝑅𝐵𝑦superscript𝑤𝑏\psi_{RB}(y)=w^{\prime}b.italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) = italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_b .

Suppose it is required to predict a response z𝑧zitalic_z at the predictor value wRk.𝑤superscript𝑅𝑘w\in R^{k}.italic_w ∈ italic_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT . When βNk(0,τ2I)similar-to𝛽subscript𝑁𝑘0superscript𝜏2𝐼\beta\sim N_{k}(0,\tau^{2}I)italic_β ∼ italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( 0 , italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) the prior distribution of z𝑧zitalic_z is zN(0,σ2+τ2ww)=N(0,σz2)similar-to𝑧𝑁0superscript𝜎2superscript𝜏2superscript𝑤𝑤𝑁0superscriptsubscript𝜎𝑧2z\sim N(0,\sigma^{2}+\tau^{2}w^{\prime}w)=N(0,\sigma_{z}^{2})italic_z ∼ italic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_w ) = italic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and the posterior distribution is N(μpost(z),σpost2(z))𝑁subscript𝜇𝑝𝑜𝑠𝑡𝑧superscriptsubscript𝜎𝑝𝑜𝑠𝑡2𝑧N(\mu_{post}(z),\sigma_{post}^{2}(z))italic_N ( italic_μ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_z ) , italic_σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_z ) ) where

μpost(z)=wβpost(y), σpost2(z)=σ2+wΣpostw.formulae-sequencesubscript𝜇𝑝𝑜𝑠𝑡𝑧superscript𝑤subscript𝛽𝑝𝑜𝑠𝑡𝑦 superscriptsubscript𝜎𝑝𝑜𝑠𝑡2𝑧superscript𝜎2superscript𝑤subscriptΣ𝑝𝑜𝑠𝑡𝑤\mu_{post}(z)=w^{\prime}\beta_{post}(y),\text{ }\sigma_{post}^{2}(z)=\sigma^{2% }+w^{\prime}\Sigma_{post}w.italic_μ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_z ) = italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_β start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_y ) , italic_σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_z ) = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT italic_w .

To obtain zRB(y)subscript𝑧𝑅𝐵𝑦z_{RB}(y)italic_z start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) it is necessary to maximize the ratio of the posterior to the prior densities of z𝑧zitalic_z and this leads to

zRB(y)=(1σpost2(z)/σprior2(z))1μpost(z).subscript𝑧𝑅𝐵𝑦superscript1superscriptsubscript𝜎𝑝𝑜𝑠𝑡2𝑧superscriptsubscript𝜎𝑝𝑟𝑖𝑜𝑟2𝑧1subscript𝜇𝑝𝑜𝑠𝑡𝑧z_{RB}(y)=(1-\sigma_{post}^{2}(z)/\sigma_{prior}^{2}(z))^{-1}\mu_{post}(z).italic_z start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) = ( 1 - italic_σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_z ) / italic_σ start_POSTSUBSCRIPT italic_p italic_r italic_i italic_o italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_z ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_z ) . (19)

Note that σz2σpost2(z)=σz2(wβ)σpost2(wβ)>0superscriptsubscript𝜎𝑧2superscriptsubscript𝜎𝑝𝑜𝑠𝑡2𝑧superscriptsubscript𝜎𝑧2superscript𝑤𝛽superscriptsubscript𝜎𝑝𝑜𝑠𝑡2superscript𝑤𝛽0\sigma_{z}^{2}-\sigma_{post}^{2}(z)=\sigma_{z}^{2}(w^{\prime}\beta)-\sigma_{% post}^{2}(w^{\prime}\beta)>0italic_σ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_z ) = italic_σ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_β ) - italic_σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_β ) > 0 and so |zRB(y)|>|μpost(z)|subscript𝑧𝑅𝐵𝑦subscript𝜇𝑝𝑜𝑠𝑡𝑧|z_{RB}(y)|>|\mu_{post}(z)|| italic_z start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) | > | italic_μ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_z ) | and zRBsubscript𝑧𝑅𝐵z_{RB}italic_z start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT is further from the prior mean than zMAP(y)=μpost(z).subscript𝑧𝑀𝐴𝑃𝑦subscript𝜇𝑝𝑜𝑠𝑡𝑧z_{MAP}(y)=\mu_{post}(z).italic_z start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) = italic_μ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT ( italic_z ) . Also, we see that, when σpost2(z)superscriptsubscript𝜎𝑝𝑜𝑠𝑡2𝑧\sigma_{post}^{2}(z)italic_σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_z ) is small then zRB(y)subscript𝑧𝑅𝐵𝑦z_{RB}(y)italic_z start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) and zMAP(y)subscript𝑧𝑀𝐴𝑃𝑦z_{MAP}(y)italic_z start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) are very similar. Finally, comparing (18) and (19) we have that

zRB(y)=(σprior2(z)/σpost2(ψ))wψRB(y)=(1+σ2/τ2)ψRB(y)subscript𝑧𝑅𝐵𝑦superscriptsubscript𝜎𝑝𝑟𝑖𝑜𝑟2𝑧superscriptsubscript𝜎𝑝𝑜𝑠𝑡2𝜓superscript𝑤subscript𝜓𝑅𝐵𝑦1superscript𝜎2superscript𝜏2subscript𝜓𝑅𝐵𝑦z_{RB}(y)=(\sigma_{prior}^{2}(z)/\sigma_{post}^{2}(\psi))w^{\prime}\psi_{RB}(y% )=(1+\sigma^{2}/\tau^{2})\psi_{RB}(y)italic_z start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) = ( italic_σ start_POSTSUBSCRIPT italic_p italic_r italic_i italic_o italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_z ) / italic_σ start_POSTSUBSCRIPT italic_p italic_o italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ψ ) ) italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) = ( 1 + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y )

and so ψRB(y)subscript𝜓𝑅𝐵𝑦\psi_{RB}(y)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) at w𝑤witalic_w is more dispersed than the zRB(y)subscript𝑧𝑅𝐵𝑦z_{RB}(y)italic_z start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_y ) estimate of the mean at w𝑤witalic_w and this makes good sense as we have to take into account the additional variation due to prediction. By contrast wMAP(y)=ψMAP(y).subscript𝑤𝑀𝐴𝑃𝑦subscript𝜓𝑀𝐴𝑃𝑦w_{MAP}(y)=\psi_{MAP}(y).italic_w start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) = italic_ψ start_POSTSUBSCRIPT italic_M italic_A italic_P end_POSTSUBSCRIPT ( italic_y ) . \blacksquare

5 Credible regions and hypothesis assessment

First recall that a γ𝛾\gammaitalic_γ-relative belief credible region for ψ=Ψ(θ)𝜓Ψ𝜃\psi=\Psi(\theta)italic_ψ = roman_Ψ ( italic_θ ) is given by CΨ,γ(x)={ψ:RBΨ(ψ|x)cγ(x)}subscript𝐶Ψ𝛾𝑥conditional-set𝜓𝑅subscript𝐵Ψconditional𝜓𝑥subscript𝑐𝛾𝑥C_{\Psi,\gamma}(x)=\{\psi:RB_{\Psi}(\psi\,|\,x)\geq c_{\gamma}(x)\}italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) } where cγ(x)=sup{c:ΠΨ(RBΨ(ψ|x)c|x)γ}.subscript𝑐𝛾𝑥supremumconditional-set𝑐subscriptΠΨ𝑅subscript𝐵Ψconditional𝜓𝑥conditional𝑐𝑥𝛾c_{\gamma}(x)=\sup\{c:\Pi_{\Psi}(RB_{\Psi}(\psi\,|\,x)\geq c\,|\,x)\geq\gamma\}.italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = roman_sup { italic_c : roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_c | italic_x ) ≥ italic_γ } . There is some arbitrariness in the choice of the greater than or equal sign to define the credible region as it also could have been defined as CΨ,γ(x)={ψ:RBΨ(ψ|x)>cγ(x)}subscript𝐶Ψ𝛾𝑥conditional-set𝜓𝑅subscript𝐵Ψconditional𝜓𝑥subscript𝑐𝛾𝑥C_{\Psi,\gamma}(x)=\{\psi:RB_{\Psi}(\psi\,|\,x)>c_{\gamma}(x)\}italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) > italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) } where cγ(x)=inf{c:ΠΨ(RBΨ(ψ|x)c|x)1γ}.subscript𝑐𝛾𝑥infimumconditional-set𝑐subscriptΠΨ𝑅subscript𝐵Ψconditional𝜓𝑥conditional𝑐𝑥1𝛾c_{\gamma}(x)=\inf\{c:\Pi_{\Psi}(RB_{\Psi}(\psi\,|\,x)\leq c\,|\,x)\leq 1-% \gamma\}.italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = roman_inf { italic_c : roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≤ italic_c | italic_x ) ≤ 1 - italic_γ } . In this latter case cγ(x)subscript𝑐𝛾𝑥c_{\gamma}(x)italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) is the (1γ)1𝛾(1-\gamma)( 1 - italic_γ )-th quantile of the posterior distribution of the relative belief ratio. This definition has some advantages as using this implies that the plausible region satisfies PlΨ(x)=CΨ,γ(x)𝑃subscript𝑙Ψ𝑥subscript𝐶Ψ𝛾𝑥Pl_{\Psi}(x)=C_{\Psi,\gamma}(x)italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) = italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) where γ=ΠΨ(PlΨ(x)|x).𝛾subscriptΠΨconditional𝑃subscript𝑙Ψ𝑥𝑥\gamma=\Pi_{\Psi}(Pl_{\Psi}(x)\,|\,x).italic_γ = roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_P italic_l start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) . Also, the strength of the evidence concerning the hypothesis H0:Ψ(θ)=ψ0:subscript𝐻0Ψ𝜃subscript𝜓0H_{0}:\Psi(\theta)=\psi_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT : roman_Ψ ( italic_θ ) = italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT satisfies StrΨ(ψ0|x)=1ΠΨ(CΨ,γ(x)|x)𝑆𝑡subscript𝑟Ψconditionalsubscript𝜓0𝑥1subscriptΠΨconditionalsubscript𝐶Ψ𝛾𝑥𝑥Str_{\Psi}(\psi_{0}\,|\,x)=1-\Pi_{\Psi}(C_{\Psi,\gamma}(x)\,|\,x)italic_S italic_t italic_r start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) = 1 - roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) where γ=1StrΨ(ψ0|x).𝛾1𝑆𝑡subscript𝑟Ψconditionalsubscript𝜓0𝑥\gamma=1-Str_{\Psi}(\psi_{0}\,|\,x).italic_γ = 1 - italic_S italic_t italic_r start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | italic_x ) . The point here is that there is a close relationship between relative belief credible regions and the plausible region and the strength calculation. As such, any decision-theoretic interpretation for relative belief credible regions also applies to the plausible region and the strength of the evidence. Throughout this section we will, however, retain the definition for CΨ,γ(x)subscript𝐶Ψ𝛾𝑥C_{\Psi,\gamma}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) provided in Section 2.3.

Now consider the lowest posterior loss γ𝛾\gammaitalic_γ-credible regions that arise from the prior-based loss functions considered here.

Theorem 8. Suppose that πΨ(ψ)>0subscript𝜋Ψ𝜓0\pi_{\Psi}(\psi)>0italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) > 0 for every ψΨ(Θ)𝜓ΨΘ\psi\in\Psi(\Theta)italic_ψ ∈ roman_Ψ ( roman_Θ ) where Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) is finite with νΨsubscript𝜈Ψ\nu_{\Psi}italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT equal to counting measure. Then CΨ,γ(x)subscript𝐶Ψ𝛾𝑥C_{\Psi,\gamma}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) is a γ𝛾\gammaitalic_γ-lowest posterior loss credible region for the loss function LRB.subscript𝐿𝑅𝐵L_{RB}.italic_L start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT .

Proof: From (3) and (6) the γ𝛾\gammaitalic_γ-lowest posterior loss credible region is

Dγ(x)={ψ:RBΨ(ψ|x)ΨRBΨ(ζ|x)νΨ(dζ)dγ(x)}subscript𝐷𝛾𝑥conditional-set𝜓𝑅subscript𝐵Ψconditional𝜓𝑥subscriptΨ𝑅subscript𝐵Ψconditional𝜁𝑥subscript𝜈Ψ𝑑𝜁subscript𝑑𝛾𝑥D_{\gamma}(x)=\left\{\psi:RB_{\Psi}(\psi\,|\,x)\geq\int_{\Psi}RB_{\Psi}(\zeta% \,|\,x)\,\nu_{\Psi}(d\zeta)-d_{\gamma}(x)\right\}italic_D start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ ∫ start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ζ | italic_x ) italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ζ ) - italic_d start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) }

and dγ(x)=sup{d:ΠΨ(r(ψ|x)d|x)γ}.subscript𝑑𝛾𝑥supremumconditional-set𝑑subscriptΠΨ𝑟conditional𝜓𝑥conditional𝑑𝑥𝛾d_{\gamma}(x)=\sup\{d:\Pi_{\Psi}(r(\psi\,|\,x)\leq d\,|\,x)\geq\gamma\}.italic_d start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = roman_sup { italic_d : roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_r ( italic_ψ | italic_x ) ≤ italic_d | italic_x ) ≥ italic_γ } . As ΨRBΨ(z|x)νΨ(dz)subscriptΨ𝑅subscript𝐵Ψconditional𝑧𝑥subscript𝜈Ψ𝑑𝑧\int_{\Psi}RB_{\Psi}(z\,|\,x)\,\nu_{\Psi}(dz)∫ start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_z | italic_x ) italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_z ) is independent of ψ𝜓\psiitalic_ψ it is clearly equivalent to define this region via CΨ,γ(x)={ψ:RBΨ(ψ|x)cγ(x)},subscript𝐶Ψ𝛾𝑥conditional-set𝜓𝑅subscript𝐵Ψconditional𝜓𝑥subscript𝑐𝛾𝑥C_{\Psi,\gamma}(x)=\left\{\psi:RB_{\Psi}(\psi\,|\,x)\geq c_{\gamma}(x)\right\},italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) } , namely, Dγ(x)=CΨ,γ(x).subscript𝐷𝛾𝑥subscript𝐶Ψ𝛾𝑥D_{\gamma}(x)=C_{\Psi,\gamma}(x).italic_D start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) = italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) . \blacksquare

Now consider the case where ΨΨ\Psiroman_Ψ is countable and we use loss function LRB,η.subscript𝐿𝑅𝐵𝜂L_{RB,\eta}.italic_L start_POSTSUBSCRIPT italic_R italic_B , italic_η end_POSTSUBSCRIPT . Following the proof of Theorem 8 we see that a γ𝛾\gammaitalic_γ-lowest posterior loss region takes the form

Dη,γ(x)={ψ:πΨ(ψ|x)/max(η,πΨ(ψ))dη,γ(x)}subscript𝐷𝜂𝛾𝑥conditional-set𝜓subscript𝜋Ψconditional𝜓𝑥𝜂subscript𝜋Ψ𝜓subscript𝑑𝜂𝛾𝑥D_{\eta,\gamma}(x)=\left\{\psi:\pi_{\Psi}(\psi\,|\,x)/\max(\eta,\pi_{\Psi}(% \psi))\geq d_{\eta,\gamma}(x)\right\}italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) / roman_max ( italic_η , italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) ) ≥ italic_d start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) }

where dη,γ(x)=sup{d:ΠΨ(πΨ(ψ|x)/max(η,πΨ(ψ))|x)d)γ}.d_{\eta,\gamma}(x)=\sup\{d:\Pi_{\Psi}(\pi_{\Psi}(\psi\,|\,x)/\max(\eta,\pi_{% \Psi}(\psi))\,|\,x)\geq d)\geq\gamma\}.italic_d start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) = roman_sup { italic_d : roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) / roman_max ( italic_η , italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) ) | italic_x ) ≥ italic_d ) ≥ italic_γ } .

Theorem 9. Suppose that πΨ(ψ)>0subscript𝜋Ψ𝜓0\pi_{\Psi}(\psi)>0italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) > 0 for every ψΨ,𝜓Ψ\psi\in\Psi,italic_ψ ∈ roman_Ψ , that ΨΨ\Psiroman_Ψ is countable with νΨsubscript𝜈Ψ\nu_{\Psi}italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT equal to counting measure. For the loss function LRB,ηsubscript𝐿𝑅𝐵𝜂L_{RB,\eta}italic_L start_POSTSUBSCRIPT italic_R italic_B , italic_η end_POSTSUBSCRIPT then CΨ,γ(x)liminfη0Dη,γ(x)subscript𝐶Ψ𝛾𝑥subscriptinfimum𝜂0subscript𝐷𝜂𝛾𝑥C_{\Psi,\gamma}(x)\subset\lim\inf_{\eta\rightarrow 0}D_{\eta,\gamma}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) ⊂ roman_lim roman_inf start_POSTSUBSCRIPT italic_η → 0 end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) whenever γ𝛾\gammaitalic_γ is such that ΠΨ(CΨ,γ(x)|x)=γsubscriptΠΨconditionalsubscript𝐶Ψ𝛾𝑥𝑥𝛾\Pi_{\Psi}(C_{\Psi,\gamma}(x)\,|\,x)=\gammaroman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) = italic_γ and limsupη0Dη,γ(x)CΨ,γ(x)subscriptsupremum𝜂0subscript𝐷𝜂𝛾𝑥subscript𝐶Ψsuperscript𝛾𝑥\lim\sup_{\eta\rightarrow 0}D_{\eta,\gamma}(x)\subset C_{\Psi,\gamma^{\prime}}% (x)roman_lim roman_sup start_POSTSUBSCRIPT italic_η → 0 end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) ⊂ italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) whenever γ>γsuperscript𝛾𝛾\gamma^{\prime}>\gammaitalic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > italic_γ and ΠΨ(CΨ,γ(x)|x)=γ.subscriptΠΨconditionalsubscript𝐶Ψsuperscript𝛾𝑥𝑥superscript𝛾\Pi_{\Psi}(C_{\Psi,\gamma^{\prime}}(x)\,|\,x)=\gamma^{\prime}.roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) | italic_x ) = italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT .

While Theorem 9 does not establish the exact convergence limη0Dη,γ(x)=Cγ(x)subscript𝜂0subscript𝐷𝜂𝛾𝑥subscript𝐶𝛾𝑥\lim_{\eta\rightarrow 0}D_{\eta,\gamma}(x)=C_{\gamma}(x)roman_lim start_POSTSUBSCRIPT italic_η → 0 end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) = italic_C start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) it is likely, however, that this does hold under quite general circumstances due to the discreteness. Theorem 9 does show that limit points of the class of sets Dη,γ(x)subscript𝐷𝜂𝛾𝑥D_{\eta,\gamma}(x)italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) always contain CΨ,γ(x)subscript𝐶Ψ𝛾𝑥C_{\Psi,\gamma}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) and their posterior probability content differs from γ𝛾\gammaitalic_γ by at most γγsuperscript𝛾𝛾\gamma^{\prime}-\gammaitalic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_γ where γ>γsuperscript𝛾𝛾\gamma^{\prime}>\gammaitalic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > italic_γ is the next largest value for which we have exact content.

Now consider the continuous case with a regular discretization. For SΨλ={ψλ(ψ):ψλ(ψ)Bλ(ψ)},superscript𝑆subscriptΨ𝜆conditional-setsubscript𝜓𝜆𝜓subscript𝜓𝜆𝜓subscript𝐵𝜆𝜓S^{\ast}\subset\Psi_{\lambda}=\{\psi_{\lambda}(\psi):\psi_{\lambda}(\psi)\in B% _{\lambda}(\psi)\},italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⊂ roman_Ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT = { italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) : italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ∈ italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) } , namely, Ssuperscript𝑆S^{\ast}italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a subset of a discretized version of Ψ(Θ),ΨΘ\Psi(\Theta),roman_Ψ ( roman_Θ ) , define the undiscretized version of Ssuperscript𝑆S^{\ast}italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to be S=ψSBλ(ψ).𝑆subscript𝜓superscript𝑆subscript𝐵𝜆𝜓S=\cup_{\psi\in S^{\ast}}B_{\lambda}(\psi).italic_S = ∪ start_POSTSUBSCRIPT italic_ψ ∈ italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) . Now let CΨ,λ,γ(x)superscriptsubscript𝐶Ψ𝜆𝛾𝑥C_{\Psi,\lambda,\gamma}^{\ast}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) be the γ𝛾\gammaitalic_γ-relative belief region for the discretized problem and let CΨ,λ,γ(x)subscript𝐶Ψ𝜆𝛾𝑥C_{\Psi,\lambda,\gamma}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) be its undiscretized version. Note that in a continuous context we will consider two sets as equal if they differ only by a set of measure 0 with respect to ΠΨ.subscriptΠΨ\Pi_{\Psi}.roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT . The following result says that a γ𝛾\gammaitalic_γ-relative belief credible region for the discretized problem, after undiscretizing, converges to the γ𝛾\gammaitalic_γ-relative belief region for the original problem.

Theorem 10. Suppose that πΨsubscript𝜋Ψ\pi_{\Psi}italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT is positive and continuous, there is a regular discretization of Ψ(Θ)ΨΘ\Psi(\Theta)roman_Ψ ( roman_Θ ) and RBΨ(ψ|x)𝑅subscript𝐵Ψconditional𝜓𝑥RB_{\Psi}(\psi\,|\,x)italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) has a continuous posterior distribution. Then limλ0CΨ,λ,γ(x)=CΨ,γ(x).subscript𝜆0subscript𝐶Ψ𝜆𝛾𝑥subscript𝐶Ψ𝛾𝑥\lim_{\lambda\rightarrow 0}C_{\Psi,\lambda,\gamma}(x)=C_{\Psi,\gamma}(x).roman_lim start_POSTSUBSCRIPT italic_λ → 0 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) = italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) .

While Theorem 10 has interest in its own right, it can be also used to prove that relative belief regions are limits of lowest posterior loss regions.

Let Dη,λ,γ(x)superscriptsubscript𝐷𝜂𝜆𝛾𝑥D_{\eta,\lambda,\gamma}^{\ast}(x)italic_D start_POSTSUBSCRIPT italic_η , italic_λ , italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) be the γ𝛾\gammaitalic_γ-lowest posterior loss region obtained for the discretized problem using loss function (17) and Dη,λ,γ(x)subscript𝐷𝜂𝜆𝛾𝑥D_{\eta,\lambda,\gamma}(x)italic_D start_POSTSUBSCRIPT italic_η , italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) be the undiscretized version.

Theorem 11. Suppose that πΨsubscript𝜋Ψ\pi_{\Psi}italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT is positive and continuous, we have a regular discretization of ΨΨ\Psiroman_Ψ and RBΨ(ψ|x)𝑅subscript𝐵Ψconditional𝜓𝑥RB_{\Psi}(\psi\,|\,x)italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) has a continuous posterior distribution. Then CΨ,γ(x)=limλ0liminfη0DΨ,γ(x)=limλ0limsupη0DΨ,γ(x).subscript𝐶Ψ𝛾𝑥𝜆0𝜂0infimumsubscript𝐷Ψ𝛾𝑥𝜆0𝜂0supremumsubscript𝐷Ψ𝛾𝑥C_{\Psi,\gamma}(x)=\underset{\lambda\rightarrow 0}{\lim}\underset{\eta% \rightarrow 0}{\lim\inf}D_{\Psi,\gamma}(x)=\underset{\lambda\rightarrow 0}{% \lim}\underset{\eta\rightarrow 0}{\lim\sup}D_{\Psi,\gamma}(x).italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) = start_UNDERACCENT italic_λ → 0 end_UNDERACCENT start_ARG roman_lim end_ARG start_UNDERACCENT italic_η → 0 end_UNDERACCENT start_ARG roman_lim roman_inf end_ARG italic_D start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) = start_UNDERACCENT italic_λ → 0 end_UNDERACCENT start_ARG roman_lim end_ARG start_UNDERACCENT italic_η → 0 end_UNDERACCENT start_ARG roman_lim roman_sup end_ARG italic_D start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) .

In Evans, Guttman, and Swartz (2006) and Evans and Shakhatreh (2008) additional properties of relative belief regions are developed. For example, it is proved that a γ𝛾\gammaitalic_γ-relative belief region CΨ,γ(x)subscript𝐶Ψ𝛾𝑥C_{\Psi,\gamma}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) for ψ𝜓\psiitalic_ψ satisfying ΠΨ(CΨ,γ(x)|x)=γsubscriptΠΨconditionalsubscript𝐶Ψ𝛾𝑥𝑥𝛾\Pi_{\Psi}(C_{\Psi,\gamma}(x)\,|\,x)=\gammaroman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) = italic_γ minimizes ΠΨ(B)subscriptΠΨ𝐵\Pi_{\Psi}(B)roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B ) among all (measurable) subsets of ΨΨ\Psiroman_Ψ satisfying ΠΨ(B|x)γ.subscriptΠΨconditional𝐵𝑥𝛾\Pi_{\Psi}(B\,|\,x)\geq\gamma.roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B | italic_x ) ≥ italic_γ . So a γ𝛾\gammaitalic_γ-relative belief region is smallest among all γ𝛾\gammaitalic_γ-credible regions for ψ𝜓\psiitalic_ψ where size is measured using the prior measure. This property has several consequences. For example, the prior probability that a region B(x)Ψ(Θ)𝐵𝑥ΨΘB(x)\subset\Psi(\Theta)italic_B ( italic_x ) ⊂ roman_Ψ ( roman_Θ ) contains a false value from the prior is given by ΘΨFθ(ψB(x))ΠΨ(dψ)Π(dθ)subscriptΘsubscriptΨsubscript𝐹𝜃𝜓𝐵𝑥subscriptΠΨ𝑑𝜓Π𝑑𝜃\int_{\Theta}\int_{\Psi}F_{\theta}(\psi\in B(x))\,\Pi_{\Psi}(d\psi)\,\Pi(d\theta)∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_ψ ∈ italic_B ( italic_x ) ) roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ ) roman_Π ( italic_d italic_θ ) where a false value is a value of ψΠΨsimilar-to𝜓subscriptΠΨ\psi\sim\Pi_{\Psi}italic_ψ ∼ roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT generated independently of (θ,x)ΠΨ×Fθ.similar-to𝜃𝑥subscriptΠΨsubscript𝐹𝜃(\theta,x)\sim\Pi_{\Psi}\times F_{\theta}.( italic_θ , italic_x ) ∼ roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT × italic_F start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT . It can be proved that a γ𝛾\gammaitalic_γ-relative belief region minimizes this probability among all γ𝛾\gammaitalic_γ-credible regions for ψ𝜓\psiitalic_ψ and is always unbiased in the sense that the probability of covering a false value is bounded above by γ.𝛾\gamma.italic_γ . Furthermore, a γ𝛾\gammaitalic_γ-relative belief region maximizes the relative belief ratio ΠΨ(B|x)/ΠΨ(B)subscriptΠΨconditional𝐵𝑥subscriptΠΨ𝐵\Pi_{\Psi}(B\,|\,x)/\Pi_{\Psi}(B)roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B | italic_x ) / roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B ) and the Bayes factor ΠΨ(B|x)ΠΨ(Bc)/ΠΨ(Bc|x)ΠΨ(B)subscriptΠΨconditional𝐵𝑥subscriptΠΨsuperscript𝐵𝑐subscriptΠΨconditionalsuperscript𝐵𝑐𝑥subscriptΠΨ𝐵\Pi_{\Psi}(B\,|\,x)\Pi_{\Psi}(B^{c})/\Pi_{\Psi}(B^{c}\,|\,x)\Pi_{\Psi}(B)roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B | italic_x ) roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ) / roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT | italic_x ) roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B ) among all regions BΨ𝐵ΨB\subset\Psiitalic_B ⊂ roman_Ψ with ΠΨ(B)=ΠΨ(CΨ,γ(x)|x).subscriptΠΨ𝐵subscriptΠΨconditionalsubscript𝐶Ψ𝛾𝑥𝑥\Pi_{\Psi}(B)=\Pi_{\Psi}(C_{\Psi,\gamma}(x)\,|\,x).roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B ) = roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) .

While the results in this section have been concerned with obtaining credible regions for parameters, similar results can be proved for the construction of prediction regions.

6 Conclusions

Relative belief inferences are closely related to likelihood inferences. This together with their invariance and optimality properties make these prime candidates as appropriate inferences in Bayesian contexts. This paper has shown that relative belief inferences arise naturally in a decision-theoretic formulation using loss functions based on the prior.

Appendix

Proof of Theorem 2 and Corollary 3: We have that

rη(δ|x)subscript𝑟𝜂conditional𝛿𝑥\displaystyle r_{\eta}(\delta\,|\,x)italic_r start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_δ | italic_x ) =ΨLRB,η(θ,δ(x))πΨ(ψ|x)νΨ(dψ)absentsubscriptΨsubscript𝐿𝑅𝐵𝜂𝜃𝛿𝑥subscript𝜋Ψconditional𝜓𝑥subscript𝜈Ψ𝑑𝜓\displaystyle=\int_{\Psi}L_{RB,\eta}(\theta,\delta(x))\pi_{\Psi}(\psi\,|\,x)\,% \nu_{\Psi}(d\psi)= ∫ start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_R italic_B , italic_η end_POSTSUBSCRIPT ( italic_θ , italic_δ ( italic_x ) ) italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ )
=ΨπΨ(ψ|x)max(η,πΨ(ψ))νΨ(dψ)πΨ(δ(x)|x)max(η,πΨ(δ(x))).absentsubscriptΨsubscript𝜋Ψconditional𝜓𝑥𝜂subscript𝜋Ψ𝜓subscript𝜈Ψ𝑑𝜓subscript𝜋Ψconditional𝛿𝑥𝑥𝜂subscript𝜋Ψ𝛿𝑥\displaystyle=\int_{\Psi}\frac{\pi_{\Psi}(\psi\,|\,x)}{\max(\eta,\pi_{\Psi}(% \psi))}\,\nu_{\Psi}(d\psi)-\frac{\pi_{\Psi}(\delta(x)\,|\,x)}{\max(\eta,\pi_{% \Psi}(\delta(x)))}.= ∫ start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) end_ARG start_ARG roman_max ( italic_η , italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) ) end_ARG italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ ) - divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) | italic_x ) end_ARG start_ARG roman_max ( italic_η , italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) ) end_ARG . (20)

The first term in (20) is bounded above by 1/η1𝜂1/\eta1 / italic_η and does not depend on δ(x)𝛿𝑥\delta(x)italic_δ ( italic_x ) so the value of a Bayes rule at x𝑥xitalic_x is obtained by finding δ(x)𝛿𝑥\delta(x)italic_δ ( italic_x ) that maximizes the second term. Note that

πΨ(δ(x)|x)max(η,πΨ(δ(x)))={πΨ(δ(x)|x)ηif η>πΨ(δ(x)),RBΨ(δ(x)|x)if ηπΨ(δ(x)).subscript𝜋Ψconditional𝛿𝑥𝑥𝜂subscript𝜋Ψ𝛿𝑥casessubscript𝜋Ψconditional𝛿𝑥𝑥𝜂if 𝜂subscript𝜋Ψ𝛿𝑥𝑅subscript𝐵Ψconditional𝛿𝑥𝑥if 𝜂subscript𝜋Ψ𝛿𝑥\frac{\pi_{\Psi}(\delta(x)\,|\,x)}{\max(\eta,\pi_{\Psi}(\delta(x)))}=\left\{% \begin{array}[c]{cl}\frac{\pi_{\Psi}(\delta(x)\,|\,x)}{\eta}&\text{if }\eta>% \pi_{\Psi}(\delta(x)),\\ RB_{\Psi}(\delta(x)\,|\,x)&\text{if }\eta\leq\pi_{\Psi}(\delta(x)).\end{array}\right.divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) | italic_x ) end_ARG start_ARG roman_max ( italic_η , italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) ) end_ARG = { start_ARRAY start_ROW start_CELL divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) | italic_x ) end_ARG start_ARG italic_η end_ARG end_CELL start_CELL if italic_η > italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) , end_CELL end_ROW start_ROW start_CELL italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) | italic_x ) end_CELL start_CELL if italic_η ≤ italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) . end_CELL end_ROW end_ARRAY (21)

There are at most finitely many values of ψ𝜓\psiitalic_ψ satisfying ηπΨ(ψ)𝜂subscript𝜋Ψ𝜓\eta\leq\pi_{\Psi}(\psi)italic_η ≤ italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) and so RBΨ(ψ|x)𝑅subscript𝐵Ψconditional𝜓𝑥RB_{\Psi}(\psi\,|\,x)italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) assumes a maximum on this set, say at ψη(x),subscript𝜓𝜂𝑥\psi_{\eta}(x),italic_ψ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_x ) , and ψη(x)=ψRB(x)subscript𝜓𝜂𝑥subscript𝜓𝑅𝐵𝑥\psi_{\eta}(x)=\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_x ) = italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) when ηπΨ(ψRB(x)).𝜂subscript𝜋Ψsubscript𝜓𝑅𝐵𝑥\eta\leq\pi_{\Psi}(\psi_{RB}(x)).italic_η ≤ italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) . If η>πΨ(δ(x)),𝜂subscript𝜋Ψ𝛿𝑥\eta>\pi_{\Psi}(\delta(x)),italic_η > italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) , then πΨ(δ(x)|x)/η<RBΨ(δ(x)|x)RBΨ(ψRB(x)|x).subscript𝜋Ψconditional𝛿𝑥𝑥𝜂𝑅subscript𝐵Ψconditional𝛿𝑥𝑥𝑅subscript𝐵Ψconditionalsubscript𝜓𝑅𝐵𝑥𝑥\pi_{\Psi}(\delta(x)\,|\,x)/\eta<RB_{\Psi}(\delta(x)\,|\,x)\leq RB_{\Psi}(\psi% _{RB}(x)\,|\,x).italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) | italic_x ) / italic_η < italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) | italic_x ) ≤ italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) | italic_x ) . This proves that, for all ηη0=πΨ(ψRB(x))>0𝜂subscript𝜂0subscript𝜋Ψsubscript𝜓𝑅𝐵𝑥0\eta\leq\eta_{0}=\pi_{\Psi}(\psi_{RB}(x))>0italic_η ≤ italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) > 0 the maximizer of (21) is given by δ(x)=ψRB(x)𝛿𝑥subscript𝜓𝑅𝐵𝑥\delta(x)=\psi_{RB}(x)italic_δ ( italic_x ) = italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) and the results are established.

Proof of Theorem 5: The prior risk of δ𝛿\deltaitalic_δ is given by

r(δ)𝑟𝛿\displaystyle r(\delta)italic_r ( italic_δ ) =Θ𝒳L(θ,δ(x))F(dx|θ)Π(dθ)absentsubscriptΘsubscript𝒳𝐿𝜃𝛿𝑥𝐹conditional𝑑𝑥𝜃Π𝑑𝜃\displaystyle=\int_{\Theta}\int_{\mathcal{X}}L(\theta,\delta(x))\,F(dx\,|\,% \theta)\,\Pi(d\theta)= ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_L ( italic_θ , italic_δ ( italic_x ) ) italic_F ( italic_d italic_x | italic_θ ) roman_Π ( italic_d italic_θ )
=Θ𝒳[h(Ψ(θ))I{δ(x)}(Ψ(θ))h(Ψ(θ))]F(dx|θ)Π(dθ)absentsubscriptΘsubscript𝒳delimited-[]Ψ𝜃subscript𝐼𝛿𝑥Ψ𝜃Ψ𝜃𝐹conditional𝑑𝑥𝜃Π𝑑𝜃\displaystyle=\int_{\Theta}\int_{\mathcal{X}}[h(\Psi(\theta))-I_{\{\delta(x)\}% }(\Psi(\theta))h(\Psi(\theta))]\,F(dx\,|\,\theta)\,\Pi(d\theta)= ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT [ italic_h ( roman_Ψ ( italic_θ ) ) - italic_I start_POSTSUBSCRIPT { italic_δ ( italic_x ) } end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) italic_h ( roman_Ψ ( italic_θ ) ) ] italic_F ( italic_d italic_x | italic_θ ) roman_Π ( italic_d italic_θ )
=Θh(Ψ(θ))Π(dθ)𝒳ΘI{δ(x)}(Ψ(θ))h(Ψ(θ))Π(dθ|x)M(dx)absentsubscriptΘΨ𝜃Π𝑑𝜃subscript𝒳subscriptΘsubscript𝐼𝛿𝑥Ψ𝜃Ψ𝜃Πconditional𝑑𝜃𝑥𝑀𝑑𝑥\displaystyle=\int_{\Theta}h(\Psi(\theta))\,\Pi(d\theta)-\int_{\mathcal{X}}% \int_{\Theta}I_{\{\delta(x)\}}(\Psi(\theta))h(\Psi(\theta))\,\Pi(d\theta\,|\,x% )\,M(dx)= ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_h ( roman_Ψ ( italic_θ ) ) roman_Π ( italic_d italic_θ ) - ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT { italic_δ ( italic_x ) } end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ ) ) italic_h ( roman_Ψ ( italic_θ ) ) roman_Π ( italic_d italic_θ | italic_x ) italic_M ( italic_d italic_x )
=Θh(Ψ(θ))Π(dθ)𝒳h(δ(x))πΨ(δ(x)|x)M(dx)absentsubscriptΘΨ𝜃Π𝑑𝜃subscript𝒳𝛿𝑥subscript𝜋Ψconditional𝛿𝑥𝑥𝑀𝑑𝑥\displaystyle=\int_{\Theta}h(\Psi(\theta))\,\Pi(d\theta)-\int_{\mathcal{X}}h(% \delta(x))\pi_{\Psi}(\delta(x)\,|\,x)\,M(dx)= ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_h ( roman_Ψ ( italic_θ ) ) roman_Π ( italic_d italic_θ ) - ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_h ( italic_δ ( italic_x ) ) italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) | italic_x ) italic_M ( italic_d italic_x )

and

ΘΘ𝒳L(θ,δ(x))F(dx|θ)Π(dθ)Π(dθ)subscriptΘsubscriptΘsubscript𝒳𝐿superscript𝜃𝛿𝑥𝐹conditional𝑑𝑥𝜃Π𝑑𝜃Π𝑑superscript𝜃\displaystyle\int_{\Theta}\int_{\Theta}\int_{\mathcal{X}}L(\theta^{\prime},% \delta(x))\,F(dx\,|\,\theta)\,\Pi(d\theta)\,\Pi(d\theta^{\prime})∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_L ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_δ ( italic_x ) ) italic_F ( italic_d italic_x | italic_θ ) roman_Π ( italic_d italic_θ ) roman_Π ( italic_d italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
=ΘΘ𝒳[h(Ψ(θ))I{δ(x)}(Ψ(θ))h(Ψ(θ))]F(dx|θ)Π(dθ)Π(dθ)absentsubscriptΘsubscriptΘsubscript𝒳delimited-[]Ψsuperscript𝜃subscript𝐼𝛿𝑥Ψsuperscript𝜃Ψsuperscript𝜃𝐹conditional𝑑𝑥𝜃Π𝑑𝜃Π𝑑superscript𝜃\displaystyle=\int_{\Theta}\int_{\Theta}\int_{\mathcal{X}}[h(\Psi(\theta^{% \prime}))-I_{\{\delta(x)\}}(\Psi(\theta^{\prime}))h(\Psi(\theta^{\prime}))]\,F% (dx\,|\,\theta)\,\Pi(d\theta)\,\Pi(d\theta^{\prime})= ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT [ italic_h ( roman_Ψ ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) - italic_I start_POSTSUBSCRIPT { italic_δ ( italic_x ) } end_POSTSUBSCRIPT ( roman_Ψ ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) italic_h ( roman_Ψ ( italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ] italic_F ( italic_d italic_x | italic_θ ) roman_Π ( italic_d italic_θ ) roman_Π ( italic_d italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
=Θh(Ψ(θ))Π(dθ)𝒳h(δ(x))πΨ(δ(x))M(dx).absentsubscriptΘΨ𝜃Π𝑑𝜃subscript𝒳𝛿𝑥subscript𝜋Ψ𝛿𝑥𝑀𝑑𝑥\displaystyle=\int_{\Theta}h(\Psi(\theta))\,\Pi(d\theta)-\int_{\mathcal{X}}h(% \delta(x))\pi_{\Psi}(\delta(x))\,M(dx).= ∫ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT italic_h ( roman_Ψ ( italic_θ ) ) roman_Π ( italic_d italic_θ ) - ∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_h ( italic_δ ( italic_x ) ) italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) italic_M ( italic_d italic_x ) .

Therefore, δ𝛿\deltaitalic_δ is Bayesian unbiased if and only if

𝒳h(δ(x))[πΨ(δ(x)|x)πΨ(δ(x))]M(dx)0.subscript𝒳𝛿𝑥delimited-[]subscript𝜋Ψconditional𝛿𝑥𝑥subscript𝜋Ψ𝛿𝑥𝑀𝑑𝑥0\int_{\mathcal{X}}h(\delta(x))[\pi_{\Psi}(\delta(x)\,|\,x)-\pi_{\Psi}(\delta(x% ))]\,M(dx)\geq 0.∫ start_POSTSUBSCRIPT caligraphic_X end_POSTSUBSCRIPT italic_h ( italic_δ ( italic_x ) ) [ italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) | italic_x ) - italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) ] italic_M ( italic_d italic_x ) ≥ 0 . (22)

This inequality holds when δ(x)=ψRB(x)𝛿𝑥subscript𝜓𝑅𝐵𝑥\delta(x)=\psi_{RB}(x)italic_δ ( italic_x ) = italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) because πΨ(|x)/πΨ()\pi_{\Psi}(\cdot\,|\,x)/\pi_{\Psi}(\cdot)italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( ⋅ | italic_x ) / italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( ⋅ ) is the density of ΠΨ(|x)\Pi_{\Psi}(\cdot\,|\,x)roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( ⋅ | italic_x ) with respect to ΠΨsubscriptΠΨ\Pi_{\Psi}roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT and which implies that the maximum of this density is greater than or equal to 1.

Proof of Theorem 6 and Corollary 7: Just as in Theorem 2, a Bayes rule δλ,η(x)subscript𝛿𝜆𝜂𝑥\delta_{\lambda,\eta}(x)italic_δ start_POSTSUBSCRIPT italic_λ , italic_η end_POSTSUBSCRIPT ( italic_x ) maximizes πΨ,λ(δ(x)|x)/max(η,πΨ,λ(δ(x)))subscript𝜋Ψ𝜆conditional𝛿𝑥𝑥𝜂subscript𝜋Ψ𝜆𝛿𝑥\pi_{\Psi,\lambda}(\delta(x)\,|\,x)/\max(\eta,\pi_{\Psi,\lambda}(\delta(x)))italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) | italic_x ) / roman_max ( italic_η , italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) ) for δ(x)Ψλ.𝛿𝑥subscriptΨ𝜆\delta(x)\in\Psi_{\lambda}.italic_δ ( italic_x ) ∈ roman_Ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT . Furthermore, as in Theorem 2, such a rule exists. Now define η(λ)𝜂𝜆\eta(\lambda)italic_η ( italic_λ ) so that 0<η(λ)<ΠΨ(Bλ(ψRB(x)))0𝜂𝜆subscriptΠΨsubscript𝐵𝜆subscript𝜓𝑅𝐵𝑥0<\eta(\lambda)<\Pi_{\Psi}(B_{\lambda}(\psi_{RB}(x)))0 < italic_η ( italic_λ ) < roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) ) and note that η(λ)0𝜂𝜆0\eta(\lambda)\rightarrow 0italic_η ( italic_λ ) → 0 as λ0.𝜆0\lambda\rightarrow 0.italic_λ → 0 . We have that, as λ0,𝜆0\lambda\rightarrow 0,italic_λ → 0 ,

πΨ,λ(ψλ(ψRB(x))|x)max(η(λ),πΨ,λ(ψλ(ψRB(x)))=πΨ,λ(ψλ(ψRB(x))|x)πΨ,λ(ψλ(ψRB))RBΨ(ψRB(x)|x).\frac{\pi_{\Psi,\lambda}(\psi_{\lambda}(\psi_{RB}(x))\,|\,x)}{\max(\eta(% \lambda),\pi_{\Psi,\lambda}(\psi_{\lambda}(\psi_{RB}(x)))}=\frac{\pi_{\Psi,% \lambda}(\psi_{\lambda}(\psi_{RB}(x))\,|\,x)}{\pi_{\Psi,\lambda}(\psi_{\lambda% }(\psi_{RB}))}\rightarrow RB_{\Psi}(\psi_{RB}(x)\,|\,x).divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) | italic_x ) end_ARG start_ARG roman_max ( italic_η ( italic_λ ) , italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) ) end_ARG = divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) | italic_x ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ) ) end_ARG → italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) | italic_x ) . (23)

Let ϵ>0.italic-ϵ0\epsilon>0.italic_ϵ > 0 . Let λ0subscript𝜆0\lambda_{0}italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be such that supψΨsubscriptsupremum𝜓Ψ\sup_{\psi\in\Psi}roman_sup start_POSTSUBSCRIPT italic_ψ ∈ roman_Ψ end_POSTSUBSCRIPTdiam(Bλ(ψ))<ϵ/2subscript𝐵𝜆𝜓italic-ϵ2(B_{\lambda}(\psi))<\epsilon/2( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) < italic_ϵ / 2 for all λ<λ0.𝜆subscript𝜆0\lambda<\lambda_{0}.italic_λ < italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . Then for λ<λ0,𝜆subscript𝜆0\lambda<\lambda_{0},italic_λ < italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , and any δ(x)𝛿𝑥\delta(x)italic_δ ( italic_x ) satisfying δ(x)ψRB(x)ϵ,norm𝛿𝑥subscript𝜓𝑅𝐵𝑥italic-ϵ||\delta(x)-\psi_{RB}(x)||\geq\epsilon,| | italic_δ ( italic_x ) - italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) | | ≥ italic_ϵ , we have

πΨ,λ(ψλ(δ(x))|x)πΨ,λ(ψλ(δ(x)))=Bλ(ψλ(δ(x)))πΨ(ψ|x)νΨ(dψ)Bλ(ψλ(δ(x)))πΨ(ψ)νΨ(dψ)subscript𝜋Ψ𝜆conditionalsubscript𝜓𝜆𝛿𝑥𝑥subscript𝜋Ψ𝜆subscript𝜓𝜆𝛿𝑥subscriptsubscript𝐵𝜆subscript𝜓𝜆𝛿𝑥subscript𝜋Ψconditional𝜓𝑥subscript𝜈Ψ𝑑𝜓subscriptsubscript𝐵𝜆subscript𝜓𝜆𝛿𝑥subscript𝜋Ψ𝜓subscript𝜈Ψ𝑑𝜓\displaystyle\frac{\pi_{\Psi,\lambda}(\psi_{\lambda}(\delta(x))\,|\,x)}{\pi_{% \Psi,\lambda}(\psi_{\lambda}(\delta(x)))}=\frac{\int_{B_{\lambda}(\psi_{% \lambda}(\delta(x)))}\pi_{\Psi}(\psi\,|\,x)\,\nu_{\Psi}(d\psi)}{\int_{B_{% \lambda}(\psi_{\lambda}(\delta(x)))}\pi_{\Psi}(\psi)\,\nu_{\Psi}(d\psi)}divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) | italic_x ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) ) end_ARG = divide start_ARG ∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) ) end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ ) end_ARG start_ARG ∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) ) end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ ) end_ARG
=Bλ(ψλ(δ(x)))RBΨ(ψ|x)πΨ(ψ)νΨ(dψ)Bλ(ψλ(δ(x)))πΨ(ψ)νΨ(dψ)absentsubscriptsubscript𝐵𝜆subscript𝜓𝜆𝛿𝑥𝑅subscript𝐵Ψconditional𝜓𝑥subscript𝜋Ψ𝜓subscript𝜈Ψ𝑑𝜓subscriptsubscript𝐵𝜆subscript𝜓𝜆𝛿𝑥subscript𝜋Ψ𝜓subscript𝜈Ψ𝑑𝜓\displaystyle=\frac{\int_{B_{\lambda}(\psi_{\lambda}(\delta(x)))}RB_{\Psi}(% \psi\,|\,x)\pi_{\Psi}(\psi)\,\nu_{\Psi}(d\psi)}{\int_{B_{\lambda}(\psi_{% \lambda}(\delta(x)))}\pi_{\Psi}(\psi)\,\nu_{\Psi}(d\psi)}= divide start_ARG ∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) ) end_POSTSUBSCRIPT italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ ) end_ARG start_ARG ∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_δ ( italic_x ) ) ) end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) italic_ν start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_d italic_ψ ) end_ARG
sup{ψ:ψψRB(x)>ϵ/2}RBΨ(ψ|x)<RBΨ(ψRB(x)|x).absentsubscriptsupremumconditional-set𝜓norm𝜓subscript𝜓𝑅𝐵𝑥italic-ϵ2𝑅subscript𝐵Ψconditional𝜓𝑥𝑅subscript𝐵Ψconditionalsubscript𝜓𝑅𝐵𝑥𝑥\displaystyle\leq\sup_{\{\psi:||\psi-\psi_{RB}(x)||>\epsilon/2\}}RB_{\Psi}(% \psi\,|\,x)<RB_{\Psi}(\psi_{RB}(x)\,|\,x).≤ roman_sup start_POSTSUBSCRIPT { italic_ψ : | | italic_ψ - italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) | | > italic_ϵ / 2 } end_POSTSUBSCRIPT italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) < italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) | italic_x ) . (24)

By (23) and (24) there exists λ1<λ0subscript𝜆1subscript𝜆0\lambda_{1}<\lambda_{0}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that, for all λ<λ1,𝜆subscript𝜆1\lambda<\lambda_{1},italic_λ < italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ,

πΨ,λ(ψλ(ψRB(x))|x)πΨ,λ(ψλ(ψRB(x)))>sup{ψ:ψψRB(x)>ϵ/2}RBΨ(ψ|x).subscript𝜋Ψ𝜆conditionalsubscript𝜓𝜆subscript𝜓𝑅𝐵𝑥𝑥subscript𝜋Ψ𝜆subscript𝜓𝜆subscript𝜓𝑅𝐵𝑥subscriptsupremumconditional-set𝜓norm𝜓subscript𝜓𝑅𝐵𝑥italic-ϵ2𝑅subscript𝐵Ψconditional𝜓𝑥\frac{\pi_{\Psi,\lambda}(\psi_{\lambda}(\psi_{RB}(x))\,|\,x)}{\pi_{\Psi,% \lambda}(\psi_{\lambda}(\psi_{RB}(x)))}>\sup_{\{\psi:||\psi-\psi_{RB}(x)||>% \epsilon/2\}}RB_{\Psi}(\psi\,|\,x).divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) | italic_x ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) ) end_ARG > roman_sup start_POSTSUBSCRIPT { italic_ψ : | | italic_ψ - italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) | | > italic_ϵ / 2 } end_POSTSUBSCRIPT italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) . (25)

Therefore, when λ<λ1,𝜆subscript𝜆1\lambda<\lambda_{1},italic_λ < italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , a Bayes rule δλ,η(λ)(x)subscript𝛿𝜆𝜂𝜆𝑥\delta_{\lambda,\eta(\lambda)}(x)italic_δ start_POSTSUBSCRIPT italic_λ , italic_η ( italic_λ ) end_POSTSUBSCRIPT ( italic_x ) satisfies

πΨ,λ(δλ,η(λ)(x)|x)πΨ,λ(δλ,η(λ)(x))πΨ,λ(δλ,η(λ)(x)|x)max(η(λ),πΨ,λ(δλ,η(λ)(x)))subscript𝜋Ψ𝜆conditionalsubscript𝛿𝜆𝜂𝜆𝑥𝑥subscript𝜋Ψ𝜆subscript𝛿𝜆𝜂𝜆𝑥subscript𝜋Ψ𝜆conditionalsubscript𝛿𝜆𝜂𝜆𝑥𝑥𝜂𝜆subscript𝜋Ψ𝜆subscript𝛿𝜆𝜂𝜆𝑥\displaystyle\frac{\pi_{\Psi,\lambda}(\delta_{\lambda,\eta(\lambda)}(x)\,|\,x)% }{\pi_{\Psi,\lambda}(\delta_{\lambda,\eta(\lambda)}(x))}\geq\frac{\pi_{\Psi,% \lambda}(\delta_{\lambda,\eta(\lambda)}(x)\,|\,x)}{\max(\eta(\lambda),\pi_{% \Psi,\lambda}(\delta_{\lambda,\eta(\lambda)}(x)))}divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_λ , italic_η ( italic_λ ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_λ , italic_η ( italic_λ ) end_POSTSUBSCRIPT ( italic_x ) ) end_ARG ≥ divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_λ , italic_η ( italic_λ ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) end_ARG start_ARG roman_max ( italic_η ( italic_λ ) , italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_λ , italic_η ( italic_λ ) end_POSTSUBSCRIPT ( italic_x ) ) ) end_ARG
πΨ,λ(ψλ(ψRB(x))|x)max(η(λ),πΨ,λ(ψλ(ψRB(x)))=πΨ,λ(ψλ(ψRB(x))|x)πΨ,λ(ψλ(ψRB(x))).\displaystyle\geq\frac{\pi_{\Psi,\lambda}(\psi_{\lambda}(\psi_{RB}(x))\,|\,x)}% {\max(\eta(\lambda),\pi_{\Psi,\lambda}(\psi_{\lambda}(\psi_{RB}(x)))}=\frac{% \pi_{\Psi,\lambda}(\psi_{\lambda}(\psi_{RB}(x))\,|\,x)}{\pi_{\Psi,\lambda}(% \psi_{\lambda}(\psi_{RB}(x)))}.≥ divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) | italic_x ) end_ARG start_ARG roman_max ( italic_η ( italic_λ ) , italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) ) end_ARG = divide start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) | italic_x ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) ) ) end_ARG . (26)

By (24), (25) and (26) this implies that δλ,η(λ)(x)ψRB(x)<ϵ/2normsubscript𝛿𝜆𝜂𝜆𝑥subscript𝜓𝑅𝐵𝑥italic-ϵ2||\delta_{\lambda,\eta(\lambda)}(x)-\psi_{RB}(x)||<\epsilon/2| | italic_δ start_POSTSUBSCRIPT italic_λ , italic_η ( italic_λ ) end_POSTSUBSCRIPT ( italic_x ) - italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) | | < italic_ϵ / 2 and the convergence is established.

Now πΨ,λ(ψ^λ(x)|x)/πΨ,λ(ψ^λ(x))πΨ,λ(δλ,η(λ)(x)|x)/πΨ,λ(δλ,η(λ)(x))subscript𝜋Ψ𝜆conditionalsubscript^𝜓𝜆𝑥𝑥subscript𝜋Ψ𝜆subscript^𝜓𝜆𝑥subscript𝜋Ψ𝜆conditionalsubscript𝛿𝜆𝜂𝜆𝑥𝑥subscript𝜋Ψ𝜆subscript𝛿𝜆𝜂𝜆𝑥\pi_{\Psi,\lambda}(\hat{\psi}_{\lambda}(x)\,|\,x)/\pi_{\Psi,\lambda}(\hat{\psi% }_{\lambda}(x))\geq\pi_{\Psi,\lambda}(\delta_{\lambda,\eta(\lambda)}(x)\,|\,x)% /\pi_{\Psi,\lambda}(\delta_{\lambda,\eta(\lambda)}(x))italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) / italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x ) ) ≥ italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_λ , italic_η ( italic_λ ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) / italic_π start_POSTSUBSCRIPT roman_Ψ , italic_λ end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT italic_λ , italic_η ( italic_λ ) end_POSTSUBSCRIPT ( italic_x ) ) and so by (24), (25) and (26) this implies that ψ^λ(x)ψRB(x)<ϵnormsubscript^𝜓𝜆𝑥subscript𝜓𝑅𝐵𝑥italic-ϵ||\hat{\psi}_{\lambda}(x)-\psi_{RB}(x)||<\epsilon| | over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x ) - italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) | | < italic_ϵ and the convergence of ψ^λ(x)subscript^𝜓𝜆𝑥\hat{\psi}_{\lambda}(x)\ over^ start_ARG italic_ψ end_ARG start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_x )to ψRB(x)subscript𝜓𝑅𝐵𝑥\psi_{RB}(x)italic_ψ start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT ( italic_x ) is established.

Proof of Theorem 9: For c>0𝑐0c>0italic_c > 0 let Sc(x)={ψ:RBΨ(ψ|x)c}subscript𝑆𝑐𝑥conditional-set𝜓𝑅subscript𝐵Ψconditional𝜓𝑥𝑐S_{c}(x)=\{\psi:RB_{\Psi}(\psi\,|\,x)\geq c\}italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_c } and Sη,c(x)={ψ:πΨ(ψ|x)/max(η,πΨ(ψ))c}.subscript𝑆𝜂𝑐𝑥conditional-set𝜓subscript𝜋Ψconditional𝜓𝑥𝜂subscript𝜋Ψ𝜓𝑐S_{\eta,c}(x)=\{\psi:\pi_{\Psi}(\psi\,|\,x)/\max(\eta,\pi_{\Psi}(\psi))\geq c\}.italic_S start_POSTSUBSCRIPT italic_η , italic_c end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) / roman_max ( italic_η , italic_π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ ) ) ≥ italic_c } . Note that Sη,c(x)Sc(x)subscript𝑆𝜂𝑐𝑥subscript𝑆𝑐𝑥S_{\eta,c}(x)\uparrow S_{c}(x)italic_S start_POSTSUBSCRIPT italic_η , italic_c end_POSTSUBSCRIPT ( italic_x ) ↑ italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) as η0.𝜂0\eta\rightarrow 0.italic_η → 0 .

Suppose c𝑐citalic_c is such that ΠΨ(Sc(x)|x)γ.subscriptΠΨconditionalsubscript𝑆𝑐𝑥𝑥𝛾\Pi_{\Psi}(S_{c}(x)\,|\,x)\leq\gamma.roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) | italic_x ) ≤ italic_γ . Then ΠΨ(Sη,c(x)|x)γsubscriptΠΨconditionalsubscript𝑆𝜂𝑐𝑥𝑥𝛾\Pi_{\Psi}(S_{\eta,c}(x)\,|\,x)\leq\gammaroman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_η , italic_c end_POSTSUBSCRIPT ( italic_x ) | italic_x ) ≤ italic_γ for all η𝜂\etaitalic_η and so Sη,c(x)Dη,γ(x).subscript𝑆𝜂𝑐𝑥subscript𝐷𝜂𝛾𝑥S_{\eta,c}(x)\subset D_{\eta,\gamma}(x).italic_S start_POSTSUBSCRIPT italic_η , italic_c end_POSTSUBSCRIPT ( italic_x ) ⊂ italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) . This implies that Sc(x)liminfη0Dη,γ(x)subscript𝑆𝑐𝑥subscriptinfimum𝜂0subscript𝐷𝜂𝛾𝑥S_{c}(x)\subset\lim\inf_{\eta\rightarrow 0}D_{\eta,\gamma}(x)italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) ⊂ roman_lim roman_inf start_POSTSUBSCRIPT italic_η → 0 end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) and since ΠΨ(CΨ,γ(x)|x)=γsubscriptΠΨconditionalsubscript𝐶Ψ𝛾𝑥𝑥𝛾\Pi_{\Psi}(C_{\Psi,\gamma}(x)\,|\,x)=\gammaroman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) = italic_γ this implies that CΨ,γ(x)liminfη0Dη,γ(x).subscript𝐶Ψ𝛾𝑥subscriptinfimum𝜂0subscript𝐷𝜂𝛾𝑥C_{\Psi,\gamma}(x)\subset\lim\inf_{\eta\rightarrow 0}D_{\eta,\gamma}(x).italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) ⊂ roman_lim roman_inf start_POSTSUBSCRIPT italic_η → 0 end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) .

Now suppose c𝑐citalic_c is such that ΠΨ(Sc(x)|x)>γ.subscriptΠΨconditionalsubscript𝑆𝑐𝑥𝑥𝛾\Pi_{\Psi}(S_{c}(x)\,|\,x)>\gamma.roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) | italic_x ) > italic_γ . Then there exists η0subscript𝜂0\eta_{0}italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that for all η<η0𝜂subscript𝜂0\eta<\eta_{0}italic_η < italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT we have ΠΨ(Sη,c(x)|x)>γ.subscriptΠΨconditionalsubscript𝑆𝜂𝑐𝑥𝑥𝛾\Pi_{\Psi}(S_{\eta,c}(x)\,|\,x)>\gamma.roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_η , italic_c end_POSTSUBSCRIPT ( italic_x ) | italic_x ) > italic_γ . Since Dη,γ(x)Sη,c(x)subscript𝐷𝜂𝛾𝑥subscript𝑆𝜂𝑐𝑥D_{\eta,\gamma}(x)\subset S_{\eta,c}(x)italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) ⊂ italic_S start_POSTSUBSCRIPT italic_η , italic_c end_POSTSUBSCRIPT ( italic_x ) when η<η0,𝜂subscript𝜂0\eta<\eta_{0},italic_η < italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , then limsupη0Dη,γ(x)Sc(x).subscriptsupremum𝜂0subscript𝐷𝜂𝛾𝑥subscript𝑆𝑐𝑥\lim\sup_{\eta\rightarrow 0}D_{\eta,\gamma}(x)\subset S_{c}(x).roman_lim roman_sup start_POSTSUBSCRIPT italic_η → 0 end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) ⊂ italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) . Then choosing c=cγ(x)𝑐subscript𝑐superscript𝛾𝑥c=c_{\gamma^{\prime}}(x)italic_c = italic_c start_POSTSUBSCRIPT italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) for γ>γsuperscript𝛾𝛾\gamma^{\prime}>\gammaitalic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > italic_γ implies that limsupη0Dη,γ(x)CΨ,γ(x).subscriptsupremum𝜂0subscript𝐷𝜂𝛾𝑥subscript𝐶Ψsuperscript𝛾𝑥\lim\sup_{\eta\rightarrow 0}D_{\eta,\gamma}(x)\subset C_{\Psi,\gamma^{\prime}}% (x).roman_lim roman_sup start_POSTSUBSCRIPT italic_η → 0 end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_η , italic_γ end_POSTSUBSCRIPT ( italic_x ) ⊂ italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x ) .

Proof of Theorem 10: Let Sc(x)={ψ:RBΨ(ψ|x)c}subscript𝑆𝑐𝑥conditional-set𝜓𝑅subscript𝐵Ψconditional𝜓𝑥𝑐S_{c}(x)=\{\psi:RB_{\Psi}(\psi\,|\,x)\geq c\}italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_c } and Sλ,c(x)={ψ:ΠΨ(Bλ(ψ)|x)/ΠΨ(Bλ(ψ))c}.subscript𝑆𝜆𝑐𝑥conditional-set𝜓subscriptΠΨconditionalsubscript𝐵𝜆𝜓𝑥subscriptΠΨsubscript𝐵𝜆𝜓𝑐S_{\lambda,c}(x)=\{\psi:\Pi_{\Psi}(B_{\lambda}(\psi)\,|\,x)/\Pi_{\Psi}(B_{% \lambda}(\psi))\geq c\}.italic_S start_POSTSUBSCRIPT italic_λ , italic_c end_POSTSUBSCRIPT ( italic_x ) = { italic_ψ : roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) | italic_x ) / roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) ≥ italic_c } . Recall that

limλ0ΠΨ(Bλ(ψ)|x)/ΠΨ(Bλ(ψ))=limλ0RBΨ(Bλ(ψ)|x)=RBΨ(ψ|x)subscript𝜆0subscriptΠΨconditionalsubscript𝐵𝜆𝜓𝑥subscriptΠΨsubscript𝐵𝜆𝜓subscript𝜆0𝑅subscript𝐵Ψconditionalsubscript𝐵𝜆𝜓𝑥𝑅subscript𝐵Ψconditional𝜓𝑥\lim_{\lambda\rightarrow 0}\Pi_{\Psi}(B_{\lambda}(\psi)\,|\,x)/\Pi_{\Psi}(B_{% \lambda}(\psi))=\lim_{\lambda\rightarrow 0}RB_{\Psi}(B_{\lambda}(\psi)\,|\,x)=% RB_{\Psi}(\psi\,|\,x)roman_lim start_POSTSUBSCRIPT italic_λ → 0 end_POSTSUBSCRIPT roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) | italic_x ) / roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) ) = roman_lim start_POSTSUBSCRIPT italic_λ → 0 end_POSTSUBSCRIPT italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) | italic_x ) = italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x )

for every ψ.𝜓\psi.italic_ψ . If RBΨ(ψ|x)>c,𝑅subscript𝐵Ψconditional𝜓𝑥𝑐RB_{\Psi}(\psi\,|\,x)>c,italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) > italic_c , there exists λ0subscript𝜆0\lambda_{0}italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that for all λ<λ0,𝜆subscript𝜆0\lambda<\lambda_{0},italic_λ < italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , then ΠΨ(RBΨ(Bλ(ψ)|x)>c\Pi_{\Psi}(RB_{\Psi}(B_{\lambda}(\psi)\,|\,x)>croman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) | italic_x ) > italic_c and this implies that ψliminfλ0Sλ,c(x).𝜓subscriptinfimum𝜆0subscript𝑆𝜆𝑐𝑥\psi\in\lim\inf_{\lambda\rightarrow 0}S_{\lambda,c}(x).italic_ψ ∈ roman_lim roman_inf start_POSTSUBSCRIPT italic_λ → 0 end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_λ , italic_c end_POSTSUBSCRIPT ( italic_x ) . Now ΠΨ(RBΨ(ψ|x)=c)=0subscriptΠΨ𝑅subscript𝐵Ψconditional𝜓𝑥𝑐0\Pi_{\Psi}(RB_{\Psi}(\psi\,|\,x)=c)=0roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) = italic_c ) = 0 and so Sc(x)liminfλ0Sλ,c(x)subscript𝑆𝑐𝑥subscriptinfimum𝜆0subscript𝑆𝜆𝑐𝑥S_{c}(x)\subset\lim\inf_{\lambda\rightarrow 0}S_{\lambda,c}(x)italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) ⊂ roman_lim roman_inf start_POSTSUBSCRIPT italic_λ → 0 end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_λ , italic_c end_POSTSUBSCRIPT ( italic_x ) (after possibly deleting a set of ΠΨsubscriptΠΨ\Pi_{\Psi}roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT-measure 0 from Sc(x)).S_{c}(x)).italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) ) . If ψlimsupλ0Sλ,c(x),𝜓subscriptsupremum𝜆0subscript𝑆𝜆𝑐𝑥\psi\in\lim\sup_{\lambda\rightarrow 0}S_{\lambda,c}(x),italic_ψ ∈ roman_lim roman_sup start_POSTSUBSCRIPT italic_λ → 0 end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_λ , italic_c end_POSTSUBSCRIPT ( italic_x ) , then RBΨ(Bλ(ψ)|x)c𝑅subscript𝐵Ψconditionalsubscript𝐵𝜆𝜓𝑥𝑐RB_{\Psi}(B_{\lambda}(\psi)\,|\,x)\geq citalic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_B start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_ψ ) | italic_x ) ≥ italic_c for infinitely many λ0,𝜆0\lambda\rightarrow 0,italic_λ → 0 , which implies that RBΨ(ψ|x)c,𝑅subscript𝐵Ψconditional𝜓𝑥𝑐RB_{\Psi}(\psi\,|\,x)\geq c,italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_c , and therefore ψ𝜓absent\psi\initalic_ψ ∈ Sc(x).subscript𝑆𝑐𝑥S_{c}(x).italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) . This proves Sc(x)=limλ0Sλ,c(x)subscript𝑆𝑐𝑥subscript𝜆0subscript𝑆𝜆𝑐𝑥S_{c}(x)=\lim_{\lambda\rightarrow 0}S_{\lambda,c}(x)italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) = roman_lim start_POSTSUBSCRIPT italic_λ → 0 end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_λ , italic_c end_POSTSUBSCRIPT ( italic_x ) (up to a set of ΠΨsubscriptΠΨ\Pi_{\Psi}roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT-measure 0) so that limλ0ΠΨ(Sλ,c(x)ΔSc(x)|x)=0subscript𝜆0subscriptΠΨconditionalsubscript𝑆𝜆𝑐𝑥Δsubscript𝑆𝑐𝑥𝑥0\lim_{\lambda\rightarrow 0}\Pi_{\Psi}(S_{\lambda,c}(x)\Delta S_{c}(x)\,|\,x)=0roman_lim start_POSTSUBSCRIPT italic_λ → 0 end_POSTSUBSCRIPT roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c end_POSTSUBSCRIPT ( italic_x ) roman_Δ italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_x ) | italic_x ) = 0 for any c.𝑐c.italic_c .

Let cλ,γ(x)=sup{c0:ΠΨ(Sλ,c(x)|x)γ}subscript𝑐𝜆𝛾𝑥supremumconditional-set𝑐0subscriptΠΨconditionalsubscript𝑆𝜆𝑐𝑥𝑥𝛾c_{\lambda,\gamma}(x)=\sup\{c\geq 0:\Pi_{\Psi}(S_{\lambda,c}(x)\,|\,x)\geq\gamma\}italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) = roman_sup { italic_c ≥ 0 : roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c end_POSTSUBSCRIPT ( italic_x ) | italic_x ) ≥ italic_γ } so Scγ(x)(x)=CΨ,γ(x),subscript𝑆subscript𝑐𝛾𝑥𝑥subscript𝐶Ψ𝛾𝑥S_{c_{\gamma}(x)}(x)=C_{\Psi,\gamma}(x),italic_S start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) = italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) , Sλ,cλ,γ(x)(x)=CΨ,λ,γ(x)subscript𝑆𝜆subscript𝑐𝜆𝛾𝑥𝑥subscript𝐶Ψ𝜆𝛾𝑥S_{\lambda,c_{\lambda,\gamma}(x)}(x)=C_{\Psi,\lambda,\gamma}(x)\ italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) = italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x )and

ΠΨ(CΨ,γ(x)ΔCΨ,λ,γ(x)|x)=ΠΨ(Scγ(x)(x)ΔSλ,cλ,γ(x)(x)|x)subscriptΠΨconditionalsubscript𝐶Ψ𝛾𝑥Δsubscript𝐶Ψ𝜆𝛾𝑥𝑥subscriptΠΨconditionalsubscript𝑆subscript𝑐𝛾𝑥𝑥Δsubscript𝑆𝜆subscript𝑐𝜆𝛾𝑥𝑥𝑥\displaystyle\Pi_{\Psi}(C_{\Psi,\gamma}(x)\Delta C_{\Psi,\lambda,\gamma}(x)\,|% \,x)=\Pi_{\Psi}(S_{c_{\gamma}(x)}(x)\Delta S_{\lambda,c_{\lambda,\gamma}(x)}(x% )\,|\,x)roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) roman_Δ italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) = roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) roman_Δ italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x )
ΠΨ(Scγ(x)(x)ΔSλ,cγ(x)(x)|x)+ΠΨ(Sλ,cλ,γ(x)(x)ΔSλ,cγ(x)(x)|x).absentsubscriptΠΨconditionalsubscript𝑆subscript𝑐𝛾𝑥𝑥Δsubscript𝑆𝜆subscript𝑐𝛾𝑥𝑥𝑥subscriptΠΨconditionalsubscript𝑆𝜆subscript𝑐𝜆𝛾𝑥𝑥Δsubscript𝑆𝜆subscript𝑐𝛾𝑥𝑥𝑥\displaystyle\leq\Pi_{\Psi}(S_{c_{\gamma}(x)}(x)\Delta S_{\lambda,c_{\gamma}(x% )}(x)\,|\,x)+\Pi_{\Psi}(S_{\lambda,c_{\lambda,\gamma}(x)}(x)\Delta S_{\lambda,% c_{\gamma}(x)}(x)\,|\,x).≤ roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) roman_Δ italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) + roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) roman_Δ italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) . (27)

Since Scγ(x)(x)=limλ0Sλ,cγ(x)(x)subscript𝑆subscript𝑐𝛾𝑥𝑥subscript𝜆0subscript𝑆𝜆subscript𝑐𝛾𝑥𝑥S_{c_{\gamma}(x)}(x)=\lim_{\lambda\rightarrow 0}S_{\lambda,c_{\gamma}(x)}(x)italic_S start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) = roman_lim start_POSTSUBSCRIPT italic_λ → 0 end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) then ΠΨ(Scγ(x)(x)ΔSλ,cγ(x)(x)|x)0subscriptΠΨconditionalsubscript𝑆subscript𝑐𝛾𝑥𝑥Δsubscript𝑆𝜆subscript𝑐𝛾𝑥𝑥𝑥0\Pi_{\Psi}(S_{c_{\gamma}(x)}(x)\Delta S_{\lambda,c_{\gamma}(x)}(x)\,|\,x)\rightarrow 0roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) roman_Δ italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) → 0 and ΠΨ(Sλ,cγ(x)(x)|x)ΠΨ(Scγ(x)(x)|x)=γsubscriptΠΨconditionalsubscript𝑆𝜆subscript𝑐𝛾𝑥𝑥𝑥subscriptΠΨconditionalsubscript𝑆subscript𝑐𝛾𝑥𝑥𝑥𝛾\Pi_{\Psi}(S_{\lambda,c_{\gamma}(x)}(x)\,|\,x)\rightarrow\Pi_{\Psi}(S_{c_{% \gamma}(x)}(x)\,|\,x)=\gammaroman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) → roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) = italic_γ as λ0.𝜆0\lambda\rightarrow 0.italic_λ → 0 . Now consider the second term in (27). Since RBΨ(ψ|x)𝑅subscript𝐵Ψconditional𝜓𝑥RB_{\Psi}(\psi\,|\,x)italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) has a continuous posterior distribution, ΠΨ(RBΨ(ψ|x)c|x)subscriptΠΨ𝑅subscript𝐵Ψconditional𝜓𝑥conditional𝑐𝑥\Pi_{\Psi}(RB_{\Psi}(\psi\,|\,x)\geq c\,|\,x)roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_R italic_B start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_ψ | italic_x ) ≥ italic_c | italic_x ) is continuous in c.𝑐c.italic_c . Let ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 and note that for all λ𝜆\lambdaitalic_λ small enough, ΠΨ(Sλ,cγϵ(x)(x)|x)<γsubscriptΠΨconditionalsubscript𝑆𝜆subscript𝑐𝛾italic-ϵ𝑥𝑥𝑥𝛾\Pi_{\Psi}(S_{\lambda,c_{\gamma-\epsilon}(x)}(x)\,|\,x)<\gammaroman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ - italic_ϵ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) < italic_γ and ΠΨ(Sλ,cγ+ϵ(x)(x)|x)>γsubscriptΠΨconditionalsubscript𝑆𝜆subscript𝑐𝛾italic-ϵ𝑥𝑥𝑥𝛾\Pi_{\Psi}(S_{\lambda,c_{\gamma+\epsilon}(x)}(x)\,|\,x)>\gammaroman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ + italic_ϵ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) > italic_γ which implies that cγ+ϵ(x)cλ,γ(x)cγϵ(x)subscript𝑐𝛾italic-ϵ𝑥subscript𝑐𝜆𝛾𝑥subscript𝑐𝛾italic-ϵ𝑥c_{\gamma+\epsilon}(x)\leq c_{\lambda,\gamma}(x)\leq c_{\gamma-\epsilon}(x)italic_c start_POSTSUBSCRIPT italic_γ + italic_ϵ end_POSTSUBSCRIPT ( italic_x ) ≤ italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) ≤ italic_c start_POSTSUBSCRIPT italic_γ - italic_ϵ end_POSTSUBSCRIPT ( italic_x ) and therefore Sλ,cγ+ϵ(x)(x)Sλ,cλ,γ(x)Sλ,cγϵ(x)(x).subscript𝑆𝜆subscript𝑐𝛾italic-ϵ𝑥𝑥subscript𝑆𝜆subscript𝑐𝜆𝛾𝑥subscript𝑆𝜆subscript𝑐𝛾italic-ϵ𝑥𝑥S_{\lambda,c_{\gamma+\epsilon}(x)}(x)\subset S_{\lambda,c_{\lambda,\gamma}(x)}% \subset S_{\lambda,c_{\gamma-\epsilon}(x)}(x).italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ + italic_ϵ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) ⊂ italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ⊂ italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ - italic_ϵ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) . As Sλ,cλ,γ(x)(x)Sλ,cγ(x)(x)subscript𝑆𝜆subscript𝑐𝜆𝛾𝑥𝑥subscript𝑆𝜆subscript𝑐𝛾𝑥𝑥S_{\lambda,c_{\lambda,\gamma}(x)}(x)\subset S_{\lambda,c_{\gamma}(x)}(x)italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) ⊂ italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) or Sλ,cλ,γ(x)(x)Sλ,cγ(x)(x)subscript𝑆𝜆subscript𝑐𝛾𝑥𝑥subscript𝑆𝜆subscript𝑐𝜆𝛾𝑥𝑥S_{\lambda,c_{\lambda,\gamma}(x)}(x)\supset S_{\lambda,c_{\gamma}(x)}(x)italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) ⊃ italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) then

ΠΨ(Sλ,cλ,γ(x)(x)ΔSλ,cγ(x)(x)|x)=|ΠΨ(Sλ,cλ,γ(x)(x)|x)ΠΨ(Sλ,cγ(x)(x)|x)|.\Pi_{\Psi}(S_{\lambda,c_{\lambda,\gamma}(x)}(x)\Delta S_{\lambda,c_{\gamma}(x)% }(x)\,|\,x)=|\Pi_{\Psi}(S_{\lambda,c_{\lambda,\gamma}(x)}(x)\,|\,x)-\Pi_{\Psi}% (S_{\lambda,c_{\gamma}(x)}(x)\,|\,x)|.roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) roman_Δ italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) = | roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) - roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) | .

For all λ𝜆\lambdaitalic_λ small, then |ΠΨ(Sλ,cλ,γ(x)(x)|x)ΠΨ(Sλ,cγ(x)(x)|x)||\Pi_{\Psi}(S_{\lambda,c_{\lambda,\gamma}(x)}(x)\,|\,x)-\Pi_{\Psi}(S_{\lambda,% c_{\gamma}(x)}(x)\,|\,x)|| roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) - roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) | is bounded above by

max{|ΠΨ(Sλ,cγ+ϵ(x)(x)|x)ΠΨ(Sλ,cγ(x)(x)|x)|,\displaystyle\max\{|\Pi_{\Psi}(S_{\lambda,c_{\gamma+\epsilon}(x)}(x)\,|\,x)-% \Pi_{\Psi}(S_{\lambda,c_{\gamma}(x)}(x)\,|\,x)|,roman_max { | roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ + italic_ϵ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) - roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) | ,
|ΠΨ(Sλ,cγϵ(x)(x)|x)ΠΨ(Sλ,cγ(x)(x)|x)|}\displaystyle\qquad|\Pi_{\Psi}(S_{\lambda,c_{\gamma-\epsilon}(x)}(x)\,|\,x)-% \Pi_{\Psi}(S_{\lambda,c_{\gamma}(x)}(x)\,|\,x)|\}| roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ - italic_ϵ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) - roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_λ , italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ( italic_x ) | italic_x ) | }

and this upper bound converges to ϵitalic-ϵ\epsilonitalic_ϵ as λ0.𝜆0\lambda\rightarrow 0.italic_λ → 0 . Since ϵitalic-ϵ\epsilonitalic_ϵ is arbitrary this implies that the second term in (27) goes to 0 as λ0𝜆0\lambda\rightarrow 0italic_λ → 0 and this proves the result.

Proof of Theorem 11: Suppose, without loss of generality that 0<γ<1.0𝛾10<\gamma<1.0 < italic_γ < 1 . Let ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 and δ>0𝛿0\delta>0italic_δ > 0 satisfy γ+δ1.𝛾𝛿1\gamma+\delta\leq 1.italic_γ + italic_δ ≤ 1 . Put γ(λ,γ)=ΠΨ(CΨ,λ,γ(x)|x),γ′′(λ,γ)superscript𝛾𝜆𝛾subscriptΠΨconditionalsubscript𝐶Ψ𝜆𝛾𝑥𝑥superscript𝛾′′𝜆𝛾\gamma^{\prime}(\lambda,\gamma)=\Pi_{\Psi}(C_{\Psi,\lambda,\gamma}(x)\,|\,x),% \gamma^{\prime\prime}(\lambda,\gamma)italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) = roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) , italic_γ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ )
=ΠΨ(CΨ,λ,γ+δ(x)|x)absentsubscriptΠΨconditionalsubscript𝐶Ψ𝜆𝛾𝛿𝑥𝑥=\Pi_{\Psi}(C_{\Psi,\lambda,\gamma+\delta}(x)\,|\,x)= roman_Π start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ + italic_δ end_POSTSUBSCRIPT ( italic_x ) | italic_x ) and note that γ(λ,γ)γ,γ′′(λ,γ)γ+δ.formulae-sequencesuperscript𝛾𝜆𝛾𝛾superscript𝛾′′𝜆𝛾𝛾𝛿\gamma^{\prime}(\lambda,\gamma)\geq\gamma,\gamma^{\prime\prime}(\lambda,\gamma% )\geq\gamma+\delta.italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) ≥ italic_γ , italic_γ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) ≥ italic_γ + italic_δ . By Theorem 10 CΨ,λ,γ(x)CΨ,γ(x)subscript𝐶Ψ𝜆𝛾𝑥subscript𝐶Ψ𝛾𝑥C_{\Psi,\lambda,\gamma}(x)\rightarrow C_{\Psi,\gamma}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) → italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) and CΨ,λ,γ+δ(x)CΨ,γ+δ(x)subscript𝐶Ψ𝜆𝛾𝛿𝑥subscript𝐶Ψ𝛾𝛿𝑥C_{\Psi,\lambda,\gamma+\delta}(x)\rightarrow C_{\Psi,\gamma+\delta}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ + italic_δ end_POSTSUBSCRIPT ( italic_x ) → italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ + italic_δ end_POSTSUBSCRIPT ( italic_x ) as λ0𝜆0\lambda\rightarrow 0italic_λ → 0 so γ(λ,γ)γsuperscript𝛾𝜆𝛾𝛾\gamma^{\prime}(\lambda,\gamma)\rightarrow\gammaitalic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) → italic_γ and γ′′(λ,γ)γ+δsuperscript𝛾′′𝜆𝛾𝛾𝛿\gamma^{\prime\prime}(\lambda,\gamma)\rightarrow\gamma+\deltaitalic_γ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) → italic_γ + italic_δ as λ0.𝜆0\lambda\rightarrow 0.italic_λ → 0 . This implies that there is a λ0(δ)subscript𝜆0𝛿\lambda_{0}(\delta)italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_δ ) such that for all λ<λ0(δ)𝜆subscript𝜆0𝛿\lambda<\lambda_{0}(\delta)italic_λ < italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_δ ) then γ(λ,γ)<γ′′(λ,γ).superscript𝛾𝜆𝛾superscript𝛾′′𝜆𝛾\gamma^{\prime}(\lambda,\gamma)<\gamma^{\prime\prime}(\lambda,\gamma).italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) < italic_γ start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) . Therefore, by Theorem 9, we have that for all λ<λ0(δ)𝜆subscript𝜆0𝛿\lambda<\lambda_{0}(\delta)italic_λ < italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_δ )

CΨ,λ,γ(x)liminfη0Dη,λ,γ(λ,γ)(x)limsupη0Dη,λ,γ(λ,γ)(x)CΨ,λ,γ+δ(x).subscript𝐶Ψ𝜆𝛾𝑥𝜂0infimumsubscript𝐷𝜂𝜆superscript𝛾𝜆𝛾𝑥𝜂0supremumsubscript𝐷𝜂𝜆superscript𝛾𝜆𝛾𝑥subscript𝐶Ψ𝜆𝛾𝛿𝑥C_{\Psi,\lambda,\gamma}(x)\subset\underset{\eta\rightarrow 0}{\lim\inf\,}D_{% \eta,\lambda,\gamma^{\prime}(\lambda,\gamma)}(x)\subset\underset{\eta% \rightarrow 0}{\lim\sup\,}D_{\eta,\lambda,\gamma^{\prime}(\lambda,\gamma)}(x)% \subset C_{\Psi,\lambda,\gamma+\delta}(x).italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ end_POSTSUBSCRIPT ( italic_x ) ⊂ start_UNDERACCENT italic_η → 0 end_UNDERACCENT start_ARG roman_lim roman_inf end_ARG italic_D start_POSTSUBSCRIPT italic_η , italic_λ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) end_POSTSUBSCRIPT ( italic_x ) ⊂ start_UNDERACCENT italic_η → 0 end_UNDERACCENT start_ARG roman_lim roman_sup end_ARG italic_D start_POSTSUBSCRIPT italic_η , italic_λ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) end_POSTSUBSCRIPT ( italic_x ) ⊂ italic_C start_POSTSUBSCRIPT roman_Ψ , italic_λ , italic_γ + italic_δ end_POSTSUBSCRIPT ( italic_x ) . (28)

From (28) and Theorem 10 we have that

CΨ,γ(x)subscript𝐶Ψ𝛾𝑥\displaystyle C_{\Psi,\gamma}(x)italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) liminfλ0liminfη0Dη,λ,γ(λ,γ)(x)absent𝜆0infimum𝜂0infimumsubscript𝐷𝜂𝜆superscript𝛾𝜆𝛾𝑥\displaystyle\subset\underset{\lambda\rightarrow 0}{\lim\inf\,}\underset{\eta% \rightarrow 0}{\lim\inf\,}D_{\eta,\lambda,\gamma^{\prime}(\lambda,\gamma)}(x)⊂ start_UNDERACCENT italic_λ → 0 end_UNDERACCENT start_ARG roman_lim roman_inf end_ARG start_UNDERACCENT italic_η → 0 end_UNDERACCENT start_ARG roman_lim roman_inf end_ARG italic_D start_POSTSUBSCRIPT italic_η , italic_λ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) end_POSTSUBSCRIPT ( italic_x )
limsupλ0limsupη0Dη,λ,γ(λ,γ)(x)CΨ,γ+δ(x).absent𝜆0supremum𝜂0supremumsubscript𝐷𝜂𝜆superscript𝛾𝜆𝛾𝑥subscript𝐶Ψ𝛾𝛿𝑥\displaystyle\subset\underset{\lambda\rightarrow 0}{\lim\sup}\underset{\eta% \rightarrow 0}{\,\lim\sup\,}D_{\eta,\lambda,\gamma^{\prime}(\lambda,\gamma)}(x% )\subset C_{\Psi,\gamma+\delta}(x).⊂ start_UNDERACCENT italic_λ → 0 end_UNDERACCENT start_ARG roman_lim roman_sup end_ARG start_UNDERACCENT italic_η → 0 end_UNDERACCENT start_ARG roman_lim roman_sup end_ARG italic_D start_POSTSUBSCRIPT italic_η , italic_λ , italic_γ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_λ , italic_γ ) end_POSTSUBSCRIPT ( italic_x ) ⊂ italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ + italic_δ end_POSTSUBSCRIPT ( italic_x ) .

Since limδ0CΨ,γ+δ(x)=CΨ,γ(x)subscript𝛿0subscript𝐶Ψ𝛾𝛿𝑥subscript𝐶Ψ𝛾𝑥\lim_{\delta\rightarrow 0}C_{\Psi,\gamma+\delta}(x)=C_{\Psi,\gamma}(x)roman_lim start_POSTSUBSCRIPT italic_δ → 0 end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ + italic_δ end_POSTSUBSCRIPT ( italic_x ) = italic_C start_POSTSUBSCRIPT roman_Ψ , italic_γ end_POSTSUBSCRIPT ( italic_x ) this establishes the result.

References

  • [1] Al-Labadi, L., Alzaatreh, A. and Evans, M. (2024) How to measure evidence and its strength: Bayes factors or relative belief ratios? arXiv:2301.08994
  • [2] Al-Labadi, L. and Evans, M. (2017) Optimal robustness results for some Bayesian procedures and the relationship to prior-data conflict. Bayesian Analysis 12, 3, 702-728.
  • [3] Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer.
  • [4] Bernardo, J. M. (2005). Intrinsic credible regions: an objective Bayesian approach to interval estimation. Test, 14(2):317–384. With comments and a rejoinder by the author.
  • [5] Bernardo, J. M. and Smith, A. F. M. (2000). Bayesian Theory. Wiley Series in Probability and Statistics. John Wiley & Sons Ltd., New York. Paperback.
  • [6] Birnbaum, A. (1962) On the foundations of statistical inference (with discussion). Journal of the American Statistical Association. 57 (298), 269–326.
  • [7] Evans, M. (2015) Measuring Statistical Evidence Using Relative Belief. Chapman & Hall/CRC Monographs on Statistics & Applied Probability.
  • [8] Evans, M. (2024) The concept of statistical evidence: historical roots and current developments. arXiv:2406.05843.
  • [9] Evans, M. and Guo, Y. (2021) Measuring and controlling bias for some Bayesian inferences and the relation to frequentist criteria. Entropy, 23(2), 190, doi: 10.3390/e23020190.
  • [10] Evans, M. J., Guttman, I., and Swartz, T. (2006). Optimality and computations for relative surprise inferences. Canad. J. Statist, 34(1):113-129.
  • [11] Evans, M. and Jang, G-H. (2011). Weak informativity and the information in one prior relative to another. Statistical Science, Vol. 26, No. 3, 423-439.
  • [12] Evans, M. and Moshonov, H. (2006) Checking for prior-data conflict. Bayesian Analysis, 1, 4, 893-914.
  • [13] Evans, M. and Shakhatreh, M. (2008). Optimal properties of some Bayesian inferences. Electron. J. Stat., 2, 1268–1280.
  • [14] Le Cam, L. (1953). On some asymptotic properties of maximum likelihood estimates and related Bayes’ estimates. Univ. California Publ. Statist, 1, 277–329.
  • [15] Nott,D., Wang, X., Evans, M., and Englert, B-G. (2020) Checking for prior-data conflict using prior to posterior divergences. Statistical Science, 35, 2, 234-253.
  • [16] Robert, C. P. (1996). Intrinsic losses. Theory and Decision, 40, 191-214.
  • [17] Royall, R. M. (1997). Statistical Evidence: A likelihood paradigm. Chapman & Hall.
  • [18] Rudin, W. (1974). Real and Complex Analysis. McGraw Hill, New York.
  • [19] Savage, L. J. (1971). The Foundations of Statistics. Dover Publications.