Corrected Correlation Estimates for Meta-Analysis

Alexander Johnson-Vázquez The authors gratefully acknowledge the Bill & Melinda Gates Foundation. Institute for Health Metrics and Evaluation, Seattle Department of Applied Mathematics, University of Washington, Seattle Alexander W. Hsu Institute for Health Metrics and Evaluation, Seattle Department of Applied Mathematics, University of Washington, Seattle Aleksandr Aravkin Institute for Health Metrics and Evaluation, Seattle Department of Applied Mathematics, University of Washington, Seattle Peng Zheng Institute for Health Metrics and Evaluation, Seattle
Abstract

Meta-analysis aggregates estimates and uncertainty across multiple studies, summarizing individual reports into aggregate results that are frequently used to inform health policy and recommendations. When a given study reports multiple estimates, such as log odds ratios (ORs) or log relative risks (RRs) across different exposure groups, accounting for within-study estimate correlations is improves both efficiency of meta-analytic estimates and provides more accurate estimates of uncertainty. The canonical approaches of Greenland and Longnecker (1992) and Hamling et al. (2008) construct pseudo-cases and non-cases for exposure groups to estimate correlations of reported within-study estimates. However, currently availble implementations for both methods can fail on simple examples.

We review both GL and Hamling methods through the lens of optimization. For ORs, we provide modifications of each approach that ensure convergence for any feasible inputs. For GL, this is achieved through a new connection to entropic minimization. For Hamling, a modification leads to a provably solvable equivalent set of equations given a specific initialization. For each, we provide implementations guaranteed to work for any feasible input.

For RRs, we show the new GL approach is always guaranteed to succeed. We derive counter-examples where the Hamling approach does not admit any solutions. For the special RR case where the variances are all equal, we derive a necessary and sufficient condition for success.


Keywords: meta-analysis, correlated observations, convex optimization, nonlinear equations

1 Introduction

Meta-analysis combines results reported by multiple studies to obtain aggregate results and estimate between-study heterogeneity Haidich (2010). Meta-analytic results inform public health recommendations, underscoring the importance of accuracy in meta-analytic methods (Deeks et al., 2019)[Chapter 10]. Understanding dose-response relationships across different ranges of exposure poses particular challenges Orsini et al. (2012); Liu et al. (2009); Crippa et al. (2019); Zheng et al. (2022). Dose-response meta-analysis seeks to quantify the impact of a continuous risk, such as systolic blood pressure (Razo et al., 2022), smoking (Dai et al., 2022), meat (Lescinsky et al., 2022) or vegetables (Stanaway et al., 2022) consumed, on the risk of an outcome, e.g. lung cancer or heart disease, by aggregating available estimates for different exposure groups across many studies.

Two of the most common types of estimates are adjusted odds ratios and relative risks (Schmidt and Kohlmann, 2008). Because these estimates always share a common reference group, the estimates for different exposure levels are correlated. Estimating relationships without correcting for these correlations is inefficient and under-estimates the variance of the resulting coefficients (Greenland and Longnecker, 1992, Appendix (1)). We show the potential impact of the adjustment, as well as a real-world example, in Section 2.

In short, it is crucial for meta-analyses to adjust for within-study correlation. Since we are blind to the adjustment mechanism of reported odds ratios (ORs) and relative risks (RRs), we do not have access to the true underlying covariance matrix between reported estimates. If the adjusted estimates are produced through a regression, then an estimated covariance matrix would be available. However, this estimated covariance matrix is generally not reported. As we have access only to the reported metadata, we must accurately construct this covariance matrix. In their groundbreaking work, Greenland and Longnecker (1992) showed that it is possible to estimate within-study correlations, and use them to approximate the covariance matrix. The GL approach requires the modeler to provide the total number of subjects at each exposure level (both treatment and control), the total number of cases, and adjusted treatment effects at each exposure level, such as log ORs or log RRs. Using this information, the GL approach uses a root-finding algorithm to obtain pseudo-case counts for every exposure that match reported estimates, and then uses the pseudo-counts to estimate asymptotic within-study correlations. These correlations inform downstream analyses, accounting for the impact of a common reference group explicitly before estimating study-specific random effects through mixed-effects modeling.

Following the work of Greenland and Longnecker (1992), Hamling et al. (2008) also use reported estimates to get pseudo-counts of cases versus non-cases. However, Hamling et al. (2008) directly use the standard errors of the reported estimates rather than requiring modelers to obtain subject counts at each exposure level. The Hamling approach requires only two additional pieces of information beyond the estimates and their variances: the ratio of unexposed controls to total exposed controls, and the ratio of all controls to all cases. Hamling et al. (2008) fit pseudo-cell counts to the available data, and given pseudo-cell counts, the correlation estimators are the same as those of GL.

These methods are widely used in the community; for example, the meta-analysis R package dosresmeta (Crippa and Orsini, 2016) implements both correlation estimators in their Covariance function that creates the within-study covariance matrix. Despite the wide use of both methods, past research stopped short of providing guarantees of success given feasible inputs. In fact, both Greenland and Longnecker (1992) and Hamling et al. (2008) discussed numerical instability, citing occasional failures and the need to re-initialize as needed. As originally presented, and as currently implemented in Crippa and Orsini (2016), both methods fail on simple modifications to the input data from working examples.

Here, we fill the current gap, providing robust GL and Hamling methods guaranteed to work for all feasible inputs on the OR problem, including our generated failure modes that can break the current implementation Crippa and Orsini (2016). To do this, we study each approach using an optimization perspective. For GL, we show the root-finding problem of Greenland and Longnecker (1992) is equivalent to a convex minimization problem in both the OR and RR settings. Convexity allows us to prove existence and uniqueness of results, and use disciplined convex programming (DCP) (Boyd and Vandenberghe, 2004) to remove any decisions by the user regarding initialization and to provide state-of-the-art numerical solving techniques. We provide an implementation using cvxpy that is guaranteed to return the unique solution (Diamond and Boyd, 2016; Agrawal et al., 2018). For Hamling, in the case of OR, we develop an equivalent set of nonlinear equations, and prove that these equivalent formulations are always solvable. We provide a Python implementation that, in practice, converges for all inputs. For RR, we show that in fact the Hamling approach may fail, provide a counter-example where there is no solution, and provide a sufficient condition on solvability for reported RR’s in the case where reported variances are all equal. Our implementation also covers the RR case but provides an informative warning to the modeler should the model fail to find a root.

Roadmap.

In Section 2, we provide theoretical and empirical motivation for adjusting for within-study correlation, which may be useful for readers new to the topic. We review the work of Greenland and Longnecker (1992) and Hamling et al. (2008) in Section 3. We develop the necessary innovations to robustify each method and provide theoretical guarantees in Sections 4 and 5. Finally, in Section 6, we present numerical illustrations showing our methods provide identical results to those of Greenland and Longnecker (1992) and Hamling et al. (2008) when the original methods converge, and provide correct results for inputs that break currently available implementation. We also present a counter-example in the RR regime that has no solution for the Hamling approach.

2 Motivation for Correlation Correction

Before we review existing methods and introduce our updated techniques for correcting for within-study correlation, we motivate the necessity of such methods. We show that considering differences in means with respect to a reference group always induces a nonzero correlation reported estimates. Building on this example, we construct a toy simulation that shows the potential impact of failing to account for this correlation. We show a simple example from a real-world study in peripheral artery disease in which we observe high correlation between estimates, leading to a significant difference in the slope of the dose-response relationship between adjusted and unadjusted estimates. Finally, we briefly describe implications of the correlation correction for meta-analysis.

2.1 Theoretical Motivation

Consider measurements {x1i}i=1n1superscriptsubscriptsuperscriptsubscript𝑥1𝑖𝑖1subscript𝑛1{\left\{x_{1}^{i}\right\}}_{i=1}^{n_{1}}{ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT,{x2j}j=1n2superscriptsubscriptsuperscriptsubscript𝑥2𝑗𝑗1subscript𝑛2{\left\{x_{2}^{j}\right\}}_{j=1}^{n_{2}}{ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT from two different treatment groups and measurements from a reference group,{x0l}l=1n0superscriptsubscriptsuperscriptsubscript𝑥0𝑙𝑙1subscript𝑛0{\left\{x_{0}^{l}\right\}}_{l=1}^{n_{0}}{ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where nksubscript𝑛𝑘n_{k}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the number of samples in group k{0,1,2}𝑘012k\in\{0,1,2\}italic_k ∈ { 0 , 1 , 2 }. Here, we assume that each xkisuperscriptsubscript𝑥𝑘𝑖x_{k}^{i}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is independently distributed according to a Gaussian distribution distinct for each k𝑘kitalic_k, i.e.,

xkiiid𝒩(μk,σk2)superscriptsubscript𝑥𝑘𝑖iidsimilar-to𝒩subscript𝜇𝑘superscriptsubscript𝜎𝑘2x_{k}^{i}\overset{\mathrm{iid}}{\sim}\mathcal{N}(\mu_{k},\sigma_{k}^{2})italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT overroman_iid start_ARG ∼ end_ARG caligraphic_N ( italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )

for nonzero μk,σksubscript𝜇𝑘subscript𝜎𝑘\mu_{k},\sigma_{k}italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Without loss of generality, assume μ^2>μ^0subscript^𝜇2subscript^𝜇0\hat{\mu}_{2}>\hat{\mu}_{0}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and μ^1>μ^0subscript^𝜇1subscript^𝜇0\hat{\mu}_{1}>\hat{\mu}_{0}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

We define the empirical mean estimator as

μ^k=1nki=1nkxkisubscript^𝜇𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑥𝑘𝑖\hat{\mu}_{k}=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}x_{k}^{i}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT

and seek to estimate the difference in means between the treatment groups and reference group, constructing the estimators η^1=(μ^1μ^0)subscript^𝜂1subscript^𝜇1subscript^𝜇0\hat{\eta}_{1}=(\hat{\mu}_{1}-\hat{\mu}_{0})over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and η^2=(μ^2μ^0)subscript^𝜂2subscript^𝜇2subscript^𝜇0\hat{\eta}_{2}=(\hat{\mu}_{2}-\hat{\mu}_{0})over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ( over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). The reference group induces a positive correlation between these estimators, as shown below:

Cov(η^1,η^2)Covsubscript^𝜂1subscript^𝜂2\displaystyle\mathrm{Cov}(\hat{\eta}_{1},\hat{\eta}_{2})roman_Cov ( over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =𝔼[η^1η^2]𝔼[η^1]𝔼[η^2]absent𝔼delimited-[]subscript^𝜂1subscript^𝜂2𝔼delimited-[]subscript^𝜂1𝔼delimited-[]subscript^𝜂2\displaystyle=\mathbb{E}{\left[\hat{\eta}_{1}\hat{\eta}_{2}\right]}-\mathbb{E}% {\left[\hat{\eta}_{1}\right]}\mathbb{E}{\left[\hat{\eta}_{2}\right]}= blackboard_E [ over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] - blackboard_E [ over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] blackboard_E [ over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ]
=𝔼[(μ^1μ^0)(μ^2μ^0)]𝔼[(μ^1μ^0)]𝔼[(μ^2μ^0)]absent𝔼delimited-[]subscript^𝜇1subscript^𝜇0subscript^𝜇2subscript^𝜇0𝔼delimited-[]subscript^𝜇1subscript^𝜇0𝔼delimited-[]subscript^𝜇2subscript^𝜇0\displaystyle=\mathbb{E}{\left[(\hat{\mu}_{1}-\hat{\mu}_{0})(\hat{\mu}_{2}-% \hat{\mu}_{0})\right]}-\mathbb{E}{\left[(\hat{\mu}_{1}-\hat{\mu}_{0})\right]}% \mathbb{E}{\left[(\hat{\mu}_{2}-\hat{\mu}_{0})\right]}= blackboard_E [ ( over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ( over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] - blackboard_E [ ( over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] blackboard_E [ ( over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ]
=𝔼[μ^1μ^2μ^1μ^0μ^2μ^0+μ^02](μ1μ0)(μ2μ0)absent𝔼delimited-[]subscript^𝜇1subscript^𝜇2subscript^𝜇1subscript^𝜇0subscript^𝜇2subscript^𝜇0superscriptsubscript^𝜇02subscript𝜇1subscript𝜇0subscript𝜇2subscript𝜇0\displaystyle=\mathbb{E}{\left[\hat{\mu}_{1}\hat{\mu}_{2}-\hat{\mu}_{1}\hat{% \mu}_{0}-\hat{\mu}_{2}\hat{\mu}_{0}+\hat{\mu}_{0}^{2}\right]}-(\mu_{1}-\mu_{0}% )(\mu_{2}-\mu_{0})= blackboard_E [ over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ( italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
=𝔼[μ^1μ^2]𝔼[μ^1μ^0]𝔼[μ^2μ^0]+𝔼[μ^02](μ1μ2μ1μ0μ2μ0+μ02)absent𝔼delimited-[]subscript^𝜇1subscript^𝜇2𝔼delimited-[]subscript^𝜇1subscript^𝜇0𝔼delimited-[]subscript^𝜇2subscript^𝜇0𝔼delimited-[]superscriptsubscript^𝜇02subscript𝜇1subscript𝜇2subscript𝜇1subscript𝜇0subscript𝜇2subscript𝜇0superscriptsubscript𝜇02\displaystyle=\mathbb{E}{\left[\hat{\mu}_{1}\hat{\mu}_{2}\right]}-\mathbb{E}{% \left[\hat{\mu}_{1}\hat{\mu}_{0}\right]}-\mathbb{E}{\left[\hat{\mu}_{2}\hat{% \mu}_{0}\right]}+\mathbb{E}{\left[\hat{\mu}_{0}^{2}\right]}-{\left(\mu_{1}\mu_% {2}-\mu_{1}\mu_{0}-\mu_{2}\mu_{0}+\mu_{0}^{2}\right)}= blackboard_E [ over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] - blackboard_E [ over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] - blackboard_E [ over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ] + blackboard_E [ over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - ( italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
=𝔼[μ^02]μ02absent𝔼delimited-[]superscriptsubscript^𝜇02superscriptsubscript𝜇02\displaystyle=\mathbb{E}{\left[\hat{\mu}_{0}^{2}\right]}-\mu_{0}^{2}= blackboard_E [ over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] - italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=σ02/n0.absentsuperscriptsubscript𝜎02subscript𝑛0\displaystyle=\sigma_{0}^{2}/n_{0}.= italic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

The correlation is driven by the variance of the mean of the reference group. By independence, the variance of the estimators themselves is given by

𝕍[η^1]=σ02n0+σ12n1,𝕍[η^2]=σ02n0+σ22n2formulae-sequence𝕍delimited-[]subscript^𝜂1superscriptsubscript𝜎02subscript𝑛0superscriptsubscript𝜎12subscript𝑛1𝕍delimited-[]subscript^𝜂2superscriptsubscript𝜎02subscript𝑛0superscriptsubscript𝜎22subscript𝑛2\mathbb{V}[\hat{\eta}_{1}]=\frac{\sigma_{0}^{2}}{n_{0}}+\frac{\sigma_{1}^{2}}{% n_{1}},\quad\mathbb{V}[\hat{\eta}_{2}]=\frac{\sigma_{0}^{2}}{n_{0}}+\frac{% \sigma_{2}^{2}}{n_{2}}blackboard_V [ over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = divide start_ARG italic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , blackboard_V [ over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = divide start_ARG italic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG

where 𝕍𝕍\mathbb{V}blackboard_V is the variance operator. Thus, the smaller the reference group, and the larger its intrinsic variance, the larger the induced correlation between η^1subscript^𝜂1\hat{\eta}_{1}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and η^2subscript^𝜂2\hat{\eta}_{2}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Using this data and assuming there is a true, linear effect across groups β𝛽\betaitalic_β, we may seek to estimate β𝛽\betaitalic_β through least-squares regression. The two methods we observe are generalized least squares (GLS) and ordinary least squares (OLS). We set X𝑋Xitalic_X to be the appended vector X=({x0l}l=1n0,{x1i}i=1n1,{x2k}k=1n2)𝑋superscriptsuperscriptsubscriptsuperscriptsubscript𝑥0𝑙𝑙1subscript𝑛0superscriptsubscriptsuperscriptsubscript𝑥1𝑖𝑖1subscript𝑛1superscriptsubscriptsuperscriptsubscript𝑥2𝑘𝑘1subscript𝑛2topX={\left({\left\{x_{0}^{l}\right\}}_{l=1}^{n_{0}},{\left\{x_{1}^{i}\right\}}_{% i=1}^{n_{1}},{\left\{x_{2}^{k}\right\}}_{k=1}^{n_{2}}\right)}^{\top}italic_X = ( { italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , { italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT and also set η^=(η^1,η^2)^𝜂subscript^𝜂1subscript^𝜂2\hat{\eta}={\left(\hat{\eta}_{1},\hat{\eta}_{2}\right)}over^ start_ARG italic_η end_ARG = ( over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). Thus, we may construct the estimator β^corsubscript^𝛽cor\hat{\beta}_{\mathrm{cor}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT for β𝛽\betaitalic_β to be

β^cor=(XC1X)1XC1η^subscript^𝛽corsuperscriptsuperscript𝑋topsuperscript𝐶1𝑋1superscript𝑋topsuperscript𝐶1^𝜂\hat{\beta}_{\mathrm{cor}}={\left(X^{\top}C^{-1}X\right)}^{-1}X^{\top}C^{-1}% \hat{\eta}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT = ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_η end_ARG

to be the GLS estimate, where we are accounting for correlation between η^1,η^2subscript^𝜂1subscript^𝜂2\hat{\eta}_{1},\hat{\eta}_{2}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, which we know must exist (Kariya and Kurata, 2004). Here, C𝐶Citalic_C is the covariance matrix of the estimates η^1,η^2subscript^𝜂1subscript^𝜂2\hat{\eta}_{1},\hat{\eta}_{2}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with entries defined as above. Similarly, we construct the estimate β^OLSsubscript^𝛽OLS\hat{\beta}_{\mathrm{OLS}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_OLS end_POSTSUBSCRIPT to β𝛽\betaitalic_β to be

β^OLS=(XX)1Xη^.subscript^𝛽OLSsuperscriptsuperscript𝑋top𝑋1superscript𝑋top^𝜂\hat{\beta}_{\mathrm{OLS}}={\left(X^{\top}X\right)}^{-1}X^{\top}\hat{\eta}.over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_OLS end_POSTSUBSCRIPT = ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_η end_ARG .

Note that this OLS estimator does not account for correlation and amounts to assuming the independence of η^1,η^2subscript^𝜂1subscript^𝜂2\hat{\eta}_{1},\hat{\eta}_{2}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

In evaluating these two estimators, it is easy to show that both β^cor,β^OLSsubscript^𝛽corsubscript^𝛽OLS\hat{\beta}_{\mathrm{cor}},\hat{\beta}_{\mathrm{OLS}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT , over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_OLS end_POSTSUBSCRIPT are unbiased. By construction, we have

𝕍[β^cor]𝕍delimited-[]subscript^𝛽cor\displaystyle\mathbb{V}{\left[\hat{\beta}_{\mathrm{cor}}\right]}blackboard_V [ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT ] =(XC1X)1absentsuperscriptsuperscript𝑋topsuperscript𝐶1𝑋1\displaystyle={\left(X^{\top}C^{-1}X\right)}^{-1}= ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT
𝕍[β^OLS]𝕍delimited-[]subscript^𝛽OLS\displaystyle\mathbb{V}{\left[\hat{\beta}_{\mathrm{OLS}}\right]}blackboard_V [ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_OLS end_POSTSUBSCRIPT ] =(XX)1XCX(XX)1absentsuperscriptsuperscript𝑋top𝑋1superscript𝑋top𝐶𝑋superscriptsuperscript𝑋top𝑋1\displaystyle={\left(X^{\top}X\right)}^{-1}X^{\top}CX{\left(X^{\top}X\right)}^% {-1}= ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_C italic_X ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT

as the variance estimators. From the generalized Gauss-Markov theorem (Kariya and Kurata, 2004), it follows that β^corsubscript^𝛽cor\hat{\beta}_{\mathrm{cor}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT is optimal among all linear, unbiased estimators and asymptotically efficient. In particular, 𝕍[β^cor]𝕍[β^OLS]𝕍delimited-[]subscript^𝛽cor𝕍delimited-[]subscript^𝛽OLS\mathbb{V}{\left[\hat{\beta}_{\mathrm{cor}}\right]}\leq\mathbb{V}{\left[\hat{% \beta}_{\mathrm{OLS}}\right]}blackboard_V [ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT ] ≤ blackboard_V [ over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_OLS end_POSTSUBSCRIPT ], with strict inequality whenever C𝐶Citalic_C is not diagonal. This explains the advantage of GLS estimation according to problems of this class. A theme of the present work is that the covariance matrix C𝐶Citalic_C is not always known. This further illustrates the necessity of develo** good approximation techniques to C𝐶Citalic_C so that estimators downstream remain more precise.

In the next section we illustrate the impact of this correlation on the efficiency of the estimator for the overall relationship computed from multiple reported estimates. In the context of meta regression, we would often consider both of the exposure effect estimates η^1subscript^𝜂1\hat{\eta}_{1}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, η^2subscript^𝜂2\hat{\eta}_{2}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in conjunction with data from other studies to estimate effects as a function of exposure level.

2.2 Numerical Illustration

Refer to caption
Figure 1: The empirical distribution of the β^^𝛽\hat{\beta}over^ start_ARG italic_β end_ARG estimate in a numerical simulation when estimated using GLS with a weighting covariance matrix (blue; β^corsubscript^𝛽cor\hat{\beta}_{\mathrm{cor}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT) versus when estimated using OLS, assuming no correlation (orange; β^OLSsubscript^𝛽OLS\hat{\beta}_{\mathrm{OLS}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_OLS end_POSTSUBSCRIPT). The true value of β=1𝛽1\beta=1italic_β = 1 prescribed before the simulation is given by the vertical line.

Here, we create a simplified simulation to show the impact of adjusting for the correlation between the mean estimator levels η^isubscript^𝜂𝑖\hat{\eta}_{i}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. In our simulation, using the notation from above, we have n0=1subscript𝑛01n_{0}=1italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 with 4 exposure levels and n1==n4=10subscript𝑛1subscript𝑛410n_{1}=\dots=n_{4}=10italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ⋯ = italic_n start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = 10. We prescribe a true value of β=1𝛽1\beta=1italic_β = 1 and seek to estimate this population value according to the data.

After assigning all the initial count data, we construct our estimates η^1,,η^4subscript^𝜂1subscript^𝜂4\hat{\eta}_{1},\dots,\hat{\eta}_{4}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. Using the exposure levels as the standard exogenous variable in the regression, we then have all the relevant data. We compare the estimates β^corsubscript^𝛽cor\hat{\beta}_{\mathrm{cor}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT and β^OLSsubscript^𝛽OLS\hat{\beta}_{\mathrm{OLS}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_OLS end_POSTSUBSCRIPT to β𝛽\betaitalic_β using the GLS and OLS formulations as constructed above, with the relevant dimensional differences applied. The results from 5,000 realizations are shown in Figure 1. Both estimators are unbiased, but β^corsubscript^𝛽cor\hat{\beta}_{\mathrm{cor}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT is has a much smaller variance than β^OLSsubscript^𝛽OLS\hat{\beta}_{\mathrm{OLS}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_OLS end_POSTSUBSCRIPT.

The simulation is relevant to the situations considered by Greenland and Longnecker (1992) and Hamling et al. (2008), since log\logroman_log OR and RR estimates are created with respect to the same reference group per study. The main differences are that we have explicit access to the full covariance between our reported estimators, while in meta-analytic settings the correlation is hidden and must be inferred–this is the core problem that correlation correction estimators seek to solve.

We end the section with a real-world example where the adjustment makes a big difference to summarizing the study.

2.3 Real example: blood pressure and peripheral artery disease.

We provide a brief example of a real-world study where the correlation-adjusted estimates are significantly different from the estimates obtained when independence of the estimates are assumed. Itoga et al. (2018) study the impact of blood pressure on peripheral artery disease (PAD), reporting results by subgroups of exposure. We assume a linear relationship between SBP and relative risk of PAD, and visualize the weighted least squares (WLS) and correlation-corrected GLS regressions in Figure 2. In the meta-analytic setting, studies typically report standard errors. We compare to a naive WLS estimator β^WLSsubscript^𝛽WLS\hat{\beta}_{\mathrm{WLS}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_WLS end_POSTSUBSCRIPT where residuals are weighted by the inverse of the reported standard errors. Mathematically, setting V𝑉Vitalic_V to be the diagonal matrix whose diagonal elements are the reported variances for each exposure level, we have the following estimator:

β^WLS=(XV1X)1XV1η^,subscript^𝛽WLSsuperscriptsuperscript𝑋topsuperscript𝑉1𝑋1superscript𝑋topsuperscript𝑉1^𝜂\hat{\beta}_{\mathrm{WLS}}={\left(X^{\top}V^{-1}X\right)}^{-1}X^{\top}V^{-1}% \hat{\eta},over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_WLS end_POSTSUBSCRIPT = ( italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_η end_ARG ,

where, in this case, η^^𝜂\hat{\eta}over^ start_ARG italic_η end_ARG is the vector of log ORs for each exposure. Using the metadata, we do not have access to the true covariance matrix C𝐶Citalic_C; we estimate the covariance matrix used in GLS by the method of  Greenland and Longnecker (1992).

The x𝑥xitalic_x-axis shows SBP, while the y𝑦yitalic_y-axis gives the log relative risk. The blue dots show reported adjusted odds ratios, plotted at the mid-points of the exposure groups reported by the paper.

Refer to caption
Figure 2: The regression lines generated by the slope coefficients β^corsubscript^𝛽cor\hat{\beta}_{\mathrm{cor}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT and β^WLSsubscript^𝛽WLS\hat{\beta}_{\mathrm{WLS}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_WLS end_POSTSUBSCRIPT (solid lines). The dashed line is produced by the slope β^corsubscript^𝛽cor\hat{\beta}_{\mathrm{cor}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT whose intercept is allowed to be non-zero. The baseline exposure is the reference level of exposure considered by the study, used to create relevant OR estimates.

The solid lines correspond to β^corsubscript^𝛽cor\hat{\beta}_{\mathrm{cor}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT (blue) and β^WLSsubscript^𝛽WLS\hat{\beta}_{\mathrm{WLS}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_WLS end_POSTSUBSCRIPT (olive) in Figure 2, which are required to pass through the red ‘origin’ point corresponding to the reference group with midpoint at SBP. The WLS estimate β^WLSsubscript^𝛽WLS\hat{\beta}_{\mathrm{WLS}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_WLS end_POSTSUBSCRIPT appears to fit the data more closely than β^corsubscript^𝛽cor\hat{\beta}_{\mathrm{cor}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT. However, the line produced by β^corsubscript^𝛽cor\hat{\beta}_{\mathrm{cor}}over^ start_ARG italic_β end_ARG start_POSTSUBSCRIPT roman_cor end_POSTSUBSCRIPT provides a better estimate for the slope. We can think of it as ‘adjusting’ for the fact that variance in the reference group results propagates to all non-reference points, shifting them up and down together. We illustrate this by including the dashed line in Figure 2. This is the correlation-corrected estimate shifted to the non-reference data–it should capture the trend of the data more accurately than the WLS estimate.

In the case of SBP vs. PAD, adjusting for within-study correlation would give a higher estimate of overall risk for that study. While it is impossible to know ‘truth’ for any given study, the simulation in Figure 1 serves as a reminder that although both the WLS and GLS estimates are unbiased, the WLS has much higher variance.

We proceed to consider the case of meta-analysis where multiple studies are observed and discuss implications of correcting for correlation in that setting.

2.4 Implications for Meta-Analysis

A general description of likelihood formulations for meta-analysis is developed by Zheng et al. (2021). Taking the simplest example, consider the statistical model for aggregating multiple reported result vectors ηisubscript𝜂𝑖\eta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with specific effects for study i𝑖iitalic_i:

η^i=Xiβ+𝟏ui+ϵi,subscript^𝜂𝑖subscript𝑋𝑖𝛽1subscript𝑢𝑖subscriptitalic-ϵ𝑖\hat{\eta}_{i}=X_{i}\beta+\mathbf{1}u_{i}+\epsilon_{i},over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β + bold_1 italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where ϵi𝒩(0,Vi)similar-tosubscriptitalic-ϵ𝑖𝒩0subscript𝑉𝑖\epsilon_{i}\sim\mathcal{N}(0,V_{i})italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) describes the observation errors for study i𝑖iitalic_i, while uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a scalar realization of a random effect distributed as 𝒩(0,γ)𝒩0𝛾\mathcal{N}(0,\gamma)caligraphic_N ( 0 , italic_γ ) where γ𝛾\gammaitalic_γ represents between-study heterogeneity. The ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are independent across i𝑖iitalic_i, and also from each other. This model applies the realization of the specific effect to all observations from study i𝑖iitalic_i, hence the vector 1 that copies uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to impact every element of η^isubscript^𝜂𝑖\hat{\eta}_{i}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The variance of the error term is given by:

𝕍[𝟏ui+ϵi]=Vi+γ𝟏𝟏T.𝕍delimited-[]1subscript𝑢𝑖subscriptitalic-ϵ𝑖subscript𝑉𝑖𝛾superscript11𝑇\mathbb{V}[\mathbf{1}u_{i}+\epsilon_{i}]=V_{i}+\gamma\mathbf{1}\mathbf{1}^{T}.blackboard_V [ bold_1 italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ bold_11 start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT .

The maximum likelihood estimate for β𝛽\betaitalic_β and γ𝛾\gammaitalic_γ is then given by solving

minβ,γi(Xiβη^i)T(Vi+γ𝟏𝟏T)(Xiβη^i)+12log|Vi+γ𝟏𝟏T|subscript𝛽𝛾subscript𝑖superscriptsubscript𝑋𝑖𝛽subscript^𝜂𝑖𝑇subscript𝑉𝑖𝛾superscript11𝑇subscript𝑋𝑖𝛽subscript^𝜂𝑖12subscript𝑉𝑖𝛾superscript11𝑇\min_{\beta,\gamma}\sum_{i}(X_{i}\beta-\hat{\eta}_{i})^{T}\left(V_{i}+\gamma% \mathbf{1}\mathbf{1}^{T}\right)(X_{i}\beta-\hat{\eta}_{i})+\frac{1}{2}\log|V_{% i}+\gamma\mathbf{1}\mathbf{1}^{T}|roman_min start_POSTSUBSCRIPT italic_β , italic_γ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β - over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ bold_11 start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_β - over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_log | italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_γ bold_11 start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT |

From the likelihood expression, we can observe that meta-analysis effectively quantifies the extent to which the reported variances Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT do not represent the inherent variance in the data, and adjust through augmenting with the between-study heterogeneity variance γ𝛾\gammaitalic_γ. Using only the reported variances corresponds to assuming that each reported Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is diagonal, so any correlation is left to meta-analysis to discover, with a single parameter γ𝛾\gammaitalic_γ. In fact, as shown in the previous sections, within-study correlations are induced by the shared reference group, and the extent this happens can vary by study (for example, a study with a very large reference group will have less correlation than a study with a small reference group). As a result, providing correlated Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT will leave γ𝛾\gammaitalic_γ to capture the variances of the unknown study-specific effects, exactly as intended, rather than trying to capture all the residual correlations.

With the motivation established, we proceed to review methods that are actually used to estimate and compute the correlation used for the correction in this section. For the remainder of the paper, we focus on reliability and accuracy of the correlation correction methods.

3 Methods of GL and Hamling

In this section, we present the approaches of GL and Hamling. In this review section, we focus on log ORs to vastly simplify presentation; however our robust methods in Sections 4 and 5 cover both log ORs and log RRs. Special challenges and counter-examples for the Hamling approach in the RR case are also presented in Section 5.

We start by defining key variables following original notation, see Table 1.

Table 1: Notation and method requirements table.
Variable Dimension Definition Used by
n𝑛nitalic_n 1111 number of alternative exposure levels -
x𝑥xitalic_x n𝑛nitalic_n alternative exposure levels -
N𝑁Nitalic_N n+1𝑛1n+1italic_n + 1 total subjects at all exposures GL
M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 1111 total cases GL
L𝐿Litalic_L n𝑛nitalic_n estimates of log-odds GL, H
V𝑉Vitalic_V n𝑛nitalic_n reported variances for log-odds H
R𝑅Ritalic_R n𝑛nitalic_n estimates of log-risks GL, H
VRsuperscript𝑉𝑅V^{R}italic_V start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT n𝑛nitalic_n reported variances for log-risks H
A𝐴Aitalic_A n𝑛nitalic_n cases for alternative exposures -
a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 1111 cases for reference exposure -
B𝐵Bitalic_B n𝑛nitalic_n non-cases for alternative exposures -
b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 1111 non-cases for reference exposure -
p𝑝pitalic_p 1111 ratio of unexposed controls to total controls H
z𝑧zitalic_z 1111 ratio of total controls to total cases H

M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the sum of all elements of A𝐴Aitalic_A and a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. For both GL and Hamling, the goal is to estimate A,a0,B𝐴subscript𝑎0𝐵A,a_{0},Bitalic_A , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_B, and b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Following Greenland and Longnecker (1992) and Hamling et al. (2008), we refer to the first element in the vector N𝑁Nitalic_N as n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and the remaining elements as N+subscript𝑁N_{+}italic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. We always have that A+B=N+𝐴𝐵subscript𝑁A+B=N_{+}italic_A + italic_B = italic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and a0+b0=n0subscript𝑎0subscript𝑏0subscript𝑛0a_{0}+b_{0}=n_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. We also include the data requirements by each study. More details are given in the following sections for how the data are used. Here, H is shorthand for the Hamling method. With notation established, we summarize the main goal of the GL and Hamling methods.

3.1 Correlation and Covariance

The main goal of both GL and Hamling methods is to obtain a variance-covariance matrix, replacing a diagonal matrix of reported variances with an updated variance-covariance matrix with the same variances and estimated correlations. In particular, both methods estimate the correlation for two log ORs at two different exposures xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT by

rxi,xj=1/a0+1/b01/a0+1/b0+1/Ai+1/Bi1/a0+1/b0+1/Aj+1/Bjsubscript𝑟subscript𝑥𝑖subscript𝑥𝑗1subscript𝑎01subscript𝑏01subscript𝑎01subscript𝑏01subscript𝐴𝑖1subscript𝐵𝑖1subscript𝑎01subscript𝑏01subscript𝐴𝑗1subscript𝐵𝑗r_{x_{i},x_{j}}=\frac{1/a_{0}+1/b_{0}}{\sqrt{1/a_{0}+1/b_{0}+1/A_{i}+1/B_{i}}% \sqrt{1/a_{0}+1/b_{0}+1/A_{j}+1/B_{j}}}italic_r start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 / italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 / italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG 1 / italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 / italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 / italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + 1 / italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG square-root start_ARG 1 / italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 / italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 / italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + 1 / italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG end_ARG (1)

where Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represent controls, and the correlation for two log RRs at these exposures by

rxi,xj=1/a01/b01/a01/b0+1/Ai1/Bi1/a01/b0+1/Aj1/Bjsubscript𝑟subscript𝑥𝑖subscript𝑥𝑗1subscript𝑎01subscript𝑏01subscript𝑎01subscript𝑏01subscript𝐴𝑖1subscript𝐵𝑖1subscript𝑎01subscript𝑏01subscript𝐴𝑗1subscript𝐵𝑗r_{x_{i},x_{j}}=\frac{1/a_{0}-1/b_{0}}{\sqrt{1/a_{0}-1/b_{0}+1/A_{i}-1/B_{i}}% \sqrt{1/a_{0}-1/b_{0}+1/A_{j}-1/B_{j}}}italic_r start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 / italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - 1 / italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG 1 / italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - 1 / italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 / italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 / italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG square-root start_ARG 1 / italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - 1 / italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 / italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - 1 / italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG end_ARG (2)

where Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represent totals. The final variance-covariance matrix is obtained by appropriately scaling these correlations using the reported variances. There is a degree of freedom in the pseudo-counts that factors out of the correlation formulas: all pseudo-counts can be multiplied by a constant value and the correlations would not change in either the OR or the RR case.

Finally it may help to alert the reader to the key difference between the Hamling and GL approaches by observing that by construction of the Hamling approach, the pseudo-counts successfully obtained by that method (for either RRs or ORs) satisfy

rxi,xj=1/a0+1/b0ViVjsubscript𝑟subscript𝑥𝑖subscript𝑥𝑗1subscript𝑎01subscript𝑏0subscript𝑉𝑖subscript𝑉𝑗r_{x_{i},x_{j}}=\frac{1/a_{0}+1/b_{0}}{\sqrt{V_{i}V_{j}}}italic_r start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 / italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + 1 / italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG end_ARG

where Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Vjsubscript𝑉𝑗V_{j}italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are the variances reported for the estimates. This equality need not hold for the pseudo-counts inferred by the GL approach, which uses group counts in place of reported variances. This difference is discussed explicitly in the following sections.

3.2 GL Newton Method

Algorithm 1 Greenland and Longnecker Algorithm
0:  M1,N,Lsubscript𝑀1𝑁𝐿M_{1},N,Litalic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_N , italic_L, Initialize A𝐴Aitalic_A
1:  difference1difference1\mathrm{difference}\leftarrow 1roman_difference ← 1
2:  while difference1e4difference1𝑒4\mathrm{difference}\geq 1e-4roman_difference ≥ 1 italic_e - 4 do
3:     A+sum(A)subscript𝐴sum𝐴A_{+}\leftarrow\mathrm{sum}(A)italic_A start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ← roman_sum ( italic_A )
4:     a0M1A+subscript𝑎0subscript𝑀1subscript𝐴a_{0}\leftarrow M_{1}-A_{+}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT + end_POSTSUBSCRIPT
5:     b0n0a0subscript𝑏0subscript𝑛0subscript𝑎0b_{0}\leftarrow n_{0}-a_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
6:     BN+A𝐵subscript𝑁𝐴B\leftarrow N_{+}-Aitalic_B ← italic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT - italic_A
7:     c01a0+1b0subscript𝑐01subscript𝑎01subscript𝑏0c_{0}\leftarrow\frac{1}{a_{0}}+\frac{1}{b_{0}}italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG
8:     c1A+1B𝑐1𝐴1𝐵c\leftarrow\frac{1}{A}+\frac{1}{B}italic_c ← divide start_ARG 1 end_ARG start_ARG italic_A end_ARG + divide start_ARG 1 end_ARG start_ARG italic_B end_ARG {Element-wise inverse}
9:     eL+log(a0)+log(B)log(A)log(b0)𝑒𝐿subscript𝑎0𝐵𝐴subscript𝑏0e\leftarrow L+\log(a_{0})+\log(B)-\log(A)-\log(b_{0})italic_e ← italic_L + roman_log ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + roman_log ( italic_B ) - roman_log ( italic_A ) - roman_log ( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) {Element-wise log\logroman_log}
10:     H𝐻absentH\leftarrowitalic_H ← matrix of size n×n𝑛𝑛n\times nitalic_n × italic_n whose diagonal elements are c+c0𝑐subscript𝑐0c+c_{0}italic_c + italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and whose off-diagonal elements are c0subscript𝑐0c_{0}italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
11:     AA+H1e𝐴𝐴superscript𝐻1𝑒A\leftarrow A+H^{-1}eitalic_A ← italic_A + italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_e
12:     differenceH1e2differencesubscriptnormsuperscript𝐻1𝑒2\mathrm{difference}\leftarrow{\left\|H^{-1}e\right\|}_{2}roman_difference ← ∥ italic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_e ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
13:  end while

The GL approach uses reported estimates, total counts, and the total number of cases to find pseudo-counts in each category to match reported log-OR or log-RR estimates using an iterative root-finding method given in Algorithm 1. Indeed, Algorithm 1 is exactly Newton’s method for root-finding, applied to find pseudo-counts such that plug-in estimates from the pseudo-counts match those of the adjusted estimates reported in the original study. Once A,B,a0,b0𝐴𝐵subscript𝑎0subscript𝑏0A,B,a_{0},b_{0}italic_A , italic_B , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are found, the Greenland and Longnecker (1992) uses these values to calculate the correlation coefficient rijsubscript𝑟𝑖𝑗r_{ij}italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT on log OR estimates Lisubscript𝐿𝑖L_{i}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Ljsubscript𝐿𝑗L_{j}italic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT using (1), as well as covariances

Cij=rij(ViVj)1/2.subscript𝐶𝑖𝑗subscript𝑟𝑖𝑗superscriptsubscript𝑉𝑖subscript𝑉𝑗12C_{ij}=r_{ij}{\left(V_{i}V_{j}\right)}^{1/2}.italic_C start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT .

For an arbitrary multi-variable function f:nn:𝑓superscript𝑛superscript𝑛f:\mathbb{R}^{n}\to\mathbb{R}^{n}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the Newton iteration is given by

xk+1=xk[Jf(xk)]1f(xk)subscript𝑥𝑘1subscript𝑥𝑘superscriptdelimited-[]subscript𝐽𝑓subscript𝑥𝑘1𝑓subscript𝑥𝑘x_{k+1}=x_{k}-{\left[J_{f}(x_{k})\right]}^{-1}f(x_{k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - [ italic_J start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (3)

where Jfsubscript𝐽𝑓J_{f}italic_J start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT is defined to be the Jacobian matrix of f𝑓fitalic_f, comprising partial derivatives (Gautschi, 1997). Newton’s method is locally convergent; meaning that when the initial iterate x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is “close enough” to a root, (3) will eventually find it; however, getting close enough can be tricky (Süli_Mayers_2003). Global convergence refers to be ability of the algorithm to converge regardless of initialization. Greenland and Longnecker (1992) do not prove global convergence guarantees; and in fact as given in Greenland and Longnecker (1992) and summarized in Algorithm 1, the method can break depending on initialization.

The function g:nn:𝑔superscript𝑛superscript𝑛g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT whose zero we are searching for appears in line 11 of Algorithm 1 and is given by

g(A)=Llog(a0(A))𝟏log(B(A))+log(A)+log(b0(A))𝟏.𝑔𝐴𝐿subscript𝑎0𝐴1𝐵𝐴𝐴subscript𝑏0𝐴1g(A)=-L-\log(a_{0}(A))\mathbf{1}-\log(B(A))+\log(A)+\log(b_{0}(A))\mathbf{1}.italic_g ( italic_A ) = - italic_L - roman_log ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) ) bold_1 - roman_log ( italic_B ( italic_A ) ) + roman_log ( italic_A ) + roman_log ( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) ) bold_1 . (4)

where 𝟏n1superscript𝑛\mathbf{1}\in\mathbb{R}^{n}bold_1 ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the vector of ones of the right dimension, copying the values of the scalar quantity to all coordinates. By construction, a0,Bsubscript𝑎0𝐵a_{0},Bitalic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_B, and b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are all functions of A𝐴Aitalic_A. The Jacobian matrix H𝐻Hitalic_H is contains all the partial derivatives of g(A)𝑔𝐴g(A)italic_g ( italic_A ) and is computed in Algorithm  1. Greenland and Longnecker (1992) suggest using crude estimates to initialize A𝐴Aitalic_A if available, and otherwise using the null expected value: M1N+sum(N)subscript𝑀1subscript𝑁sum𝑁M_{1}\frac{N_{+}}{\mathrm{sum}(N)}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_ARG start_ARG roman_sum ( italic_N ) end_ARG. A priori, convergence is not guaranteed. In Section 6, we explore failure modes of existing implementations.

We show in Section (4) that the function g𝑔gitalic_g in (4) is the gradient of a convex function and recast the rootfinding problem for g𝑔gitalic_g into a convex optimization problem which allows us to robustly compute the GL estimator. This leads to a variety of algorithms with global convergence guarantees, and more simply, to a DCP approach that does not need user-specified initialization and leverages the state-of-the-art open source optimization software cvxpy (Diamond and Boyd, 2016); we make this available to the community.

3.3 Hamling Method

Hamling et al. (2008) extended the work of Greenland and Longnecker (1992), and also construct pseudo-counts A,B,b0,𝐴𝐵subscript𝑏0A,B,b_{0},italic_A , italic_B , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , and a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT using an iterative root-finding method. Once the pseudo-counts are obtained, the correlations across treatment effect exposures and overall covariance matrix are calculated exactly the same as by Greenland and Longnecker (1992). The main difference is that Hamling et al. (2008) only requires estimates and their variances, along with p𝑝pitalic_p and z𝑧zitalic_z from Table 1, discussed in detail below.

The two pieces of information that Hamling requires in addition to estimates and variances are p𝑝pitalic_p and z𝑧zitalic_z, which correspond to the ratio of unexposed controls to total number of controls, and the ratio of total number of controls to total number of cases, respectively. These quantities can be obtained by using crude reported estimates from the study, or from another pathway (e.g. literature) if the study did not report the quantities.

(Hamling et al., 2008, Appendix A) solve for A,B,p,z𝐴𝐵superscript𝑝superscript𝑧A,B,p^{\prime},z^{\prime}italic_A , italic_B , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in terms of a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT:

Ai=1+a0Lib0Vi1a01b0,Bi=1+b0a0LiVi1a01b0,p=b0i=1nBi,z=i=1nBii=1nAi.\displaystyle\begin{split}A_{i}&=\frac{1+\frac{a_{0}L_{i}}{b_{0}}}{V_{i}-\frac% {1}{a_{0}}-\frac{1}{b_{0}}},\quad B_{i}=\frac{1+\frac{b_{0}}{a_{0}L_{i}}}{V_{i% }-\frac{1}{a_{0}}-\frac{1}{b_{0}}},\quad p^{\prime}=\frac{b_{0}}{\sum_{i=1}^{n% }B_{i}},\quad z^{\prime}=\frac{\sum_{i=1}^{n}B_{i}}{\sum_{i=1}^{n}A_{i}}.\end{split}start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL = divide start_ARG 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_ARG , italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_ARG , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG . end_CELL end_ROW (5)

The quantities psuperscript𝑝p^{\prime}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and zsuperscript𝑧z^{\prime}italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are functions of (a0,b0)subscript𝑎0subscript𝑏0(a_{0},b_{0})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), and the main idea of the Hamling method is to match psuperscript𝑝p^{\prime}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, zsuperscript𝑧z^{\prime}italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to the p,z𝑝𝑧p,zitalic_p , italic_z values provided by the study, minimizing the squared differences:

(ppp)2+(zzz)2superscript𝑝superscript𝑝𝑝2superscript𝑧superscript𝑧𝑧2{\left(\frac{p-p^{\prime}}{p}\right)}^{2}+{\left(\frac{z-z^{\prime}}{z}\right)% }^{2}( divide start_ARG italic_p - italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_p end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_z - italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_z end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (6)

The iteration of Hamling, summarized in Algorithm 2, update a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT through the equations (5). Once (a0,b0)subscript𝑎0subscript𝑏0(a_{0},b_{0})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) are found, equations (5) yield all needed pseudo-counts. It is not obvious from Hamling et al. (2008), but a consequence of our work here is equivalent to showing that (6) can always be brought to 00, for all feasible inputs.

Algorithm 2 Hamling Algorithm
0:  p,z,L,v𝑝𝑧𝐿𝑣p,z,L,vitalic_p , italic_z , italic_L , italic_v, Initialize a0,b0subscript𝑎0subscript𝑏0a_{0},b_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
1:  error \leftarrow 1.0
2:  while error1e4error1𝑒4\mathrm{error}\geq 1e-4roman_error ≥ 1 italic_e - 4 do
3:     Ai(a0,b0)(1+a0Lib0)/(Vi1a01b0)subscript𝐴𝑖subscript𝑎0subscript𝑏01subscript𝑎0subscript𝐿𝑖subscript𝑏0subscript𝑉𝑖1subscript𝑎01subscript𝑏0A_{i}(a_{0},b_{0})\leftarrow\left(1+\frac{a_{0}L_{i}}{b_{0}}\right)/\left(V_{i% }-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ← ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )
4:     Bi(a0,b0)(1+b0a0Li)/(Vi1a01b0)subscript𝐵𝑖subscript𝑎0subscript𝑏01subscript𝑏0subscript𝑎0subscript𝐿𝑖subscript𝑉𝑖1subscript𝑎01subscript𝑏0B_{i}(a_{0},b_{0})\leftarrow\left(1+\frac{b_{0}}{a_{0}L_{i}}\right)/\left(V_{i% }-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ← ( 1 + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )
5:     p(a0,b0)b0/(i=1nBi(a0,b0))superscript𝑝subscript𝑎0subscript𝑏0subscript𝑏0superscriptsubscript𝑖1𝑛subscript𝐵𝑖subscript𝑎0subscript𝑏0p^{\prime}(a_{0},b_{0})\leftarrow b_{0}/\left(\sum_{i=1}^{n}B_{i}(a_{0},b_{0})\right)italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ← italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) )
6:     z(a0,b0)(i=1nBi(a0,b0))/(i=1nAi(a0,b0))superscript𝑧subscript𝑎0subscript𝑏0superscriptsubscript𝑖1𝑛subscript𝐵𝑖subscript𝑎0subscript𝑏0superscriptsubscript𝑖1𝑛subscript𝐴𝑖subscript𝑎0subscript𝑏0z^{\prime}(a_{0},b_{0})\leftarrow\left(\sum_{i=1}^{n}B_{i}(a_{0},b_{0})\right)% /\left(\sum_{i=1}^{n}A_{i}(a_{0},b_{0})\right)italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ← ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) / ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) )
7:     error(ppp)2+(zzz)2errorsuperscript𝑝superscript𝑝𝑝2superscript𝑧superscript𝑧𝑧2\mathrm{error}\leftarrow\left(\frac{p-p^{\prime}}{p}\right)^{2}+\left(\frac{z-% z^{\prime}}{z}\right)^{2}roman_error ← ( divide start_ARG italic_p - italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_p end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_z - italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG start_ARG italic_z end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
8:     a0,b0subscript𝑎0subscript𝑏0absenta_{0},b_{0}\leftarrowitalic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← Update {Black Box Optimization routine to shrink error, e.g. Excel or Stata}
9:  end while

Hamling et al. (2008) suggest using the Excel Solve function as a black-box optimizer. However, the solution method turns out to be less important than the choice of equations and their initialization. In Section 5, we show that a modified but equivalent system of nonlinear equations for OR always has a solution. In contrast, the original formulation does not have any such guarantees, and Hamling et al. (2008) discuss the need to use different starting points to ensure converge in specific instances. In Section 6, we give specific, simple examples where the method as given in Algorithm (2) fail to converge to a solution for a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (returning negative counts A𝐴Aitalic_A or B𝐵Bitalic_B), while the method of Section 5 succeeds. In the RR case, we show that it is in fact possible for the Hamling approach to catastrophically fail, which we discuss in detail in Section 5.

4 Convex Optimization Formulation of GL

In this section, we develop a robust GL approach by establishing that g(A)𝑔𝐴g(A)italic_g ( italic_A ) used in the root-finding Newton method of Algorithm (1) is the gradient of a convex function. We show that the convex model of interest is a sum of entropic distance functions for both log ORs and log-RRs. We begin with log ORs.

4.1 GL: Odds Ratios

Recall the function g(A)𝑔𝐴g(A)italic_g ( italic_A ) that is the focus of the Newton’s root finding method proposed by Greenland and Longnecker (1992):

g(A)=Llog(a0(A))𝟏log(B(A))+log(A)+log(b0(A))𝟏.𝑔𝐴𝐿subscript𝑎0𝐴1𝐵𝐴𝐴subscript𝑏0𝐴1g(A)=-L-\log(a_{0}(A))\mathbf{1}-\log(B(A))+\log(A)+\log(b_{0}(A))\mathbf{1}.italic_g ( italic_A ) = - italic_L - roman_log ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) ) bold_1 - roman_log ( italic_B ( italic_A ) ) + roman_log ( italic_A ) + roman_log ( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) ) bold_1 .

We can find the integral G𝐺Gitalic_G of g𝑔gitalic_g and obtain an objective that corresponds to this gradient:

G(A)=LA+(a0(A)log(a0(A))a0(A))+i=1n(Bi(A)log(Bi(A))Bi(A))+i=1n(Ailog(Ai)Ai)+(b0(A)log(b0(A))b0(A)).𝐺𝐴superscript𝐿top𝐴subscript𝑎0𝐴subscript𝑎0𝐴subscript𝑎0𝐴superscriptsubscript𝑖1𝑛subscript𝐵𝑖𝐴subscript𝐵𝑖𝐴subscript𝐵𝑖𝐴superscriptsubscript𝑖1𝑛subscript𝐴𝑖subscript𝐴𝑖subscript𝐴𝑖subscript𝑏0𝐴subscript𝑏0𝐴subscript𝑏0𝐴\displaystyle\begin{split}G(A)=-L^{\top}A&+{\left(a_{0}(A)\log(a_{0}(A))-a_{0}% (A)\right)}+\sum_{i=1}^{n}{\left(B_{i}(A)\log(B_{i}(A))-B_{i}(A)\right)}\\ &+\sum_{i=1}^{n}{\left(A_{i}\log(A_{i})-A_{i}\right)}+{\left(b_{0}(A)\log(b_{0% }(A))-b_{0}(A)\right)}.\end{split}start_ROW start_CELL italic_G ( italic_A ) = - italic_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A end_CELL start_CELL + ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) roman_log ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) ) - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) ) + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_A ) roman_log ( italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_A ) ) - italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_A ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) roman_log ( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) ) - italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) ) . end_CELL end_ROW (7)

From here, note that we may equivalently solve for the optimal A𝐴Aitalic_A by minimizing the integrated G𝐺Gitalic_G. This is equivalent to finding roots of g𝑔gitalic_g as GL does, since G(A)=g(A)𝐺𝐴𝑔𝐴\nabla G(A)=g(A)∇ italic_G ( italic_A ) = italic_g ( italic_A ) and a function is at an optimal value precisely when its gradient is zero.

Recall that a convex function G𝐺Gitalic_G satisfies (Boyd and Vandenberghe, 2004)

G(λA1+(1λ)A2)λG(A1)+(1λ)G(A2)for all0<λ<1,andA1,A2n.formulae-sequenceformulae-sequence𝐺𝜆subscript𝐴11𝜆subscript𝐴2𝜆𝐺subscript𝐴11𝜆𝐺subscript𝐴2for all0𝜆1andsubscript𝐴1subscript𝐴2superscript𝑛G(\lambda A_{1}+(1-\lambda)A_{2})\leq\lambda G(A_{1})+(1-\lambda)G(A_{2})\quad% \mbox{for all}\quad 0<\lambda<1,\quad\mbox{and}\quad A_{1},A_{2}\in\mathbb{R}^% {n}.italic_G ( italic_λ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_λ ) italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≤ italic_λ italic_G ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + ( 1 - italic_λ ) italic_G ( italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) for all 0 < italic_λ < 1 , and italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT . (8)

A closely related property called strict convexity requires strict inequality in (8) for A1A2subscript𝐴1subscript𝐴2A_{1}\neq A_{2}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≠ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. For a function with continuous derivative, as in our case, the convex property and first-order Taylor series expansion of G𝐺Gitalic_G yield the differential characterization of convexity

G(A2)G(A1)+(A2A1)G(A1)for allA1,A2n.formulae-sequence𝐺subscript𝐴2𝐺subscript𝐴1superscriptsubscript𝐴2subscript𝐴1top𝐺subscript𝐴1for allsubscript𝐴1subscript𝐴2superscript𝑛G(A_{2})\geq G(A_{1})+(A_{2}-A_{1})^{\top}\nabla G(A_{1})\quad\mbox{for all}% \quad A_{1},A_{2}\in\mathbb{R}^{n}.italic_G ( italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≥ italic_G ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + ( italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∇ italic_G ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) for all italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT . (9)

The characterization (9) means that if G(A1)=0𝐺subscript𝐴10\nabla G(A_{1})=0∇ italic_G ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 0, then necessarily

G(A2)G(A1)for allA2n,formulae-sequence𝐺subscript𝐴2𝐺subscript𝐴1for allsubscript𝐴2superscript𝑛G(A_{2})\geq G(A_{1})\quad\mbox{for all}\quad A_{2}\in\mathbb{R}^{n},italic_G ( italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≥ italic_G ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) for all italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ,

that is, g(A1)=0𝑔subscript𝐴10g(A_{1})=0italic_g ( italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 0 guarantees A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT must be the global minimizer of G𝐺Gitalic_G. Moreover, a strictly convex function G𝐺Gitalic_G cannot have more than one global minimum; otherwise, given two such minima, we can use the strict version of (8) to get a point with a lower value for e.g. λ=1/2𝜆12\lambda=1/2italic_λ = 1 / 2.

Finally, for a function with second continuous derivative, non-negative eigenvalues of the Hessian for any A𝐴Aitalic_A in the domain is a sufficient condition for convexity. As already discussed in Section 3.2, the Jacobian matrix H𝐻Hitalic_H of g𝑔gitalic_g, which is exactly the Hessian of G𝐺Gitalic_G, is symmetric positive definite, meaning all eigenvalues are actually positive, which means G𝐺Gitalic_G must be strictly convex (Boyd and Vandenberghe, 2004).

Putting these facts together, the root-finding problem for g𝑔gitalic_g (4) is equivalent to minimizing a strictly convex minimization problem with objective G𝐺Gitalic_G (7). This perspective reveals that the original GL method can be strengthened by using additional structure and safeguards provided by G𝐺Gitalic_G. For example, the simplest safeguard for Newton’s method when minimizing G𝐺Gitalic_G is a step size search that moves in the Newton direction just enough to guarantee a proportional decrease G𝐺Gitalic_G, and adding this element to Algorithm 1 would already provide global convergence guarantees. The optimization problem is given by

min0ANG(A)subscript0𝐴𝑁𝐺𝐴\min_{0\leq A\leq N}\quad G(A)roman_min start_POSTSUBSCRIPT 0 ≤ italic_A ≤ italic_N end_POSTSUBSCRIPT italic_G ( italic_A ) (10)

where G(A)𝐺𝐴G(A)italic_G ( italic_A ) is given in (7). This formulation implicitly maintains domain constraints, that is, non-negativity of A𝐴Aitalic_A, NA𝑁𝐴N-Aitalic_N - italic_A, as well as non-negativity of a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, since the logarithm is only defined on +subscript\mathbb{R}_{+}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. The key element in (7) is the entropic distance function f:[0,)m:𝑓superscript0𝑚f:[0,\infty)^{m}\to\mathbb{R}italic_f : [ 0 , ∞ ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → blackboard_R:

f(x)=xlog(x)x.𝑓𝑥𝑥𝑥𝑥f(x)=x\log(x)-x.italic_f ( italic_x ) = italic_x roman_log ( italic_x ) - italic_x . (11)

As we approach 00, xlog(x)𝑥𝑥x\log(x)italic_x roman_log ( italic_x ) goes to 00, as can be easily seen by using L’Hôpital’s rule. As x𝑥xitalic_x grows large, xlog(x)𝑥𝑥x\log(x)italic_x roman_log ( italic_x ) grows faster than x𝑥xitalic_x, so f(x)𝑓𝑥f(x)\to\inftyitalic_f ( italic_x ) → ∞ as x𝑥x\to\inftyitalic_x → ∞. Finally the entropic function has positive second derivative on its domain

f′′(x)=1/xsuperscript𝑓′′𝑥1𝑥f^{\prime\prime}(x)=1/xitalic_f start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_x ) = 1 / italic_x

so it is strictly convex. Since the sum of convex and strictly convex functions are strictly convex by definition, the entire objective G𝐺Gitalic_G (7) is strictly convex. This implies that any minimizer of G𝐺Gitalic_G (7) must be unique, and it remains to show only that a minimizer exists for G𝐺Gitalic_G.

Theorem 4.1

Suppose N+>A,M1>1A,formulae-sequencesubscript𝑁𝐴subscript𝑀1superscript1top𝐴N_{+}>A,M_{1}>1^{\top}A,italic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT > italic_A , italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 1 start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A , and n0>a0subscript𝑛0subscript𝑎0n_{0}>a_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT according to the variables defined in Table (1). Let L𝐿Litalic_L represent log ORs for the necessary exposure levels and take the elements of L𝐿Litalic_L to be finite. Then the function G(A)𝐺𝐴G(A)italic_G ( italic_A ) (7) always has a unique global minimizer.

For the proof, please see the Appendix 8. From Theorem 4.1, G𝐺Gitalic_G always has a unique minimizer for feasible inputs, undergirding the approach of Greenland and Longnecker (1992). A unique global minimum exists under simple assumptions about problem data, and standard optimization solvers (including gradient, Gauss-Newton, quasi-Newton, and Newton), when properly safeguarded by trust region or line search, will converge to the unique global minimum of G𝐺Gitalic_G for any feasible initialization of A0subscript𝐴0A_{0}italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. In particular, we use disciplined convex programming (Grant et al., 2006) to solve the problem. In Section 6, we show that the root-finding scheme of Greenland and Longnecker (1992) is fragile with respect to initialization, but the new approach is guaranteed to work.

4.2 GL: Relative Risk

We now discuss the changes to apply the approach to log-RR scores. The overall approach and notation (see Table (1)) largely follow the development in the preceding section. R𝑅Ritalic_R, the log RR score, is a function of problem data as given by Greenland and Longnecker (1992):

exp(R)=An0N+a0,R=log(A)log(N+)log(a0)+log(n0).formulae-sequence𝑅𝐴subscript𝑛0subscript𝑁subscript𝑎0𝑅𝐴subscript𝑁subscript𝑎0subscript𝑛0\exp(R)=\frac{An_{0}}{N_{+}a_{0}},\quad R=\log(A)-\log(N_{+})-\log(a_{0})+\log% (n_{0}).roman_exp ( italic_R ) = divide start_ARG italic_A italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG , italic_R = roman_log ( italic_A ) - roman_log ( italic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) - roman_log ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + roman_log ( italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) .

Here, N+subscript𝑁N_{+}italic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT and n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are treated as known quantities, again following Greenland and Longnecker (1992). To recover the pseudo-counts, we look for A,a0𝐴subscript𝑎0A,a_{0}italic_A , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT that are roots of

h(A)=R+log(A)log(N+)log(a0(A))+log(n0).𝐴𝑅𝐴subscript𝑁subscript𝑎0𝐴subscript𝑛0h(A)=-R+\log(A)-\log(N_{+})-\log(a_{0}(A))+\log(n_{0}).italic_h ( italic_A ) = - italic_R + roman_log ( italic_A ) - roman_log ( italic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) - roman_log ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) ) + roman_log ( italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) . (12)

Greenland and Longnecker (1992) suggest an algorithm similar to Algorithm (1) to construct cell counts for A𝐴Aitalic_A and a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Just as in Section 4.1, we cast this root-finding method as a way to solve a convex optimization program based on entropic distance, analogous to (7). Integrating Equation (12), we obtain

H(A)=A(LRlog(N+)+1log(n0))+i=1nAilog(Ai)Ai+a0(A)log(a0(A))a0(A).𝐻𝐴superscript𝐴topsubscript𝐿𝑅subscript𝑁1subscript𝑛0superscriptsubscript𝑖1𝑛subscript𝐴𝑖subscript𝐴𝑖subscript𝐴𝑖subscript𝑎0𝐴subscript𝑎0𝐴subscript𝑎0𝐴H(A)=A^{\top}{\left(-L_{R}-\log(N_{+})+\textbf{1}\log(n_{0})\right)}+\sum_{i=1% }^{n}A_{i}\log(A_{i})-A_{i}+a_{0}(A)\log(a_{0}(A))-a_{0}(A).italic_H ( italic_A ) = italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( - italic_L start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT - roman_log ( italic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) + 1 roman_log ( italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) roman_log ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) ) - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_A ) . (13)

The function H𝐻Hitalic_H is strictly convex, since it is the sum of three linear terms, and n+1𝑛1n+1italic_n + 1 entropic distance functions (see the discussion in Section 4.1). We prove a theorem analogous to Theorem 4.1, showing the existence of a solution under simple assumptions; uniqueness follows from strict convexity.

Theorem 4.2

Suppose M1>1Asubscript𝑀1superscript1top𝐴M_{1}>1^{\top}Aitalic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 1 start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A. Let LRsubscript𝐿𝑅L_{R}italic_L start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT represent log RR ratios for the necessary exposure levels such that LRsubscript𝐿𝑅L_{R}italic_L start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT is finite. Then the function H(A)𝐻𝐴H(A)italic_H ( italic_A ) (13) always has a unique minimizer.

The proof for Theorem 4.2 is in the appendix. In this way, we may construct the optimization problem

minA+nH(A)subscript𝐴superscriptsubscript𝑛𝐻𝐴\min_{A\in\mathbb{R}_{+}^{n}}\quad H(A)roman_min start_POSTSUBSCRIPT italic_A ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_H ( italic_A ) (14)

where H(A)𝐻𝐴H(A)italic_H ( italic_A ) is given in (13). By Theorem 4.2, problem (14) must have a minimizer. A solution to the optimization problem (14) may be found by using any number of optimization methods, and in particular, we can also use disciplined convex programming (Grant et al., 2006) to solve (14), just as in Section 4.1.

It may seem a natural fact that root finding here corresponds to a convex objective, but in our experience this is an exception rather than the rule. To be clear, while minimizing a smooth convex function is often solved by a root-finding procedure on the gradient, the converse rarely holds, that is, a typical root finding problem rarely turns out to correspond to the gradient of a convex model. Case in point: when we consider the Hamling method, we do not have a convex interpretation, and as a result have to essentially use brute force to derive theoretical convergence guarantees. It is also quite fortunate that the convex reformulation works in a very similar way for the GL approach for both RR and OR. Again returning to Hamling, in the case of OR, we can find a counter-example guaranteed to fail. The contrast of GL with Hamling here underscores the rarity of the discovered relationship of the GL approach to convex minimization.

5 Solvabililty of Hamling Method

In Section 3.3, we gave a brief overview of the method of Hamling et al. (2008), which involved formulating and solving nonlinear equations (5) for Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The approach relies on the reported variances rather than group totals to infer pseudo-counts. Besides the estimates and variances, the Hamling approach needs only p𝑝pitalic_p and z𝑧zitalic_z, see Table 1. However, the parametrization using variances make the nonlinear equations of Hamling far more difficult to analyze than the GL approach. The original work Hamling et al. (2008) did not provide any guarantees, and in fact the authors’ numerical examples suggest initialization may be quite important. In this section, we prove that for the OR case, the equations always have a unique positive solution, and when properly initialized, the solution can always be found. In the RR case, the situation is more difficult; we present a counter-example where a solution to the Hamling equations cannot exist, and a partial theoretical result by deriving a sufficient condition for the existence of a solution to Hamling RR in the equivariant case.

5.1 Hamling: Odds Ratios

The quantities that Hamling et al. (2008) use, as functions of the underlying pseudo-counts, are given by:

Ri=Aib0a0Bi,Vi=1a0+1b0+1Ai+1Bi,p=b0i=0nBi,z=i=0nBii=0nAi.formulae-sequencesubscript𝑅𝑖subscript𝐴𝑖subscript𝑏0subscript𝑎0subscript𝐵𝑖formulae-sequencesubscript𝑉𝑖1subscript𝑎01subscript𝑏01subscript𝐴𝑖1subscript𝐵𝑖formulae-sequence𝑝subscript𝑏0superscriptsubscript𝑖0𝑛subscript𝐵𝑖𝑧superscriptsubscript𝑖0𝑛subscript𝐵𝑖superscriptsubscript𝑖0𝑛subscript𝐴𝑖\displaystyle R_{i}=\frac{A_{i}b_{0}}{a_{0}B_{i}},\quad V_{i}=\frac{1}{a_{0}}+% \frac{1}{b_{0}}+\frac{1}{A_{i}}+\frac{1}{B_{i}},\quad p=\frac{b_{0}}{\sum_{i=0% }^{n}B_{i}},\quad z=\frac{\sum_{i=0}^{n}B_{i}}{\sum_{i=0}^{n}A_{i}}.italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , italic_p = divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , italic_z = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG . (15)

Using the substitution Bi=AiB0A0Risubscript𝐵𝑖subscript𝐴𝑖subscript𝐵0superscript𝐴0subscript𝑅𝑖B_{i}=\frac{A_{i}B_{0}}{A^{0}R_{i}}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_A start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG Hamling et al. (2008) obtains Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in terms of a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT:

Bi(a0,b0)subscript𝐵𝑖subscript𝑎0subscript𝑏0\displaystyle B_{i}(a_{0},b_{0})italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =(1+b0a0Ri)/(Vi1a01b0)absent1subscript𝑏0subscript𝑎0subscript𝑅𝑖subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\left(1+\frac{b_{0}}{a_{0}R_{i}}\right)/\left(V_{i}-\frac{1}{a_{% 0}}-\frac{1}{b_{0}}\right)= ( 1 + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )
Ai(a0,b0)subscript𝐴𝑖subscript𝑎0subscript𝑏0\displaystyle A_{i}(a_{0},b_{0})italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =(1+a0Rib0)/(Vi1a01b0)absent1subscript𝑎0subscript𝑅𝑖subscript𝑏0subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\left(1+\frac{a_{0}R_{i}}{b_{0}}\right)/\left(V_{i}-\frac{1}{a_{% 0}}-\frac{1}{b_{0}}\right)= ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )

Note that these equations for Ai,Bisubscript𝐴𝑖subscript𝐵𝑖A_{i},B_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the equations that Hamling et al. (2008) solve for, in terms of a0,b0subscript𝑎0subscript𝑏0a_{0},b_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, in order to match the variances of the pseudo-counts to the reported variances. Though, the authors do not solve these equations explicitly, instead using Algorithm 2 to estimate the changing parameter values a0,b0subscript𝑎0subscript𝑏0a_{0},b_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and update pseudo-counts accordingly.

B+=i=1nBi,A+=i=1nAi.formulae-sequencesubscript𝐵superscriptsubscript𝑖1𝑛subscript𝐵𝑖subscript𝐴superscriptsubscript𝑖1𝑛subscript𝐴𝑖B_{+}=\sum_{i=1}^{n}B_{i},\quad A_{+}=\sum_{i=1}^{n}A_{i}.italic_B start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

Summing across each set of equations for Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT we get

B+subscript𝐵\displaystyle B_{+}italic_B start_POSTSUBSCRIPT + end_POSTSUBSCRIPT =i=1n(1+b0a0Ri)/(Vi1a01b0)absentsuperscriptsubscript𝑖1𝑛1subscript𝑏0subscript𝑎0subscript𝑅𝑖subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\sum_{i=1}^{n}\left(1+\frac{b_{0}}{a_{0}R_{i}}\right)/\left(V_{i% }-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )
A+subscript𝐴\displaystyle A_{+}italic_A start_POSTSUBSCRIPT + end_POSTSUBSCRIPT =i=1n(1+a0Rib0)/(Vi1a01b0)absentsuperscriptsubscript𝑖1𝑛1subscript𝑎0subscript𝑅𝑖subscript𝑏0subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\sum_{i=1}^{n}\left(1+\frac{a_{0}R_{i}}{b_{0}}\right)/\left(V_{i% }-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )

From the definitions of p𝑝pitalic_p and z𝑧zitalic_z we have

B+=1ppb0,A+=1z(1p)B+a0=1zpb0a0.formulae-sequencesubscript𝐵1𝑝𝑝subscript𝑏0subscript𝐴1𝑧1𝑝subscript𝐵subscript𝑎01𝑧𝑝subscript𝑏0subscript𝑎0\displaystyle B_{+}=\frac{1-p}{p}b_{0},\quad A_{+}=\frac{1}{z(1-p)}B_{+}-a_{0}% =\frac{1}{zp}b_{0}-a_{0}.italic_B start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_z ( 1 - italic_p ) end_ARG italic_B start_POSTSUBSCRIPT + end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

Combining these equations together, we get a system of two explicit equations for unknowns a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT:

1ppb01𝑝𝑝subscript𝑏0\displaystyle\frac{1-p}{p}b_{0}divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =i=1n(1+b0a0Ri)/(Vi1a01b0)absentsuperscriptsubscript𝑖1𝑛1subscript𝑏0subscript𝑎0subscript𝑅𝑖subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\sum_{i=1}^{n}\left(1+\frac{b_{0}}{a_{0}R_{i}}\right)/\left(V_{i% }-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) (16)
1zpb0a01𝑧𝑝subscript𝑏0subscript𝑎0\displaystyle\frac{1}{zp}b_{0}-a_{0}divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =i=1n(1+a0Rib0)/(Vi1a01b0)absentsuperscriptsubscript𝑖1𝑛1subscript𝑎0subscript𝑅𝑖subscript𝑏0subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\sum_{i=1}^{n}\left(1+\frac{a_{0}R_{i}}{b_{0}}\right)/\left(V_{i% }-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )

The approach developed here is similar in nature to that of Hamling et al. (2008), but equations (16) are not derived in Hamling et al. (2008). The explicit form of (16) is used to prove the results below, namely that equations (16) always have a solution.

First, we show that a unique positive solution to (16) exists when all the variances are identical, that is, all Vi=vsubscript𝑉𝑖𝑣V_{i}=vitalic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_v. The theorem for this case serves as a base case for the induction in the general result, and also is of interest since the proof technique is direct; we actually find the closed form of the solution.

Theorem 5.1

Suppose all of the Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are equal to the scalar v>0𝑣0v>0italic_v > 0. Then there is a unique positive solution of the equations (16) for any value p(0,1)𝑝01p\in(0,1)italic_p ∈ ( 0 , 1 ) and any value of z>0𝑧0z>0italic_z > 0, and any set of positive estimates Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Let c=a0b0𝑐subscript𝑎0subscript𝑏0c=\frac{a_{0}}{b_{0}}italic_c = divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG. Let r1=i=1n1Risubscript𝑟1superscriptsubscript𝑖1𝑛1subscript𝑅𝑖r_{1}=\sum_{i=1}^{n}\frac{1}{R_{i}}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG and r2=i=1nRisubscript𝑟2superscriptsubscript𝑖1𝑛subscript𝑅𝑖r_{2}=\sum_{i=1}^{n}R_{i}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Then the positive solution to Hamling is given by

c=npznz+npr1z+D2z(np+(1p)r2)𝑐𝑛𝑝𝑧𝑛𝑧𝑛𝑝subscript𝑟1𝑧𝐷2𝑧𝑛𝑝1𝑝subscript𝑟2c=\frac{npz-nz+n-pr_{1}z+\sqrt{D}}{2z(np+(1-p)r_{2})}italic_c = divide start_ARG italic_n italic_p italic_z - italic_n italic_z + italic_n - italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z + square-root start_ARG italic_D end_ARG end_ARG start_ARG 2 italic_z ( italic_n italic_p + ( 1 - italic_p ) italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

where

D=n2p2z22n2pz2+2n2pz+n2z22n2z+n22np2r1z2+2npr1z2+2npr1z+p2r12z24pr1r2z+4r1r2z.𝐷superscript𝑛2superscript𝑝2superscript𝑧22superscript𝑛2𝑝superscript𝑧22superscript𝑛2𝑝𝑧superscript𝑛2superscript𝑧22superscript𝑛2𝑧superscript𝑛22𝑛superscript𝑝2subscript𝑟1superscript𝑧22𝑛𝑝subscript𝑟1superscript𝑧22𝑛𝑝subscript𝑟1𝑧superscript𝑝2superscriptsubscript𝑟12superscript𝑧24𝑝subscript𝑟1subscript𝑟2𝑧4subscript𝑟1subscript𝑟2𝑧D=n^{2}p^{2}z^{2}-2n^{2}pz^{2}+2n^{2}pz+n^{2}z^{2}-2n^{2}z+n^{2}-2np^{2}r_{1}z% ^{2}+2npr_{1}z^{2}+2npr_{1}z+p^{2}r_{1}^{2}z^{2}-4pr_{1}r_{2}z+4r_{1}r_{2}z.italic_D = italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p italic_z + italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z + italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_n italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_n italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_n italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z + 4 italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z .

Once we have c𝑐citalic_c, the solutions to (16) are given by

b0=1v(p1p(n+r1c)+1+1c),a0=cb0.formulae-sequencesubscript𝑏01𝑣𝑝1𝑝𝑛subscript𝑟1𝑐11𝑐subscript𝑎0𝑐subscript𝑏0b_{0}=\frac{1}{v}\left(\frac{p}{1-p}\left(n+\frac{r_{1}}{c}\right)+1+\frac{1}{% c}\right),\quad a_{0}=cb_{0}.italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_v end_ARG ( divide start_ARG italic_p end_ARG start_ARG 1 - italic_p end_ARG ( italic_n + divide start_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_c end_ARG ) + 1 + divide start_ARG 1 end_ARG start_ARG italic_c end_ARG ) , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_c italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

We provide a proof in Section 8.3 in the Appendix. The crux of the proof is to show that D𝐷Ditalic_D is always positive, for any feasible inputs (n,p,z,r1,r2)𝑛𝑝𝑧subscript𝑟1subscript𝑟2(n,p,z,r_{1},r_{2})( italic_n , italic_p , italic_z , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ). An interesting consequence of the proof is that in addition to the unique positive solution for c𝑐citalic_c (and hence a0)a_{0})italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), there is also a unique negative solution for c𝑐citalic_c and a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, obtained by taking the negative branch in the quadratic formula. From our numerical experience with Hamling, both our implementation and the one in dosresmeta can find the negative a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT solution, leading to infeasible pseudo-counts, when incorrectly initialized.

We now show by induction that equations (16) always have a unique feasible solution in the general case.

Theorem 5.2

For any set of positive Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, positive Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, p(0,1)𝑝01p\in(0,1)italic_p ∈ ( 0 , 1 ) and z>0𝑧0z>0italic_z > 0, the equations (16) have a positive solution with a0>0subscript𝑎00a_{0}>0italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 and b0>0subscript𝑏00b_{0}>0italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0.

See the Appendix, Section 8.4 for a proof of Theorem (5.2). The proof proceeds by induction, as it is impossible to find a closed form solution in the general case. This result ensures convergence to a tuple (a0,b0)subscript𝑎0subscript𝑏0(a_{0},b_{0})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) that can be used to construct the cell counts A𝐴Aitalic_A and B𝐵Bitalic_B according to equations (5). Our presentation of the nonlinear system in the form of equations (16) provides robustness to the method of Hamling et al. (2008), guaranteeing solutions for any choice of positive Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The inductive step of the theorem shown in Section (8.4) of the Appendix uses the structure of the equations to show the existence of the solution.

To find the solution in practice, we minimize the squared norm of the equations (16), similar to the approach shown in Algorithm 2. The construction of the proof assumes the positivity of the denominators Vi1a01b0subscript𝑉𝑖1subscript𝑎01subscript𝑏0V_{i}-\frac{1}{a_{0}}-\frac{1}{b_{0}}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG throughout. This guides our initialization strategy to ensure that a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are large enough that positivity holds for the smallest reported variance,

Vmin=miniVi.subscript𝑉subscript𝑖subscript𝑉𝑖V_{\min}=\min_{i}V_{i}.italic_V start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

From a theoretical standpoint, the nonlinear constraint

a0+b0ϵa0b0Vminsubscript𝑎0subscript𝑏0italic-ϵsubscript𝑎0subscript𝑏0subscript𝑉a_{0}+b_{0}\leq\epsilon a_{0}b_{0}V_{\min}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≤ italic_ϵ italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT

may be needed and can be maintained via line search, in practice the method always converges as long as the constraint holds at initialization. This is a markedly different strategy than the one suggested by Hamling et al. (2008), who focus on psuperscript𝑝p^{\prime}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and zsuperscript𝑧z^{\prime}italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT computed from total counts in the data. Their strategy, as implemented by Crippa and Orsini (2016), fails for cases where the reported variances are small, and is discussed in Section 6.

5.2 Hamling: Relative Risk

We now consider the log RR scores. We use the exact same notation as what has been described in the current section, except we now use LRsubscript𝐿𝑅L_{R}italic_L start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT to imply the log RRs instead of log ORs. We have

Ri=Aib0a0Bisubscript𝑅𝑖subscript𝐴𝑖subscript𝑏0subscript𝑎0subscript𝐵𝑖R_{i}=\frac{A_{i}b_{0}}{a_{0}B_{i}}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG

where, following the notation of Hamling et al. (2008), b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT now indicates total subjects in the reference group, and Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT indicates total subjects in each risk group, with a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT reference non-cases and Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT cases across exposure groups. Thus we have b0>a0subscript𝑏0subscript𝑎0b_{0}>a_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and Bi>Aisubscript𝐵𝑖subscript𝐴𝑖B_{i}>A_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Moreover, from classic results we have

Vi=1a01b0+1Ai1Bi.subscript𝑉𝑖1subscript𝑎01subscript𝑏01subscript𝐴𝑖1subscript𝐵𝑖V_{i}=\frac{1}{a_{0}}-\frac{1}{b_{0}}+\frac{1}{A_{i}}-\frac{1}{B_{i}}.italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG .

This becomes the key difference that underlies the construction of our new equations. Indeed, we obtain

Ai=1a0Rib0Vi1a0+1b0Bi=b0a0Ri1Vi1a0+1b0.subscript𝐴𝑖1subscript𝑎0subscript𝑅𝑖subscript𝑏0subscript𝑉𝑖1subscript𝑎01subscript𝑏0subscript𝐵𝑖subscript𝑏0subscript𝑎0subscript𝑅𝑖1subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle\begin{split}A_{i}&=\frac{1-\frac{a_{0}{R_{i}}}{b_{0}}}{V_{i}-% \frac{1}{a_{0}}+\frac{1}{b_{0}}}\\ B_{i}&=\frac{\frac{b_{0}}{a_{0}{R_{i}}}-1}{V_{i}-\frac{1}{a_{0}}+\frac{1}{b_{0% }}}.\end{split}start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL = divide start_ARG 1 - divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL = divide start_ARG divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - 1 end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_ARG . end_CELL end_ROW (17)

The constraint that Bi>Aisubscript𝐵𝑖subscript𝐴𝑖B_{i}>A_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT doesn’t give any new information, since it is equivalent to

b0a0Ri+a0Rib0>2.subscript𝑏0subscript𝑎0subscript𝑅𝑖subscript𝑎0subscript𝑅𝑖subscript𝑏02\frac{b_{0}}{a_{0}{R_{i}}}+\frac{a_{0}{R_{i}}}{b_{0}}>2.divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG > 2 .

The sum of any positive quantity and reciprocal is always greater than or equal to 2222, with the minimum attained when the quantity is exactly 1111. We do however know something about z𝑧zitalic_z. Recall the formulas

p=b0i=0nBi,z=i=0nBii=0nAi.formulae-sequence𝑝subscript𝑏0superscriptsubscript𝑖0𝑛subscript𝐵𝑖𝑧superscriptsubscript𝑖0𝑛subscript𝐵𝑖superscriptsubscript𝑖0𝑛subscript𝐴𝑖p=\frac{b_{0}}{\sum_{i=0}^{n}B_{i}},\quad z=\frac{\sum_{i=0}^{n}B_{i}}{\sum_{i% =0}^{n}A_{i}}.italic_p = divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , italic_z = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG .

For the relative risk case, by definition we have z1𝑧1z\geq 1italic_z ≥ 1.

Using equations (17) and formulas for p𝑝pitalic_p and z𝑧zitalic_z, we construct the two nonlinear equations that are analogous to equations (16):

1ppb01𝑝𝑝subscript𝑏0\displaystyle\frac{1-p}{p}b_{0}divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =i=1n(b0a0Ri1)/(Vi1a0+1b0)absentsuperscriptsubscript𝑖1𝑛subscript𝑏0subscript𝑎0subscript𝑅𝑖1subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\sum_{i=1}^{n}\left(\frac{b_{0}}{a_{0}R_{i}}-1\right)/\left(V_{i% }-\frac{1}{a_{0}}+\frac{1}{b_{0}}\right)= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - 1 ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) (18)
1zpb0a01𝑧𝑝subscript𝑏0subscript𝑎0\displaystyle\frac{1}{zp}b_{0}-a_{0}divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =i=1n(1a0Rib0)/(Vi1a0+1b0).absentsuperscriptsubscript𝑖1𝑛1subscript𝑎0subscript𝑅𝑖subscript𝑏0subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\sum_{i=1}^{n}\left(1-\frac{a_{0}R_{i}}{b_{0}}\right)/\left(V_{i% }-\frac{1}{a_{0}}+\frac{1}{b_{0}}\right).= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 - divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) .

We now give analogous results to Theorems (5.1) and (5.2), which are proved in the Appendix.

Theorem 5.3

Suppose all of the Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are equal to the scalar v>0𝑣0v>0italic_v > 0. Then there is a unique positive solution of the equations (18) for any value p(0,1)𝑝01p\in(0,1)italic_p ∈ ( 0 , 1 ) and any value of z>0𝑧0z>0italic_z > 0, and any set of positive estimates Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, if and only if

(1p)z(1pp)24r2,(1p)z1.formulae-sequence1𝑝𝑧superscript1𝑝𝑝24subscript𝑟21𝑝𝑧1(1-p)z\geq\left(\frac{1-p}{p}\right)^{2}4r_{2},\quad(1-p)z\geq 1.( 1 - italic_p ) italic_z ≥ ( divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 4 italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ( 1 - italic_p ) italic_z ≥ 1 .

Using the notation of Theorem 5.1, when the conditions above are satisfied the positive solution c=a0b0𝑐subscript𝑎0subscript𝑏0c=\frac{a_{0}}{b_{0}}italic_c = divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG is given by

c=n(zpz+1)+pr1z+D2z(np+(1p)r2)𝑐𝑛𝑧𝑝𝑧1𝑝subscript𝑟1𝑧𝐷2𝑧𝑛𝑝1𝑝subscript𝑟2c=\frac{n(z-pz+1)+pr_{1}z+\sqrt{D}}{2z(np+(1-p)r_{2})}italic_c = divide start_ARG italic_n ( italic_z - italic_p italic_z + 1 ) + italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z + square-root start_ARG italic_D end_ARG end_ARG start_ARG 2 italic_z ( italic_n italic_p + ( 1 - italic_p ) italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

where

D=(n(pzz1)r1zp)24r1(nzp+r2zr2pz).𝐷superscript𝑛𝑝𝑧𝑧1subscript𝑟1𝑧𝑝24subscript𝑟1𝑛𝑧𝑝subscript𝑟2𝑧subscript𝑟2𝑝𝑧D=(n(pz-z-1)-r_{1}zp)^{2}-4r_{1}(nzp+r_{2}z-r_{2}pz).italic_D = ( italic_n ( italic_p italic_z - italic_z - 1 ) - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n italic_z italic_p + italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p italic_z ) .

Once c𝑐citalic_c is found, we have

b0=1Vc(p1p(r1cn)+(1c)),a0=cb0.formulae-sequencesubscript𝑏01𝑉𝑐𝑝1𝑝subscript𝑟1𝑐𝑛1𝑐subscript𝑎0𝑐subscript𝑏0b_{0}=\frac{1}{Vc}\left(\frac{p}{1-p}(r_{1}-cn)+(1-c)\right),\quad a_{0}=cb_{0}.italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_V italic_c end_ARG ( divide start_ARG italic_p end_ARG start_ARG 1 - italic_p end_ARG ( italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_c italic_n ) + ( 1 - italic_c ) ) , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_c italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

A simple counter-example that violates the two inequalities required by Therorem 5.3, and for which there is no solution, is given by

R1=0.9328,R2=0.062,p=0.1,z=1.1.formulae-sequencesubscript𝑅10.9328formulae-sequencesubscript𝑅20.062formulae-sequence𝑝0.1𝑧1.1R_{1}=0.9328,R_{2}=0.062,p=0.1,z=1.1.italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.9328 , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.062 , italic_p = 0.1 , italic_z = 1.1 .

These values have no solution in the RR example for any equal variance values V1=V2=vsubscript𝑉1subscript𝑉2𝑣V_{1}=V_{2}=vitalic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_v. In Section 6, we show that available implementations return nonsensical results, and in fact cannot solve the defining equations, which makes sense, given that D<0𝐷0D<0italic_D < 0 in this case. In contrast to the previous section, there is no way to fix this issue, a solution simply cannot exist. The best we can do in such a case is to suggest the modeler check their inputs Ri,Vi,p,zsubscript𝑅𝑖subscript𝑉𝑖𝑝𝑧R_{i},V_{i},p,zitalic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_p , italic_z or consider using the GL approach, which is always guaranteed to work.

6 Numerical Examples

In this section, we review detailed examples of the implementation and results of our proposed methods as described in Sections 4 and 5. First, we show that the corrected methods we proposed reproduce the results of Greenland and Longnecker (1992) and Hamling et al. (2008) for the canonical examples in these papers. Second, we show failure modes for Greenland and Longnecker (1992) and Hamling et al. (2008) and correct estimates from the robust implementations using the results in this work. For GL, we leverage the connection to convex optimization to provide software using disciplined convex programming libraries CVXPY for our modification of the approach of Greenland and Longnecker (1992). For Hamling, we use SciPy optimization routines with a theoretically justified initialization to solve the root finding problem. To demonstrate the failure modes, we use the R library dosresmeta (Crippa and Orsini, 2016), which implements both GL and Hamling methods.

6.1 Results Comparison to GL and Hamling: Canonical Examples

We use the data from (Greenland and Longnecker, 1992, Table 1) as a simple example showing that the optimized GL reproduces the same results as regular Greenland and Longnecker (1992) and Hamling et al. (2008) for simple problems. In the example given by Greenland and Longnecker (1992), the authors fit the linear-logistic model

λ(x,z)=α+βx.𝜆𝑥𝑧𝛼𝛽𝑥\lambda(x,z)=\alpha+\beta x.italic_λ ( italic_x , italic_z ) = italic_α + italic_β italic_x .

In this case, the model is giving the log-odds of a subject being a case, and we want to estimate β𝛽\betaitalic_β. The data x𝑥xitalic_x represent alcohol intake as exposure levels. We present a summary of the adjusted estimates we obtain using our convex formulation for the objective of Greenland and Longnecker (1992) and solutions to the modified system of equations originally in Hamling et al. (2008), and showing the coefficient value β^^𝛽\hat{\beta}over^ start_ARG italic_β end_ARG estimate along with the variance estimate for each method.

In Table 2 we present the least-squares estimates generated from the four different types of pseudo-count fitting techniques described in this study. Denote by “Unadjusted” as using reported variances with the independence assumption. Denote by “GL” the least-squares and variance estimates obtained by the cell-fitting procedure of Greenland and Longnecker (1992). Denote by “Hamling” the estimates produced from the method of Hamling et al. (2008). Denote by “Convex GL” as the estimates obtained from our fitting procedure that modifies the method of Greenland and Longnecker (1992) as described in Section 4. Denote by “Solved Hamling” as the estimates obtained from our fitting procedure that modifies the method of Hamling et al. (2008) as described in Section 5.

Table 2: Estimates and variances table–log-odds ratios.
Method β^^𝛽\hat{\beta}over^ start_ARG italic_β end_ARG Variance
Unadjusted 0.0334 0.000349
GL 0.0454 0.000427
Convex GL 0.0454 0.000427
Hamling 0.04588 0.000421
Solved Hamling 0.04588 0.000421

The Convex GL method produces the same results as the original GL approach Greenland and Longnecker (1992) when the latter succeeds. Additionally, our Solved Hamling method produces the same results as the standard Hamling method when the latter succeeds. There are numerical differences in variance results for corresponding methods; for the Convex GL approach that uses DCP, we use a high degree of precision in the solver, so these results correspond to solving the equations to a greater degree of precision. The estimates obtained by Hamling vs. GL differ, but this is to be expected, as discussed in Section 3.1.

We next include a summary of the pseudo-counts only of cases generated by each method in Table 3. We follow the same notation used in Table 1 for cases.

Table 3: Pseudo-count table–log-odds ratios.
Method a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT A3subscript𝐴3A_{3}italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT
GL 160.4702 70.2046 95.4696 124.8556
Convex GL 160.5064 70.3304 95.4857 124.6776
Hamling 96.2699 50.9684 57.2220 67.7043
Solved Hamling 96.2653 50.9654 57.2180 67.6989

On this simple example, the pseudo-counts for cases generated by our methods match closely those generated by the original methods. Again counts generated by GL methods differ from those generated by Hamling methods, since the GL relies on group counts, whereas Hamling uses reported variances. The pseudo-counts are an intermediate result whose main purpose is to obtain the covariance matrix, and we compare the covariance matrices obtained by these methods in Figure 3.

Refer to caption
Figure 3: Estimated covariance matrices for OR. Left: Covariance matrix generated from the Convex GL pseudo-counts; Right: Covariance matrix generated from the Solved Hamling pseudo-counts.

There are small differences between the individual entries in Figure 3. These differences are in fact what cause the estimates in Table 2 to vary slightly between the GL and Hamling-based methods. Note that the Convex GL covariance matrix has different entries, whereas the entries of the (Solved) Hamling covariance matrix are identical. This is due to the variance model in the construction of the Hamling estimators.

Next, we run a similar test on RRs, using the alcohol and colorectal cancer data and results in Orsini et al. (2012). We present a summary of the adjusted estimates obtained by our methods and by the methods of GL and Hamling. We use data directly from dosresmeta, specifically the alcohol_crc dataframe, and analyze the subset id author atm. In Table (4) we present the least-squares estimates, similar to what was shown above.

Table 4: Estimates and variances table–log-relative risks.
Method β^^𝛽\hat{\beta}over^ start_ARG italic_β end_ARG Variance
Unadjusted -0.00294 1.5865e-05
GL 0.0071 1.5176e-05
Convex GL 0.0071 1.5166e-05
Hamling 0.0063 1.5490e-05
Solved Hamling 0.0063 1.5436e-05

Once again we see small numerical differences in variance estimates, with our estimates using high precision on the equation solves. We also see a larger difference between the estimates obtained by GL vs. Hamling, a direct consequence of the different parametrizations.

We provide a summary of case pseudo-counts generated by each method in Table 5. The pseudo-count estimates within method families are close; while counts between GL and Hamling methods match in some groups but differ in others, causing the differences observed in estimate values in Table 4.

Table 5: Pseudo-count table–log-relative risk.
Method a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT A3subscript𝐴3A_{3}italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT A4subscript𝐴4A_{4}italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT A5subscript𝐴5A_{5}italic_A start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT
GL 26.5957 34.0061 42.8532 33.3584 17.9492 29.2359
Convex GL 26.5973 34.0061 42.8532 33.3583 17.9492 29.2359
Hamling 26.4495 39.5129 44.2940 31.6140 15.3332 22.6277
Solved Hamling 26.4087 39.4526 44.2234 31.5706 15.3105 22.5738
Refer to caption
Figure 4: Estimated covariance matrices for RR. Left: Covariance matrix generated from the Convex GL pseudo-counts; Right: Covariance matrix generated from the Solved Hamling pseudo-counts. Both cases are with respect to the relative risk regime.

Covariance matrices obtained from pseudo-counts generated by our Convex GL and Solved Hamling methods are shown in Figure 4. We again see identical entries in the covariance matrix produced from Hamling. The differences in covariance values between the matrices explain the differences in estimates values in Table 4.

We now continue to the avoidable failure modes, providing simple OR examples where the original GL and Hamling methods fail but our Convex GL and Solved Hamling methods succeed.

6.2 Original method failure and Corrected success

In this section we produce simple failure modes for original GL and Hamling methods, and show that new methods work on these cases, as expected from the theoretical results. This is reassuring to practitioners running many analyses; the need to re-initialize current methods and potential quiet failures of the Hamling method can both be avoided with straightforward modifications. To demonstrate the failure modes, we perturb the alcohol_cvd data from dosresmeta.

6.2.1 GL method failure

Using the method of GL first, we change the number of subjects at each exposure level in the alcohol_cvd dataset to be a function of the number of cases in the same dataset at the corresponding exposure levels. Namely, we modify the number of subjects to be

N=A+t𝑁𝐴𝑡N=A+titalic_N = italic_A + italic_t

for integer values t={1,,20}𝑡120t=\{1,\dots,20\}italic_t = { 1 , … , 20 }. The lower the t𝑡titalic_t, the more extreme the situation, corresponding to very few controls in each group. We use each N𝑁Nitalic_N as input data to the standard GL routine to construct pseudo-counts using the GL method. For t13𝑡13t\leq 13italic_t ≤ 13, the original GL method in dosresmeta fails.

In the cases of failure, even though the initial A𝐴Aitalic_A is feasible, GL iterations run afoul of the logarithmic terms in the dosresmeta implementation for low t𝑡titalic_t. For t14𝑡14t\geq 14italic_t ≥ 14, this issue disappears. The entire problem is avoided when we use the convex GL approach, which succeeds in all cases.

The new Convex GL method succeeds even in the extreme case when t=1𝑡1t=1italic_t = 1. We compare the covariance matrix for N1=A+1superscript𝑁1𝐴1N^{1}=A+1italic_N start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = italic_A + 1 compared to the covariance matrix GL obtains on the original data in Figure 5. We see that the covariance matrices returned for the original and perturbed data are fairly close, suggesting that correlations are well-behaved in such cases and underscoring the need for a robust method. In other words, results for GL will likely be useful even for small studies when we have very few controls.

Refer to caption
Figure 5: Comparison of GL covariance matrices for original data vs. perturbed data. Left: Covariance matrix generated from the Convex GL pseudo-counts on original data for alcohol; Right: Covariance matrix generated from the Convex GL pseudo-counts on N1=A+1superscript𝑁1𝐴1N^{1}=A+1italic_N start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT = italic_A + 1, a hypothetical where there is only one control in every group. The original GL method Greenland and Longnecker (1992) fails on the hypothetical example shown on the right.

6.2.2 Hamling method failure

The Hamling method fails when default initialization fails to guarantee positivity of all denominators Vi1a01b0subscript𝑉𝑖1subscript𝑎01subscript𝑏0V_{i}-\frac{1}{a_{0}}-\frac{1}{b_{0}}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG. For example, the Hamling initialization used by dosresmeta can break when the input Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are very small. In this case, the dosresmeta Hamling approach returns negative pseudo-counts, and correlations computed using these counts.

To show this failure mode, we alter the alcohol_cvd dataset in dosresmeta by changing the reported variances to be

V^=(NA,0.001,0.01,0.2,0.9)^𝑉NA0.0010.010.20.9\hat{V}={\left(\mathrm{NA},0.001,0.01,0.2,0.9\right)}over^ start_ARG italic_V end_ARG = ( roman_NA , 0.001 , 0.01 , 0.2 , 0.9 )

where the NA is a placeholder for the reference exposure level. Passing this data into the hamling method in dosresmeta, we obtain negative values in the estimated counts for cases and non-cases at the first level of exposure, as shown in Table 6.

Table 6: Pseudo-count table–broken Hamling Example .
Method a0subscript𝑎0a_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT A1subscript𝐴1A_{1}italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT A2subscript𝐴2A_{2}italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT A3subscript𝐴3A_{3}italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT A4subscript𝐴4A_{4}italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT
Hamling 189.7 -207.5 514.2 8.6 2.2
Solved Hamling 2897.8 2976.1 157.2 31.5706 9.2

The method of Hamling fails silently, since it then uses the negative values to compute the covariance matrix. To study the downstream effects, we compare the covariance matrices constructed by dosresmeta from the wrong pseudo-counts generated by the original Hamling method with those generated by the solved Hamling method in Figure 6. The solved Hamling method obtains an order of magnitude smaller correlation across the subgroups. This means that when Hamling fails quietly, it will provide estimates that deviate further from the uncorrected estimates compared to the correctly solved formulation.

Refer to caption
Figure 6: Comparison between covariance matrices generated by Hamling from wrong pseudo-counts and correct pseudo-counts. Left: Covariance matrix generated from the negative Hamling pseudo-counts; Right: Covariance matrix generated from the Solved Hamling pseudo-counts. The correct values result in a much smaller between-level covariance than the incorrect values in this example.

We extend this example to study the range of the failure mode as a function of the scale of variance values. For simplicity we vary only the first element of V^^𝑉\hat{V}over^ start_ARG italic_V end_ARG. We then assess whether there are any negative values in the constructed pseudo-counts for cases A𝐴Aitalic_A by the Hamling method. As can easily be verified, the variance values below 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT in the first numerical coordinate produce negative values in A𝐴Aitalic_A. Obviously for smaller estimates the method still fails, but such small variances correspond to huge sample sizes that are unlikely to occur in practice. The variance values greater than 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT produce only positive values, even beyond 1. This shows clearly how the Hamling method fails for small enough variance values given default initialization in dosresmeta, and can be fixed easily by using the strategy discussed in Section 5.

To fix the problem we use the initialization suggested by the theoretical analysis. Specifically, we construct the initialization parameters a0,b0subscript𝑎0subscript𝑏0a_{0},b_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as

(a0,b0)=(10min(v),10min(v)).subscript𝑎0subscript𝑏010𝑣10𝑣(a_{0},b_{0})={\left(\frac{10}{\min(v)},\frac{10}{\min(v)}\right)}.( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ( divide start_ARG 10 end_ARG start_ARG roman_min ( italic_v ) end_ARG , divide start_ARG 10 end_ARG start_ARG roman_min ( italic_v ) end_ARG ) .

The underlying idea is that the large initialization ensures the denominators of Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and Bisubscript𝐵𝑖B_{i}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in equations (5) remain positive, ensuring all counts are positive. This works well numerically, and does not break regardless of the visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT values. This provides empirical support for the proof technique given in Theorem 5.2.

In the next section, we study the unavoidable failure mode of Hamling for RRs.

6.3 Hamling Failure for RR

We review the counter-example presented in Section 5

R1=0.9328,R2=0.062,p=0.1,z=1.1.formulae-sequencesubscript𝑅10.9328formulae-sequencesubscript𝑅20.062formulae-sequence𝑝0.1𝑧1.1R_{1}=0.9328,R_{2}=0.062,p=0.1,z=1.1.italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.9328 , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.062 , italic_p = 0.1 , italic_z = 1.1 .

This example was obtained by violating the conditions presented in Theorem 5.3 for the equivariant case. The failure corresponds to obtaining a negative discriminant in the quadratic formula for the ratio c=a0b0𝑐subscript𝑎0subscript𝑏0c=\frac{a_{0}}{b_{0}}italic_c = divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG, and means that a solution cannot exist, regardless of reported (equal) variances. To see this bear out in practice we make a simple choice

v1=v2=1.0.subscript𝑣1subscript𝑣21.0v_{1}=v_{2}=1.0.italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1.0 .

Running dosresmeta on this example gives us results in Table 7.

Table 7: Hamling Results for RR Counter-Example.
A𝐴Aitalic_A N𝑁Nitalic_N
1.4 1.3
1.1×1051.1superscript105-1.1\times 10^{-5}- 1.1 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT 1.1×1051.1superscript105-1.1\times 10^{-5}- 1.1 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT
0.880.880.880.88 13.313.313.313.3

We see negative values for A𝐴Aitalic_A and N𝑁Nitalic_N, a problem for any situation, and a0>b0subscript𝑎0subscript𝑏0a_{0}>b_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, which is impossible for RR. These issues still can still occur for a candidate solution to the equations (18). However, the claim we made is stronger, that is, a solution that satisfies the six equations corresponding to p,z,R1,R2,v1,v2𝑝𝑧subscript𝑅1subscript𝑅2subscript𝑣1subscript𝑣2p,z,R_{1},R_{2},v_{1},v_{2}italic_p , italic_z , italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT cannot exist. When we review the dosresmeta result with respect to these six equations, we find that in fact, two of the six are not satisfied:

R1(A,N)subscript𝑅1𝐴𝑁\displaystyle R_{1}(A,N)italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_A , italic_N ) =0.93,R2(A,N)=0.062,v2(A,N)=1.0,p(A,N)=0.1;formulae-sequenceabsent0.93formulae-sequencesubscript𝑅2𝐴𝑁0.062formulae-sequencesubscript𝑣2𝐴𝑁1.0𝑝𝐴𝑁0.1\displaystyle=0.93,\quad R_{2}(A,N)=0.062,\quad v_{2}(A,N)=1.0,\quad p(A,N)=0.1;= 0.93 , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_A , italic_N ) = 0.062 , italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_A , italic_N ) = 1.0 , italic_p ( italic_A , italic_N ) = 0.1 ;
v1(A,N)subscript𝑣1𝐴𝑁\displaystyle v_{1}(A,N)italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_A , italic_N ) =0.05;z(A,N)=6.33.formulae-sequenceabsent0.05𝑧𝐴𝑁6.33\displaystyle=-0.05;\quad z(A,N)=6.33.= - 0.05 ; italic_z ( italic_A , italic_N ) = 6.33 .

In contrast to the previous examples, there is no way to fix this; we know from the proof of Theorem 5.3 that no solutions can exist to this example.

7 Conclusion

In this paper we have taken a closer look at the methods of Greenland and Longnecker (1992) and Hamling et al. (2008).

We have shown that the GL approach lends itself to a reformulation to minimizing a convex model, for both ORs and RRs. In both cases we can avoid all numerical difficulty and guarantee convergence to the unique optimal point for any feasible data inputs. This was a rather surprising finding that initially motivated us to write the paper. The convex loss that emerged when we integrated the optimality conditions is the entropic distance function, an object that appears in other areas of mathematics and statistics. An unexplored consequence of the connection to convex models is that it is now easy to include side information (if such information is available to modelers) through the use of linear equality and inequality constraints on the pseudo-counts A𝐴Aitalic_A. As long as there is a feasible A𝐴Aitalic_A, the proof theory in this paper guarantees a unique solution, and modifying the formulation is straightforward in cvxpy. We leave further exploration of this idea to future work.

For the Hamling method, the story is more complicated. In the case of OR, we were able to show that the Hamling equations always have a solution. In fact we obtained a closed form solution for the equivariant case (all reported variances equal) and provided a proof by induction for the general case. This means that literally for any observed ORs, variances, p𝑝pitalic_p, and z𝑧zitalic_z, we can always find a solution.

In contrast, for RR, there is no guarantee that Hamling will work. We presented a counter-example when there are only two alternative groups. Counter-examples are by nature odd, but nonetheless there is a fundamental difference between RR and OR for Hamling stemming from relying on reported variances. This is curious. Between the methods of GL and Hamling, when faced with many meta-analyses we find the Hamling approach more appealing, since it only needs p𝑝pitalic_p and z𝑧zitalic_z in addition to reported estimates and variances. Based on the RR failure, we should keep the GL method available should an unavoidable failure mode arise.

We have done our best to make the results as interpretable and clear as possible. We have an implementation for GL and Hamling methods publicly available111https://github.com/ihmeuw-msca/CorrelationCorrection; and we have shown simple cases where we can break the widely used dosresmeta package using simple examples. Using the insights in this paper, safeguarding estimates available in other packages is a straightforward task. For GL, it is a matter of providing standard optimization guardrails, such as a line search. For Hamling, it is a change in the initialization strategy based on the minimum reported variance.

References

  • Agrawal et al. [2018] Akshay Agrawal, Robin Verschueren, Steven Diamond, and Stephen Boyd. A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, 2018.
  • Boyd and Vandenberghe [2004] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • Crippa and Orsini [2016] Alessio Crippa and Nicola Orsini. Multivariate dose-response meta-analysis: The dosresmeta R package. Journal of Statistical Software, Code Snippets, 72(1):1–15, 2016. doi: 10.18637/jss.v072.c01.
  • Crippa et al. [2019] Alessio Crippa, Andrea Discacciati, Matteo Bottai, Donna Spiegelman, and Nicola Orsini. One-stage dose–response meta-analysis for aggregated data. Statistical methods in medical research, 28(5):1579–1596, 2019.
  • Dai et al. [2022] Xiaochen Dai, Gabriela F Gil, Marissa B Reitsma, Noah S Ahmad, Jason A Anderson, Catherine Bisignano, Sinclair Carr, Rachel Feldman, Simon I Hay, Jiawei He, et al. Health effects associated with smoking: a burden of proof study. Nature medicine, 28(10):2045–2055, 2022.
  • Deeks et al. [2019] Jonathan J Deeks, Julian PT Higgins, Douglas G Altman, and Cochrane Statistical Methods Group. Analysing data and undertaking meta-analyses. Cochrane handbook for systematic reviews of interventions, pages 241–284, 2019.
  • Diamond and Boyd [2016] Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
  • Gautschi [1997] Walter Gautschi. Numerical analysis: an introduction. Birkhauser Boston Inc., USA, 1997. ISBN 0817638954.
  • Grant et al. [2006] Michael Grant, Stephen Boyd, and Yinyu Ye. Disciplined convex programming. Springer, 2006.
  • Greenland and Longnecker [1992] Sander Greenland and Matthew P. Longnecker. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. American journal of epidemiology, 135 11:1301–9, 1992. URL https://api.semanticscholar.org/CorpusID:31135711.
  • Haidich [2010] Anna-Bettina Haidich. Meta-analysis in medical research. Hippokratia, 14(Suppl 1):29, 2010.
  • Hamling et al. [2008] Jan Hamling, Peter Lee, Rolf Weitkunat, and Mathias Ambühl. Facilitating meta-analyses by deriving relative effect and precision estimates for alternative comparisons from a set of estimates presented by exposure level or disease category. Statistics in medicine, 27(7):954–970, 2008.
  • Itoga et al. [2018] Nathan K Itoga, Daniel S Tawfik, Charles K Lee, Satoshi Maruyama, Nicholas J Leeper, and Tara I Chang. Association of blood pressure measurements with peripheral artery disease events: reanalysis of the allhat data. Circulation, 138(17):1805–1814, 2018.
  • Kariya and Kurata [2004] Takeaki Kariya and Hiroshi Kurata. Generalized least squares. John Wiley & Sons, 2004.
  • Lescinsky et al. [2022] Haley Lescinsky, Ashkan Afshin, Charlie Ashbaugh, Catherine Bisignano, Michael Brauer, Giannina Ferrara, Simon I Hay, Jiawei He, Vincent Iannucci, Laurie B Marczak, et al. Health effects associated with consumption of unprocessed red meat: a burden of proof study. Nature Medicine, 28(10):2075–2082, 2022.
  • Liu et al. [2009] Qin Liu, Nancy R Cook, Anna Bergström, and Chung-Cheng Hsieh. A two-stage hierarchical regression model for meta-analysis of epidemiologic nonlinear dose–response data. Computational Statistics & Data Analysis, 53(12):4157–4167, 2009.
  • Orsini et al. [2012] Nicola Orsini, Ruifeng Li, Alicja Wolk, Polyna Khudyakov, and Donna Spiegelman. Meta-analysis for linear and nonlinear dose-response relations: examples, an evaluation of approximations, and software. American journal of epidemiology, 175(1):66–73, 2012.
  • Razo et al. [2022] Christian Razo, Catherine A Welgan, Catherine O Johnson, Susan A McLaughlin, Vincent Iannucci, Anthony Rodgers, Nelson Wang, Kate E LeGrand, Reed JD Sorensen, Jiawei He, et al. Effects of elevated systolic blood pressure on ischemic heart disease: a burden of proof study. Nature medicine, 28(10):2056–2065, 2022.
  • Rudin et al. [1964] Walter Rudin et al. Principles of mathematical analysis, volume 3. McGraw-hill New York, 1964.
  • Schmidt and Kohlmann [2008] Carsten Oliver Schmidt and Thomas Kohlmann. When to use the odds ratio or the relative risk? International journal of public health, 53(3):165, 2008.
  • Stanaway et al. [2022] Jeffrey D Stanaway, Ashkan Afshin, Charlie Ashbaugh, Catherine Bisignano, Michael Brauer, Giannina Ferrara, Vanessa Garcia, Demewoz Haile, Simon I Hay, Jiawei He, et al. Health effects associated with vegetable consumption: a burden of proof study. Nature medicine, 28(10):2066–2074, 2022.
  • Zheng et al. [2021] Peng Zheng, Ryan Barber, Reed JD Sorensen, Christopher JL Murray, and Aleksandr Y Aravkin. Trimmed constrained mixed effects models: formulations and algorithms. Journal of Computational and Graphical Statistics, 30(3):544–556, 2021.
  • Zheng et al. [2022] Peng Zheng, Aleksandr Aravkin, Christopher Murray, et al. The burden of proof studies: assessing the evidence of risk. Nature Medicine, 28(10):2038–2044, 2022.

8 Appendix

In this section, we provide proofs of theorems presented throughout this work.

8.1 Proof of Theorem 4.1

Take G𝐺Gitalic_G defined as in equation (7). G𝐺Gitalic_G is continuous on its domain [0,)nsuperscript0𝑛[0,\infty)^{n}[ 0 , ∞ ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. First, we show that G𝐺Gitalic_G is proper, i.e., for some positive values of A,a0,B,b0𝐴subscript𝑎0𝐵subscript𝑏0A,a_{0},B,b_{0}italic_A , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_B , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, G(A)+not-equivalent-to𝐺𝐴G(A)\not\equiv+\inftyitalic_G ( italic_A ) ≢ + ∞ and that for any X[0,)n,G(X)>formulae-sequence𝑋superscript0𝑛𝐺𝑋X\in[0,\infty)^{n},G(X)>-\inftyitalic_X ∈ [ 0 , ∞ ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_G ( italic_X ) > - ∞. For this fact, we need the hypothesis in the statement of the theorem. Let A𝐴Aitalic_A is the vector of ones of length n𝑛nitalic_n, i.e., A=[1,,1]𝐴superscript11topA=[1,\dots,1]^{\top}italic_A = [ 1 , … , 1 ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. Since, by hypothesis, N+>Asubscript𝑁𝐴N_{+}>Aitalic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT > italic_A and n0>a0subscript𝑛0subscript𝑎0n_{0}>a_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, G(A)<𝐺𝐴G(A)<\inftyitalic_G ( italic_A ) < ∞ inspection. Also by inspection, G𝐺Gitalic_G is not equal to -\infty- ∞ for any A𝐴Aitalic_A in its domain.

Next, G𝐺Gitalic_G is optimized over the compact set 0AN0𝐴𝑁0\leq A\leq N0 ≤ italic_A ≤ italic_N. Since, by hypothesis, N+>Asubscript𝑁𝐴N_{+}>Aitalic_N start_POSTSUBSCRIPT + end_POSTSUBSCRIPT > italic_A and n0>a0subscript𝑛0subscript𝑎0n_{0}>a_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, G(A)<𝐺𝐴G(A)<\inftyitalic_G ( italic_A ) < ∞ inspection. Also by inspection, G𝐺Gitalic_G is not equal to -\infty- ∞ for any A𝐴Aitalic_A in its domain. Since G𝐺Gitalic_G is continuous on the compact domain 0AN0𝐴𝑁0\leq A\leq N0 ≤ italic_A ≤ italic_N, it attains its minimum and maximum values. Since G𝐺Gitalic_G is strictly convex, this minimizer must be unique. This completes the proof.

8.2 Proof of Theorem 4.2

Take H𝐻Hitalic_H as defined in equation (13). This proof will follow the same structure as the proof for Theorem 4.1. We need only show H𝐻Hitalic_H is proper and that it has compact sublevel sets since H𝐻Hitalic_H is clearly continuous on the domain [0,)nsuperscript0𝑛[0,\infty)^{n}[ 0 , ∞ ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. To show that H𝐻Hitalic_H is proper, similar to the proof of Theorem (4.1), consider the case when A𝐴Aitalic_A is the vector of ones of length n𝑛nitalic_n. By the hypothesis in the statement of Theorem 4.2, n0>a0subscript𝑛0subscript𝑎0n_{0}>a_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, so that H𝐻Hitalic_H is finite by inspection. Also by inspection, H𝐻Hitalic_H is never equal to -\infty- ∞ on any point in its domain.

Next, we show that H𝐻Hitalic_H has compact sublevel sets, that is,

𝒜α:={A:H(A)α}assignsubscript𝒜𝛼conditional-set𝐴𝐻𝐴𝛼\mathcal{A}_{\alpha}:=\{A:H(A)\leq\alpha\}caligraphic_A start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT := { italic_A : italic_H ( italic_A ) ≤ italic_α }

are closed and bounded. The closed prpoerty follows immediately by continuity. Next, for a sequence of X[0,)n𝑋superscript0𝑛X\in[0,\infty)^{n}italic_X ∈ [ 0 , ∞ ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, H(X)𝐻𝑋H(X)\to\inftyitalic_H ( italic_X ) → ∞ as Xnorm𝑋{\left\|X\right\|}\to\infty∥ italic_X ∥ → ∞ since H𝐻Hitalic_H is a sum of affine functions and entropic distance functions in all coordinates, see equation (11). As Xnorm𝑋{\left\|X\right\|}\to\infty∥ italic_X ∥ → ∞, the xlogx𝑥𝑥x\log xitalic_x roman_log italic_x terms in H𝐻Hitalic_H increase faster than linear terms. This implies directly that any sublevel set of H𝐻Hitalic_H must have an upper bound. Thus, H𝐻Hitalic_H has compact sublevel sets. In particular, H𝐻Hitalic_H attains its minimum and maximum for any choice of sublevel set, so in particular we can consider α=H(1)𝛼𝐻1\alpha=H(1)italic_α = italic_H ( 1 ), the vector of all ones discussed in the previous paragraph. Once we know H𝐻Hitalic_H attains its minimum, we also know that the minimum is unique by strict convexity of the entropic distance.

8.3 Proof of Theorem 5.1

To prove the result, we simplify and rewrite the equations

(V1a01b0)1ppb0𝑉1subscript𝑎01subscript𝑏01𝑝𝑝subscript𝑏0\displaystyle\left(V-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)\frac{1-p}{p}b_{0}( italic_V - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =i=1n(1+b0a0Ri)=n+b0a0i=1n1Ri=n+b0a0r1absentsuperscriptsubscript𝑖1𝑛1subscript𝑏0subscript𝑎0subscript𝑅𝑖𝑛subscript𝑏0subscript𝑎0superscriptsubscript𝑖1𝑛1subscript𝑅𝑖𝑛subscript𝑏0subscript𝑎0subscript𝑟1\displaystyle=\sum_{i=1}^{n}\left(1+\frac{b_{0}}{a_{0}R_{i}}\right)=n+\frac{b_% {0}}{a_{0}}\sum_{i=1}^{n}\frac{1}{R_{i}}=n+\frac{b_{0}}{a_{0}}r_{1}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) = italic_n + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = italic_n + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
(V1a01b0)(1zpb0a0)𝑉1subscript𝑎01subscript𝑏01𝑧𝑝subscript𝑏0subscript𝑎0\displaystyle\left(V-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)\left(\frac{1}{zp}b% _{0}-a_{0}\right)( italic_V - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =i=1n(1+a0Rib0)=n+a0b0i=1nRi=n+a0b0r2absentsuperscriptsubscript𝑖1𝑛1subscript𝑎0subscript𝑅𝑖subscript𝑏0𝑛subscript𝑎0subscript𝑏0superscriptsubscript𝑖1𝑛subscript𝑅𝑖𝑛subscript𝑎0subscript𝑏0subscript𝑟2\displaystyle=\sum_{i=1}^{n}\left(1+\frac{a_{0}R_{i}}{b_{0}}\right)=n+\frac{a_% {0}}{b_{0}}\sum_{i=1}^{n}R_{i}=n+\frac{a_{0}}{b_{0}}r_{2}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) = italic_n + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_n + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

Dividing the equations we obtain

n+b0a0r1n+a0b0r2=1ppb01zpb0a0=1pp1zpa0b0𝑛subscript𝑏0subscript𝑎0subscript𝑟1𝑛subscript𝑎0subscript𝑏0subscript𝑟21𝑝𝑝subscript𝑏01𝑧𝑝subscript𝑏0subscript𝑎01𝑝𝑝1𝑧𝑝subscript𝑎0subscript𝑏0\frac{n+\frac{b_{0}}{a_{0}}r_{1}}{n+\frac{a_{0}}{b_{0}}r_{2}}=\frac{\frac{1-p}% {p}b_{0}}{\frac{1}{zp}b_{0}-a_{0}}=\frac{\frac{1-p}{p}}{\frac{1}{zp}-\frac{a_{% 0}}{b_{0}}}divide start_ARG italic_n + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_n + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG = divide start_ARG divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG = divide start_ARG divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG - divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_ARG

Defining now c=a0b0𝑐subscript𝑎0subscript𝑏0c=\frac{a_{0}}{b_{0}}italic_c = divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG we have

n+r1cn+cr2=1pp1zpc𝑛subscript𝑟1𝑐𝑛𝑐subscript𝑟21𝑝𝑝1𝑧𝑝𝑐\frac{n+\frac{r_{1}}{c}}{n+cr_{2}}=\frac{\frac{1-p}{p}}{\frac{1}{zp}-c}divide start_ARG italic_n + divide start_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_c end_ARG end_ARG start_ARG italic_n + italic_c italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG = divide start_ARG divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG - italic_c end_ARG

Multiplying by c𝑐citalic_c we have

r1+ncn+r2c=(1p)zc1pzcsubscript𝑟1𝑛𝑐𝑛subscript𝑟2𝑐1𝑝𝑧𝑐1𝑝𝑧𝑐\frac{r_{1}+nc}{n+r_{2}c}=\frac{(1-p)zc}{1-pzc}divide start_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_n italic_c end_ARG start_ARG italic_n + italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_c end_ARG = divide start_ARG ( 1 - italic_p ) italic_z italic_c end_ARG start_ARG 1 - italic_p italic_z italic_c end_ARG

The solution is given by

c=npznz+npr1z±D2z(nppr2+r2)𝑐plus-or-minus𝑛𝑝𝑧𝑛𝑧𝑛𝑝subscript𝑟1𝑧𝐷2𝑧𝑛𝑝𝑝subscript𝑟2subscript𝑟2c=\frac{npz-nz+n-pr_{1}z\pm\sqrt{D}}{2z(np-pr_{2}+r_{2})}italic_c = divide start_ARG italic_n italic_p italic_z - italic_n italic_z + italic_n - italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z ± square-root start_ARG italic_D end_ARG end_ARG start_ARG 2 italic_z ( italic_n italic_p - italic_p italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

where

D𝐷\displaystyle Ditalic_D =n2p2z22n2pz2+2n2pz+n2z22n2z+n22np2r1z2+2npr1z2+2npr1z+p2r12z24pr1r2z+4r1r2zabsentsuperscript𝑛2superscript𝑝2superscript𝑧22superscript𝑛2𝑝superscript𝑧22superscript𝑛2𝑝𝑧superscript𝑛2superscript𝑧22superscript𝑛2𝑧superscript𝑛22𝑛superscript𝑝2subscript𝑟1superscript𝑧22𝑛𝑝subscript𝑟1superscript𝑧22𝑛𝑝subscript𝑟1𝑧superscript𝑝2superscriptsubscript𝑟12superscript𝑧24𝑝subscript𝑟1subscript𝑟2𝑧4subscript𝑟1subscript𝑟2𝑧\displaystyle=n^{2}p^{2}z^{2}-2n^{2}pz^{2}+2n^{2}pz+n^{2}z^{2}-2n^{2}z+n^{2}-2% np^{2}r_{1}z^{2}+2npr_{1}z^{2}+2npr_{1}z+p^{2}r_{1}^{2}z^{2}-4pr_{1}r_{2}z+4r_% {1}r_{2}z= italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p italic_z + italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z + italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_n italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_n italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_n italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z + 4 italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z
=n2(p2z22pz2+2pz+z2z+1)absentsuperscript𝑛2superscript𝑝2superscript𝑧22𝑝superscript𝑧22𝑝𝑧superscript𝑧2𝑧1\displaystyle=n^{2}\left(p^{2}z^{2}-2pz^{2}+2pz+z^{2}-z+1\right)= italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_p italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_p italic_z + italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_z + 1 )
+n(2p2r1z2+2pr1z2+2pr1z)𝑛2superscript𝑝2subscript𝑟1superscript𝑧22𝑝subscript𝑟1superscript𝑧22𝑝subscript𝑟1𝑧\displaystyle+n\left(-2p^{2}r_{1}z^{2}+2pr_{1}z^{2}+2pr_{1}z\right)+ italic_n ( - 2 italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z )
+p2r12z24pr1r2z+4r1r2zsuperscript𝑝2superscriptsubscript𝑟12superscript𝑧24𝑝subscript𝑟1subscript𝑟2𝑧4subscript𝑟1subscript𝑟2𝑧\displaystyle+p^{2}r_{1}^{2}z^{2}-4pr_{1}r_{2}z+4r_{1}r_{2}z+ italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z + 4 italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z

We want to show that each piece is 0absent0\geq 0≥ 0. In fact we have

p2z22pz2+2pz+z2z+1=z((p1)2z+2p1)+1superscript𝑝2superscript𝑧22𝑝superscript𝑧22𝑝𝑧superscript𝑧2𝑧1𝑧superscript𝑝12𝑧2𝑝11p^{2}z^{2}-2pz^{2}+2pz+z^{2}-z+1=z\left((p-1)^{2}z+2p-1\right)+1italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_p italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_p italic_z + italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_z + 1 = italic_z ( ( italic_p - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z + 2 italic_p - 1 ) + 1

The minimum with respect to p𝑝pitalic_p of the inside expression occurs at p1=1z𝑝11𝑧p-1=\frac{-1}{z}italic_p - 1 = divide start_ARG - 1 end_ARG start_ARG italic_z end_ARG. Plugging in, that gives us

z(1z+12z)+1=z,𝑧1𝑧12𝑧1𝑧z\left(\frac{1}{z}+1-\frac{2}{z}\right)+1=z,italic_z ( divide start_ARG 1 end_ARG start_ARG italic_z end_ARG + 1 - divide start_ARG 2 end_ARG start_ARG italic_z end_ARG ) + 1 = italic_z ,

so as a result we have

n2(p2z22pz2+2pz+z2z+1)n2z.superscript𝑛2superscript𝑝2superscript𝑧22𝑝superscript𝑧22𝑝𝑧superscript𝑧2𝑧1superscript𝑛2𝑧n^{2}\left(p^{2}z^{2}-2pz^{2}+2pz+z^{2}-z+1\right)\geq n^{2}z.italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_p italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_p italic_z + italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_z + 1 ) ≥ italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z .

Next, we have

n(2p2r1z2+2pr1z2+2pr1z)=n(2r1z)((1p)pz+p)2nr1zp.𝑛2superscript𝑝2subscript𝑟1superscript𝑧22𝑝subscript𝑟1superscript𝑧22𝑝subscript𝑟1𝑧𝑛2subscript𝑟1𝑧1𝑝𝑝𝑧𝑝2𝑛subscript𝑟1𝑧𝑝n\left(-2p^{2}r_{1}z^{2}+2pr_{1}z^{2}+2pr_{1}z\right)=n(2r_{1}z)\left((1-p)pz+% p\right)\geq 2nr_{1}zp.italic_n ( - 2 italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z ) = italic_n ( 2 italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z ) ( ( 1 - italic_p ) italic_p italic_z + italic_p ) ≥ 2 italic_n italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z italic_p .

Finally, we have

p2r12z24pr1r2z+4r1r2z=p2r12z2+4(1p)r1r2zp2r12z2superscript𝑝2superscriptsubscript𝑟12superscript𝑧24𝑝subscript𝑟1subscript𝑟2𝑧4subscript𝑟1subscript𝑟2𝑧superscript𝑝2superscriptsubscript𝑟12superscript𝑧241𝑝subscript𝑟1subscript𝑟2𝑧superscript𝑝2superscriptsubscript𝑟12superscript𝑧2p^{2}r_{1}^{2}z^{2}-4pr_{1}r_{2}z+4r_{1}r_{2}z=p^{2}r_{1}^{2}z^{2}+4(1-p)r_{1}% r_{2}z\geq p^{2}r_{1}^{2}z^{2}italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z + 4 italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z = italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 ( 1 - italic_p ) italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z ≥ italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Putting everything together, we get

Dn2z+2nr1zp+p2r12z2=z(n2+2nr1p+p2r12z)0.𝐷superscript𝑛2𝑧2𝑛subscript𝑟1𝑧𝑝superscript𝑝2superscriptsubscript𝑟12superscript𝑧2𝑧superscript𝑛22𝑛subscript𝑟1𝑝superscript𝑝2superscriptsubscript𝑟12𝑧0D\geq n^{2}z+2nr_{1}zp+p^{2}r_{1}^{2}z^{2}=z(n^{2}+2nr_{1}p+p^{2}r_{1}^{2}z)% \geq 0.italic_D ≥ italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z + 2 italic_n italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z italic_p + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_z ( italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_n italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p + italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z ) ≥ 0 .

Thus a solution always exists.

To see that only one solution is positive, recall the form of the solution:

c=npznz+npr1z±D2z(nppr2+r2)𝑐plus-or-minus𝑛𝑝𝑧𝑛𝑧𝑛𝑝subscript𝑟1𝑧𝐷2𝑧𝑛𝑝𝑝subscript𝑟2subscript𝑟2c=\frac{npz-nz+n-pr_{1}z\pm\sqrt{D}}{2z(np-pr_{2}+r_{2})}italic_c = divide start_ARG italic_n italic_p italic_z - italic_n italic_z + italic_n - italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z ± square-root start_ARG italic_D end_ARG end_ARG start_ARG 2 italic_z ( italic_n italic_p - italic_p italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

We can observe that

n2(p2z22pz2+2pz+z2z+1)(n(z(p1)+1))2=n2zsuperscript𝑛2superscript𝑝2superscript𝑧22𝑝superscript𝑧22𝑝𝑧superscript𝑧2𝑧1superscript𝑛𝑧𝑝112superscript𝑛2𝑧n^{2}\left(p^{2}z^{2}-2pz^{2}+2pz+z^{2}-z+1\right)-(n(z(p-1)+1))^{2}=n^{2}zitalic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_p italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_p italic_z + italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_z + 1 ) - ( italic_n ( italic_z ( italic_p - 1 ) + 1 ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z

and as a result

D(npznz+npr1z)0.𝐷𝑛𝑝𝑧𝑛𝑧𝑛𝑝subscript𝑟1𝑧0\sqrt{D}-(npz-nz+n-pr_{1}z)\geq 0.square-root start_ARG italic_D end_ARG - ( italic_n italic_p italic_z - italic_n italic_z + italic_n - italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z ) ≥ 0 .

That means we have

c2=npznz+npr1zD2z(np+(1p)r2)<0<c1=npznz+npr1z+D2z(np+(1p)r2).subscript𝑐2𝑛𝑝𝑧𝑛𝑧𝑛𝑝subscript𝑟1𝑧𝐷2𝑧𝑛𝑝1𝑝subscript𝑟20subscript𝑐1𝑛𝑝𝑧𝑛𝑧𝑛𝑝subscript𝑟1𝑧𝐷2𝑧𝑛𝑝1𝑝subscript𝑟2c_{2}=\frac{npz-nz+n-pr_{1}z-\sqrt{D}}{2z(np+(1-p)r_{2})}<0<c_{1}=\frac{npz-nz% +n-pr_{1}z+\sqrt{D}}{2z(np+(1-p)r_{2})}.italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG italic_n italic_p italic_z - italic_n italic_z + italic_n - italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z - square-root start_ARG italic_D end_ARG end_ARG start_ARG 2 italic_z ( italic_n italic_p + ( 1 - italic_p ) italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG < 0 < italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG italic_n italic_p italic_z - italic_n italic_z + italic_n - italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z + square-root start_ARG italic_D end_ARG end_ARG start_ARG 2 italic_z ( italic_n italic_p + ( 1 - italic_p ) italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG .

Plugging c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in to the first equation, we have

b0=1V(p1p(n+r1c)+1+1c1),a0=c1b0.formulae-sequencesubscript𝑏01𝑉𝑝1𝑝𝑛subscript𝑟1𝑐11subscript𝑐1subscript𝑎0subscript𝑐1subscript𝑏0b_{0}=\frac{1}{V}\left(\frac{p}{1-p}\left(n+\frac{r_{1}}{c}\right)+1+\frac{1}{% c_{1}}\right),\quad a_{0}=c_{1}b_{0}.italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_V end_ARG ( divide start_ARG italic_p end_ARG start_ARG 1 - italic_p end_ARG ( italic_n + divide start_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_c end_ARG ) + 1 + divide start_ARG 1 end_ARG start_ARG italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

and we have found the unique positive solution. This completes the proof.

8.4 Proof of Theorem 5.2

We prove this theorem by induction. For the base case, when n=1𝑛1n=1italic_n = 1, the existence of a unique positive solution follows immediately from Theorem  5.1. For the inductive hypothesis, suppose that for a given n𝑛nitalic_n for the dimension of our vectors V𝑉Vitalic_V and L𝐿Litalic_L, we have the positive solution pair a0n,b0nsuperscriptsubscript𝑎0𝑛superscriptsubscript𝑏0𝑛a_{0}^{n},b_{0}^{n}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT that simultaneously satisfy the system (16). Thus, we have that

1ppb0n1𝑝𝑝superscriptsubscript𝑏0𝑛\displaystyle\frac{1-p}{p}b_{0}^{n}divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT =i=1n(1+b0na0nRi)/(Vi1a0n1b0n)absentsuperscriptsubscript𝑖1𝑛1superscriptsubscript𝑏0𝑛superscriptsubscript𝑎0𝑛subscript𝑅𝑖subscript𝑉𝑖1superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛\displaystyle=\sum_{i=1}^{n}\left(1+\frac{b_{0}^{n}}{a_{0}^{n}R_{i}}\right)/% \left(V_{i}-\frac{1}{a_{0}^{n}}-\frac{1}{b_{0}^{n}}\right)= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG )
1zpb0na0n1𝑧𝑝superscriptsubscript𝑏0𝑛superscriptsubscript𝑎0𝑛\displaystyle\frac{1}{zp}b_{0}^{n}-a_{0}^{n}divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT =i=1n(1+a0nRib0n)/(Vi1a0n1b0n).absentsuperscriptsubscript𝑖1𝑛1superscriptsubscript𝑎0𝑛subscript𝑅𝑖superscriptsubscript𝑏0𝑛subscript𝑉𝑖1superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛\displaystyle=\sum_{i=1}^{n}\left(1+\frac{a_{0}^{n}R_{i}}{b_{0}^{n}}\right)/% \left(V_{i}-\frac{1}{a_{0}^{n}}-\frac{1}{b_{0}^{n}}\right).= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ) .

If we continue to the step n+1𝑛1n+1italic_n + 1, we add strictly positive terms to the right hand side, and hence we have strict inequalities

1ppb0n<i=1n+1(1+b0na0nRi)/(Vi1a0n1b0n)1zpb0na0n<i=1n+1(1+a0nRib0n)/(Vi1a0n1b0n)1𝑝𝑝superscriptsubscript𝑏0𝑛superscriptsubscript𝑖1𝑛11superscriptsubscript𝑏0𝑛superscriptsubscript𝑎0𝑛subscript𝑅𝑖subscript𝑉𝑖1superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1𝑧𝑝superscriptsubscript𝑏0𝑛superscriptsubscript𝑎0𝑛superscriptsubscript𝑖1𝑛11superscriptsubscript𝑎0𝑛subscript𝑅𝑖superscriptsubscript𝑏0𝑛subscript𝑉𝑖1superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛\displaystyle\begin{split}\frac{1-p}{p}b_{0}^{n}&<\sum_{i=1}^{n+1}\left(1+% \frac{b_{0}^{n}}{a_{0}^{n}R_{i}}\right)/\left(V_{i}-\frac{1}{a_{0}^{n}}-\frac{% 1}{b_{0}^{n}}\right)\\ \frac{1}{zp}b_{0}^{n}-a_{0}^{n}&<\sum_{i=1}^{n+1}\left(1+\frac{a_{0}^{n}R_{i}}% {b_{0}^{n}}\right)/\left(V_{i}-\frac{1}{a_{0}^{n}}-\frac{1}{b_{0}^{n}}\right)% \end{split}start_ROW start_CELL divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_CELL start_CELL < ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ) end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_CELL start_CELL < ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ) end_CELL end_ROW (19)

and without loss of generality, we may assume that Vn+1miniVisubscript𝑉𝑛1subscript𝑖subscript𝑉𝑖V_{n+1}\geq\min_{i}V_{i}italic_V start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ≥ roman_min start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT so that Vn+1>1a0n+1b0nsubscript𝑉𝑛11superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛V_{n+1}>\frac{1}{a_{0}^{n}}+\frac{1}{b_{0}^{n}}italic_V start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT > divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG. Otherwise, we can suitably reorder the terms and apply the inductive hypothesis.

Define the functions f1,f2subscript𝑓1subscript𝑓2f_{1},f_{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT by

f1(a0,b0)subscript𝑓1subscript𝑎0subscript𝑏0\displaystyle f_{1}(a_{0},b_{0})italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =1ppb0i=1n+1(1+b0a0Ri)/(Vi1a01b0)absent1𝑝𝑝subscript𝑏0superscriptsubscript𝑖1𝑛11subscript𝑏0subscript𝑎0subscript𝑅𝑖subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\frac{1-p}{p}b_{0}-\sum_{i=1}^{n+1}\left(1+\frac{b_{0}}{a_{0}R_{% i}}\right)/\left(V_{i}-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)= divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )
f2(a0,b0)subscript𝑓2subscript𝑎0subscript𝑏0\displaystyle f_{2}(a_{0},b_{0})italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =1zpb0a0i=1n+1(1+a0Rib0)/(Vi1a01b0).absent1𝑧𝑝subscript𝑏0subscript𝑎0superscriptsubscript𝑖1𝑛11subscript𝑎0subscript𝑅𝑖subscript𝑏0subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\frac{1}{zp}b_{0}-a_{0}-\sum_{i=1}^{n+1}\left(1+\frac{a_{0}R_{i}% }{b_{0}}\right)/\left(V_{i}-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right).= divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) .

The remaining work is focused on finding the a0n+1,b0n+1superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1a_{0}^{n+1},b_{0}^{n+1}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT such that f1(a0n+1,b0n+1)=f2(a0n+1,b0n+1)=0subscript𝑓1superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1subscript𝑓2superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛10f_{1}(a_{0}^{n+1},b_{0}^{n+1})=f_{2}(a_{0}^{n+1},b_{0}^{n+1})=0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ) = italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ) = 0, and is separated into two steps:

  1. Step 1

    Show that we can find points (a01,b01)superscriptsubscript𝑎01superscriptsubscript𝑏01(a_{0}^{1},b_{0}^{1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ), (a02,b02)superscriptsubscript𝑎02superscriptsubscript𝑏02(a_{0}^{2},b_{0}^{2})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), (a03,b03)superscriptsubscript𝑎03superscriptsubscript𝑏03(a_{0}^{3},b_{0}^{3})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) with

    f1(a01,b01)>0,f2(a01,b01)>0,formulae-sequencesubscript𝑓1superscriptsubscript𝑎01superscriptsubscript𝑏010subscript𝑓2superscriptsubscript𝑎01superscriptsubscript𝑏010f_{1}(a_{0}^{1},b_{0}^{1})>0,\quad f_{2}(a_{0}^{1},b_{0}^{1})>0,italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) > 0 , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) > 0 ,
    f1(a02,b02)>0,f2(a02,b02)<0,formulae-sequencesubscript𝑓1superscriptsubscript𝑎02superscriptsubscript𝑏020subscript𝑓2superscriptsubscript𝑎02superscriptsubscript𝑏020f_{1}(a_{0}^{2},b_{0}^{2})>0,\quad f_{2}(a_{0}^{2},b_{0}^{2})<0,italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) > 0 , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) < 0 ,

    and

    f1(a03,b03)<0,f2(a03,b03)>0.formulae-sequencesubscript𝑓1superscriptsubscript𝑎03superscriptsubscript𝑏030subscript𝑓2superscriptsubscript𝑎03superscriptsubscript𝑏030f_{1}(a_{0}^{3},b_{0}^{3})<0,\quad f_{2}(a_{0}^{3},b_{0}^{3})>0.italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) < 0 , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) > 0 .

    These points, along with (an,bn)subscript𝑎𝑛subscript𝑏𝑛(a_{n},b_{n})( italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) from the inductive hypothesis, are shown in Figure 7 and the four points set up continuation arguments used in Step 2.

  2. Step 2

    Show that, by continuity of the solution maps, either case above leads to the existence of (a0n+1,b0n+1)superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1(a_{0}^{n+1},b_{0}^{n+1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ) simultaneously satisfying f1=f2=0subscript𝑓1subscript𝑓20f_{1}=f_{2}=0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.

Step 1 proof:

First, we observe that

lima0f1(a0,b0)=1ppb0i=1n+11Vi1b0.subscriptsubscript𝑎0subscript𝑓1subscript𝑎0subscript𝑏01𝑝𝑝subscript𝑏0superscriptsubscript𝑖1𝑛11subscript𝑉𝑖1subscript𝑏0\displaystyle\lim_{a_{0}\uparrow\infty}f_{1}(a_{0},b_{0})=\frac{1-p}{p}b_{0}-% \sum_{i=1}^{n+1}\frac{1}{V_{i}-\frac{1}{b_{0}}}.roman_lim start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ↑ ∞ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_ARG .

As long as we take b01>2max(V,p1p2i=1n+11Vi)superscriptsubscript𝑏012𝑉𝑝1𝑝2superscriptsubscript𝑖1𝑛11subscript𝑉𝑖b_{0}^{1}>2\max\left(V,\frac{p}{1-p}2\sum_{i=1}^{n+1}\frac{1}{V_{i}}\right)italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT > 2 roman_max ( italic_V , divide start_ARG italic_p end_ARG start_ARG 1 - italic_p end_ARG 2 ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) where V=maxiVi𝑉subscript𝑖subscript𝑉𝑖V=\max_{i}V_{i}italic_V = roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we can then find a large enough value a01superscriptsubscript𝑎01a_{0}^{1}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT satisfying f1(a01,b01)>0subscript𝑓1superscriptsubscript𝑎01superscriptsubscript𝑏010f_{1}(a_{0}^{1},b_{0}^{1})>0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) > 0. Next, we have

lima0f2(a0,b0)=subscriptsubscript𝑎0subscript𝑓2subscript𝑎0subscript𝑏0\lim_{a_{0}\uparrow\infty}f_{2}(a_{0},b_{0})=-\inftyroman_lim start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ↑ ∞ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = - ∞

for any b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, so in particular we can select a large enough a01superscriptsubscript𝑎01a_{0}^{1}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT with f2(a01,b01)<0subscript𝑓2superscriptsubscript𝑎01superscriptsubscript𝑏010f_{2}(a_{0}^{1},b_{0}^{1})<0italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) < 0 and f1(a01,b01)>0subscript𝑓1superscriptsubscript𝑎01superscriptsubscript𝑏010f_{1}(a_{0}^{1},b_{0}^{1})>0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) > 0. This gives us a point in the lower-right quadrant of Figure 7.

Now we observe that

limb0f2(a0,b0)=subscriptsubscript𝑏0subscript𝑓2subscript𝑎0subscript𝑏0\lim_{b_{0}\uparrow\infty}f_{2}(a_{0},b_{0})=\inftyroman_lim start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ↑ ∞ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ∞

along the path a0=12b0subscript𝑎012subscript𝑏0a_{0}=\frac{1}{2}b_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Along this same path, we have

limb0f1(a0,b0)=subscriptsubscript𝑏0subscript𝑓1subscript𝑎0subscript𝑏0\lim_{b_{0}\uparrow\infty}f_{1}(a_{0},b_{0})=\inftyroman_lim start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ↑ ∞ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = ∞

as well. We can thus select a large enough value b02superscriptsubscript𝑏02b_{0}^{2}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and a02=12b02superscriptsubscript𝑎0212superscriptsubscript𝑏02a_{0}^{2}=\frac{1}{2}b_{0}^{2}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for which f1(a02,b02)>0subscript𝑓1superscriptsubscript𝑎02superscriptsubscript𝑏020f_{1}(a_{0}^{2},b_{0}^{2})>0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) > 0 and f2(a02,b02)>0subscript𝑓2superscriptsubscript𝑎02superscriptsubscript𝑏020f_{2}(a_{0}^{2},b_{0}^{2})>0italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) > 0. This gives us the point in the upper-right quadrant of Figure 7.

Next, consider 0<ϵ<<10italic-ϵmuch-less-than10<\epsilon<<10 < italic_ϵ < < 1 and take

b0=1ϵ2,a0=1Vminϵϵ2formulae-sequencesubscript𝑏01superscriptitalic-ϵ2subscript𝑎01subscript𝑉italic-ϵsuperscriptitalic-ϵ2b_{0}=\frac{1}{\epsilon^{2}},\quad a_{0}=\frac{1}{V_{\min}-\epsilon-\epsilon^{% 2}}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_V start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT - italic_ϵ - italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

With these definitions, we have

f2(ϵ)1zpϵ21Vminϵϵ2(n+1)(1+ϵ2RiVminϵϵ2)1ϵ>0subscript𝑓2italic-ϵ1𝑧𝑝superscriptitalic-ϵ21subscript𝑉italic-ϵsuperscriptitalic-ϵ2𝑛11superscriptitalic-ϵ2subscript𝑅𝑖subscript𝑉italic-ϵsuperscriptitalic-ϵ21italic-ϵ0f_{2}(\epsilon)\geq\frac{1}{zp\epsilon^{2}}-\frac{1}{V_{\min}-\epsilon-% \epsilon^{2}}-(n+1)\left(1+\frac{\epsilon^{2}R_{i}}{V_{\min}-\epsilon-\epsilon% ^{2}}\right)\frac{1}{\epsilon}>0italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_ϵ ) ≥ divide start_ARG 1 end_ARG start_ARG italic_z italic_p italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_V start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT - italic_ϵ - italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - ( italic_n + 1 ) ( 1 + divide start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_V start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT - italic_ϵ - italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG > 0
f1(ϵ)<1ppϵi=1n+1(1+Vminϵϵ2ϵ2)1ϵ<0subscript𝑓1italic-ϵ1𝑝𝑝italic-ϵsuperscriptsubscript𝑖1𝑛11subscript𝑉italic-ϵsuperscriptitalic-ϵ2superscriptitalic-ϵ21italic-ϵ0f_{1}(\epsilon)<\frac{1-p}{p\epsilon}-\sum_{i=1}^{n+1}\left(1+\frac{V_{\min}-% \epsilon-\epsilon^{2}}{\epsilon^{2}}\right)\frac{1}{\epsilon}<0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_ϵ ) < divide start_ARG 1 - italic_p end_ARG start_ARG italic_p italic_ϵ end_ARG - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_V start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT - italic_ϵ - italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) divide start_ARG 1 end_ARG start_ARG italic_ϵ end_ARG < 0

for small ϵitalic-ϵ\epsilonitalic_ϵ. Thus for 0<ϵ<<10italic-ϵmuch-less-than10<\epsilon<<10 < italic_ϵ < < 1 we get a point (a03,b03)superscriptsubscript𝑎03superscriptsubscript𝑏03(a_{0}^{3},b_{0}^{3})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) with f2>0subscript𝑓20f_{2}>0italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 and f1<0subscript𝑓10f_{1}<0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < 0. This gives us a point in the upper left quadrant of Figure 7.

Finally, by the inductive hypothesis, we have f1(a0n,b0n)<0subscript𝑓1superscriptsubscript𝑎0𝑛superscriptsubscript𝑏0𝑛0f_{1}(a_{0}^{n},b_{0}^{n})<0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) < 0 and f2(a0n,b0n)<0subscript𝑓2superscriptsubscript𝑎0𝑛superscriptsubscript𝑏0𝑛0f_{2}(a_{0}^{n},b_{0}^{n})<0italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) < 0, which gives us a point in the lower left quadrant of Figure 7.

(a0n,b0n)superscriptsubscript𝑎0𝑛superscriptsubscript𝑏0𝑛(a_{0}^{n},b_{0}^{n})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )(a03,b03)superscriptsubscript𝑎03superscriptsubscript𝑏03(a_{0}^{3},b_{0}^{3})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT )(a01,b01)superscriptsubscript𝑎01superscriptsubscript𝑏01(a_{0}^{1},b_{0}^{1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT )(a02,b02)superscriptsubscript𝑎02superscriptsubscript𝑏02(a_{0}^{2},b_{0}^{2})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )(a0n+1,b0n+1)superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1(a_{0}^{n+1},b_{0}^{n+1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT )f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTf2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Figure 7: Note the generic form (a0,b0subscript𝑎0subscript𝑏0a_{0},b_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) is shorthand for (f1(a0,b0),f2(a0,b0))subscript𝑓1subscript𝑎0subscript𝑏0subscript𝑓2subscript𝑎0subscript𝑏0(f_{1}(a_{0},b_{0}),f_{2}(a_{0},b_{0}))( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ). The point (a0n+1,b0n+1)superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1(a_{0}^{n+1},b_{0}^{n+1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ) serves as desired solution point to complete the proof.
Step 2 Proof:

To show that there is a point (a0n+1,b0n+1)superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1(a_{0}^{n+1},b_{0}^{n+1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ) such that the inequalities (19) become equalities, we create separate interpolations relying on the intermediate value theorem (IVT) Rudin et al. [1964].

Consider the points (a0n,b0n)superscriptsubscript𝑎0𝑛superscriptsubscript𝑏0𝑛(a_{0}^{n},b_{0}^{n})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) and (a01,b01)superscriptsubscript𝑎01superscriptsubscript𝑏01(a_{0}^{1},b_{0}^{1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) and the convex combination

pλ=λ(a0n,b0n)+(1λ)(a01,b01)subscript𝑝𝜆𝜆superscriptsubscript𝑎0𝑛superscriptsubscript𝑏0𝑛1𝜆superscriptsubscript𝑎01superscriptsubscript𝑏01p_{\lambda}=\lambda(a_{0}^{n},b_{0}^{n})+(1-\lambda)(a_{0}^{1},b_{0}^{1})italic_p start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT = italic_λ ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) + ( 1 - italic_λ ) ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT )

We have f1(p1)<0,f1(p0)>0formulae-sequencesubscript𝑓1subscript𝑝10subscript𝑓1subscript𝑝00f_{1}(p_{1})<0,f_{1}(p_{0})>0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) < 0 , italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) > 0, so by the IVT there is a λ(0,1)𝜆01\lambda\in(0,1)italic_λ ∈ ( 0 , 1 ) with f1(pλ)=0subscript𝑓1subscript𝑝𝜆0f_{1}(p_{\lambda})=0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ) = 0.

If f2(pλ)>0subscript𝑓2subscript𝑝𝜆0f_{2}(p_{\lambda})>0italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ) > 0, we proceed to Case 1 below. If f2(pλ)<0subscript𝑓2subscript𝑝𝜆0f_{2}(p_{\lambda})<0italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ) < 0, we have a point of intersection below the f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT axis, as shown in Figure 8. We then apply IVT to (a03,b03)superscriptsubscript𝑎03superscriptsubscript𝑏03(a_{0}^{3},b_{0}^{3})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) and (a02,b02)superscriptsubscript𝑎02superscriptsubscript𝑏02(a_{0}^{2},b_{0}^{2})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). If the crossing point obtained from the IVT is above the f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT axis, we proceed to Case 2 below, and if it is below the f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT axis, we proceed to Case 1 below.

  • Case 1:

    f2(pλ)>0subscript𝑓2subscript𝑝𝜆0f_{2}(p_{\lambda})>0italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ) > 0 or IVT applied to (a03,b03)superscriptsubscript𝑎03superscriptsubscript𝑏03(a_{0}^{3},b_{0}^{3})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) and (a02,b02)superscriptsubscript𝑎02superscriptsubscript𝑏02(a_{0}^{2},b_{0}^{2})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) yields a point below the f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT axis. In either case, applying the IVT twice, we obtain two points along the f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT axis, as shown in Figure 9, with opposite signs along f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The constraint f2=0subscript𝑓20f_{2}=0italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 is easily incorporated into f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, which becomes

    f3(a0,b0)subscript𝑓3subscript𝑎0subscript𝑏0\displaystyle f_{3}(a_{0},b_{0})italic_f start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =(1p)z(a0+i=1n+1(1+a0Rib0)/(Vi1a01b0))absent1𝑝𝑧subscript𝑎0superscriptsubscript𝑖1𝑛11subscript𝑎0subscript𝑅𝑖subscript𝑏0subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=(1-p)z\left(a_{0}+\sum_{i=1}^{n+1}\left(1+\frac{a_{0}R_{i}}{b_{0% }}\right)/\left(V_{i}-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)\right)= ( 1 - italic_p ) italic_z ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) )
    i=1n+1(1+a0Rib0)/(Vi1a01b0)superscriptsubscript𝑖1𝑛11subscript𝑎0subscript𝑅𝑖subscript𝑏0subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle-\sum_{i=1}^{n+1}\left(1+\frac{a_{0}R_{i}}{b_{0}}\right)/\left(V_% {i}-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)- ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )

    Clearly f3subscript𝑓3f_{3}italic_f start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT has opposite signs for the two points of intersection in Figure 9, and applying IVT again we find the point (a0n+1,b0n+1)superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1(a_{0}^{n+1},b_{0}^{n+1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ).

  • Case 3:

    In this case, we have successfully found two points with f1=0subscript𝑓10f_{1}=0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, and opposite signs with respect to f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Just as in the previous case, we can explicitly incorporate the constraint f1=0subscript𝑓10f_{1}=0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 into f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, to obtain

    f4(a0,b0)subscript𝑓4subscript𝑎0subscript𝑏0\displaystyle f_{4}(a_{0},b_{0})italic_f start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =1zp(1p)i=1n+1(1+b0a0Ri)/(Vi1a01b0)absent1𝑧𝑝1𝑝superscriptsubscript𝑖1𝑛11subscript𝑏0subscript𝑎0subscript𝑅𝑖subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle=\frac{1}{zp(1-p)}\sum_{i=1}^{n+1}\left(1+\frac{b_{0}}{a_{0}R_{i}% }\right)/\left(V_{i}-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)= divide start_ARG 1 end_ARG start_ARG italic_z italic_p ( 1 - italic_p ) end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )
    a0i=1n+1(1+a0Rib0)/(Vi1a01b0)subscript𝑎0superscriptsubscript𝑖1𝑛11subscript𝑎0subscript𝑅𝑖subscript𝑏0subscript𝑉𝑖1subscript𝑎01subscript𝑏0\displaystyle-a_{0}-\sum_{i=1}^{n+1}\left(1+\frac{a_{0}R_{i}}{b_{0}}\right)/% \left(V_{i}-\frac{1}{a_{0}}-\frac{1}{b_{0}}\right)- italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ( 1 + divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) / ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG )

    Clearly f4subscript𝑓4f_{4}italic_f start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT has opposite signs now for the two crossing points shown on the f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT axis of Figure 8, and by IVT, we have existence of (a0n+1,b0n+1)superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1(a_{0}^{n+1},b_{0}^{n+1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ).

(a0n,b0n)superscriptsubscript𝑎0𝑛superscriptsubscript𝑏0𝑛(a_{0}^{n},b_{0}^{n})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )(a01,b01)superscriptsubscript𝑎01superscriptsubscript𝑏01(a_{0}^{1},b_{0}^{1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT )(a0n+1,b0n+1)superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1(a_{0}^{n+1},b_{0}^{n+1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT )(a02,b02)superscriptsubscript𝑎02superscriptsubscript𝑏02(a_{0}^{2},b_{0}^{2})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )(a03,b03)superscriptsubscript𝑎03superscriptsubscript𝑏03(a_{0}^{3},b_{0}^{3})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT )(a0,b0)subscript𝑎0subscript𝑏0(a_{0},b_{0})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )(a0,b0)subscript𝑎0subscript𝑏0(a_{0},b_{0})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )(a0,b0)subscript𝑎0subscript𝑏0(a_{0},b_{0})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTf2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Figure 8: The points and their associated continuous deformations (the colored lines) according to Case 1. Note the generic form (a0,b0subscript𝑎0subscript𝑏0a_{0},b_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) is shorthand for (f1(a0,b0),f2(a0,b0))subscript𝑓1subscript𝑎0subscript𝑏0subscript𝑓2subscript𝑎0subscript𝑏0(f_{1}(a_{0},b_{0}),f_{2}(a_{0},b_{0}))( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ).
(a0n,b0n)superscriptsubscript𝑎0𝑛superscriptsubscript𝑏0𝑛(a_{0}^{n},b_{0}^{n})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )(a01,b01)superscriptsubscript𝑎01superscriptsubscript𝑏01(a_{0}^{1},b_{0}^{1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT )(a0n+1,b0n+1)superscriptsubscript𝑎0𝑛1superscriptsubscript𝑏0𝑛1(a_{0}^{n+1},b_{0}^{n+1})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT )(a02,b02)superscriptsubscript𝑎02superscriptsubscript𝑏02(a_{0}^{2},b_{0}^{2})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )(a03,b03)superscriptsubscript𝑎03superscriptsubscript𝑏03(a_{0}^{3},b_{0}^{3})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT )f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTf2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
Figure 9: The points and their associated continuous deformations (the colored lines) according to Case 2 in the f1f2subscript𝑓1subscript𝑓2f_{1}-f_{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT plane. Note the generic form (a0,b0)subscript𝑎0subscript𝑏0(a_{0},b_{0})( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) is shorthand for (f1(a0,b0),f2(a0,b0))subscript𝑓1subscript𝑎0subscript𝑏0subscript𝑓2subscript𝑎0subscript𝑏0(f_{1}(a_{0},b_{0}),f_{2}(a_{0},b_{0}))( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ).

8.5 Proof of Theorem 5.3

To prove the result, we simplify and rewrite the equations

(Vi1a0+1b0)1ppb0subscript𝑉𝑖1subscript𝑎01subscript𝑏01𝑝𝑝subscript𝑏0\displaystyle\left(V_{i}-\frac{1}{a_{0}}+\frac{1}{b_{0}}\right)\frac{1-p}{p}b_% {0}( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =i=1n(b0a0Ri1)=b0a0i=1n1Rin=b0a0r1nabsentsuperscriptsubscript𝑖1𝑛subscript𝑏0subscript𝑎0subscript𝑅𝑖1subscript𝑏0subscript𝑎0superscriptsubscript𝑖1𝑛1subscript𝑅𝑖𝑛subscript𝑏0subscript𝑎0subscript𝑟1𝑛\displaystyle=\sum_{i=1}^{n}\left(\frac{b_{0}}{a_{0}R_{i}}-1\right)=\frac{b_{0% }}{a_{0}}\sum_{i=1}^{n}\frac{1}{R_{i}}-n=\frac{b_{0}}{a_{0}}r_{1}-n= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - 1 ) = divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - italic_n = divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_n
(Vi1a0+1b0)(1zpb0a0)subscript𝑉𝑖1subscript𝑎01subscript𝑏01𝑧𝑝subscript𝑏0subscript𝑎0\displaystyle\left(V_{i}-\frac{1}{a_{0}}+\frac{1}{b_{0}}\right)\left(\frac{1}{% zp}b_{0}-a_{0}\right)( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) ( divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =i=1n(1a0Rib0)=na0b0i=1nRi=na0b0r2absentsuperscriptsubscript𝑖1𝑛1subscript𝑎0subscript𝑅𝑖subscript𝑏0𝑛subscript𝑎0subscript𝑏0superscriptsubscript𝑖1𝑛subscript𝑅𝑖𝑛subscript𝑎0subscript𝑏0subscript𝑟2\displaystyle=\sum_{i=1}^{n}\left(1-\frac{a_{0}R_{i}}{b_{0}}\right)=n-\frac{a_% {0}}{b_{0}}\sum_{i=1}^{n}R_{i}=n-\frac{a_{0}}{b_{0}}r_{2}= ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( 1 - divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) = italic_n - divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_n - divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

Dividing the equations we obtain

b0a0r1nna0b0r2=1ppb01zpb0a0=1pp1zpa0b0subscript𝑏0subscript𝑎0subscript𝑟1𝑛𝑛subscript𝑎0subscript𝑏0subscript𝑟21𝑝𝑝subscript𝑏01𝑧𝑝subscript𝑏0subscript𝑎01𝑝𝑝1𝑧𝑝subscript𝑎0subscript𝑏0\frac{\frac{b_{0}}{a_{0}}r_{1}-n}{n-\frac{a_{0}}{b_{0}}r_{2}}=\frac{\frac{1-p}% {p}b_{0}}{\frac{1}{zp}b_{0}-a_{0}}=\frac{\frac{1-p}{p}}{\frac{1}{zp}-\frac{a_{% 0}}{b_{0}}}divide start_ARG divide start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_n end_ARG start_ARG italic_n - divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG = divide start_ARG divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG = divide start_ARG divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG - divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_ARG

Defining now c=a0b0𝑐subscript𝑎0subscript𝑏0c=\frac{a_{0}}{b_{0}}italic_c = divide start_ARG italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG we have

r1cnncr2=1pp1zpcsubscript𝑟1𝑐𝑛𝑛𝑐subscript𝑟21𝑝𝑝1𝑧𝑝𝑐\frac{\frac{r_{1}}{c}-n}{n-cr_{2}}=\frac{\frac{1-p}{p}}{\frac{1}{zp}-c}divide start_ARG divide start_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_c end_ARG - italic_n end_ARG start_ARG italic_n - italic_c italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG = divide start_ARG divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_z italic_p end_ARG - italic_c end_ARG

with the inherited constraint that c<1𝑐1c<1italic_c < 1, since a0<b0subscript𝑎0subscript𝑏0a_{0}<b_{0}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT by definition. Multiplying by c𝑐citalic_c we have

r1ncnr2c=(1p)zc1pzcsubscript𝑟1𝑛𝑐𝑛subscript𝑟2𝑐1𝑝𝑧𝑐1𝑝𝑧𝑐\frac{r_{1}-nc}{n-r_{2}c}=\frac{(1-p)zc}{1-pzc}divide start_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_n italic_c end_ARG start_ARG italic_n - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_c end_ARG = divide start_ARG ( 1 - italic_p ) italic_z italic_c end_ARG start_ARG 1 - italic_p italic_z italic_c end_ARG

The solution is given by

c=n(zpz+1)+pr1z±D2z(np+(1p)r2)𝑐plus-or-minus𝑛𝑧𝑝𝑧1𝑝subscript𝑟1𝑧𝐷2𝑧𝑛𝑝1𝑝subscript𝑟2c=\frac{n(z-pz+1)+pr_{1}z\pm\sqrt{D}}{2z(np+(1-p)r_{2})}italic_c = divide start_ARG italic_n ( italic_z - italic_p italic_z + 1 ) + italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z ± square-root start_ARG italic_D end_ARG end_ARG start_ARG 2 italic_z ( italic_n italic_p + ( 1 - italic_p ) italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG

where

D=(n(pzz1)r1zp)24r1(nzp+r2zr2pz).𝐷superscript𝑛𝑝𝑧𝑧1subscript𝑟1𝑧𝑝24subscript𝑟1𝑛𝑧𝑝subscript𝑟2𝑧subscript𝑟2𝑝𝑧D=(n(pz-z-1)-r_{1}zp)^{2}-4r_{1}(nzp+r_{2}z-r_{2}pz).italic_D = ( italic_n ( italic_p italic_z - italic_z - 1 ) - italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_z italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_n italic_z italic_p + italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z - italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p italic_z ) .

By inspection the coefficient for n2superscript𝑛2n^{2}italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is given by

(pzz1)2,superscript𝑝𝑧𝑧12(pz-z-1)^{2},( italic_p italic_z - italic_z - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

which is always positive. The coefficient for n𝑛nitalic_n is given by

(2(pzz1)4)zpr1=2zpr1(zpz1)2𝑝𝑧𝑧14𝑧𝑝subscript𝑟12𝑧𝑝subscript𝑟1𝑧𝑝𝑧1(-2(pz-z-1)-4)zpr_{1}=2zpr_{1}(z-pz-1)( - 2 ( italic_p italic_z - italic_z - 1 ) - 4 ) italic_z italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 2 italic_z italic_p italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_z - italic_p italic_z - 1 )

This term is non-negative exactly when (1p)z11𝑝𝑧1(1-p)z\geq 1( 1 - italic_p ) italic_z ≥ 1. Finally, the constant term is given by

r12z2p2+4r1r2z(p1).superscriptsubscript𝑟12superscript𝑧2superscript𝑝24subscript𝑟1subscript𝑟2𝑧𝑝1r_{1}^{2}z^{2}p^{2}+4r_{1}r_{2}z(p-1).italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_z ( italic_p - 1 ) .

The condition for when this term is non-negative can also be written in terms of (1p)z1𝑝𝑧(1-p)z( 1 - italic_p ) italic_z:

(1p)z(1pp)24r2.1𝑝𝑧superscript1𝑝𝑝24subscript𝑟2(1-p)z\geq\left(\frac{1-p}{p}\right)^{2}4r_{2}.( 1 - italic_p ) italic_z ≥ ( divide start_ARG 1 - italic_p end_ARG start_ARG italic_p end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 4 italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

It is easy to find a counter-example when these conditions are violated, and where D<0𝐷0D<0italic_D < 0.

n=2,p=0.1,z=1.1,r1=31.9,r2=1.formulae-sequence𝑛2formulae-sequence𝑝0.1formulae-sequence𝑧1.1formulae-sequencesubscript𝑟131.9subscript𝑟21n=2,\quad p=0.1,\quad z=1.1,\quad r_{1}=31.9,\quad r_{2}=1.italic_n = 2 , italic_p = 0.1 , italic_z = 1.1 , italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 31.9 , italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 .

This yields D=33.4𝐷33.4D=-33.4italic_D = - 33.4, so there is no solution. In this case, (1p)z=0.991𝑝𝑧0.99(1-p)z=0.99( 1 - italic_p ) italic_z = 0.99, while (1p)2p24r2=324superscript1𝑝2superscript𝑝24subscript𝑟2324\frac{(1-p)^{2}}{p^{2}}4r_{2}=324divide start_ARG ( 1 - italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG 4 italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 324, so both inequalities are violated, the latter significantly. The result is easily achievable, since we just want to find α𝛼\alphaitalic_α with

21α+2α=31.9.21𝛼2𝛼31.9\frac{2}{1-\alpha}+\frac{2}{\alpha}=31.9.divide start_ARG 2 end_ARG start_ARG 1 - italic_α end_ARG + divide start_ARG 2 end_ARG start_ARG italic_α end_ARG = 31.9 .

This gives us

R1=.9328,R2=.0672.formulae-sequencesubscript𝑅1.9328subscript𝑅2.0672R_{1}=.9328,R_{2}=.0672.italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = .9328 , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = .0672 .