Multi-Source Conformal Inference Under Distribution Shift

Yi Liu    Alexander W. Levis    Sharon-Lise Normand    Larry Han
Abstract

Recent years have experienced increasing utilization of complex machine learning models across multiple sources of data to inform more generalizable decision-making. However, distribution shifts across data sources and privacy concerns related to sharing individual-level data, coupled with a lack of uncertainty quantification from machine learning predictions, make it challenging to achieve valid inferences in multi-source environments. In this paper, we consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources. We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations, and show that one can incorporate machine learning prediction algorithms in the estimation of nuisance functions while still achieving parametric rates of convergence to nominal coverage probabilities. Moreover, when conditional outcome invariance is violated, we propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction. We highlight the robustness and efficiency of our proposals for a variety of conformal scores and data-generating mechanisms via extensive synthetic experiments. Hospital length of stay prediction intervals for pediatric patients undergoing a high-risk cardiac surgical procedure between 2016-2022 in the U.S. illustrate the utility of our methodology.


1 Introduction

Conformal inference is a set of methods used to construct distribution-free, nonparametric prediction intervals, for an outcome Y𝑌Yitalic_Y on the basis of covariates 𝑿𝑿\boldsymbol{X}bold_italic_X, with finite-sample marginal coverage guarantees. The framework was first introduced by Vovk et al. (2005, 2009) and has since been extended to regression settings under covariate shift (Lei et al., 2018; Tibshirani et al., 2019; Lei & Candès, 2021). Recently, Yang et al. (2024) proposed robust prediction intervals under covariate shift by revealing a connection with the missing data literature, and appealing to modern semiparametric efficiency theory. However, Yang et al. (2024) assume only a single data source such that the conditional outcome distribution Y𝑿conditional𝑌𝑿Y\mid\boldsymbol{X}italic_Y ∣ bold_italic_X—and therefore the conditional distribution of conformal scores—is homogeneous. In general, conformal prediction methods have focused on covariate shift while assuming that conditional outcome distributions are invariant across environments (Peters et al., 2016). We note, however, that some work has studied label shift settings (e.g., Podkopaev & Ramdas (2021)), but this involves the analogously strong assumption that the distribution of 𝑿Yconditional𝑿𝑌\boldsymbol{X}\mid Ybold_italic_X ∣ italic_Y is homogeneous. We refer the reader to Barber et al. (2023) and the extensive literature review therein for other works.

In reality, conditional outcome invariance is unlikely to hold in the real world. In recent years, there has been a huge increase in popularity in using large clinical research networks that facilitate multi-center collaboration. One goal with these networks is to leverage the multiple diverse data sources to mitigate issues related to small or non-representative data, thereby increasing statistical power for probing various scientific hypotheses. However, different clinical sites may be heterogeneous in terms of patient populations, treatment practices, and patient outcomes. Furthermore, since individual-level data is protected by privacy regulations such as HIPAA and GDPR, direct pooling of data across sites is typically not feasible. Federated transfer learning methods have been proposed as powerful tools for integrating heterogeneous data (Duan et al., 2020a; Li et al., 2023), and have been applied to yield robust point estimation of the effect of a treatment on a combined population across sites (Xiong et al., 2023; Vo et al., 2022b), and for the treatment effect on a specific target population (Han et al., 2021, 2024, 2023; Vo et al., 2022a), while accounting for data-sharing constraints and heterogeneity (i.e., covariate shift and different conditional outcome distributions). It has also been applied in problems related to interval estimation, e.g., constructing robust confidence intervals by selecting eligible sites (Guo et al., 2023) with uniform coverage guarantees.

In conformal prediction, Lu et al. (2023) proposed a notion of partial exchangeability, but the focus of their work is to construct prediction intervals on the combined population across sites, and not any particular target site. Relatedly, Plassier et al. (2023) considered federated conformal prediction under label shift via quantile regression, and Humbert et al. (2023) proposed a quantile-of-quantiles estimator for conformal prediction by aggregating multiple quantiles returned by each site. To date, there are no federated learning methods developed for conformal inference on a missing outcome in a setting with distribution shift across multi-site data, and where data cannot be directly combined due to privacy concerns. When conditional outcome distributions are not the same across sites, there is likely to be poor conformal set performance with existing methods when transferring prediction models (e.g., learned conditional quantiles) from one set to another, e.g., deployment to target distributions that are different from the source distribution (** et al., 2023a; Cai et al., 2023).

Our work differs from recent work by Lee et al. (2023) and Dunn et al. (2023) in important ways. Lee et al. (2023) focus on predicting an outcome on a new subject from a new (unobserved) site. Dunn et al. (2023) focus on this same task, and also consider a simplified version of the problem of predicting an outcome on a new subject from an existing (observed) site—they propose an unsupervised method that does not allow for the inclusion of covariates, and leave the supervised version as an open problem. Neither work allows for outcome missingness. In this paper, we fill these methodological and applied gaps by leveraging conformal prediction tools to provide patients with personalized predictions using multi-source data, accounting for missing data and distribution shifts, i.e., covariate shift and heterogeneous conditional outcome distributions. We propose a method to obtain valid prediction intervals, exploiting information from multiple potentially heterogeneous sites, and respecting the privacy of individual-level data when it cannot be shared. Our proposal shares the marginal coverage properties of conformal prediction methods and builds on modern semiparametric efficiency theory and federated learning for more robust and efficient uncertainty quantification.

2 Prediction interval construction

2.1 Notation and background

Consider the following multi-site paradigm with missing data. We have data from K𝐾Kitalic_K sites, and for each subject in each site, we observe a covariate vector 𝑿𝑿\boldsymbol{X}bold_italic_X. Let T{0,1,,K1}𝑇01𝐾1T\in\{0,1,...,K-1\}italic_T ∈ { 0 , 1 , … , italic_K - 1 } denote the study sites, where T=0𝑇0T=0italic_T = 0 indicates the target site and the remainder are source sites. Let R𝑅Ritalic_R be an indicator for observing the outcome Y𝑌Yitalic_Y, i.e., R=1𝑅1R=1italic_R = 1 if Y𝑌Yitalic_Y is observed and R=0𝑅0R=0italic_R = 0 if Y𝑌Yitalic_Y is missing. The data are assumed to be a random sample of n𝑛nitalic_n i.i.d. copies of 𝒪=(𝑿,T,R,RY)𝒪𝑿𝑇𝑅𝑅𝑌similar-to\mathcal{O}=(\boldsymbol{X},T,R,RY)\sim\mathbb{P}caligraphic_O = ( bold_italic_X , italic_T , italic_R , italic_R italic_Y ) ∼ blackboard_P. Throughout, let n(f)1ni=1nf(𝒪i)subscript𝑛𝑓1𝑛superscriptsubscript𝑖1𝑛𝑓subscript𝒪𝑖\mathbb{P}_{n}(f)\equiv\frac{1}{n}\sum_{i=1}^{n}f(\mathcal{O}_{i})blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_f ) ≡ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) be shorthand for the empirical average. To proceed, we make the following standard assumptions.

Assumption 2.1 (Missing at random [MAR]).
RYT,𝑿.perpendicular-toabsentperpendicular-to𝑅conditional𝑌𝑇𝑿R\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{% \displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0% mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.% 0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}% \mkern 2.0mu{\scriptscriptstyle\perp}}}Y\mid T,\boldsymbol{X}.italic_R start_RELOP ⟂ ⟂ end_RELOP italic_Y ∣ italic_T , bold_italic_X .
Assumption 2.2 (Positivity).

For ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0,

[[R=1T,𝑿]ϵ]=1.delimited-[]delimited-[]𝑅conditional1𝑇𝑿italic-ϵ1\mathbb{P}[\mathbb{P}[R=1\mid T,\boldsymbol{X}]\geq\epsilon]=1.blackboard_P [ blackboard_P [ italic_R = 1 ∣ italic_T , bold_italic_X ] ≥ italic_ϵ ] = 1 .

Note that MAR (i.e., Assumption 2.1), which asserts that missingness status is not informative about outcomes, given T𝑇Titalic_T and 𝑿𝑿\boldsymbol{X}bold_italic_X, and positivity (i.e., Assumption 2.2), which requires that no subjects have outcomes that could never be observed, are both required for point identification of the distribution of missing outcomes and are standard in this literature (Lei & Candès, 2021; Yang et al., 2024)

We construct prediction intervals of the form C^α(𝑿)subscript^𝐶𝛼𝑿\widehat{C}_{\alpha}(\boldsymbol{X})over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_italic_X ), for α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ), such that

(YC^α(𝑿)T=0,R=0)1α.\mathbb{P}(Y\in\widehat{C}_{\alpha}(\boldsymbol{X})\mid T=0,R=0)\geq 1-\alpha.blackboard_P ( italic_Y ∈ over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 ) ≥ 1 - italic_α .

That is, our predictions should be tailored for missing outcomes in the target site, with marginal coverage guarantees. In the spirit of conformal inference, we introduce a conformal score, S(𝑿,Y)𝑆𝑿𝑌S(\boldsymbol{X},Y)italic_S ( bold_italic_X , italic_Y ), which for now we assume is fixed. Our predictions will be based on this score, namely C^α(𝑿)={y:S(𝑿,y)r^}subscript^𝐶𝛼𝑿conditional-set𝑦𝑆𝑿𝑦^𝑟\widehat{C}_{\alpha}(\boldsymbol{X})=\left\{y\in\mathbb{R}:S(\boldsymbol{X},y)% \leq\widehat{r}\right\}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_italic_X ) = { italic_y ∈ blackboard_R : italic_S ( bold_italic_X , italic_y ) ≤ over^ start_ARG italic_r end_ARG }, where r^^𝑟\widehat{r}over^ start_ARG italic_r end_ARG is an estimate of r0=r0(α)()subscript𝑟0subscript𝑟0𝛼r_{0}=r_{0}(\alpha)(\mathbb{P})italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ), the (1α)1𝛼(1-\alpha)( 1 - italic_α )-quantile of the conformal score S(𝑿,Y)𝑆𝑿𝑌S(\boldsymbol{X},Y)italic_S ( bold_italic_X , italic_Y ) in the target site.

Under MAR, the functional r0=r0(α)()subscript𝑟0subscript𝑟0𝛼r_{0}=r_{0}(\alpha)(\mathbb{P})italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) is identified as the solution to an estimating equation:

(S(𝑿,Y)r0T=0,R=0)\displaystyle\mathbb{P}(S(\boldsymbol{X},Y)\leq r_{0}\mid T=0,R=0)blackboard_P ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∣ italic_T = 0 , italic_R = 0 )
=𝔼((S(𝑿,Y)r0T=0,𝑿,R=1)T=0,R=0)\displaystyle=\mathbb{E}_{\mathbb{P}}(\mathbb{P}(S(\boldsymbol{X},Y)\leq r_{0}% \mid T=0,\boldsymbol{X},R=1)\mid T=0,R=0)= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( blackboard_P ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∣ italic_T = 0 , bold_italic_X , italic_R = 1 ) ∣ italic_T = 0 , italic_R = 0 )
=1α.absent1𝛼\displaystyle=1-\alpha.= 1 - italic_α .

Without imposing any further structure, the nonparametric influence function of this functional can be derived (Yang et al., 2024).

Theorem 2.3 (Yang et al. (2024)).

Under Assumptions 2.1 and 2.2, the nonparametric influence function of the functional r0=r0(α)()subscript𝑟0subscript𝑟0𝛼r_{0}=r_{0}(\alpha)(\mathbb{P})italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) is given by

r˙0(𝒪;)subscript˙𝑟0𝒪\displaystyle\dot{r}_{0}(\mathcal{O};\mathbb{P})over˙ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; blackboard_P )
I(T=0)[(1R){m0(r0,𝑿)(1α)}\displaystyle\propto I(T=0)\big{[}(1-R)\{m_{0}(r_{0},\boldsymbol{X})-(1-\alpha)\}∝ italic_I ( italic_T = 0 ) [ ( 1 - italic_R ) { italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) - ( 1 - italic_α ) }
+Rη0(𝑿){I(S(𝑿,Y)r0)m0(r0,𝑿)}]\displaystyle\quad\quad+R\eta_{0}(\boldsymbol{X})\{I(S(\boldsymbol{X},Y)\leq r% _{0})-m_{0}(r_{0},\boldsymbol{X})\}\big{]}+ italic_R italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) { italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) } ]
φ0(𝒪;r0,m0,η0),absentsubscript𝜑0𝒪subscript𝑟0subscript𝑚0subscript𝜂0\displaystyle\eqqcolon\varphi_{0}(\mathcal{O};r_{0},m_{0},\eta_{0}),≕ italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , (1)

where

m0(r,𝑿)=(S(𝑿,Y)r𝑿,T=0,R=1)m_{0}(r,\boldsymbol{X})=\mathbb{P}(S(\boldsymbol{X},Y)\leq r\mid\boldsymbol{X}% ,T=0,R=1)italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , bold_italic_X ) = blackboard_P ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r ∣ bold_italic_X , italic_T = 0 , italic_R = 1 )

is the cumulative distribution function (CDF) of the conformal score, and

η0(𝑿)=(R=0T=0,𝑿)(R=1T=0,𝑿)\eta_{0}(\boldsymbol{X})=\frac{\mathbb{P}(R=0\mid T=0,\boldsymbol{X})}{\mathbb% {P}(R=1\mid T=0,\boldsymbol{X})}italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) = divide start_ARG blackboard_P ( italic_R = 0 ∣ italic_T = 0 , bold_italic_X ) end_ARG start_ARG blackboard_P ( italic_R = 1 ∣ italic_T = 0 , bold_italic_X ) end_ARG

is the missingness risk ratio.

Yang et al. (2024) propose a robust estimator r^0subscript^𝑟0\widehat{r}_{0}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT that solves 0=n[φ0(𝒪;r,m^0,η^0)]0subscript𝑛delimited-[]subscript𝜑0𝒪𝑟subscript^𝑚0subscript^𝜂00=\mathbb{P}_{n}\left[\varphi_{0}(\mathcal{O};r,\widehat{m}_{0},\widehat{\eta}% _{0})\right]0 = blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT [ italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; italic_r , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] for r𝑟ritalic_r, where m^0,η^0subscript^𝑚0subscript^𝜂0\widehat{m}_{0},\widehat{\eta}_{0}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are estimated nuisance functions.

Applying the method of Yang et al. (2024) in our multi-source data setting would only use data from the target site T=0𝑇0T=0italic_T = 0 itself. To leverage data from the other K1𝐾1K-1italic_K - 1 sites, we make two contributions: (i) we propose a fully efficient estimator of r0subscript𝑟0r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT under further structural assumptions regarding outcome distribution homogeneity (Section 2.2), and (ii) develop (Section 2.3) and implement (Section 3) a data-adaptive approach when these structural assumptions may be violated.

2.2 Efficient estimation under homogeneity

When subjects from different data sources are deemed to be similar, it may be reasonable to assert that the outcome distribution is common across them. This idea is formalized with the following structural assumption.

Assumption 2.4 (Common conditional outcome distribution [CCOD]).

TY𝑿perpendicular-toabsentperpendicular-to𝑇conditional𝑌𝑿T\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{% \displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0% mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.% 0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}% \mkern 2.0mu{\scriptscriptstyle\perp}}}Y\mid\boldsymbol{X}italic_T start_RELOP ⟂ ⟂ end_RELOP italic_Y ∣ bold_italic_X.

Notably, Assumption 2.4 entails no restriction on the covariate distribution across sites. That is, any level of covariate shift is permitted. Under CCOD (i.e., Assumption 2.4), data from non-target source sites may be leveraged to improve the estimation of the target site quantile r0subscript𝑟0r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Our first result generalizes Theorem 2.3 to the multi-source setting under CCOD.

Theorem 2.5.

Under Assumptions 2.1, 2.2, and 2.4, the semiparametric efficient influence function (EIF) of r0=r0(α)()subscript𝑟0subscript𝑟0𝛼r_{0}=r_{0}(\alpha)(\mathbb{P})italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) is given by

r˙0CCOD(𝒪;)superscriptsubscript˙𝑟0CCOD𝒪\displaystyle\dot{r}_{0}^{\mathrm{CCOD}}(\mathcal{O};\mathbb{P})over˙ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; blackboard_P )
I(T=0)(1R){m¯(r0,𝑿)(1α)}proportional-toabsent𝐼𝑇01𝑅¯𝑚subscript𝑟0𝑿1𝛼\displaystyle\propto I(T=0)(1-R)\left\{\overline{m}(r_{0},\boldsymbol{X})-(1-% \alpha)\right\}∝ italic_I ( italic_T = 0 ) ( 1 - italic_R ) { over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) - ( 1 - italic_α ) }
+Rη¯(𝑿)q0(𝑿){I(S(𝑿,Y)r0)m¯(r0,𝑿)}𝑅¯𝜂𝑿subscript𝑞0𝑿𝐼𝑆𝑿𝑌subscript𝑟0¯𝑚subscript𝑟0𝑿\displaystyle\quad+R\overline{\eta}(\boldsymbol{X})q_{0}(\boldsymbol{X})\left% \{I(S(\boldsymbol{X},Y)\leq r_{0})-\overline{m}(r_{0},\boldsymbol{X})\right\}+ italic_R over¯ start_ARG italic_η end_ARG ( bold_italic_X ) italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) { italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) }
φCCOD(𝒪;r0,m¯,η¯,q0),absentsuperscript𝜑CCOD𝒪subscript𝑟0¯𝑚¯𝜂subscript𝑞0\displaystyle\eqqcolon\varphi^{\mathrm{CCOD}}(\mathcal{O};r_{0},\overline{m},% \overline{\eta},q_{0}),≕ italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over¯ start_ARG italic_m end_ARG , over¯ start_ARG italic_η end_ARG , italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , (2)

where

m¯(r,𝑿)=(S(𝑿,Y)r𝑿,R=1)¯𝑚𝑟𝑿𝑆𝑿𝑌conditional𝑟𝑿𝑅1\overline{m}(r,\boldsymbol{X})=\mathbb{P}(S(\boldsymbol{X},Y)\leq r\mid% \boldsymbol{X},R=1)over¯ start_ARG italic_m end_ARG ( italic_r , bold_italic_X ) = blackboard_P ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r ∣ bold_italic_X , italic_R = 1 )

is the global CDF of the conformal score,

η¯(𝑿)=(R=0𝑿)(R=1𝑿)¯𝜂𝑿𝑅conditional0𝑿𝑅conditional1𝑿\overline{\eta}(\boldsymbol{X})=\frac{\mathbb{P}(R=0\mid\boldsymbol{X})}{% \mathbb{P}(R=1\mid\boldsymbol{X})}over¯ start_ARG italic_η end_ARG ( bold_italic_X ) = divide start_ARG blackboard_P ( italic_R = 0 ∣ bold_italic_X ) end_ARG start_ARG blackboard_P ( italic_R = 1 ∣ bold_italic_X ) end_ARG

is the global missingness risk ratio, and

q0(𝑿)=[T=0𝑿,R=0]subscript𝑞0𝑿delimited-[]𝑇conditional0𝑿𝑅0q_{0}(\boldsymbol{X})=\mathbb{P}[T=0\mid\boldsymbol{X},R=0]italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) = blackboard_P [ italic_T = 0 ∣ bold_italic_X , italic_R = 0 ]

is the target-site propensity.

Compared to the nonparametric influence function of the (1α)1𝛼(1-\alpha)( 1 - italic_α )-quantile of the conformal score (2.3), which uses data from the target site only, the semiparametric EIF (2.5) leverages data from all sites with observed outcomes Y𝑌Yitalic_Y. Under CCOD, we propose the estimator r^CCODsuperscript^𝑟CCOD\widehat{r}^{\mathrm{CCOD}}over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT which solves 0=n[φCCOD(𝒪;r,m¯^,η¯^,q^0)]0subscript𝑛delimited-[]superscript𝜑CCOD𝒪𝑟^¯𝑚^¯𝜂subscript^𝑞00=\mathbb{P}_{n}\left[\varphi^{\mathrm{CCOD}}(\mathcal{O};r,\widehat{\overline% {m}},\widehat{\overline{\eta}},\widehat{q}_{0})\right]0 = blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT [ italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; italic_r , over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG , over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] for r𝑟ritalic_r. We perform cross-fitting such that the nuisance estimators (m¯^,η¯^,q^0)^¯𝑚^¯𝜂subscript^𝑞0(\widehat{\overline{m}},\widehat{\overline{\eta}},\widehat{q}_{0})( over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG , over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) are estimated on an independent data split from the given estimating equation. The following result demonstrates the marginal coverage properties of the conformal interval C^αCCOD(𝑿)={y:S(𝑿,y)r^CCOD}superscriptsubscript^𝐶𝛼CCOD𝑿conditional-set𝑦𝑆𝑿𝑦superscript^𝑟CCOD\widehat{C}_{\alpha}^{\mathrm{CCOD}}(\boldsymbol{X})=\{y\in\mathbb{R}:S(% \boldsymbol{X},y)\leq\widehat{r}^{\mathrm{CCOD}}\}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( bold_italic_X ) = { italic_y ∈ blackboard_R : italic_S ( bold_italic_X , italic_y ) ≤ over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT }.

Theorem 2.6.

Let Dnsuperscript𝐷𝑛D^{n}italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT denote the training data with which r^CCODsuperscript^𝑟CCOD\widehat{r}^{\mathrm{CCOD}}over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT is fit, and let (𝐗,T,R)𝐗𝑇𝑅(\boldsymbol{X},T,R)( bold_italic_X , italic_T , italic_R ) denote a new independent test point with associated outcome Y𝑌Yitalic_Y. Assume that (m¯^,η¯^,q^0)^¯𝑚^¯𝜂subscript^𝑞0(\widehat{\overline{m}},\widehat{\overline{\eta}},\widehat{q}_{0})( over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG , over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) are each uniformly bounded, and that m¯^(,𝐱)^¯𝑚𝐱\widehat{\overline{m}}(\,\cdot\,,\boldsymbol{x})over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG ( ⋅ , bold_italic_x ) is a non-decreasing function, for each 𝐱𝐱\boldsymbol{x}bold_italic_x. Under Assumptions 2.1, 2.2, and 2.4,

[YC^αCCOD(𝑿)T=0,R=0,Dn]\displaystyle\mathbb{P}[Y\in\widehat{C}_{\alpha}^{\mathrm{CCOD}}(\boldsymbol{X% })\mid T=0,R=0,D^{n}]blackboard_P [ italic_Y ∈ over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 , italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ]
=(1α)+O(n1/2+Rn),absent1𝛼subscript𝑂superscript𝑛12subscript𝑅𝑛\displaystyle=(1-\alpha)+O_{\mathbb{P}}(n^{-1/2}+R_{n}),= ( 1 - italic_α ) + italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT + italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ,

where

Rn={η¯^η¯+q^0q0}suprm¯^(r,)m¯(r,).subscript𝑅𝑛delimited-∥∥^¯𝜂¯𝜂delimited-∥∥subscript^𝑞0subscript𝑞0subscriptsupremum𝑟delimited-∥∥^¯𝑚𝑟¯𝑚𝑟R_{n}=\left\{\lVert\widehat{\overline{\eta}}-\overline{\eta}\rVert+\lVert% \widehat{q}_{0}-q_{0}\rVert\right\}\sup_{r}\lVert\widehat{\overline{m}}(r,% \cdot)-\overline{m}(r,\cdot)\rVert.italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { ∥ over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG - over¯ start_ARG italic_η end_ARG ∥ + ∥ over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ } roman_sup start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∥ over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG ( italic_r , ⋅ ) - over¯ start_ARG italic_m end_ARG ( italic_r , ⋅ ) ∥ .

Here f2=𝔼(f(𝒪)2)superscriptdelimited-∥∥𝑓2subscript𝔼𝑓superscript𝒪2\lVert f\rVert^{2}=\mathbb{E}_{\mathbb{P}}(f(\mathcal{O})^{2})∥ italic_f ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_f ( caligraphic_O ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) is the squared L2()subscript𝐿2L_{2}(\mathbb{P})italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_P ) norm.

Theorem 2.6 says that conditional on training data, the proposed prediction interval attains nominal coverage at essentially parametric rates (some authors reserve the term parametric rate for a o(n1/2)subscript𝑜superscript𝑛12o_{\mathbb{P}}(n^{-1/2})italic_o start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) remainder), so long as the second order asymptotic bias term Rnsubscript𝑅𝑛R_{n}italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT converges fast enough to zero. The robustness of our estimator is made clear from inspecting this bias, and Theorem 2.6 supports flexible (i.e., non- or semi-parametric) estimators for component nuisance functions: Rn=O(n1/2)subscript𝑅𝑛subscript𝑂superscript𝑛12R_{n}=O_{\mathbb{P}}(n^{-1/2})italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) will hold whenever m¯0,η¯0,q0subscript¯𝑚0subscript¯𝜂0subscript𝑞0\overline{m}_{0},\overline{\eta}_{0},q_{0}over¯ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over¯ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are estimated at O(n1/4O_{\mathbb{P}}(n^{-1/4}italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 4 end_POSTSUPERSCRIPT) rates, which may be achievable under smoothness, sparsity, or other structural conditions. Since the bias of our coverage error rate is of the order of the product of two errors, it can be substantially smaller relative to that of related work by Lei & Candès (2021) (which would include data from the target site only in this setting), which has a bias of the order of the minimum of two errors (Yang et al., 2024). We note that the boundedness assumptions in Theorem 2.6 are standard, and that m¯^(,𝒙)^¯𝑚𝒙\widehat{\overline{m}}(\,\cdot\,,\boldsymbol{x})over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG ( ⋅ , bold_italic_x ) should well be monotone, as it estimates the CDF m¯(,𝒙)¯𝑚𝒙\overline{m}(\,\cdot\,,\boldsymbol{x})over¯ start_ARG italic_m end_ARG ( ⋅ , bold_italic_x ).

Remark 2.7.

Whereas the coverage guarantees for prediction intervals in Lei & Candès (2021) appear to hold only under a particular choice of conformal score (conditional quantile regression [CQR]), our methodology is not restricted by the choice of conformal score. To highlight the robustness of our procedure to the choice of conformal score, in the numerical experiments of Section 3, we evaluate three different conformal scores:

  • CQR score (see Lei & Candès (2021)).

  • Absolute residual (ASR): SASR(xi,yi)=|yiμ^(xi)|subscript𝑆ASRsubscript𝑥𝑖subscript𝑦𝑖subscript𝑦𝑖^𝜇subscript𝑥𝑖S_{\textrm{ASR}}(x_{i},y_{i})=|y_{i}-\widehat{\mu}(x_{i})|italic_S start_POSTSUBSCRIPT ASR end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) |, where μ^()^𝜇\widehat{\mu}(\cdot)over^ start_ARG italic_μ end_ARG ( ⋅ ) is a regression model to estimate μ(x)=𝔼{YX=x}𝜇𝑥𝔼conditional-set𝑌𝑋𝑥\mu(x)=\mathbb{E}\{Y\mid X=x\}italic_μ ( italic_x ) = blackboard_E { italic_Y ∣ italic_X = italic_x }.

  • Locally weighted ASR (Lei et al., 2018), defined by

    Slocal ASR(xi,yi)=|yiμ^(xi)|ρ^(xi),subscript𝑆local ASRsubscript𝑥𝑖subscript𝑦𝑖subscript𝑦𝑖^𝜇subscript𝑥𝑖^𝜌subscript𝑥𝑖S_{\textrm{local ASR}}(x_{i},y_{i})=\frac{|y_{i}-\widehat{\mu}(x_{i})|}{% \widehat{\rho}(x_{i})},italic_S start_POSTSUBSCRIPT local ASR end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | end_ARG start_ARG over^ start_ARG italic_ρ end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG ,

    where ρ^(xi)^𝜌subscript𝑥𝑖\widehat{\rho}(x_{i})over^ start_ARG italic_ρ end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is an estimate of the conditional mean absolute deviation (MAD), 𝔼{|Yiμ(Xi)||Xi=xi}𝔼conditionalsubscript𝑌𝑖𝜇subscript𝑋𝑖subscript𝑋𝑖subscript𝑥𝑖\mathbb{E}\{|Y_{i}-\mu(X_{i})|~{}\big{|}~{}X_{i}=x_{i}\}blackboard_E { | italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, a function of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT fitted on 𝒟11subscript𝒟11\mathcal{D}_{11}caligraphic_D start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT.

2.3 Heterogeneous outcome distribution across sites

In practical settings, it will often be unreasonable to assume that the conditional outcome distribution is the same across all sites. In such cases, some source sites may provide relevant information for constructing target-site specific prediction intervals, whereas other sites may not. Concretely, the distribution of Y𝑌Yitalic_Y given (T=k,𝑿)𝑇𝑘𝑿(T=k,\boldsymbol{X})( italic_T = italic_k , bold_italic_X ) may be close to that in the target site T=0𝑇0T=0italic_T = 0 for some k𝑘kitalic_k, but not others. In this section, we present an approach that combines information from target and source sites in a data-adaptive manner. Our approach is also privacy-preserving, in that it involves only minimal data sharing of summary statistics across sites.

Our proposal is to construct a (1α)1𝛼(1-\alpha)( 1 - italic_α )-quantile for the target site by taking a weighted average of estimated quantiles (r^0,r^1,,r^K1)subscript^𝑟0subscript^𝑟1subscript^𝑟𝐾1(\widehat{r}_{0},\widehat{r}_{1},\dots,\widehat{r}_{K-1})( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT ), where r^ksubscript^𝑟𝑘\widehat{r}_{k}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT uses data from site k𝑘kitalic_k for each k𝑘kitalic_k. We call the weights in the weighted average federated weights. In the following subsections, we describe how the site-specific quantiles are estimated, and how the federated weights are obtained.

2.3.1 Target site

For the target site, we estimate r^0subscript^𝑟0\widehat{r}_{0}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT nonparametrically as in Section 2.1. That is, we use the approach motivated by Theorem 2.3, and take r^0subscript^𝑟0\widehat{r}_{0}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT that solves n[φ0(𝒪;r^0,m^0,η^0)]=0,subscript𝑛delimited-[]subscript𝜑0𝒪subscript^𝑟0subscript^𝑚0subscript^𝜂00\mathbb{P}_{n}\left[\varphi_{0}\left(\mathcal{O};\widehat{r}_{0},\widehat{m}_{% 0},\widehat{\eta}_{0}\right)\right]=0,blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT [ italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] = 0 , where φ0subscript𝜑0\varphi_{0}italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the nonparametric influence function (2.3).

2.3.2 Source sites

To construct a target-site specific quantile estimate using data from site k{1,,K1}𝑘1𝐾1k\in\{1,\ldots,K-1\}italic_k ∈ { 1 , … , italic_K - 1 }, we make a working partial CCOD assumption that outcomes have the same conditional distribution in site k𝑘kitalic_k as in the target site. Note that we use this working partial CCOD assumption only to derive the form of the influence function; to aggregate information from source sites, we derive federated weights to account for possible violations of CCOD (Section 2.3.3). An influence function under this assumption is derived in the following result.

Theorem 2.8.

Under Assumptions 2.1, 2.2, and the partial CCOD assumption p(y𝐗,T=k)p(y𝐗,T=0)𝑝conditional𝑦𝐗𝑇𝑘𝑝conditional𝑦𝐗𝑇0p(y\mid\boldsymbol{X},T=k)\equiv p(y\mid\boldsymbol{X},T=0)italic_p ( italic_y ∣ bold_italic_X , italic_T = italic_k ) ≡ italic_p ( italic_y ∣ bold_italic_X , italic_T = 0 ), an influence function (IF) of r0subscript𝑟0r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is given by

r˙k(𝒪;)subscript˙𝑟𝑘𝒪\displaystyle\dot{r}_{k}(\mathcal{O};\mathbb{P})over˙ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O ; blackboard_P )
I(T=0,R=0)(T=0,R=0)[m0(r0,𝑿)(1α)]proportional-toabsent𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0delimited-[]subscript𝑚0subscript𝑟0𝑿1𝛼\displaystyle\propto\frac{I(T=0,R=0)}{\mathbb{P}(T=0,R=0)}[m_{0}(r_{0},% \boldsymbol{X})-(1-\alpha)]∝ divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG [ italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) - ( 1 - italic_α ) ]
+I(T=k,R=1)(T=k,R=1)ωk,0(𝑿)[I(S(𝑿,Y)r0)mk(r0,𝑿)]𝐼formulae-sequence𝑇𝑘𝑅1formulae-sequence𝑇𝑘𝑅1subscript𝜔𝑘0𝑿delimited-[]𝐼𝑆𝑿𝑌subscript𝑟0subscript𝑚𝑘subscript𝑟0𝑿\displaystyle\quad+\frac{I(T=k,R=1)}{\mathbb{P}(T=k,R=1)}\omega_{k,0}(% \boldsymbol{X})[I(S(\boldsymbol{X},Y)\leq r_{0})-m_{k}(r_{0},\boldsymbol{X})]+ divide start_ARG italic_I ( italic_T = italic_k , italic_R = 1 ) end_ARG start_ARG blackboard_P ( italic_T = italic_k , italic_R = 1 ) end_ARG italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( bold_italic_X ) [ italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) ]
φk(𝒪;r0,m0,mk,ωk,0),absentsubscript𝜑𝑘𝒪subscript𝑟0subscript𝑚0subscript𝑚𝑘subscript𝜔𝑘0\displaystyle\eqqcolon\varphi_{k}\left(\mathcal{O};r_{0},m_{0},m_{k},\omega_{k% ,0}\right),≕ italic_φ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ) ,

where mk(r,𝐗)=(S(𝐗,Y)r𝐗,T=k,R=1)m_{k}(r,\boldsymbol{X})=\mathbb{P}(S(\boldsymbol{X},Y)\leq r\mid\boldsymbol{X}% ,T=k,R=1)italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r , bold_italic_X ) = blackboard_P ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r ∣ bold_italic_X , italic_T = italic_k , italic_R = 1 ) is the CDF of the conformal score in site k𝑘kitalic_k, and

ωk,0(𝒙)=p(𝒙T=0,R=0)p(𝒙T=k,R=1)subscript𝜔𝑘0𝒙𝑝formulae-sequenceconditional𝒙𝑇0𝑅0𝑝formulae-sequenceconditional𝒙𝑇𝑘𝑅1\omega_{k,0}(\boldsymbol{x})=\frac{p(\boldsymbol{x}\mid T=0,R=0)}{p(% \boldsymbol{x}\mid T=k,R=1)}italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( bold_italic_x ) = divide start_ARG italic_p ( bold_italic_x ∣ italic_T = 0 , italic_R = 0 ) end_ARG start_ARG italic_p ( bold_italic_x ∣ italic_T = italic_k , italic_R = 1 ) end_ARG

is a density ratio function of covariates 𝐗𝐗\boldsymbol{X}bold_italic_X under target site to source site k𝑘kitalic_k.

Given some nuisance estimators m^0subscript^𝑚0\widehat{m}_{0}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, m^ksubscript^𝑚𝑘\widehat{m}_{k}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ω^k,0subscript^𝜔𝑘0\widehat{\omega}_{k,0}over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT, we take r^ksubscript^𝑟𝑘\widehat{r}_{k}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that solves

n[φk(𝒪;r^k,m^0,m^k,ω^k,0)]=0.subscript𝑛delimited-[]subscript𝜑𝑘𝒪subscript^𝑟𝑘subscript^𝑚0subscript^𝑚𝑘subscript^𝜔𝑘00\mathbb{P}_{n}\left[\varphi_{k}\left(\mathcal{O};\widehat{r}_{k},\widehat{m}_{% 0},\widehat{m}_{k},\widehat{\omega}_{k,0}\right)\right]=0.blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT [ italic_φ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ) ] = 0 .

By construction, the quantile estimate r^ksubscript^𝑟𝑘\widehat{r}_{k}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT uses data from both site k𝑘kitalic_k and the target site, but note that the principal need for data sharing comes from the estimation of the density ratio ωk,0subscript𝜔𝑘0\omega_{k,0}italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT. This can be done with the passing of only coarse summary statistics under flexible models (Han et al., 2021).

2.3.3 Aggregation across sites

To aggregate information from the target and source sites, we first compute the discrepancy measures χ^k=|r^0r^k|subscript^𝜒𝑘subscript^𝑟0subscript^𝑟𝑘\widehat{\chi}_{k}=|\widehat{r}_{0}-\widehat{r}_{k}|over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = | over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT |, then solve for federated weights 𝒘^=(w^0,w^1,,w^K1)^𝒘subscript^𝑤0subscript^𝑤1subscript^𝑤𝐾1\widehat{\boldsymbol{w}}=(\widehat{w}_{0},\widehat{w}_{1},\dots,\widehat{w}_{K% -1})over^ start_ARG bold_italic_w end_ARG = ( over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT ) that minimize the following loss:

Q(𝒘)𝑄𝒘\displaystyle Q(\boldsymbol{w})italic_Q ( bold_italic_w ) =n[{φ0(𝒪;r^0,m^0,η^0)\displaystyle=\mathbb{P}_{n}\Bigg{[}\bigg{\{}\varphi_{0}(\mathcal{O};\widehat{% r}_{0},\widehat{m}_{0},\widehat{\eta}_{0})= blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT [ { italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
k=1K1wkφk(𝒪i;r^0,m^0,m^k,ω^k,0)}2]\displaystyle\quad-\sum_{k=1}^{K-1}w_{k}{\varphi}_{k}(\mathcal{O}_{i};\widehat% {r}_{0},\widehat{m}_{0},\widehat{m}_{k},\widehat{\omega}_{k,0})\bigg{\}}^{2}% \Bigg{]}- ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ) } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
+1nλk=1K1|wk|χ^k2,1𝑛𝜆superscriptsubscript𝑘1𝐾1subscript𝑤𝑘superscriptsubscript^𝜒𝑘2\displaystyle\quad+\frac{1}{n}\lambda\sum_{k=1}^{K-1}\left|w_{k}\right|% \widehat{\chi}_{k}^{2},+ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG italic_λ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT | italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (3)

subject to 0wk10subscript𝑤𝑘10\leq w_{k}\leq 10 ≤ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 1, for all k{0,1,,K1},𝑘01𝐾1k\in\{0,1,\ldots,K-1\},italic_k ∈ { 0 , 1 , … , italic_K - 1 } , and k=0K1wk=1superscriptsubscript𝑘0𝐾1subscript𝑤𝑘1\sum_{k=0}^{K-1}w_{k}=1∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1, and λ𝜆\lambdaitalic_λ is a tuning parameter chosen by cross-validation. Heuristically, our approach anchors at the nonparametric estimate r^0subscript^𝑟0\widehat{r}_{0}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and weights site k𝑘kitalic_k when it is deemed similar enough to the target site (Han et al., 2021).

Finally, we compute r^0,fedsubscript^𝑟0fed\widehat{r}_{0,\text{fed}}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 , fed end_POSTSUBSCRIPT as the weighted average of the site-specific quantiles: r^0,fed=k=0K1w^kr^ksubscript^𝑟0fedsuperscriptsubscript𝑘0𝐾1subscript^𝑤𝑘subscript^𝑟𝑘\widehat{r}_{0,\text{fed}}=\sum_{k=0}^{K-1}\widehat{w}_{k}\widehat{r}_{k}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 , fed end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The federated prediction interval is then defined as C^αfed(𝑿)={y:S(𝑿,y)r^0,fed}superscriptsubscript^𝐶𝛼fed𝑿conditional-set𝑦𝑆𝑿𝑦subscript^𝑟0fed\widehat{C}_{\alpha}^{\mathrm{fed}}(\boldsymbol{X})=\{y\in\mathbb{R}:S(% \boldsymbol{X},y)\leq\widehat{r}_{0,\mathrm{fed}}\}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_fed end_POSTSUPERSCRIPT ( bold_italic_X ) = { italic_y ∈ blackboard_R : italic_S ( bold_italic_X , italic_y ) ≤ over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 , roman_fed end_POSTSUBSCRIPT }. In the following, we provide a coverage guarantee for the prediction interval based on an estimated quantile that is an arbitrary weighted combination of the relevant (i.e., oracle) source sites.

Theorem 2.9 (Oracle coverage result).

Let

𝒮={k1:p(y𝑿,T=k)p(y𝑿,T=0)},superscript𝒮conditional-set𝑘1𝑝conditional𝑦𝑿𝑇𝑘𝑝conditional𝑦𝑿𝑇0\mathcal{S}^{*}=\{k\geq 1:p(y\mid\boldsymbol{X},T=k)\equiv p(y\mid\boldsymbol{% X},T=0)\},caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = { italic_k ≥ 1 : italic_p ( italic_y ∣ bold_italic_X , italic_T = italic_k ) ≡ italic_p ( italic_y ∣ bold_italic_X , italic_T = 0 ) } ,

which may be empty, denote the source sites for which the partial CCOD assumption holds. Let Dnsuperscript𝐷𝑛D^{n}italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT denote the training data with which r^0,fedsubscript^𝑟0fed\widehat{r}_{0,\mathrm{fed}}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 , roman_fed end_POSTSUBSCRIPT is fit, and let (𝐗,T,R)𝐗𝑇𝑅(\boldsymbol{X},T,R)( bold_italic_X , italic_T , italic_R ) denote a new independent test point with associated outcome Y𝑌Yitalic_Y. Assume that (η^0,m^0)subscript^𝜂0subscript^𝑚0(\widehat{\eta}_{0},\widehat{m}_{0})( over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and (ω^k,0,m^k)subscript^𝜔𝑘0subscript^𝑚𝑘(\widehat{\omega}_{k,0},\widehat{m}_{k})( over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), for k𝒮𝑘superscript𝒮k\in\mathcal{S}^{*}italic_k ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, are each uniformly bounded, and that m^k(,𝐱)subscript^𝑚𝑘𝐱\widehat{m}_{k}(\,\cdot\,,\boldsymbol{x})over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ⋅ , bold_italic_x ) is a non-decreasing function for k{0}𝒮𝑘0superscript𝒮k\in\{0\}\cup\mathcal{S}^{*}italic_k ∈ { 0 } ∪ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, for each 𝐱𝐱\boldsymbol{x}bold_italic_x. For any w=(w0,,wK1)superscript𝑤subscript𝑤0subscript𝑤𝐾1w^{*}=(w_{0},\ldots,w_{K-1})italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_w start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT ) with wk0subscript𝑤𝑘0w_{k}\geq 0italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ 0, k=0K1wk=1superscriptsubscript𝑘0𝐾1subscript𝑤𝑘1\sum_{k=0}^{K-1}w_{k}=1∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1, and satisfying wk=0subscript𝑤𝑘0w_{k}=0italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 for k{0}𝒮𝑘0superscript𝒮k\not\in\{0\}\cup\mathcal{S}^{*}italic_k ∉ { 0 } ∪ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, define

C^αw(𝑿)={y:S(𝑿,y)k=0K1wkr^k}.superscriptsubscript^𝐶𝛼superscript𝑤𝑿conditional-set𝑦𝑆𝑿𝑦superscriptsubscript𝑘0𝐾1subscript𝑤𝑘subscript^𝑟𝑘\widehat{C}_{\alpha}^{w^{*}}(\boldsymbol{X})=\left\{y\in\mathbb{R}:S(% \boldsymbol{X},y)\leq\sum_{k=0}^{K-1}w_{k}\widehat{r}_{k}\right\}.over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( bold_italic_X ) = { italic_y ∈ blackboard_R : italic_S ( bold_italic_X , italic_y ) ≤ ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } .

Then under Assumptions 2.1 and 2.2, and conditions (i)–(iii) of Lemma A.1,

[YC^αw(𝑿)T=0,R=0,Dn]\displaystyle\mathbb{P}[Y\in\widehat{C}_{\alpha}^{w^{*}}(\boldsymbol{X})\mid T% =0,R=0,D^{n}]blackboard_P [ italic_Y ∈ over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 , italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ]
=(1α)+O(n1/2+Rn),absent1𝛼subscript𝑂superscript𝑛12superscriptsubscript𝑅𝑛\displaystyle=(1-\alpha)+O_{\mathbb{P}}(n^{-1/2}+R_{n}^{*}),= ( 1 - italic_α ) + italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT + italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ,

where

Rn=w0{η^0η0suprm^0(r,)m0(r,)}superscriptsubscript𝑅𝑛subscript𝑤0delimited-∥∥subscript^𝜂0subscript𝜂0subscriptsupremum𝑟delimited-∥∥subscript^𝑚0𝑟subscript𝑚0𝑟\displaystyle R_{n}^{*}=w_{0}\left\{\lVert\widehat{\eta}_{0}-\eta_{0}\rVert% \cdot\sup_{r}\lVert\widehat{m}_{0}(r,\cdot)-m_{0}(r,\cdot)\rVert\right\}italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT { ∥ over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ⋅ roman_sup start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∥ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , ⋅ ) - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , ⋅ ) ∥ }
+k=1K1wk{ω^k,0ωk,0suprm^k(r,)mk(r,)\displaystyle\quad+\sum_{k=1}^{K-1}w_{k}\bigg{\{}\lVert\widehat{\omega}_{k,0}-% \omega_{k,0}\rVert\cdot\sup_{r}\lVert\widehat{m}_{k}(r,\cdot)-m_{k}(r,\cdot)\rVert+ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT { ∥ over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT - italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ∥ ⋅ roman_sup start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∥ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r , ⋅ ) - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r , ⋅ ) ∥
+suprm^k(r,)m^0(r,)}.\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad+\sup_{r}\lVert\widehat{m% }_{k}(r,\cdot)-\widehat{m}_{0}(r,\cdot)\rVert\bigg{\}}.+ roman_sup start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∥ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r , ⋅ ) - over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , ⋅ ) ∥ } .

Note that our penalization procedure in (2.3.3) is designed such that wk0subscript𝑤𝑘0w_{k}\to 0italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → 0 whenever k𝒮𝑘superscript𝒮k\notin\mathcal{S}^{*}italic_k ∉ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, akin to adaptive Lasso (Zou, 2006) and trans-Lasso (Fan et al., 2024).

2.4 Estimation with data splitting

To construct target-site-specific prediction intervals for missing outcomes leveraging information from all sites, we follow the steps described in Algorithm 1. In brief, we randomly split the training data 𝒟𝒟\mathcal{D}caligraphic_D into two equal-sized folds 𝒟1𝒟2subscript𝒟1subscript𝒟2\mathcal{D}_{1}\cup\mathcal{D}_{2}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ caligraphic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We train the models for the putative CDFs of the conformal score mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, k=0,1,,K1𝑘01𝐾1k=0,1,\dots,K-1italic_k = 0 , 1 , … , italic_K - 1 on 𝒟11subscript𝒟11\mathcal{D}_{11}caligraphic_D start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT. Likewise, we train the density ratio model ωk,0subscript𝜔𝑘0\omega_{k,0}italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT on 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We fit all nuisance functions using SuperLearner with the base learners being random forest, elastic net, and generalized linear model (GLM). SuperLearner is a meta-learning algorithm that creates an optimal weighted average of the base learners and is shown to be as accurate as the best possible prediction algorithm (van der Laan et al., 2007). Density ratio models accommodate flexible basis functions and higher order terms to capture higher-order differences such as variance and skewness. One example we consider is the exponential tilt model, which recovers the entire class of natural exponential family distributions, including the normal distribution with mean shift, Bernoulli distribution for binary covariates, and more (Qin, 1998; Duan et al., 2020b). We predict the values from the trained models on 𝒟2subscript𝒟2\mathcal{D}_{2}caligraphic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and plug these values into the IFs given in Algorithm 1. Figure 1 provides a visualization of the procedure. Full detail on influence function estimation is given in Algorithm 3.

Refer to caption
Figure 1: Illustration of the proposed robust algorithm for multi-source conformal prediction. Each θ^^𝜃\widehat{\theta}over^ start_ARG italic_θ end_ARG represented by a different color is the estimated (1α)1𝛼(1-\alpha)( 1 - italic_α )-quantile of the conformal score using data from the site with the same color. m^0subscript^𝑚0\widehat{m}_{0}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (in red) is the estimated CDF of the conformal score using only the target site data. The other m^k(k1)subscript^𝑚𝑘𝑘1\widehat{m}_{k}~{}(k\geq 1)over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_k ≥ 1 ) are the estimated CDFs of the conformal scores from source sites, and ω^k,0(k1)subscript^𝜔𝑘0𝑘1\widehat{\omega}_{k,0}~{}(k\geq 1)over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( italic_k ≥ 1 ) is the density ratio of site k𝑘kitalic_k versus the target site. The federated r^fed,0subscript^𝑟fed,0\widehat{r}_{\textrm{fed,0}}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT fed,0 end_POSTSUBSCRIPT is a weighted average of the site-specific quantiles, with weights given by 𝒘^^𝒘\widehat{\boldsymbol{w}}over^ start_ARG bold_italic_w end_ARG. The prediction interval C^α(𝒙)subscript^𝐶𝛼𝒙\widehat{C}_{\alpha}(\boldsymbol{x})over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_italic_x ) is the set of outcomes y𝑦yitalic_y such that the corresponding conformal scores S(𝒙,y)𝑆𝒙𝑦S(\boldsymbol{x},y)italic_S ( bold_italic_x , italic_y ) in the target are below the threshold r^fed,0subscript^𝑟fed,0\widehat{r}_{\textrm{fed,0}}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT fed,0 end_POSTSUBSCRIPT.
Algorithm 1 Robust multi-source conformal prediction
1:  Input: Training data 𝒟={𝒪i=(𝑿i,Ti,Ri,RiYi),i=1,,n}𝒟formulae-sequencesubscript𝒪𝑖subscript𝑿𝑖subscript𝑇𝑖subscript𝑅𝑖subscript𝑅𝑖subscript𝑌𝑖𝑖1𝑛\mathcal{D}=\{\mathcal{O}_{i}=(\boldsymbol{X}_{i},T_{i},R_{i},R_{i}Y_{i}),i=1,% \dots,n\}caligraphic_D = { caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i = 1 , … , italic_n } with number of sites K>0𝐾0K>0italic_K > 0, and the target site is indexed by T=0𝑇0T=0italic_T = 0; desired coverage probability 1α1𝛼1-\alpha1 - italic_α; estimators of nuisance functions mk(θ,𝑿)subscript𝑚𝑘𝜃𝑿m_{k}(\theta,\boldsymbol{X})italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ , bold_italic_X ), η0(𝑿)subscript𝜂0𝑿\eta_{0}(\boldsymbol{X})italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ), and ωk,0(𝑿)subscript𝜔𝑘0𝑿\omega_{k,0}(\boldsymbol{X})italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( bold_italic_X ) for k=1,,K1𝑘1𝐾1k=1,\dots,K-1italic_k = 1 , … , italic_K - 1; a tuning parameter λ𝜆\lambdaitalic_λ (in the optimization step); a testing point 𝑿=𝒙𝑿𝒙\boldsymbol{X}=\boldsymbol{x}bold_italic_X = bold_italic_x from the target site.
2:  Output: A valid prediction set C^α(𝒙)subscript^𝐶𝛼𝒙\widehat{C}_{\alpha}(\boldsymbol{x})over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_italic_x ).
3:  Split the training data 𝒟𝒟\mathcal{D}caligraphic_D randomly into 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒟2subscript𝒟2\mathcal{D}_{2}caligraphic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, where 𝒟j={𝒪i𝒟,ij}subscript𝒟𝑗formulae-sequencesubscript𝒪𝑖𝒟𝑖subscript𝑗\mathcal{D}_{j}=\{\mathcal{O}_{i}\in\mathcal{D},i\in\mathcal{I}_{j}\}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D , italic_i ∈ caligraphic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } for j=1,2𝑗12j=1,2italic_j = 1 , 2 and 12={1,2,,n}subscript1subscript212𝑛\mathcal{I}_{1}\cup\mathcal{I}_{2}=\{1,2,\dots,n\}caligraphic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { 1 , 2 , … , italic_n }.
4:  Fit nuisance functions m^ksubscript^𝑚𝑘\widehat{m}_{k}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ω^k,0subscript^𝜔𝑘0\widehat{\omega}_{k,0}over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT using SuperLearner on 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and predict them on 𝒟2subscript𝒟2\mathcal{D}_{2}caligraphic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.
5:  For the target site k=0𝑘0k=0italic_k = 0, find θ^=r^0^𝜃subscript^𝑟0\widehat{\theta}=\widehat{r}_{0}over^ start_ARG italic_θ end_ARG = over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT that solves 0=1|2|i2φ0(𝒪i;θ^,m^0,η^0).01subscript2subscript𝑖subscript2subscript𝜑0subscript𝒪𝑖^𝜃subscript^𝑚0subscript^𝜂00=\dfrac{1}{|\mathcal{I}_{2}|}\displaystyle\sum_{i\in\mathcal{I}_{2}}\varphi_{% 0}(\mathcal{O}_{i};\widehat{\theta},\widehat{m}_{0},\widehat{\eta}_{0}).0 = divide start_ARG 1 end_ARG start_ARG | caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_θ end_ARG , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) .
6:  For source sites k1𝑘1k\geq 1italic_k ≥ 1, find θ^=r^k^𝜃subscript^𝑟𝑘\widehat{\theta}=\widehat{r}_{k}over^ start_ARG italic_θ end_ARG = over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that solves 0=1|2|i2φk(𝒪i;θ^,m^0,m^k,ω^k,0)01subscript2subscript𝑖subscript2subscript𝜑𝑘subscript𝒪𝑖^𝜃subscript^𝑚0subscript^𝑚𝑘subscript^𝜔𝑘00=\dfrac{1}{|\mathcal{I}_{2}|}\displaystyle\sum_{i\in\mathcal{I}_{2}}\varphi_{% k}(\mathcal{O}_{i};\widehat{\theta},\widehat{m}_{0},\widehat{m}_{k},\widehat{% \omega}_{k,0})0 = divide start_ARG 1 end_ARG start_ARG | caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_θ end_ARG , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ). Compute χ^k=|r^0r^k|subscript^𝜒𝑘subscript^𝑟0subscript^𝑟𝑘\widehat{\chi}_{k}=|\widehat{r}_{0}-\widehat{r}_{k}|over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = | over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT |.
7:  Solve for aggregation weights 𝒘^=(w^0,w^1,w^K1)^𝒘subscript^𝑤0subscript^𝑤1subscript^𝑤𝐾1\widehat{\boldsymbol{w}}=(\widehat{w}_{0},\widehat{w}_{1},\dots\widehat{w}_{K-% 1})over^ start_ARG bold_italic_w end_ARG = ( over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT ) that minimize Q(𝒘)𝑄𝒘Q(\boldsymbol{w})italic_Q ( bold_italic_w ) subject to 0wk10subscript𝑤𝑘10\leq w_{k}\leq 10 ≤ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 1 and k=0K1wk=1superscriptsubscript𝑘0𝐾1subscript𝑤𝑘1\displaystyle\sum_{k=0}^{K-1}w_{k}=1∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1.
8:  Compute θ^=r^0,fed=k=0K1w^kr^k^𝜃subscript^𝑟0fedsuperscriptsubscript𝑘0𝐾1subscript^𝑤𝑘subscript^𝑟𝑘\widehat{\theta}=\widehat{r}_{0,\text{fed}}=\displaystyle\sum_{k=0}^{K-1}% \widehat{w}_{k}\widehat{r}_{k}over^ start_ARG italic_θ end_ARG = over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 , fed end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.
9:  Return: The prediction set C^α(𝒙)={y:S(𝒙,y)r^0,fed}subscript^𝐶𝛼𝒙conditional-set𝑦𝑆𝒙𝑦subscript^𝑟0fed\widehat{C}_{\alpha}(\boldsymbol{x})=\{y:S(\boldsymbol{x},y)\leq\widehat{r}_{0% ,\text{fed}}\}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_italic_x ) = { italic_y : italic_S ( bold_italic_x , italic_y ) ≤ over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 , fed end_POSTSUBSCRIPT }.

3 Numerical Experiments

In this section, we evaluate our proposed method by conducting extensive Monte Carlo simulations, examining aspects such as marginal coverage, conditional coverage, and the width of the prediction interval. In each experiment, we compare our proposed federated method to construct prediction intervals C^α(x)subscript^𝐶𝛼𝑥\widehat{C}_{\alpha}(x)over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ) against (i) the nonparametric efficient method described in Yang et al. (2024), which uses data from the target site only and ignores external source data (target only) and (ii) the method that assumes CCOD holds across sites (pooled sample). In Appendix B, we describe three other methods for learning the federated weights w^^𝑤\hat{w}over^ start_ARG italic_w end_ARG and provide complete simulation results (see details in Appendix B.2).

In total, we consider 3333 sample sizes (300,1000,3000)30010003000(300,1000,3000)( 300 , 1000 , 3000 ) ×\times× 3333 levels of covariate shift (homogeneous, weakly heterogeneous, strongly heterogeneous) ×\times× 2222 types of outcome errors (homoskedastic, heteroskedastic) ×\times× 3333 levels of concept shift (CCOD holds, weak violation, strong violation) ×\times× 3333 different conformal scores (ASR, locally weighted ASR, CQR) =162absent162=162= 162 scenarios for our proposed method and the five competitor methods.

3.1 Data generating process

We generate data from K=5𝐾5K=5italic_K = 5 sites, where site 0 is the target site and sites 1 through 4 are source sites, and Ti{0,,4}subscript𝑇𝑖04T_{i}\in\{0,\cdots,4\}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , ⋯ , 4 } denotes the site of subject i𝑖iitalic_i. Our goal is to construct valid prediction intervals for a testing point from the target site. We consider the sample size in each site to be nk{300,1000,3000}subscript𝑛𝑘30010003000n_{k}\in\{300,1000,3000\}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ { 300 , 1000 , 3000 }, k=0,,4𝑘04k=0,...,4italic_k = 0 , … , 4 and generate data over M=500𝑀500M=500italic_M = 500 independent Monte Carlo replications. We consider three site-specific covariate data generation scenarios:

  • Homogeneous covariate distributions: Xi=Φ(Xi)subscript𝑋𝑖Φsuperscriptsubscript𝑋𝑖X_{i}=\Phi(X_{i}^{*})italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Φ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) where Xi𝒩(0,1)similar-tosuperscriptsubscript𝑋𝑖𝒩01X_{i}^{*}\sim\mathcal{N}(0,1)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∼ caligraphic_N ( 0 , 1 ), and Φ()Φ\Phi(\cdot)roman_Φ ( ⋅ ) is the CDF of the standard normal distribution, for all sites.

  • Weakly heterogeneous covariate distributions: XiTi{0,1}𝒩(0,1)conditionalsuperscriptsubscript𝑋𝑖subscript𝑇𝑖01similar-to𝒩01X_{i}^{*}\mid T_{i}\in\{0,1\}\sim\mathcal{N}(0,1)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } ∼ caligraphic_N ( 0 , 1 ), XiTi=2𝒩(2,1)conditionalsuperscriptsubscript𝑋𝑖subscript𝑇𝑖2similar-to𝒩21X_{i}^{*}\mid T_{i}=2\sim\mathcal{N}(2,1)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 2 ∼ caligraphic_N ( 2 , 1 ), XiTi=3𝒩(2,4)conditionalsuperscriptsubscript𝑋𝑖subscript𝑇𝑖3similar-to𝒩24X_{i}^{*}\mid T_{i}=3\sim\mathcal{N}(2,4)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 3 ∼ caligraphic_N ( 2 , 4 ), XiTi=4𝒩(3,1)conditionalsuperscriptsubscript𝑋𝑖subscript𝑇𝑖4similar-to𝒩31X_{i}^{*}\mid T_{i}=4\sim\mathcal{N}(3,1)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 4 ∼ caligraphic_N ( 3 , 1 ), and Xi=Φ(Xi)subscript𝑋𝑖Φsuperscriptsubscript𝑋𝑖X_{i}=\Phi(X_{i}^{*})italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Φ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

  • Strongly heterogeneous covariate distributions: XiTi=0𝒩(0,1)conditionalsuperscriptsubscript𝑋𝑖subscript𝑇𝑖0similar-to𝒩01X_{i}^{*}\mid T_{i}=0\sim\mathcal{N}(0,1)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ∼ caligraphic_N ( 0 , 1 ), XiTi=1𝒩(1,1)conditionalsuperscriptsubscript𝑋𝑖subscript𝑇𝑖1similar-to𝒩11X_{i}^{*}\mid T_{i}=1\sim\mathcal{N}(1,1)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ∼ caligraphic_N ( 1 , 1 ), XiTi=2𝒩(2,4)conditionalsuperscriptsubscript𝑋𝑖subscript𝑇𝑖2similar-to𝒩24X_{i}^{*}\mid T_{i}=2\sim\mathcal{N}(2,4)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 2 ∼ caligraphic_N ( 2 , 4 ), XiTi=3𝒩(3,1)conditionalsuperscriptsubscript𝑋𝑖subscript𝑇𝑖3similar-to𝒩31X_{i}^{*}\mid T_{i}=3\sim\mathcal{N}(3,1)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 3 ∼ caligraphic_N ( 3 , 1 ), XiTi=4𝒩(4,4)conditionalsuperscriptsubscript𝑋𝑖subscript𝑇𝑖4similar-to𝒩44X_{i}^{*}\mid T_{i}=4\sim\mathcal{N}(4,4)italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∣ italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 4 ∼ caligraphic_N ( 4 , 4 ), and Xi=Φ(Xi)subscript𝑋𝑖Φsuperscriptsubscript𝑋𝑖X_{i}=\Phi(X_{i}^{*})italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_Φ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

For each scenario, we generate the propensity score of observing the outcome, i.e., e(Xi)=P(Ri=1Xi)𝑒subscript𝑋𝑖𝑃subscript𝑅𝑖conditional1subscript𝑋𝑖e(X_{i})=P(R_{i}=1\mid X_{i})italic_e ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_P ( italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ∣ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), by a logistic regression model, where

e(Xi)={1+exp(0.1+0.5Xi0.1Xi2)}1,𝑒subscript𝑋𝑖superscript10.10.5subscript𝑋𝑖0.1superscriptsubscript𝑋𝑖21\displaystyle e(X_{i})=\{1+\exp(-0.1+0.5X_{i}-0.1X_{i}^{2})\}^{-1},italic_e ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { 1 + roman_exp ( - 0.1 + 0.5 italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 0.1 italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

ensuring that the true propensity score is in (0.4,0.6)0.40.6(0.4,0.6)( 0.4 , 0.6 ) to avoid positivity violations. We include additional simulation results where the true propensity score is in the wider range (0.1,0.9)0.10.9(0.1,0.9)( 0.1 , 0.9 ). We generate Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by Bern(e(Xi)){0,1}Bern𝑒subscript𝑋𝑖01\text{Bern}(e(X_{i}))\in\{0,1\}Bern ( italic_e ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ∈ { 0 , 1 } so that outcomes are MAR.

The outcomes Yisubscript𝑌𝑖Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are generated by

Yi=5Xi+Xi2+δ(Ti,Xi)+ε(Xi),subscript𝑌𝑖5subscript𝑋𝑖superscriptsubscript𝑋𝑖2𝛿subscript𝑇𝑖subscript𝑋𝑖𝜀subscript𝑋𝑖\displaystyle Y_{i}=5X_{i}+X_{i}^{2}+\delta(T_{i},X_{i})+\varepsilon(X_{i}),italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 5 italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_δ ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_ε ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (4)

where ε(x)N(0,σ(x)2)similar-to𝜀𝑥𝑁0𝜎superscript𝑥2\varepsilon(x)\sim N(0,\sigma(x)^{2})italic_ε ( italic_x ) ∼ italic_N ( 0 , italic_σ ( italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). We consider two types of errors: (i) σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1 for homoscedastic errors and (ii) σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x ) for heteroscedastic errors. Under both cases, the oracle width of a 90%percent9090\%90 % prediction interval for the outcome is 2×z0.95𝔼{σ(Xi)}3.292subscript𝑧0.95𝔼𝜎subscript𝑋𝑖3.292\times z_{0.95}\mathbb{E}\{\sigma(X_{i})\}\approx 3.292 × italic_z start_POSTSUBSCRIPT 0.95 end_POSTSUBSCRIPT blackboard_E { italic_σ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } ≈ 3.29, where z0.95=1.645subscript𝑧0.951.645z_{0.95}=1.645italic_z start_POSTSUBSCRIPT 0.95 end_POSTSUBSCRIPT = 1.645 is the 95th percentile of the standard normal distribution. In addition, note that 𝔼{σ(Xi)}=01σ(x)𝑑x=1𝔼𝜎subscript𝑋𝑖superscriptsubscript01𝜎𝑥differential-d𝑥1\mathbb{E}\{\sigma(X_{i})\}=\int_{0}^{1}\sigma(x)dx=1blackboard_E { italic_σ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_σ ( italic_x ) italic_d italic_x = 1 for both σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1 and σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x ).

We also consider varying levels of concept shift corresponding to three cases for δ(Ti,Xi)𝛿subscript𝑇𝑖subscript𝑋𝑖\delta(T_{i},X_{i})italic_δ ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ):

  • CCOD holds: δ(Ti,Xi)=0𝛿subscript𝑇𝑖subscript𝑋𝑖0\delta(T_{i},X_{i})=0italic_δ ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 0, a constant;

  • Weak violation of CCOD: δ(Ti,Xi)=7I(Ti0)𝛿subscript𝑇𝑖subscript𝑋𝑖7𝐼subscript𝑇𝑖0\delta(T_{i},X_{i})=7I(T_{i}\not=0)italic_δ ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 7 italic_I ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ 0 );

  • Strong violation of CCOD: δ(Ti,Xi)=20I(Ti0)𝛿subscript𝑇𝑖subscript𝑋𝑖20𝐼subscript𝑇𝑖0\delta(T_{i},X_{i})=20I(T_{i}\not=0)italic_δ ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 20 italic_I ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ 0 ).

3.2 Results

We report the simulation results for nk=3000subscript𝑛𝑘3000n_{k}=3000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3000, k=0,,4𝑘04k=0,...,4italic_k = 0 , … , 4 under strongly heterogeneous covariate distributions and strong violation of CCOD in Figure 2. Complete numerical results for all sample sizes, covariate shifts, outcome errors, and concept shifts can be found in Appendix B.2.

Figure 2 summarizes results for (A) marginal coverage, (B) prediction interval width, (C) conditional coverage (C), and (D) weights as a function of discrepancy χk2=(r^0r^k)2subscriptsuperscript𝜒2𝑘superscriptsubscript^𝑟0subscript^𝑟𝑘2\chi^{2}_{k}=(\widehat{r}_{0}-\widehat{r}_{k})^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT values over 500500500500 replications. Compared to the target only method, our federated method achieves nominal marginal coverage with tighter dispersion and less variability, shorter prediction interval widths that are close to the oracle interval width (red dashed line), relatively good conditional coverage, and informative weight metrics that indicate how source site quantiles r^ksubscript^𝑟𝑘\widehat{r}_{k}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are being weighted as a function of discrepancy compared to the target site quantile r^0subscript^𝑟0\widehat{r}_{0}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The pooled sample method has poor performance for ASR, with overly conservative marginal coverage, interval widths that are on average five times longer than our federated method, and conservative conditional coverage. The performance for local ASR is also poor, with below nominal marginal and conditional coverage. The conditional coverage plots indicate that (1) ASR is not robust, which is consistent with the findings in Lei et al. (2018)); (2) both CQR and local ASR have better performance in terms of local coverage, and the results for the target only and our federated method perform similarly with 0.90.90.90.9 nominal coverage level for many values of X𝑋Xitalic_X. Full conditional coverage plots for all cases are provided in the Appendix (see Figure 11).

Refer to caption
Figure 2: A: Marginal coverage, B: Prediction interval width, C: Conditional coverage, and D: Weights for our proposed federated method compared to the pooled sample and target only methods, where sample size nk=3000subscript𝑛𝑘3000n_{k}=3000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3000, k=0,,4𝑘04k=0,...,4italic_k = 0 , … , 4 under strongly heterogeneous covariate distributions and strong violation of CCOD.

4 Data Application

Congenital heart defects (CHD) are the most prevalent birth defects in the United States, and over 40,0004000040{,}00040 , 000 surgeries for CHD are performed each year (Pasquali et al., 2016). Prolonged hospital length of stay (LOS) post-surgery places a significant financial burden on families and health care systems and is associated with postoperative morbidity. Moreover, LOS varies geographically, likely due to practice and patient heterogeneity. We utilize data from the Society of Thoracic Surgeons’ Congenital Heart Surgery Database (STS-CHSD) which includes audited preoperative, intraoperative, and early postoperative information (Overman et al., 2019) from U.S. congenital heart surgery centers. We identified all Norwood surgeries, which are palliative surgeries for patients with CHD, occurring between January 2016 and June 2022. We used the index operative encounter during a given admission as the unit of observation. There were a total of 3,45734573{,}4573 , 457 observations, with a median LOS of 40404040 days (min: 2,max: 183)min: 2max: 183(\textrm{min: 2},\textrm{max: }183)( min: 2 , max: 183 ) and 752752752752 (21.2%)percent21.2(21.2\%)( 21.2 % ) missing values for LOS.

Our goal is to provide prediction intervals for LOS for patients in target sites with missing values of LOS. The target site is defined to be one of four mutually exclusive geographic regions according to the U.S. Census Bureau: (i) South, (ii) Midwest, (iii) West, and (iv) Northeast (United States Census Bureau, 2020). We included as confounders demographic factors (e.g. age, race/ethnicity, sex, birthweight, birth height, etc.), genetic syndromes, chromosomal abnormalities, non-cardiac anomalies, pre-operative factors, and a variety of Norwood procedure-specific factors found in the STS-CHSD (Tabbutt et al., 2012). While the MAR assumption is not testable, it is more likely to be valid in settings such as ours where a rich set of potential confounders are measured prospectively.

Figure 3 displays the prediction intervals for hospital LOS following a Norwood procedure for four randomly selected individuals, one in each region, across α={0.1,0.2,0.3,0.4,0.5}𝛼0.10.20.30.40.5\alpha=\{0.1,0.2,0.3,0.4,0.5\}italic_α = { 0.1 , 0.2 , 0.3 , 0.4 , 0.5 } and conformal scores {ASR, local ASR, CQR}absentASR, local ASR, CQR\in\{\textrm{ASR, local ASR, CQR}\}∈ { ASR, local ASR, CQR }. For example, using our proposed method and CQR as the conformal score for patient B in the Midwest region, with at least 50%percent5050\%50 % probability, the expected LOS is between 24.324.324.324.3 to 39.939.939.939.9 days (α=0.5)𝛼0.5(\alpha=0.5)( italic_α = 0.5 ). Our method generally produces tighter prediction intervals than the target only method of (Yang et al., 2024), and the advantage can be practically informative. For example, using local ASR for patient C in the South region, the 80%percent8080\%80 % prediction interval is over 30303030 days shorter using our method versus the target only method. The pooled sample method performs similarly to our federated method, suggesting that data-adaptive inference may be nearly as efficient as under full CCOD in this data application.

Refer to caption
Figure 3: Each panel represents the prediction intervals for hospital LOS for a randomly selected individual following a Norwood procedure across α={0.1,0.2,0.3,0.4,0.5}𝛼0.10.20.30.40.5\alpha=\{0.1,0.2,0.3,0.4,0.5\}italic_α = { 0.1 , 0.2 , 0.3 , 0.4 , 0.5 } and conformal score {ASR, local ASR, CQR}absentASR, local ASR, CQR\in\{\textrm{ASR, local ASR, CQR}\}∈ { ASR, local ASR, CQR } for A: a patient in South, B: a patient in Midwest, C: a patient in West, D: a patient in Northeast.

5 Discussion

We proposed a data-driven and distribution-free prediction method to obtain valid prediction intervals for missing outcome data in a target site while exploiting information from multiple potentially heterogeneous sites due to distribution shifts. Our proposal shares the marginal coverage properties of conformal prediction methods and builds on modern semiparametric efficiency theory and federated learning for more robust and efficient uncertainty quantification. When subjects from different data sources are similar, such that one may be willing to assert that the outcome distributions are shared, we derive the efficient influence function leveraging all data sources. In some practical settings, it would be unreasonable to assume that the conditional outcome distribution is the same across sites, i.e., some source sites may provide relevant information for constructing prediction intervals for the target site, whereas other sites may not. In such scenarios, we present a novel approach that combines information from target and source sites in a data-adaptive manner.

Among the three types of conformal scores that we studied, we provide the following recommendations for practitioners. When the sample size is small, e.g., 300 or fewer, we suggest using local ASR, which is more robust against heteroscedasticity compared to ASR and more efficient than CQR, which on average requires larger sample sizes to attain nominal coverage. When sample sizes are larger, CQR provides coverage probabilities close to the nominal level.

An interesting line of future research concerns the development of covariate-adaptive ensemble weights for aggregating information from multiple sources of data. We conjecture that covariate-adaptive methods could produce prediction intervals that are as efficient as an oracle with knowledge of the optimal prediction interval, although we leave this for future work. Another direction for development is to formalize the framework through a sensitivity analysis approach when the CCOD assumption is violated. There are multiple options for sensitivity analysis, e.g. those working off of the Rosenbaum selection model such as ** et al. (2023b) and Yin et al. (2024), or through a sensitivity parameter encoding a hypothetical departure from the CCOD assumption via a semiparametric approach (Robins et al., 2000). Challenges to overcome would be in the estimation of nuisance functions in this case.

6 Impact Statement

This paper presents work whose goal is to advance the field of conformal prediction and its applications to precision medicine. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here.

Software and Data

We provide a user-friendly R function MuSCI() implementing the proposed method with an illustrative example, available at: https://github.com/yiliu1998/Multi-Source-Conformal.

Acknowledgements

This work was supported, in part, by Grant HL5R01HL162893 from the National Heart, Lung, and Blood Institute from the US National Institutes of Health. The data analyzed in this study were provided to the investigators through The Society of Thoracic Surgeons’ Task Force on Funded Research Program. The authors thank Sara Pasquali, Meena Nathan, John Mayer, Jr., and Katya Zelevinsky for helpful discussions related to clinical features of CHD and surgical quality.

References

  • Barber et al. (2023) Barber, R. F., Candes, E. J., Ramdas, A., and Tibshirani, R. J. Conformal prediction beyond exchangeability. The Annals of Statistics, 51(2):816–845, 2023.
  • Bickel et al. (1993) Bickel, P., Klaassen, C., Ritov, Y., and Wellner, J. Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press Baltimore, 1993.
  • Cai et al. (2023) Cai, T. T., Namkoong, H., Yadlowsky, S., et al. Diagnosing model performance under distribution shift. arXiv preprint arXiv:2303.02011, 2023.
  • Duan et al. (2020a) Duan, R., Boland, M. R., Liu, Z., Liu, Y., Chang, H. H., Xu, H., Chu, H., Schmid, C. H., Forrest, C. B., Holmes, J. H., et al. Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm. Journal of the American Medical Informatics Association, 27(3):376–385, 2020a.
  • Duan et al. (2020b) Duan, R., Ning, Y., Wang, S., Lindsay, B. G., Carroll, R. J., and Chen, Y. A fast score test for generalized mixture models. Biometrics, 76(3):811–820, 2020b.
  • Dunn et al. (2023) Dunn, R., Wasserman, L., and Ramdas, A. Distribution-free prediction sets for two-layer hierarchical models. Journal of the American Statistical Association, 118(544):2491–2502, 2023.
  • Fan et al. (2024) Fan, X., Cheng, J., Wang, H., Zhang, B., and Chen, Z. A fast trans-lasso algorithm with penalized weighted score function. Computational Statistics & Data Analysis, 192:107899, 2024.
  • Guo et al. (2023) Guo, Z., Li, X., Han, L., and Cai, T. Robust inference for federated meta-learning. arXiv preprint arXiv:2301.00718, 2023.
  • Han et al. (2021) Han, L., Hou, J., Cho, K., Duan, R., and Cai, T. Federated adaptive causal estimation (face) of target treatment effects. arXiv preprint arXiv:2112.09313, 2021.
  • Han et al. (2023) Han, L., Shen, Z., and Zubizarreta, J. Multiply robust federated estimation of targeted average treatment effects. Advances in Neural Information Processing Systems, 36:70453–70482, 2023.
  • Han et al. (2024) Han, L., Li, Y., Niknam, B., and Zubizarreta, J. R. Privacy-preserving, communication-efficient, and target-flexible hospital quality measurement. The Annals of Applied Statistics, 18(2):1337–1359, 2024.
  • Humbert et al. (2023) Humbert, P., Le Bars, B., Bellet, A., and Arlot, S. One-shot federated conformal prediction. In International Conference on Machine Learning, pp. 14153–14177. PMLR, 2023.
  • ** et al. (2023a) **, Y., Guo, K., and Rothenhausler, D. Diagnosing the role of observable distribution shift in scientific replications. arXiv preprint 2309.01056, 2023a.
  • ** et al. (2023b) **, Y., Ren, Z., and Candès, E. J. Sensitivity analysis of individual treatment effects: A robust conformal inference approach. Proceedings of the National Academy of Sciences, 120(6):e2214889120, 2023b.
  • Kennedy et al. (2023) Kennedy, E., Balakrishnan, S., and Wasserman, L. Semiparametric counterfactual density estimation. Biometrika, pp.  asad017, 2023.
  • Lee et al. (2023) Lee, Y., Barber, R. F., and Willett, R. Distribution-free inference with hierarchical data. arXiv preprint arXiv:2306.06342, 2023.
  • Lei et al. (2018) Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., and Wasserman, L. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523):1094–1111, 2018.
  • Lei & Candès (2021) Lei, L. and Candès, E. J. Conformal inference of counterfactuals and individual treatment effects. Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(5):911–938, 2021.
  • Li et al. (2023) Li, S., Cai, T., and Duan, R. Targeting underrepresented populations in precision medicine: A federated transfer learning approach. The Annals of Applied Statistics, 17(4):2970–2992, 2023.
  • Lu et al. (2023) Lu, C., Yu, Y., Karimireddy, S. P., Jordan, M., and Raskar, R. Federated conformal predictors for distributed uncertainty quantification. In International Conference on Machine Learning, pp. 22942–22964. PMLR, 2023.
  • Overman et al. (2019) Overman, D. M., Jacobs, M. L., O’Brien Jr, J. E., Kumar, S. R., Mayer Jr, J. E., Ebel, A., Clarke, D. R., and Jacobs, J. P. Ten years of data verification: the society of thoracic surgeons congenital heart surgery database audits. World Journal for Pediatric and Congenital Heart Surgery, 10(4):454–463, 2019.
  • Pasquali et al. (2016) Pasquali, S. K., Wallace, A. S., Gaynor, J. W., Jacobs, M. L., O’Brien, S. M., Hill, K. D., Gaies, M. G., Romano, J. C., Shahian, D. M., Mayer, J. E., et al. Congenital heart surgery case mix across north american centers and impact on performance assessment. The Annals of thoracic surgery, 102(5):1580–1587, 2016.
  • Peters et al. (2016) Peters, J., Bühlmann, P., and Meinshausen, N. Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):947–1012, 2016.
  • Plassier et al. (2023) Plassier, V., Makni, M., Rubashevskii, A., Moulines, E., and Panov, M. Conformal prediction for federated uncertainty quantification under label shift. In International Conference on Machine Learning, pp. 27907–27947. PMLR, 2023.
  • Podkopaev & Ramdas (2021) Podkopaev, A. and Ramdas, A. Distribution-free uncertainty quantification for classification under label shift. In Uncertainty in Artificial Intelligence, pp.  844–853. PMLR, 2021.
  • Qin (1998) Qin, J. Inferences for case-control and semiparametric two-sample density ratio models. Biometrika, 85(3):619–630, 1998.
  • Robins et al. (2000) Robins, J. M., Rotnitzky, A., and Scharfstein, D. O. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical models in epidemiology, the environment, and clinical trials, pp.  1–94. Springer, 2000.
  • Rotnitzky & Smucler (2020) Rotnitzky, A. and Smucler, E. Efficient adjustment sets for population average causal treatment effect estimation in graphical models. Journal of Machine Learning Research, 21:1–86, 2020.
  • Tabbutt et al. (2012) Tabbutt, S., Ghanayem, N., Ravishankar, C., Sleeper, L. A., Cooper, D. S., Frank, D. U., Lu, M., Pizarro, C., Frommelt, P., Goldberg, C. S., et al. Risk factors for hospital morbidity and mortality after the norwood procedure: a report from the pediatric heart network single ventricle reconstruction trial. The Journal of thoracic and cardiovascular surgery, 144(4):882–895, 2012.
  • Tibshirani et al. (2019) Tibshirani, R. J., Foygel Barber, R., Candes, E., and Ramdas, A. Conformal prediction under covariate shift. Advances in neural information processing systems, 32, 2019.
  • United States Census Bureau (2020) United States Census Bureau, G. D. Census regions and divisions of the united states, 2020. URL https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf.
  • van der Laan et al. (2007) van der Laan, M. J., Polley, E. C., and Hubbard, A. E. Super learner. Statistical applications in genetics and molecular biology, 6(1), 2007.
  • van der Vaart (2002) van der Vaart, A. Semiparametric statistics. In Lectures on probability theory and statistics (Saint-Flour, 1999), pp.  331–457. Springer, 2002.
  • Vo et al. (2022a) Vo, T. V., Bhattacharyya, A., Lee, Y., and Leong, T.-Y. An adaptive kernel approach to federated learning of heterogeneous causal effects. Advances in Neural Information Processing Systems, 35:24459–24473, 2022a.
  • Vo et al. (2022b) Vo, T. V., Lee, Y., Hoang, T. N., and Leong, T.-Y. Bayesian federated estimation of causal effects from observational data. In Uncertainty in Artificial Intelligence, pp.  2024–2034. PMLR, 2022b.
  • Vovk et al. (2005) Vovk, V., Gammerman, A., and Shafer, G. Algorithmic learning in a random world, volume 29. Springer, 2005.
  • Vovk et al. (2009) Vovk, V., Nouretdinov, I., and Gammerman, A. On-line predictive linear regression. The Annals of Statistics, pp.  1566–1590, 2009.
  • Xiong et al. (2023) Xiong, R., Koenecke, A., Powell, M., Shen, Z., Vogelstein, J. T., and Athey, S. Federated causal inference in heterogeneous observational data. Statistics in Medicine, 42(24):4418–4439, 2023.
  • Yang et al. (2024) Yang, Y., Kuchibhotla, A. K., and Tchetgen Tchetgen, E. Doubly robust calibration of prediction sets under covariate shift. Journal of the Royal Statistical Society Series B: Statistical Methodology, pp.  qkae009, 2024.
  • Yin et al. (2024) Yin, M., Shi, C., Wang, Y., and Blei, D. M. Conformal sensitivity analysis for individual treatment effects. Journal of the American Statistical Association, 119(545):122–135, 2024.
  • Zou (2006) Zou, H. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429, 2006.

Appendix A Technical Details

A.1 Proof of Theorem 2.5

Recall from Bickel et al. (1993) and van der Vaart (2002) that an influence function χ˙(𝒪;)˙𝜒𝒪\dot{\chi}(\mathcal{O};\mathbb{P})over˙ start_ARG italic_χ end_ARG ( caligraphic_O ; blackboard_P ) of a functional χ()𝜒\chi(\mathbb{P})italic_χ ( blackboard_P ) is a mean-zero finite variance function satisfying the following criterion:

ddϵχ(ϵ)|ϵ=0=𝔼(χ˙(𝒪;)u(𝒪)),evaluated-at𝑑𝑑italic-ϵ𝜒subscriptitalic-ϵitalic-ϵ0subscript𝔼˙𝜒𝒪𝑢𝒪\left.\frac{d}{d\epsilon}\chi(\mathbb{P}_{\epsilon})\right|_{\epsilon=0}=% \mathbb{E}_{\mathbb{P}}\left(\dot{\chi}(\mathcal{O};\mathbb{P})u(\mathcal{O})% \right),divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG italic_χ ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( over˙ start_ARG italic_χ end_ARG ( caligraphic_O ; blackboard_P ) italic_u ( caligraphic_O ) ) ,

for any regular parametric submodel {ϵ:ϵ[0,1)}conditional-setsubscriptitalic-ϵitalic-ϵ01\{\mathbb{P}_{\epsilon}:\epsilon\in[0,1)\}{ blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT : italic_ϵ ∈ [ 0 , 1 ) } such that 0subscript0\mathbb{P}_{0}\equiv\mathbb{P}blackboard_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≡ blackboard_P with score function u(𝒪)=ddϵlogdϵ|ϵ=0𝑢𝒪evaluated-at𝑑𝑑italic-ϵ𝑑subscriptitalic-ϵitalic-ϵ0u(\mathcal{O})=\left.\frac{d}{d\epsilon}\log{d\mathbb{P}_{\epsilon}}\right|_{% \epsilon=0}italic_u ( caligraphic_O ) = divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG roman_log italic_d blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT. The semiparametric efficient influence function is the unique such function belonging to the tangent space, ΛsubscriptΛ\Lambda_{\mathbb{P}}roman_Λ start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT, which is the closure of the linear span of all scores of regular parametric submodels through \mathbb{P}blackboard_P. To find an influence function, we take such a generic submodel, and differentiate an identifying estimating equation with respect to ϵitalic-ϵ\epsilonitalic_ϵ. Recall that

1α=𝔼((S(𝑿,Y)r0(α)()T=0,𝑿,R=1)T=0,R=0),1-\alpha=\mathbb{E}_{\mathbb{P}}(\mathbb{P}(S(\boldsymbol{X},Y)\leq r_{0}(% \alpha)(\mathbb{P})\mid T=0,\boldsymbol{X},R=1)\mid T=0,R=0),1 - italic_α = blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( blackboard_P ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) ∣ italic_T = 0 , bold_italic_X , italic_R = 1 ) ∣ italic_T = 0 , italic_R = 0 ) ,

which holds under Assumptions 2.1 and 2.2. Under Assumption 2.4, we may instead write

1α=𝔼((S(𝑿,Y)r0(α)()𝑿,R=1)T=0,R=0),1𝛼subscript𝔼formulae-sequenceconditional𝑆𝑿𝑌conditionalsubscript𝑟0𝛼𝑿𝑅1𝑇0𝑅01-\alpha=\mathbb{E}_{\mathbb{P}}(\mathbb{P}(S(\boldsymbol{X},Y)\leq r_{0}(% \alpha)(\mathbb{P})\mid\boldsymbol{X},R=1)\mid T=0,R=0),1 - italic_α = blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( blackboard_P ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) ∣ bold_italic_X , italic_R = 1 ) ∣ italic_T = 0 , italic_R = 0 ) ,

since CCOD and MAR together imply (R,T)Y𝑿perpendicular-toabsentperpendicular-to𝑅𝑇conditional𝑌𝑿(R,T)\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2.0mu{% \displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2.0% mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2.% 0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss}% \mkern 2.0mu{\scriptscriptstyle\perp}}}Y\mid\boldsymbol{X}( italic_R , italic_T ) start_RELOP ⟂ ⟂ end_RELOP italic_Y ∣ bold_italic_X. Thus, we have

00\displaystyle 0 =ddϵ𝔼ϵ(ϵ(S(𝑿,Y)r0(α)(ϵ)𝑿,R=1)T=0,R=0)|ϵ=0absentevaluated-at𝑑𝑑italic-ϵsubscript𝔼subscriptitalic-ϵformulae-sequenceconditionalsubscriptitalic-ϵ𝑆𝑿𝑌conditionalsubscript𝑟0𝛼subscriptitalic-ϵ𝑿𝑅1𝑇0𝑅0italic-ϵ0\displaystyle=\left.\frac{d}{d\epsilon}\mathbb{E}_{\mathbb{P}_{\epsilon}}(% \mathbb{P}_{\epsilon}(S(\boldsymbol{X},Y)\leq r_{0}(\alpha)(\mathbb{P}_{% \epsilon})\mid\boldsymbol{X},R=1)\mid T=0,R=0)\right|_{\epsilon=0}= divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) ∣ bold_italic_X , italic_R = 1 ) ∣ italic_T = 0 , italic_R = 0 ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT
=ddϵ𝔼ϵ((S(𝑿,Y)r0(α)()𝑿,R=1)T=0,R=0)|ϵ=0absentevaluated-at𝑑𝑑italic-ϵsubscript𝔼subscriptitalic-ϵformulae-sequenceconditional𝑆𝑿𝑌conditionalsubscript𝑟0𝛼𝑿𝑅1𝑇0𝑅0italic-ϵ0\displaystyle=\left.\frac{d}{d\epsilon}\mathbb{E}_{\mathbb{P}_{\epsilon}}(% \mathbb{P}(S(\boldsymbol{X},Y)\leq r_{0}(\alpha)(\mathbb{P})\mid\boldsymbol{X}% ,R=1)\mid T=0,R=0)\right|_{\epsilon=0}= divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( blackboard_P ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) ∣ bold_italic_X , italic_R = 1 ) ∣ italic_T = 0 , italic_R = 0 ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT
+ddϵ𝔼(ϵ(S(𝑿,Y)r0(α)()𝑿,R=1)T=0,R=0)|ϵ=0evaluated-at𝑑𝑑italic-ϵsubscript𝔼formulae-sequenceconditionalsubscriptitalic-ϵ𝑆𝑿𝑌conditionalsubscript𝑟0𝛼𝑿𝑅1𝑇0𝑅0italic-ϵ0\displaystyle\quad\quad+\left.\frac{d}{d\epsilon}\mathbb{E}_{\mathbb{P}}(% \mathbb{P}_{\epsilon}(S(\boldsymbol{X},Y)\leq r_{0}(\alpha)(\mathbb{P})\mid% \boldsymbol{X},R=1)\mid T=0,R=0)\right|_{\epsilon=0}+ divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) ∣ bold_italic_X , italic_R = 1 ) ∣ italic_T = 0 , italic_R = 0 ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT
+ddϵ𝔼((S(𝑿,Y)r0(α)(ϵ)𝑿,R=1)T=0,R=0)|ϵ=0evaluated-at𝑑𝑑italic-ϵsubscript𝔼formulae-sequenceconditional𝑆𝑿𝑌conditionalsubscript𝑟0𝛼subscriptitalic-ϵ𝑿𝑅1𝑇0𝑅0italic-ϵ0\displaystyle\quad\quad+\left.\frac{d}{d\epsilon}\mathbb{E}_{\mathbb{P}}(% \mathbb{P}(S(\boldsymbol{X},Y)\leq r_{0}(\alpha)(\mathbb{P}_{\epsilon})\mid% \boldsymbol{X},R=1)\mid T=0,R=0)\right|_{\epsilon=0}+ divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( blackboard_P ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) ∣ bold_italic_X , italic_R = 1 ) ∣ italic_T = 0 , italic_R = 0 ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT

Before proceeding, let uBCsubscript𝑢conditional𝐵𝐶u_{B\mid C}italic_u start_POSTSUBSCRIPT italic_B ∣ italic_C end_POSTSUBSCRIPT be the conditional score function for B𝐵Bitalic_B given C𝐶Citalic_C, for arbitrary B𝐵Bitalic_B and C𝐶Citalic_C, and note the key properties that (i) 𝔼(uBCC)=0subscript𝔼conditionalsubscript𝑢conditional𝐵𝐶𝐶0\mathbb{E}_{\mathbb{P}}(u_{B\mid C}\mid C)=0blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_B ∣ italic_C end_POSTSUBSCRIPT ∣ italic_C ) = 0, and (ii) uB,C=uBC+uCsubscript𝑢𝐵𝐶subscript𝑢conditional𝐵𝐶subscript𝑢𝐶u_{B,C}=u_{B\mid C}+u_{C}italic_u start_POSTSUBSCRIPT italic_B , italic_C end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_B ∣ italic_C end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT. Now, for the first of the above three terms, we have

ddϵ𝔼ϵ(m¯(r0,𝑿)T=0,R=0)|ϵ=0evaluated-at𝑑𝑑italic-ϵsubscript𝔼subscriptitalic-ϵformulae-sequenceconditional¯𝑚subscript𝑟0𝑿𝑇0𝑅0italic-ϵ0\displaystyle\left.\frac{d}{d\epsilon}\mathbb{E}_{\mathbb{P}_{\epsilon}}(% \overline{m}(r_{0},\boldsymbol{X})\mid T=0,R=0)\right|_{\epsilon=0}divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT
=𝔼({m¯(r0,𝑿)(1α)}u𝑿T=0,R=0T=0,R=0)absentsubscript𝔼formulae-sequenceconditional¯𝑚subscript𝑟0𝑿1𝛼subscript𝑢formulae-sequenceconditional𝑿𝑇0𝑅0𝑇0𝑅0\displaystyle=\mathbb{E}_{\mathbb{P}}(\{\overline{m}(r_{0},\boldsymbol{X})-(1-% \alpha)\}u_{\boldsymbol{X}\mid T=0,R=0}\mid T=0,R=0)= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( { over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) - ( 1 - italic_α ) } italic_u start_POSTSUBSCRIPT bold_italic_X ∣ italic_T = 0 , italic_R = 0 end_POSTSUBSCRIPT ∣ italic_T = 0 , italic_R = 0 )
=𝔼(I(T=0,R=0)[T=0,R=0]{m¯(r0,𝑿)(1α)}u𝑿T,R)absentsubscript𝔼𝐼formulae-sequence𝑇0𝑅0delimited-[]formulae-sequence𝑇0𝑅0¯𝑚subscript𝑟0𝑿1𝛼subscript𝑢conditional𝑿𝑇𝑅\displaystyle=\mathbb{E}_{\mathbb{P}}\left(\frac{I(T=0,R=0)}{\mathbb{P}[T=0,R=% 0]}\{\overline{m}(r_{0},\boldsymbol{X})-(1-\alpha)\}u_{\boldsymbol{X}\mid T,R}\right)= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P [ italic_T = 0 , italic_R = 0 ] end_ARG { over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) - ( 1 - italic_α ) } italic_u start_POSTSUBSCRIPT bold_italic_X ∣ italic_T , italic_R end_POSTSUBSCRIPT )
=𝔼(I(T=0,R=0)[T=0,R=0]{m¯(r0,𝑿)(1α)}u(𝒪)),absentsubscript𝔼𝐼formulae-sequence𝑇0𝑅0delimited-[]formulae-sequence𝑇0𝑅0¯𝑚subscript𝑟0𝑿1𝛼𝑢𝒪\displaystyle=\mathbb{E}_{\mathbb{P}}\left(\frac{I(T=0,R=0)}{\mathbb{P}[T=0,R=% 0]}\{\overline{m}(r_{0},\boldsymbol{X})-(1-\alpha)\}u(\mathcal{O})\right),= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P [ italic_T = 0 , italic_R = 0 ] end_ARG { over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) - ( 1 - italic_α ) } italic_u ( caligraphic_O ) ) ,

where in the last equality we are able to add in uT,Rsubscript𝑢𝑇𝑅u_{T,R}italic_u start_POSTSUBSCRIPT italic_T , italic_R end_POSTSUBSCRIPT since I(T=0,R=0){m¯(r0,𝑿)(1α)}𝐼formulae-sequence𝑇0𝑅0¯𝑚subscript𝑟0𝑿1𝛼I(T=0,R=0)\{\overline{m}(r_{0},\boldsymbol{X})-(1-\alpha)\}italic_I ( italic_T = 0 , italic_R = 0 ) { over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) - ( 1 - italic_α ) } has mean zero given (T,R)𝑇𝑅(T,R)( italic_T , italic_R ) by construction, and we can add in uRY𝑿,T,Rsubscript𝑢conditional𝑅𝑌𝑿𝑇𝑅u_{RY\mid\boldsymbol{X},T,R}italic_u start_POSTSUBSCRIPT italic_R italic_Y ∣ bold_italic_X , italic_T , italic_R end_POSTSUBSCRIPT since this has mean zero given (𝑿,T,R)𝑿𝑇𝑅(\boldsymbol{X},T,R)( bold_italic_X , italic_T , italic_R ). Similarly, for the second term above, we have

ddϵ𝔼(m¯ϵ(r0,𝑿)T=0,R=0)|ϵ=0evaluated-at𝑑𝑑italic-ϵsubscript𝔼formulae-sequenceconditionalsubscript¯𝑚italic-ϵsubscript𝑟0𝑿𝑇0𝑅0italic-ϵ0\displaystyle\left.\frac{d}{d\epsilon}\mathbb{E}_{\mathbb{P}}(\overline{m}_{% \epsilon}(r_{0},\boldsymbol{X})\mid T=0,R=0)\right|_{\epsilon=0}divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( over¯ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT
=𝔼(𝔼({I(S(𝑿,Y)r0)m¯(r0,𝑿)}uY𝑿,R=1𝑿,R=1)T=0,R=0)absentsubscript𝔼formulae-sequenceconditionalsubscript𝔼conditional𝐼𝑆𝑿𝑌subscript𝑟0¯𝑚subscript𝑟0𝑿subscript𝑢conditional𝑌𝑿𝑅1𝑿𝑅1𝑇0𝑅0\displaystyle=\mathbb{E}_{\mathbb{P}}\left(\mathbb{E}_{\mathbb{P}}(\{I(S(% \boldsymbol{X},Y)\leq r_{0})-\overline{m}(r_{0},\boldsymbol{X})\}u_{Y\mid% \boldsymbol{X},R=1}\mid\boldsymbol{X},R=1)\mid T=0,R=0\right)= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( { italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) } italic_u start_POSTSUBSCRIPT italic_Y ∣ bold_italic_X , italic_R = 1 end_POSTSUBSCRIPT ∣ bold_italic_X , italic_R = 1 ) ∣ italic_T = 0 , italic_R = 0 )
=𝔼(I(T=0,R=0)[T=0,R=0]𝔼(R[R=1𝑿]{I(S(𝑿,Y)r0)m¯(r0,𝑿)}uRY𝑿,R𝑿))absentsubscript𝔼𝐼formulae-sequence𝑇0𝑅0delimited-[]formulae-sequence𝑇0𝑅0subscript𝔼conditional𝑅delimited-[]𝑅conditional1𝑿𝐼𝑆𝑿𝑌subscript𝑟0¯𝑚subscript𝑟0𝑿subscript𝑢conditional𝑅𝑌𝑿𝑅𝑿\displaystyle=\mathbb{E}_{\mathbb{P}}\left(\frac{I(T=0,R=0)}{\mathbb{P}[T=0,R=% 0]}\mathbb{E}_{\mathbb{P}}\left(\frac{R}{\mathbb{P}[R=1\mid\boldsymbol{X}]}\{I% (S(\boldsymbol{X},Y)\leq r_{0})-\overline{m}(r_{0},\boldsymbol{X})\}u_{RY\mid% \boldsymbol{X},R}\mid\boldsymbol{X}\right)\right)= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P [ italic_T = 0 , italic_R = 0 ] end_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( divide start_ARG italic_R end_ARG start_ARG blackboard_P [ italic_R = 1 ∣ bold_italic_X ] end_ARG { italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) } italic_u start_POSTSUBSCRIPT italic_R italic_Y ∣ bold_italic_X , italic_R end_POSTSUBSCRIPT ∣ bold_italic_X ) )
=𝔼([T=0𝑿,R=0][T=0,R=0][R=0𝑿][R=1𝑿]R{I(S(𝑿,Y)r0)m¯(r0,𝑿)}u(𝒪)).absentsubscript𝔼delimited-[]𝑇conditional0𝑿𝑅0delimited-[]formulae-sequence𝑇0𝑅0delimited-[]𝑅conditional0𝑿delimited-[]𝑅conditional1𝑿𝑅𝐼𝑆𝑿𝑌subscript𝑟0¯𝑚subscript𝑟0𝑿𝑢𝒪\displaystyle=\mathbb{E}_{\mathbb{P}}\left(\frac{\mathbb{P}[T=0\mid\boldsymbol% {X},R=0]}{\mathbb{P}[T=0,R=0]}\frac{\mathbb{P}[R=0\mid\boldsymbol{X}]}{\mathbb% {P}[R=1\mid\boldsymbol{X}]}R\{I(S(\boldsymbol{X},Y)\leq r_{0})-\overline{m}(r_% {0},\boldsymbol{X})\}u(\mathcal{O})\right).= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( divide start_ARG blackboard_P [ italic_T = 0 ∣ bold_italic_X , italic_R = 0 ] end_ARG start_ARG blackboard_P [ italic_T = 0 , italic_R = 0 ] end_ARG divide start_ARG blackboard_P [ italic_R = 0 ∣ bold_italic_X ] end_ARG start_ARG blackboard_P [ italic_R = 1 ∣ bold_italic_X ] end_ARG italic_R { italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) } italic_u ( caligraphic_O ) ) .

Finally, for the third term above, we have

ddϵ𝔼(m¯(r0(α)(ϵ),𝑿)T=0,R=0)|ϵ=0evaluated-at𝑑𝑑italic-ϵsubscript𝔼formulae-sequenceconditional¯𝑚subscript𝑟0𝛼subscriptitalic-ϵ𝑿𝑇0𝑅0italic-ϵ0\displaystyle\left.\frac{d}{d\epsilon}\mathbb{E}_{\mathbb{P}}(\overline{m}(r_{% 0}(\alpha)(\mathbb{P}_{\epsilon}),\boldsymbol{X})\mid T=0,R=0)\right|_{% \epsilon=0}divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( over¯ start_ARG italic_m end_ARG ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) , bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT
=𝔼(pS𝑿,R=1(r0,𝑿)T=0,R=0)ddϵr0(α)(ϵ)|ϵ=0,absentevaluated-atsubscript𝔼formulae-sequenceconditionalsubscript𝑝conditional𝑆𝑿𝑅1subscript𝑟0𝑿𝑇0𝑅0𝑑𝑑italic-ϵsubscript𝑟0𝛼subscriptitalic-ϵitalic-ϵ0\displaystyle=\mathbb{E}_{\mathbb{P}}\left(p_{S\mid\boldsymbol{X},R=1}(r_{0},% \boldsymbol{X})\mid T=0,R=0\right)\left.\frac{d}{d\epsilon}r_{0}(\alpha)(% \mathbb{P}_{\epsilon})\right|_{\epsilon=0},= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_S ∣ bold_italic_X , italic_R = 1 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 ) divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT ,

where pS𝑿,R=1(r0,𝑿)subscript𝑝conditional𝑆𝑿𝑅1subscript𝑟0𝑿p_{S\mid\boldsymbol{X},R=1}(r_{0},\boldsymbol{X})italic_p start_POSTSUBSCRIPT italic_S ∣ bold_italic_X , italic_R = 1 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) is the conditional density of S(𝑿,Y)𝑆𝑿𝑌S(\boldsymbol{X},Y)italic_S ( bold_italic_X , italic_Y ) given 𝑿,R=1𝑿𝑅1\boldsymbol{X},R=1bold_italic_X , italic_R = 1, evaluated at r0subscript𝑟0r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Rearranging the original differentiated estimating equation, we have

ddϵr0(α)(ϵ)|ϵ=0=𝔼(r˙0CCOD(𝒪;)u(𝒪)),evaluated-at𝑑𝑑italic-ϵsubscript𝑟0𝛼subscriptitalic-ϵitalic-ϵ0subscript𝔼superscriptsubscript˙𝑟0CCOD𝒪𝑢𝒪\left.\frac{d}{d\epsilon}r_{0}(\alpha)(\mathbb{P}_{\epsilon})\right|_{\epsilon% =0}=\mathbb{E}_{\mathbb{P}}(\dot{r}_{0}^{\mathrm{CCOD}}(\mathcal{O};\mathbb{P}% )u(\mathcal{O})),divide start_ARG italic_d end_ARG start_ARG italic_d italic_ϵ end_ARG italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( over˙ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; blackboard_P ) italic_u ( caligraphic_O ) ) ,

where r˙0CCOD(𝒪;)={[T=0,R=0]𝔼(pS𝑿,R=1(r0,𝑿)T=0,R=0)}1φCCOD(𝒪;r0,m¯,η¯,q0)superscriptsubscript˙𝑟0CCOD𝒪superscriptdelimited-[]formulae-sequence𝑇0𝑅0subscript𝔼formulae-sequenceconditionalsubscript𝑝conditional𝑆𝑿𝑅1subscript𝑟0𝑿𝑇0𝑅01superscript𝜑CCOD𝒪subscript𝑟0¯𝑚¯𝜂subscript𝑞0\dot{r}_{0}^{\mathrm{CCOD}}(\mathcal{O};\mathbb{P})=-\{\mathbb{P}[T=0,R=0]% \mathbb{E}_{\mathbb{P}}\left(p_{S\mid\boldsymbol{X},R=1}(r_{0},\boldsymbol{X})% \mid T=0,R=0\right)\}^{-1}\varphi^{\mathrm{CCOD}}(\mathcal{O};r_{0},\overline{% m},\overline{\eta},q_{0})over˙ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; blackboard_P ) = - { blackboard_P [ italic_T = 0 , italic_R = 0 ] blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_S ∣ bold_italic_X , italic_R = 1 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 ) } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over¯ start_ARG italic_m end_ARG , over¯ start_ARG italic_η end_ARG , italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), and φCCODsuperscript𝜑CCOD\varphi^{\mathrm{CCOD}}italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT is as defined in Section 2.2. By Lemma 24 of Rotnitzky & Smucler (2020), the tangent space of the semiparametric model at \mathbb{P}blackboard_P is Λ=Λ𝑿ΛT𝑿ΛR𝑿,TΛRY𝑿,RsubscriptΛdirect-sumsubscriptΛ𝑿subscriptΛconditional𝑇𝑿subscriptΛconditional𝑅𝑿𝑇subscriptΛconditional𝑅𝑌𝑿𝑅\Lambda_{\mathbb{P}}=\Lambda_{\boldsymbol{X}}\oplus\Lambda_{T\mid\boldsymbol{X% }}\oplus\Lambda_{R\mid\boldsymbol{X},T}\oplus\Lambda_{RY\mid\boldsymbol{X},R}roman_Λ start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT = roman_Λ start_POSTSUBSCRIPT bold_italic_X end_POSTSUBSCRIPT ⊕ roman_Λ start_POSTSUBSCRIPT italic_T ∣ bold_italic_X end_POSTSUBSCRIPT ⊕ roman_Λ start_POSTSUBSCRIPT italic_R ∣ bold_italic_X , italic_T end_POSTSUBSCRIPT ⊕ roman_Λ start_POSTSUBSCRIPT italic_R italic_Y ∣ bold_italic_X , italic_R end_POSTSUBSCRIPT, where ΛBC={g(B,C)L2():𝔼(gC)=0}subscriptΛconditional𝐵𝐶conditional-set𝑔𝐵𝐶subscript𝐿2𝔼conditional𝑔𝐶0\Lambda_{B\mid C}=\{g(B,C)\in L_{2}(\mathbb{P}):\mathbb{E}(g\mid C)=0\}roman_Λ start_POSTSUBSCRIPT italic_B ∣ italic_C end_POSTSUBSCRIPT = { italic_g ( italic_B , italic_C ) ∈ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_P ) : blackboard_E ( italic_g ∣ italic_C ) = 0 }. It is straightforward to verify that r˙0CCOD(𝒪;)Λsuperscriptsubscript˙𝑟0CCOD𝒪subscriptΛ\dot{r}_{0}^{\mathrm{CCOD}}(\mathcal{O};\mathbb{P})\in\Lambda_{\mathbb{P}}over˙ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; blackboard_P ) ∈ roman_Λ start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT, so it is the semiparametric efficient influence function under Assumption 2.4.

A.2 Proof of Theorem 2.6

Write (f)=𝔼(f(𝒪)Dn)𝑓subscript𝔼conditional𝑓𝒪superscript𝐷𝑛\mathbb{P}(f)=\mathbb{E}_{\mathbb{P}}(f(\mathcal{O})\mid D^{n})blackboard_P ( italic_f ) = blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_f ( caligraphic_O ) ∣ italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ), for any f𝑓fitalic_f. Observe that for any r𝑟ritalic_r, possibly a function of training data Dnsuperscript𝐷𝑛D^{n}italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and for 𝒪Dnperpendicular-toabsentperpendicular-to𝒪superscript𝐷𝑛\mathcal{O}\mathchoice{\mathrel{\hbox to0.0pt{$\displaystyle\perp$\hss}\mkern 2% .0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$\textstyle\perp$\hss}\mkern 2% .0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptstyle\perp$\hss}\mkern 2% .0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$\scriptscriptstyle\perp$\hss% }\mkern 2.0mu{\scriptscriptstyle\perp}}}D^{n}caligraphic_O start_RELOP ⟂ ⟂ end_RELOP italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT,

(φCCOD(𝒪;r,m¯^,η¯^,q^0))superscript𝜑CCOD𝒪𝑟^¯𝑚^¯𝜂subscript^𝑞0\displaystyle\mathbb{P}\left(\varphi^{\mathrm{CCOD}}(\mathcal{O};r,\widehat{% \overline{m}},\widehat{\overline{\eta}},\widehat{q}_{0})\right)blackboard_P ( italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; italic_r , over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG , over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) )
=([T=0,R=0𝑿]{m¯^(r,𝑿)(1α)}+[R=1𝑿]η¯^(𝑿)q^0(𝑿){m¯(r,𝑿)m¯^(r,𝑿)})absentdelimited-[]formulae-sequence𝑇0𝑅conditional0𝑿^¯𝑚𝑟𝑿1𝛼delimited-[]𝑅conditional1𝑿^¯𝜂𝑿subscript^𝑞0𝑿¯𝑚𝑟𝑿^¯𝑚𝑟𝑿\displaystyle=\mathbb{P}\left(\mathbb{P}[T=0,R=0\mid\boldsymbol{X}]\left\{% \widehat{\overline{m}}(r,\boldsymbol{X})-(1-\alpha)\right\}+\mathbb{P}[R=1\mid% \boldsymbol{X}]\widehat{\overline{\eta}}(\boldsymbol{X})\widehat{q}_{0}(% \boldsymbol{X})\left\{\overline{m}(r,\boldsymbol{X})-\widehat{\overline{m}}(r,% \boldsymbol{X})\right\}\right)= blackboard_P ( blackboard_P [ italic_T = 0 , italic_R = 0 ∣ bold_italic_X ] { over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG ( italic_r , bold_italic_X ) - ( 1 - italic_α ) } + blackboard_P [ italic_R = 1 ∣ bold_italic_X ] over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG ( bold_italic_X ) over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) { over¯ start_ARG italic_m end_ARG ( italic_r , bold_italic_X ) - over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG ( italic_r , bold_italic_X ) } )
=([R=1𝑿][{q0(𝑿)η¯(𝑿)q^0(𝑿)η¯^(𝑿)}{m¯^(r,𝑿)m¯(r,𝑿)}+q0(𝑿)η¯(𝑿){m¯(r,𝑿)(1α)}]).absentdelimited-[]𝑅conditional1𝑿delimited-[]subscript𝑞0𝑿¯𝜂𝑿subscript^𝑞0𝑿^¯𝜂𝑿^¯𝑚𝑟𝑿¯𝑚𝑟𝑿subscript𝑞0𝑿¯𝜂𝑿¯𝑚𝑟𝑿1𝛼\displaystyle=\mathbb{P}\left(\mathbb{P}[R=1\mid\boldsymbol{X}]\left[\left\{q_% {0}(\boldsymbol{X})\overline{\eta}(\boldsymbol{X})-\widehat{q}_{0}(\boldsymbol% {X})\widehat{\overline{\eta}}(\boldsymbol{X})\right\}\left\{\widehat{\overline% {m}}(r,\boldsymbol{X})-\overline{m}(r,\boldsymbol{X})\right\}+q_{0}(% \boldsymbol{X})\overline{\eta}(\boldsymbol{X})\left\{\overline{m}(r,% \boldsymbol{X})-(1-\alpha)\right\}\right]\right).= blackboard_P ( blackboard_P [ italic_R = 1 ∣ bold_italic_X ] [ { italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) over¯ start_ARG italic_η end_ARG ( bold_italic_X ) - over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG ( bold_italic_X ) } { over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG ( italic_r , bold_italic_X ) - over¯ start_ARG italic_m end_ARG ( italic_r , bold_italic_X ) } + italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) over¯ start_ARG italic_η end_ARG ( bold_italic_X ) { over¯ start_ARG italic_m end_ARG ( italic_r , bold_italic_X ) - ( 1 - italic_α ) } ] ) .

Thus, omitting inputs, we have

(φCCOD(𝒪;r,m¯^,η¯^,q^0)φCCOD(𝒪;r,m¯,η¯,q0))=([R=1𝑿]{q0η¯q^0η¯^}{m¯^(r,)m¯(r,)})superscript𝜑CCOD𝒪𝑟^¯𝑚^¯𝜂subscript^𝑞0superscript𝜑CCOD𝒪𝑟¯𝑚¯𝜂subscript𝑞0delimited-[]𝑅conditional1𝑿subscript𝑞0¯𝜂subscript^𝑞0^¯𝜂^¯𝑚𝑟¯𝑚𝑟\mathbb{P}\left(\varphi^{\mathrm{CCOD}}(\mathcal{O};r,\widehat{\overline{m}},% \widehat{\overline{\eta}},\widehat{q}_{0})-\varphi^{\mathrm{CCOD}}(\mathcal{O}% ;r,\overline{m},\overline{\eta},q_{0})\right)=\mathbb{P}\left(\mathbb{P}[R=1% \mid\boldsymbol{X}]\left\{q_{0}\overline{\eta}-\widehat{q}_{0}\widehat{% \overline{\eta}}\right\}\left\{\widehat{\overline{m}}(r,\cdot)-\overline{m}(r,% \cdot)\right\}\right)blackboard_P ( italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; italic_r , over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG , over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; italic_r , over¯ start_ARG italic_m end_ARG , over¯ start_ARG italic_η end_ARG , italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) = blackboard_P ( blackboard_P [ italic_R = 1 ∣ bold_italic_X ] { italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT over¯ start_ARG italic_η end_ARG - over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG } { over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG ( italic_r , ⋅ ) - over¯ start_ARG italic_m end_ARG ( italic_r , ⋅ ) } ) (5)

On the other hand, by definition,

[YC^αCCOD(𝑿)T=0,R=0,Dn](1α)\displaystyle\mathbb{P}[Y\in\widehat{C}_{\alpha}^{\mathrm{CCOD}}(\boldsymbol{X% })\mid T=0,R=0,D^{n}]-(1-\alpha)blackboard_P [ italic_Y ∈ over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 , italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] - ( 1 - italic_α )
=[S(𝑿,Y)r^CCODT=0,R=0,Dn](1α)\displaystyle=\mathbb{P}[S(\boldsymbol{X},Y)\leq\widehat{r}^{\mathrm{CCOD}}% \mid T=0,R=0,D^{n}]-(1-\alpha)= blackboard_P [ italic_S ( bold_italic_X , italic_Y ) ≤ over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ∣ italic_T = 0 , italic_R = 0 , italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] - ( 1 - italic_α )
=𝔼(m¯(r^CCOD,𝑿)(1α)T=0,R=0,Dn)absentsubscript𝔼formulae-sequence¯𝑚superscript^𝑟CCOD𝑿conditional1𝛼𝑇0𝑅0superscript𝐷𝑛\displaystyle=\mathbb{E}_{\mathbb{P}}\left(\overline{m}(\widehat{r}^{\mathrm{% CCOD}},\boldsymbol{X})-(1-\alpha)\mid T=0,R=0,D^{n}\right)= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( over¯ start_ARG italic_m end_ARG ( over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT , bold_italic_X ) - ( 1 - italic_α ) ∣ italic_T = 0 , italic_R = 0 , italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )
=𝔼([R=1𝑿]q0(𝑿)η¯(𝑿){m¯(r^CCOD,𝑿)(1α)}Dn)[T=0,R=0]absentsubscript𝔼conditionaldelimited-[]𝑅conditional1𝑿subscript𝑞0𝑿¯𝜂𝑿¯𝑚superscript^𝑟CCOD𝑿1𝛼superscript𝐷𝑛delimited-[]formulae-sequence𝑇0𝑅0\displaystyle=\frac{\mathbb{E}_{\mathbb{P}}\left(\mathbb{P}[R=1\mid\boldsymbol% {X}]q_{0}(\boldsymbol{X})\overline{\eta}(\boldsymbol{X})\left\{\overline{m}(% \widehat{r}^{\mathrm{CCOD}},\boldsymbol{X})-(1-\alpha)\right\}\mid D^{n}\right% )}{\mathbb{P}[T=0,R=0]}= divide start_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( blackboard_P [ italic_R = 1 ∣ bold_italic_X ] italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) over¯ start_ARG italic_η end_ARG ( bold_italic_X ) { over¯ start_ARG italic_m end_ARG ( over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT , bold_italic_X ) - ( 1 - italic_α ) } ∣ italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG blackboard_P [ italic_T = 0 , italic_R = 0 ] end_ARG
=(φCCOD(𝒪;r^CCOD,m¯,η¯,q0))[T=0,R=0]absentsuperscript𝜑CCOD𝒪superscript^𝑟CCOD¯𝑚¯𝜂subscript𝑞0delimited-[]formulae-sequence𝑇0𝑅0\displaystyle=\frac{\mathbb{P}\left(\varphi^{\mathrm{CCOD}}(\mathcal{O};% \widehat{r}^{\mathrm{CCOD}},\overline{m},\overline{\eta},q_{0})\right)}{% \mathbb{P}[T=0,R=0]}= divide start_ARG blackboard_P ( italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT , over¯ start_ARG italic_m end_ARG , over¯ start_ARG italic_η end_ARG , italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) end_ARG start_ARG blackboard_P [ italic_T = 0 , italic_R = 0 ] end_ARG

Finally, we decompose

(φCCOD(𝒪;r^CCOD,m¯,η¯,q0))superscript𝜑CCOD𝒪superscript^𝑟CCOD¯𝑚¯𝜂subscript𝑞0\displaystyle\mathbb{P}\left(\varphi^{\mathrm{CCOD}}(\mathcal{O};\widehat{r}^{% \mathrm{CCOD}},\overline{m},\overline{\eta},q_{0})\right)blackboard_P ( italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT , over¯ start_ARG italic_m end_ARG , over¯ start_ARG italic_η end_ARG , italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) )
=n(φCCOD(𝒪;r^CCOD,m¯^,η¯^,q^0))absentsubscript𝑛superscript𝜑CCOD𝒪superscript^𝑟CCOD^¯𝑚^¯𝜂subscript^𝑞0\displaystyle=\mathbb{P}_{n}\left(\varphi^{\mathrm{CCOD}}(\mathcal{O};\widehat% {r}^{\mathrm{CCOD}},\widehat{\overline{m}},\widehat{\overline{\eta}},\widehat{% q}_{0})\right)= blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT , over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG , over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) )
(n)(φCCOD(𝒪;r^CCOD,m¯^,η¯^,q^0))subscript𝑛superscript𝜑CCOD𝒪superscript^𝑟CCOD^¯𝑚^¯𝜂subscript^𝑞0\displaystyle\quad\quad-(\mathbb{P}_{n}-\mathbb{P})\left(\varphi^{\mathrm{CCOD% }}(\mathcal{O};\widehat{r}^{\mathrm{CCOD}},\widehat{\overline{m}},\widehat{% \overline{\eta}},\widehat{q}_{0})\right)- ( blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - blackboard_P ) ( italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT , over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG , over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) )
(φCCOD(𝒪;r^CCOD,m¯^,η¯^,q^0)φCCOD(𝒪;r^CCOD,m¯,η¯,q0)),superscript𝜑CCOD𝒪superscript^𝑟CCOD^¯𝑚^¯𝜂subscript^𝑞0superscript𝜑CCOD𝒪superscript^𝑟CCOD¯𝑚¯𝜂subscript𝑞0\displaystyle\quad\quad-\mathbb{P}\left(\varphi^{\mathrm{CCOD}}(\mathcal{O};% \widehat{r}^{\mathrm{CCOD}},\widehat{\overline{m}},\widehat{\overline{\eta}},% \widehat{q}_{0})-\varphi^{\mathrm{CCOD}}(\mathcal{O};\widehat{r}^{\mathrm{CCOD% }},\overline{m},\overline{\eta},q_{0})\right),- blackboard_P ( italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT , over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG , over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT , over¯ start_ARG italic_m end_ARG , over¯ start_ARG italic_η end_ARG , italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) ,

By construction of r^CCODsuperscript^𝑟CCOD\widehat{r}^{\mathrm{CCOD}}over^ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT, the first term above is 0, while the third term is O(Rn)subscript𝑂subscript𝑅𝑛O_{\mathbb{P}}(R_{n})italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) by the product bias in (5) and boundedness of q0,η¯,q^0,η¯^subscript𝑞0¯𝜂subscript^𝑞0^¯𝜂q_{0},\overline{\eta},\widehat{q}_{0},\widehat{\overline{\eta}}italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over¯ start_ARG italic_η end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG. By our assumptions about m¯^^¯𝑚\widehat{\overline{m}}over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG (monotonicity and boundedness) and (η¯^,q^0)^¯𝜂subscript^𝑞0(\widehat{\overline{\eta}},\widehat{q}_{0})( over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) (boundedness), we can note that {φCCOD(;r,m¯^,η¯^,q^0):r}conditional-setsuperscript𝜑CCOD𝑟^¯𝑚^¯𝜂subscript^𝑞0𝑟\{\varphi^{\mathrm{CCOD}}(\,\cdot\,;r,\widehat{\overline{m}},\widehat{% \overline{\eta}},\widehat{q}_{0}):r\in\mathbb{R}\}{ italic_φ start_POSTSUPERSCRIPT roman_CCOD end_POSTSUPERSCRIPT ( ⋅ ; italic_r , over^ start_ARG over¯ start_ARG italic_m end_ARG end_ARG , over^ start_ARG over¯ start_ARG italic_η end_ARG end_ARG , over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) : italic_r ∈ blackboard_R } is a Donsker class (using similar arguments to Theorem 2 in Yang et al. (2024)), so that the second term is O(n1/2)subscript𝑂superscript𝑛12O_{\mathbb{P}}(n^{-1/2})italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ). Combining these yields the result.

A.3 Proof of Theorem 2.8

Here, we derive the influence function for a non-target source site k𝑘kitalic_k, used in Section 2.3, making the working assumption of a common conditional outcome distribution between the target site and site k𝑘kitalic_k, p(Y𝑿,T=0)=p(Y𝑿,T=k)𝑝conditional𝑌𝑿𝑇0𝑝conditional𝑌𝑿𝑇𝑘p(Y\mid\boldsymbol{X},T=0)=p(Y\mid\boldsymbol{X},T=k)italic_p ( italic_Y ∣ bold_italic_X , italic_T = 0 ) = italic_p ( italic_Y ∣ bold_italic_X , italic_T = italic_k ). Note that our data-adaptive method weights source sites that can violate CCOD; we use this working partial CCOD assumption only to derive the form of the efficient influence function to facilitate downstream analysis. Our derivation is very similar to that in the proof of Theorem 2.5. To begin, observe that

1α1𝛼\displaystyle 1-\alpha1 - italic_α =𝔼{[S(𝑿,Y)r0(α)()𝑿,T=0,R=1]T=0,R=0}\displaystyle=\mathbb{E}_{\mathbb{P}}\left\{{\mathbb{P}}[S(\boldsymbol{X},Y)% \leq r_{0}(\alpha)({{\mathbb{P}}})\mid\boldsymbol{X},T=0,R=1]\mid T=0,R=0\right\}= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT { blackboard_P [ italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) ∣ bold_italic_X , italic_T = 0 , italic_R = 1 ] ∣ italic_T = 0 , italic_R = 0 }
=𝔼{[S(𝑿,Y)r0(α)()𝑿,T=k,R=1]T=0,R=0}.\displaystyle=\mathbb{E}_{\mathbb{P}}\left\{\mathbb{P}[S(\boldsymbol{X},Y)\leq r% _{0}(\alpha)({\mathbb{P}})\mid\boldsymbol{X},T=k,R=1]\mid T=0,R=0\right\}.= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT { blackboard_P [ italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) ∣ bold_italic_X , italic_T = italic_k , italic_R = 1 ] ∣ italic_T = 0 , italic_R = 0 } .

In addition,

00\displaystyle 0 =ϵ(1α)|ϵ=0absentevaluated-atitalic-ϵ1𝛼italic-ϵ0\displaystyle=\frac{\partial}{\partial\epsilon}(1-\alpha)\Big{|}_{\epsilon=0}= divide start_ARG ∂ end_ARG start_ARG ∂ italic_ϵ end_ARG ( 1 - italic_α ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT
=ϵ𝔼ϵ{mk,ϵ(r0(α)(ϵ),𝑿)T=0,R=0}|ϵ=0absentevaluated-atitalic-ϵsubscript𝔼subscriptitalic-ϵconditional-setsubscript𝑚𝑘italic-ϵsubscript𝑟0𝛼subscriptitalic-ϵ𝑿formulae-sequence𝑇0𝑅0italic-ϵ0\displaystyle=\frac{\partial}{\partial\epsilon}\mathbb{E}_{{\mathbb{P}}_{% \epsilon}}\left\{m_{k,\epsilon}(r_{0}(\alpha)({\mathbb{P}}_{\epsilon}),% \boldsymbol{X})\mid T=0,R=0\right\}\Big{|}_{\epsilon=0}= divide start_ARG ∂ end_ARG start_ARG ∂ italic_ϵ end_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_POSTSUBSCRIPT { italic_m start_POSTSUBSCRIPT italic_k , italic_ϵ end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) , bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 } | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT
=𝔼{[mk(r0(α)(),𝑿)(1α)]u𝑿T=0,R=0T=0,R=0}absentsubscript𝔼conditional-setdelimited-[]subscript𝑚𝑘subscript𝑟0𝛼𝑿1𝛼subscript𝑢formulae-sequenceconditional𝑿𝑇0𝑅0formulae-sequence𝑇0𝑅0\displaystyle=\mathbb{E}_{\mathbb{P}}\left\{[m_{k}(r_{0}(\alpha)({\mathbb{P}})% ,\boldsymbol{X})-(1-\alpha)]u_{\boldsymbol{X}\mid T=0,R=0}\mid T=0,R=0\right\}= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT { [ italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) , bold_italic_X ) - ( 1 - italic_α ) ] italic_u start_POSTSUBSCRIPT bold_italic_X ∣ italic_T = 0 , italic_R = 0 end_POSTSUBSCRIPT ∣ italic_T = 0 , italic_R = 0 } (6)
+𝔼{𝔼{[I(S(𝑿,Y)r0(α)())mk(r0(α)(),𝑿)]uY𝑿,T=k,R=1𝑿,T=k,R=1}T=0,R=0}subscript𝔼conditional-setsubscript𝔼conditional-setdelimited-[]𝐼𝑆𝑿𝑌subscript𝑟0𝛼subscript𝑚𝑘subscript𝑟0𝛼𝑿subscript𝑢formulae-sequenceconditional𝑌𝑿𝑇𝑘𝑅1formulae-sequence𝑿𝑇𝑘𝑅1formulae-sequence𝑇0𝑅0\displaystyle\quad+\mathbb{E}_{\mathbb{P}}\big{\{}\mathbb{E}_{\mathbb{P}}\{[I(% S(\boldsymbol{X},Y)\leq r_{0}(\alpha)({\mathbb{P}}))-m_{k}(r_{0}(\alpha)({% \mathbb{P}}),\boldsymbol{X})]u_{Y\mid\boldsymbol{X},T=k,R=1}\mid\boldsymbol{X}% ,T=k,R=1\}\mid T=0,R=0\big{\}}+ blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT { blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT { [ italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) ) - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) , bold_italic_X ) ] italic_u start_POSTSUBSCRIPT italic_Y ∣ bold_italic_X , italic_T = italic_k , italic_R = 1 end_POSTSUBSCRIPT ∣ bold_italic_X , italic_T = italic_k , italic_R = 1 } ∣ italic_T = 0 , italic_R = 0 } (7)
+𝔼{fS𝑿,T=k,R=1(r0(α)()𝑿,T=k,R=1)T=0,R=0}Ck,0()ϵr0(α)(ϵ)|ϵ=0,evaluated-atsubscriptsubscript𝔼formulae-sequenceconditionalsubscript𝑓formulae-sequenceconditional𝑆𝑿𝑇𝑘𝑅1formulae-sequenceconditionalsubscript𝑟0𝛼𝑿𝑇𝑘𝑅1𝑇0𝑅0subscript𝐶𝑘0italic-ϵsubscript𝑟0𝛼subscriptitalic-ϵitalic-ϵ0\displaystyle\quad+\underbrace{\mathbb{E}_{\mathbb{P}}\{f_{S\mid\boldsymbol{X}% ,T=k,R=1}(r_{0}(\alpha)({\mathbb{P}})\mid\boldsymbol{X},T=k,R=1)\mid T=0,R=0\}% }_{C_{k,0}({\mathbb{P}})}\cdot\frac{\partial}{\partial\epsilon}r_{0}(\alpha)({% \mathbb{P}}_{\epsilon})\Big{|}_{\epsilon=0},+ under⏟ start_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT { italic_f start_POSTSUBSCRIPT italic_S ∣ bold_italic_X , italic_T = italic_k , italic_R = 1 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) ∣ bold_italic_X , italic_T = italic_k , italic_R = 1 ) ∣ italic_T = 0 , italic_R = 0 } end_ARG start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( blackboard_P ) end_POSTSUBSCRIPT ⋅ divide start_ARG ∂ end_ARG start_ARG ∂ italic_ϵ end_ARG italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT ,

where fS𝑿,T=k,R=1subscript𝑓formulae-sequenceconditional𝑆𝑿𝑇𝑘𝑅1f_{S\mid\boldsymbol{X},T=k,R=1}italic_f start_POSTSUBSCRIPT italic_S ∣ bold_italic_X , italic_T = italic_k , italic_R = 1 end_POSTSUBSCRIPT is the conditional density function of S(𝒙,y)𝑆𝒙𝑦S(\boldsymbol{x},y)italic_S ( bold_italic_x , italic_y ), i.e., the derivative of mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

Furthermore, we can write,

(6)italic-(6italic-)\displaystyle\eqref{eq:IF-der-1}italic_( italic_) =𝔼{I(T=0,R=0)(T=0,R=0)[mk(r0(α)(),𝑿)(1α)]u(0)},absentsubscript𝔼𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0delimited-[]subscript𝑚𝑘subscript𝑟0𝛼𝑿1𝛼𝑢0\displaystyle=\mathbb{E}_{\mathbb{P}}\left\{\frac{I(T=0,R=0)}{\mathbb{\mathbb{% P}}(T=0,R=0)}[m_{k}(r_{0}(\alpha)({\mathbb{P}}),\boldsymbol{X})-(1-\alpha)]u(0% )\right\},= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT { divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG [ italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) , bold_italic_X ) - ( 1 - italic_α ) ] italic_u ( 0 ) } ,
(7)italic-(7italic-)\displaystyle\eqref{eq:IF-der-2}italic_( italic_) =𝔼{I(T=0,R=0)(T=0,R=0)𝔼(I(T=k,R=1)(T=k,R=1𝑿)[I(S(𝑿,Y)r0(α)())mk(r0(α)(),𝑿)]uY𝑿,T,R𝑿)}absentsubscript𝔼𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0subscript𝔼conditional𝐼formulae-sequence𝑇𝑘𝑅1formulae-sequence𝑇𝑘𝑅conditional1𝑿delimited-[]𝐼𝑆𝑿𝑌subscript𝑟0𝛼subscript𝑚𝑘subscript𝑟0𝛼𝑿subscript𝑢conditional𝑌𝑿𝑇𝑅𝑿\displaystyle=\mathbb{E}_{\mathbb{P}}\left\{\frac{I(T=0,R=0)}{\mathbb{P}(T=0,R% =0)}\mathbb{E}_{\mathbb{P}}\left(\frac{I(T=k,R=1)}{\mathbb{P}(T=k,R=1\mid% \boldsymbol{X})}[I(S(\boldsymbol{X},Y)\leq r_{0}(\alpha)({\mathbb{P}}))-m_{k}(% r_{0}(\alpha)({\mathbb{P}}),\boldsymbol{X})]u_{Y\mid\boldsymbol{X},T,R}\mid% \boldsymbol{X}\right)\right\}= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT { divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( divide start_ARG italic_I ( italic_T = italic_k , italic_R = 1 ) end_ARG start_ARG blackboard_P ( italic_T = italic_k , italic_R = 1 ∣ bold_italic_X ) end_ARG [ italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) ) - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) , bold_italic_X ) ] italic_u start_POSTSUBSCRIPT italic_Y ∣ bold_italic_X , italic_T , italic_R end_POSTSUBSCRIPT ∣ bold_italic_X ) }
=𝔼{I(T=0,R=0)(T=0,R=0)(T=0,R=0𝑿)(T=k,R=1𝑿)[I(S(𝑿,Y)r0(α)())mk(r0(α)(),𝑿)]u(0)},absentsubscript𝔼𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅conditional0𝑿formulae-sequence𝑇𝑘𝑅conditional1𝑿delimited-[]𝐼𝑆𝑿𝑌subscript𝑟0𝛼subscript𝑚𝑘subscript𝑟0𝛼𝑿𝑢0\displaystyle=\mathbb{E}_{\mathbb{P}}\left\{\frac{I(T=0,R=0)}{\mathbb{P}(T=0,R% =0)}\frac{\mathbb{P}(T=0,R=0\mid\boldsymbol{X})}{\mathbb{P}(T=k,R=1\mid% \boldsymbol{X})}[I(S(\boldsymbol{X},Y)\leq r_{0}(\alpha)({\mathbb{P}}))-m_{k}(% r_{0}(\alpha)({\mathbb{P}}),\boldsymbol{X})]u(0)\right\},= blackboard_E start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT { divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG divide start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ∣ bold_italic_X ) end_ARG start_ARG blackboard_P ( italic_T = italic_k , italic_R = 1 ∣ bold_italic_X ) end_ARG [ italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) ) - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) , bold_italic_X ) ] italic_u ( 0 ) } ,

by the tower law. Therefore, rearranging the terms, we can obtain

ϵr0(α)(ϵ)|ϵ=0=Ck,0()1{(6)+(7)}.evaluated-atitalic-ϵsubscript𝑟0𝛼subscriptitalic-ϵitalic-ϵ0subscript𝐶𝑘0superscript1italic-(6italic-)italic-(7italic-)\displaystyle\frac{\partial}{\partial\epsilon}r_{0}(\alpha)({\mathbb{P}}_{% \epsilon})\Big{|}_{\epsilon=0}=-C_{k,0}({\mathbb{P}})^{-1}\{\eqref{eq:IF-der-1% }+\eqref{eq:IF-der-2}\}.divide start_ARG ∂ end_ARG start_ARG ∂ italic_ϵ end_ARG italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_ϵ = 0 end_POSTSUBSCRIPT = - italic_C start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( blackboard_P ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT { italic_( italic_) + italic_( italic_) } .

Therefore, an influence function of r0(α)()subscript𝑟0𝛼r_{0}(\alpha)(\cdot)italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( ⋅ ) at {\mathbb{P}}blackboard_P is

r˙0(α)(𝒪;)subscript˙𝑟0𝛼𝒪\displaystyle\dot{r}_{0}(\alpha)(\mathcal{O};{\mathbb{P}})over˙ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( caligraphic_O ; blackboard_P ) =Ck,0()1(T=0,R=0){I(T=0,R=0)[mk(r0(α)(),𝑿)=m0 under our assumption(1α)]\displaystyle=-\frac{C_{k,0}({\mathbb{P}})^{-1}}{\mathbb{P}(T=0,R=0)}\bigg{\{}% I(T=0,R=0)[\underbrace{m_{k}(r_{0}(\alpha)({\mathbb{P}}),\boldsymbol{X})}_{=m_% {0}\text{ under our assumption}}-(1-\alpha)]= - divide start_ARG italic_C start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( blackboard_P ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG { italic_I ( italic_T = 0 , italic_R = 0 ) [ under⏟ start_ARG italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) , bold_italic_X ) end_ARG start_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT under our assumption end_POSTSUBSCRIPT - ( 1 - italic_α ) ]
+I(T=k,R=1)(T=0,R=0𝑿)(T=k,R=1𝑿)[I(S(𝑿,Y)r0(α)()mk(r0(α)(),𝑿))]}\displaystyle\quad\quad\quad+I(T=k,R=1)\frac{\mathbb{P}(T=0,R=0\mid\boldsymbol% {X})}{\mathbb{P}(T=k,R=1\mid\boldsymbol{X})}[I(S(\boldsymbol{X},Y)\leq r_{0}(% \alpha)({\mathbb{P}})-m_{k}(r_{0}(\alpha)({\mathbb{P}}),\boldsymbol{X}))]\bigg% {\}}+ italic_I ( italic_T = italic_k , italic_R = 1 ) divide start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ∣ bold_italic_X ) end_ARG start_ARG blackboard_P ( italic_T = italic_k , italic_R = 1 ∣ bold_italic_X ) end_ARG [ italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_α ) ( blackboard_P ) , bold_italic_X ) ) ] }
=Ck,0()1(T=0,R=0)a probability constantφk(𝒪;r0,m0,mk,ωk,0).absentsubscriptsubscript𝐶𝑘0superscript1formulae-sequence𝑇0𝑅0a probability constantsubscript𝜑𝑘𝒪subscript𝑟0subscript𝑚0subscript𝑚𝑘subscript𝜔𝑘0\displaystyle=\underbrace{-\frac{C_{k,0}({\mathbb{P}})^{-1}}{\mathbb{P}(T=0,R=% 0)}}_{\text{a probability constant}}\varphi_{k}(\mathcal{O};r_{0},m_{0},m_{k},% \omega_{k,0}).= under⏟ start_ARG - divide start_ARG italic_C start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( blackboard_P ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG end_ARG start_POSTSUBSCRIPT a probability constant end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ) .

Observe that, by Bayes’ rule,

(T=0,R=0𝑿)(T=k,R=1𝑿)=(𝑿T=0,R=0)(𝑿T=k,R=1)ωk,0(𝑿)(T=0,R=0)(T=k,R=1).formulae-sequence𝑇0𝑅conditional0𝑿formulae-sequence𝑇𝑘𝑅conditional1𝑿subscriptformulae-sequenceconditional𝑿𝑇0𝑅0formulae-sequenceconditional𝑿𝑇𝑘𝑅1subscript𝜔𝑘0𝑿formulae-sequence𝑇0𝑅0formulae-sequence𝑇𝑘𝑅1\frac{\mathbb{P}(T=0,R=0\mid\boldsymbol{X})}{\mathbb{P}(T=k,R=1\mid\boldsymbol% {X})}=\underbrace{\frac{\mathbb{P}(\boldsymbol{X}\mid T=0,R=0)}{\mathbb{P}(% \boldsymbol{X}\mid T=k,R=1)}}_{\omega_{k,0}(\boldsymbol{X})}\cdot\frac{\mathbb% {P}(T=0,R=0)}{\mathbb{P}(T=k,R=1)}.divide start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ∣ bold_italic_X ) end_ARG start_ARG blackboard_P ( italic_T = italic_k , italic_R = 1 ∣ bold_italic_X ) end_ARG = under⏟ start_ARG divide start_ARG blackboard_P ( bold_italic_X ∣ italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( bold_italic_X ∣ italic_T = italic_k , italic_R = 1 ) end_ARG end_ARG start_POSTSUBSCRIPT italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( bold_italic_X ) end_POSTSUBSCRIPT ⋅ divide start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = italic_k , italic_R = 1 ) end_ARG .

Hence, we can work with

φk(𝒪;θ,m0,mk,ωk,0)=I(T=0,R=0)(T=0,R=0)[m0(θ,𝑿)(1α)]+I(T=k,R=1)(T=k,R=1)ωk,0(𝑿)[I(S(𝑿,Y)θ)mk(θ,𝑿)].subscript𝜑𝑘𝒪𝜃subscript𝑚0subscript𝑚𝑘subscript𝜔𝑘0𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0delimited-[]subscript𝑚0𝜃𝑿1𝛼𝐼formulae-sequence𝑇𝑘𝑅1formulae-sequence𝑇𝑘𝑅1subscript𝜔𝑘0𝑿delimited-[]𝐼𝑆𝑿𝑌𝜃subscript𝑚𝑘𝜃𝑿\displaystyle\varphi_{k}(\mathcal{O};\theta,m_{0},m_{k},\omega_{k,0})=\frac{I(% T=0,R=0)}{\mathbb{P}(T=0,R=0)}[m_{0}(\theta,\boldsymbol{X})-(1-\alpha)]+\frac{% I(T=k,R=1)}{\mathbb{P}(T=k,R=1)}\omega_{k,0}(\boldsymbol{X})[I(S(\boldsymbol{X% },Y)\leq\theta)-m_{k}(\theta,\boldsymbol{X})].italic_φ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O ; italic_θ , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ) = divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG [ italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_θ , bold_italic_X ) - ( 1 - italic_α ) ] + divide start_ARG italic_I ( italic_T = italic_k , italic_R = 1 ) end_ARG start_ARG blackboard_P ( italic_T = italic_k , italic_R = 1 ) end_ARG italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( bold_italic_X ) [ italic_I ( italic_S ( bold_italic_X , italic_Y ) ≤ italic_θ ) - italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ , bold_italic_X ) ] .

A.4 Proof of Theorem 2.9

Write rw=k=0K1wkr^ksubscript𝑟superscript𝑤superscriptsubscript𝑘0𝐾1subscript𝑤𝑘subscript^𝑟𝑘r_{w^{*}}=\sum_{k=0}^{K-1}w_{k}\widehat{r}_{k}italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. By construction of φjsubscript𝜑𝑗\varphi_{j}italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we have

[YC^αw(𝑿)T=0,R=0,Dn](1α)=[S(𝑿,Y)rw|T=0,R=0,Dn](1α)=(φj(𝒪;rw,m0,mj,ωj,0))[T=0,R=0],\displaystyle\begin{split}\mathbb{P}[Y\in\widehat{C}_{\alpha}^{w^{*}}(% \boldsymbol{X})\mid T=0,R=0,D^{n}]-(1-\alpha)&=\mathbb{P}\left[S(\boldsymbol{X% },Y)\leq r_{w^{*}}\,\middle|\,T=0,R=0,D^{n}\right]-(1-\alpha)\\ &=\frac{\mathbb{P}\left(\varphi_{j}(\mathcal{O};r_{w^{*}},m_{0},m_{j},\omega_{% j,0})\right)}{\mathbb{P}[T=0,R=0]},\end{split}start_ROW start_CELL blackboard_P [ italic_Y ∈ over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 , italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] - ( 1 - italic_α ) end_CELL start_CELL = blackboard_P [ italic_S ( bold_italic_X , italic_Y ) ≤ italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_T = 0 , italic_R = 0 , italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] - ( 1 - italic_α ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG blackboard_P ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) ) end_ARG start_ARG blackboard_P [ italic_T = 0 , italic_R = 0 ] end_ARG , end_CELL end_ROW (8)

where the last equality holds for all j𝒮𝑗superscript𝒮j\in\mathcal{S}^{*}italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, and the numerator could also be replaced by (φ0(𝒪;rw,m0,η0))subscript𝜑0𝒪subscript𝑟superscript𝑤subscript𝑚0subscript𝜂0\mathbb{P}\left(\varphi_{0}(\mathcal{O};r_{w^{*}},m_{0},\eta_{0})\right)blackboard_P ( italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ). Now, see that for any j𝒮𝑗superscript𝒮j\in\mathcal{S}^{*}italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT,

(φj(𝒪;rw,m0,mj,ωj,0))subscript𝜑𝑗𝒪subscript𝑟superscript𝑤subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0\displaystyle\mathbb{P}\left(\varphi_{j}(\mathcal{O};r_{w^{*}},m_{0},m_{j},% \omega_{j,0})\right)blackboard_P ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) )
=(φj(𝒪;rw,m0,mj,ωj,0)φj(𝒪;r^j,m0,mj,ωj,0))+(φj(𝒪;r^j,m0,mj,ωj,0))absentsubscript𝜑𝑗𝒪subscript𝑟superscript𝑤subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0subscript𝜑𝑗𝒪subscript^𝑟𝑗subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0subscript𝜑𝑗𝒪subscript^𝑟𝑗subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0\displaystyle=\mathbb{P}\left(\varphi_{j}(\mathcal{O};r_{w^{*}},m_{0},m_{j},% \omega_{j,0})-\varphi_{j}(\mathcal{O};\widehat{r}_{j},m_{0},m_{j},\omega_{j,0}% )\right)+\mathbb{P}\left(\varphi_{j}(\mathcal{O};\widehat{r}_{j},m_{0},m_{j},% \omega_{j,0})\right)= blackboard_P ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) - italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) ) + blackboard_P ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) )
=(I(T=0,R=0)(T=0,R=0){m0(rw,𝑿)m0(r^j,𝑿)})+(φj(𝒪;r^j,m0,mj,ωj,0)).absent𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0subscript𝑚0subscript𝑟superscript𝑤𝑿subscript𝑚0subscript^𝑟𝑗𝑿subscript𝜑𝑗𝒪subscript^𝑟𝑗subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0\displaystyle=\mathbb{P}\left(\frac{I(T=0,R=0)}{\mathbb{P}(T=0,R=0)}\left\{m_{% 0}(r_{w^{*}},\boldsymbol{X})-m_{0}(\widehat{r}_{j},\boldsymbol{X})\right\}% \right)+\mathbb{P}\left(\varphi_{j}(\mathcal{O};\widehat{r}_{j},m_{0},m_{j},% \omega_{j,0})\right).= blackboard_P ( divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG { italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , bold_italic_X ) - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_X ) } ) + blackboard_P ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) ) .

Further, as in the proof of Theorem 2.6, we can decompose the latter term as

(φj(𝒪;r^j,m0,mj,ωj,0))subscript𝜑𝑗𝒪subscript^𝑟𝑗subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0\displaystyle\mathbb{P}\left(\varphi_{j}(\mathcal{O};\widehat{r}_{j},m_{0},m_{% j},\omega_{j,0})\right)blackboard_P ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) ) =n(φj(𝒪;r^j,m^0,m^j,ω^j,0))absentsubscript𝑛subscript𝜑𝑗𝒪subscript^𝑟𝑗subscript^𝑚0subscript^𝑚𝑗subscript^𝜔𝑗0\displaystyle=\mathbb{P}_{n}\left(\varphi_{j}(\mathcal{O};\widehat{r}_{j},% \widehat{m}_{0},\widehat{m}_{j},\widehat{\omega}_{j,0})\right)= blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) )
(n)(φj(𝒪;r^j,m^0,m^j,ω^j,0))subscript𝑛subscript𝜑𝑗𝒪subscript^𝑟𝑗subscript^𝑚0subscript^𝑚𝑗subscript^𝜔𝑗0\displaystyle\quad\quad-(\mathbb{P}_{n}-\mathbb{P})\left(\varphi_{j}(\mathcal{% O};\widehat{r}_{j},\widehat{m}_{0},\widehat{m}_{j},\widehat{\omega}_{j,0})\right)- ( blackboard_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - blackboard_P ) ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) )
(φj(𝒪;r^j,m^0,m^j,ω^j,0)φj(𝒪;r^j,m0,mj,ωj,0)).subscript𝜑𝑗𝒪subscript^𝑟𝑗subscript^𝑚0subscript^𝑚𝑗subscript^𝜔𝑗0subscript𝜑𝑗𝒪subscript^𝑟𝑗subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0\displaystyle\quad\quad-\mathbb{P}\left(\varphi_{j}(\mathcal{O};\widehat{r}_{j% },\widehat{m}_{0},\widehat{m}_{j},\widehat{\omega}_{j,0})-\varphi_{j}(\mathcal% {O};\widehat{r}_{j},m_{0},m_{j},\omega_{j,0})\right).- blackboard_P ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) - italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) ) .

By construction of r^jsubscript^𝑟𝑗\widehat{r}_{j}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, the first term in this sum is 0, the second O(n1/2)subscript𝑂superscript𝑛12O_{\mathbb{P}}(n^{-1/2})italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) because {φj(;r,m^0,m^j,ω^j,0):r}conditional-setsubscript𝜑𝑗𝑟subscript^𝑚0subscript^𝑚𝑗subscript^𝜔𝑗0𝑟\{\varphi_{j}(\,\cdot\,;r,\widehat{m}_{0},\widehat{m}_{j},\widehat{\omega}_{j,% 0}):r\in\mathbb{R}\}{ italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ⋅ ; italic_r , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) : italic_r ∈ blackboard_R } is a Donsker class under our assumptions, and the third term is O(Rn,j+n1/2)subscript𝑂superscriptsubscript𝑅𝑛𝑗superscript𝑛12O_{\mathbb{P}}(R_{n,j}^{*}+n^{-1/2})italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_R start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ), where

Rn,j=suprm^0(r,)m^j(r,)+ω^j,0ωj,0suprm^j(r,)mj(r,),superscriptsubscript𝑅𝑛𝑗subscriptsupremum𝑟delimited-∥∥subscript^𝑚0𝑟subscript^𝑚𝑗𝑟delimited-∥∥subscript^𝜔𝑗0subscript𝜔𝑗0subscriptsupremum𝑟delimited-∥∥subscript^𝑚𝑗𝑟subscript𝑚𝑗𝑟R_{n,j}^{*}=\sup_{r}\lVert\widehat{m}_{0}(r,\cdot)-\widehat{m}_{j}(r,\cdot)% \rVert+\lVert\widehat{\omega}_{j,0}-\omega_{j,0}\rVert\cdot\sup_{r}\lVert% \widehat{m}_{j}(r,\cdot)-m_{j}(r,\cdot)\rVert,italic_R start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_sup start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∥ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , ⋅ ) - over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r , ⋅ ) ∥ + ∥ over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT - italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ∥ ⋅ roman_sup start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∥ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r , ⋅ ) - italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r , ⋅ ) ∥ ,

since, assuming [T=0,R=0]delimited-[]formulae-sequence𝑇0𝑅0\mathbb{P}[T=0,R=0]blackboard_P [ italic_T = 0 , italic_R = 0 ] is estimated by sample means in the training data (so that ^[T=0,R=0][T=0,R=0]=O(n1/2)^delimited-[]formulae-sequence𝑇0𝑅0delimited-[]formulae-sequence𝑇0𝑅0subscript𝑂superscript𝑛12\widehat{\mathbb{P}}[T=0,R=0]-\mathbb{P}[T=0,R=0]=O_{\mathbb{P}}(n^{-1/2})over^ start_ARG blackboard_P end_ARG [ italic_T = 0 , italic_R = 0 ] - blackboard_P [ italic_T = 0 , italic_R = 0 ] = italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT )), we have for any r𝑟ritalic_r possibly dependent on Dnsuperscript𝐷𝑛D^{n}italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT,

(φj(𝒪;r,m^0,m^j,ω^j,0)φj(𝒪;r,m0,mj,ωj,0))subscript𝜑𝑗𝒪𝑟subscript^𝑚0subscript^𝑚𝑗subscript^𝜔𝑗0subscript𝜑𝑗𝒪𝑟subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0\displaystyle\mathbb{P}\left(\varphi_{j}(\mathcal{O};r,\widehat{m}_{0},% \widehat{m}_{j},\widehat{\omega}_{j,0})-\varphi_{j}(\mathcal{O};r,m_{0},m_{j},% \omega_{j,0})\right)blackboard_P ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; italic_r , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) - italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( caligraphic_O ; italic_r , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) )
=(I(T=0,R=0)(T=0,R=0)[m^0(r,)m0(r,)]+I(T=j,R=1)(T=j,R=1)ω^j,0[mj(r,)m^j(r,)])+O(n1/2)absent𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0delimited-[]subscript^𝑚0𝑟subscript𝑚0𝑟𝐼formulae-sequence𝑇𝑗𝑅1formulae-sequence𝑇𝑗𝑅1subscript^𝜔𝑗0delimited-[]subscript𝑚𝑗𝑟subscript^𝑚𝑗𝑟subscript𝑂superscript𝑛12\displaystyle=\mathbb{P}\left(\frac{I(T=0,R=0)}{\mathbb{P}(T=0,R=0)}[\widehat{% m}_{0}(r,\cdot)-m_{0}(r,\cdot)]+\frac{I(T=j,R=1)}{\mathbb{P}(T=j,R=1)}\widehat% {\omega}_{j,0}[m_{j}(r,\cdot)-\widehat{m}_{j}(r,\cdot)]\right)+O_{\mathbb{P}}(% n^{-1/2})= blackboard_P ( divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG [ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , ⋅ ) - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , ⋅ ) ] + divide start_ARG italic_I ( italic_T = italic_j , italic_R = 1 ) end_ARG start_ARG blackboard_P ( italic_T = italic_j , italic_R = 1 ) end_ARG over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT [ italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r , ⋅ ) - over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r , ⋅ ) ] ) + italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT )
=([T=j,R=1𝑿][T=j,R=1]{ωj,0[m^0(r,)m0(r,)]+ω^j,0[mj(r,)m^j(r,)]})+O(n1/2)absentdelimited-[]formulae-sequence𝑇𝑗𝑅conditional1𝑿delimited-[]formulae-sequence𝑇𝑗𝑅1subscript𝜔𝑗0delimited-[]subscript^𝑚0𝑟subscript𝑚0𝑟subscript^𝜔𝑗0delimited-[]subscript𝑚𝑗𝑟subscript^𝑚𝑗𝑟subscript𝑂superscript𝑛12\displaystyle=\mathbb{P}\left(\frac{\mathbb{P}[T=j,R=1\mid\boldsymbol{X}]}{% \mathbb{P}[T=j,R=1]}\left\{\omega_{j,0}[\widehat{m}_{0}(r,\cdot)-m_{0}(r,\cdot% )]+\widehat{\omega}_{j,0}[m_{j}(r,\cdot)-\widehat{m}_{j}(r,\cdot)]\right\}% \right)+O_{\mathbb{P}}(n^{-1/2})= blackboard_P ( divide start_ARG blackboard_P [ italic_T = italic_j , italic_R = 1 ∣ bold_italic_X ] end_ARG start_ARG blackboard_P [ italic_T = italic_j , italic_R = 1 ] end_ARG { italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT [ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , ⋅ ) - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , ⋅ ) ] + over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT [ italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r , ⋅ ) - over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r , ⋅ ) ] } ) + italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT )
=O(Rn,j+n1/2).absentsubscript𝑂superscriptsubscript𝑅𝑛𝑗superscript𝑛12\displaystyle=O_{\mathbb{P}}\left(R_{n,j}^{*}+n^{-1/2}\right).= italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_R start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) .

For the target site,

(φ0(𝒪;rw,m0,η0))subscript𝜑0𝒪subscript𝑟superscript𝑤subscript𝑚0subscript𝜂0\displaystyle\mathbb{P}\left(\varphi_{0}(\mathcal{O};r_{w^{*}},m_{0},\eta_{0})\right)blackboard_P ( italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) )
=(φ0(𝒪;rw,m0,η0)φ0(𝒪;r^0,m0,η0))+(φ0(𝒪;r^0,m0,η0))absentsubscript𝜑0𝒪subscript𝑟superscript𝑤subscript𝑚0subscript𝜂0subscript𝜑0𝒪subscript^𝑟0subscript𝑚0subscript𝜂0subscript𝜑0𝒪subscript^𝑟0subscript𝑚0subscript𝜂0\displaystyle=\mathbb{P}\left(\varphi_{0}(\mathcal{O};r_{w^{*}},m_{0},\eta_{0}% )-\varphi_{0}(\mathcal{O};\widehat{r}_{0},m_{0},\eta_{0})\right)+\mathbb{P}% \left(\varphi_{0}(\mathcal{O};\widehat{r}_{0},m_{0},\eta_{0})\right)= blackboard_P ( italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) + blackboard_P ( italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) )
=(I(T=0,R=0)(T=0,R=0){m0(rw,𝑿)m0(r^0,𝑿)})+(φ0(𝒪;r^0,m0,η0))absent𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0subscript𝑚0subscript𝑟superscript𝑤𝑿subscript𝑚0subscript^𝑟0𝑿subscript𝜑0𝒪subscript^𝑟0subscript𝑚0subscript𝜂0\displaystyle=\mathbb{P}\left(\frac{I(T=0,R=0)}{\mathbb{P}(T=0,R=0)}\left\{m_{% 0}(r_{w^{*}},\boldsymbol{X})-m_{0}(\widehat{r}_{0},\boldsymbol{X})\right\}% \right)+\mathbb{P}\left(\varphi_{0}(\mathcal{O};\widehat{r}_{0},m_{0},\eta_{0}% )\right)= blackboard_P ( divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG { italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , bold_italic_X ) - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_X ) } ) + blackboard_P ( italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) )

and by Theorem 3 in Yang et al. (2024), (φ0(𝒪;r^0,m0,η0))=O(Rn,0+n1/2)subscript𝜑0𝒪subscript^𝑟0subscript𝑚0subscript𝜂0subscript𝑂superscriptsubscript𝑅𝑛0superscript𝑛12\mathbb{P}\left(\varphi_{0}(\mathcal{O};\widehat{r}_{0},m_{0},\eta_{0})\right)% =O_{\mathbb{P}}\left(R_{n,0}^{*}+n^{-1/2}\right)blackboard_P ( italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) = italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_R start_POSTSUBSCRIPT italic_n , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ), where

Rn,0=η^0η0suprm^0(r,)m0(r,).superscriptsubscript𝑅𝑛0delimited-∥∥subscript^𝜂0subscript𝜂0subscriptsupremum𝑟delimited-∥∥subscript^𝑚0𝑟subscript𝑚0𝑟R_{n,0}^{*}=\lVert\widehat{\eta}_{0}-\eta_{0}\rVert\sup_{r}\lVert\widehat{m}_{% 0}(r,\cdot)-m_{0}(r,\cdot)\rVert.italic_R start_POSTSUBSCRIPT italic_n , 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ∥ over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ roman_sup start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∥ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , ⋅ ) - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r , ⋅ ) ∥ .

It remains to characterize (I(T=0,R=0)(T=0,R=0){m0(rw,𝑿)m0(r^j,𝑿)})𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0subscript𝑚0subscript𝑟superscript𝑤𝑿subscript𝑚0subscript^𝑟𝑗𝑿\mathbb{P}\left(\frac{I(T=0,R=0)}{\mathbb{P}(T=0,R=0)}\left\{m_{0}(r_{w^{*}},% \boldsymbol{X})-m_{0}(\widehat{r}_{j},\boldsymbol{X})\right\}\right)blackboard_P ( divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG { italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , bold_italic_X ) - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_X ) } ), for j𝒮{0}𝑗superscript𝒮0j\in\mathcal{S}^{*}\cup\{0\}italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ { 0 }: in Lemma A.1, we show that these terms are each O(Rn,j+k=0K1wkRn,k+n1/2)subscript𝑂superscriptsubscript𝑅𝑛𝑗superscriptsubscript𝑘0𝐾1subscript𝑤𝑘superscriptsubscript𝑅𝑛𝑘superscript𝑛12O_{\mathbb{P}}(R_{n,j}^{*}+\sum_{k=0}^{K-1}w_{k}R_{n,k}^{*}+n^{-1/2})italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_R start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ). Combining all these results, in view of (8), we conclude that

[YC^αw(𝑿)T=0,R=0,Dn](1α)=O(minj𝒮{0}Rn,j+k=0K1wkRn,k+n1/2)=O(Rn+n1/2),\mathbb{P}[Y\in\widehat{C}_{\alpha}^{w^{*}}(\boldsymbol{X})\mid T=0,R=0,D^{n}]% -(1-\alpha)=O_{\mathbb{P}}\left(\min_{j\in\mathcal{S}^{*}\cup\{0\}}R_{n,j}^{*}% +\sum_{k=0}^{K-1}w_{k}R_{n,k}^{*}+n^{-1/2}\right)=O_{\mathbb{P}}\left(R_{n}^{*% }+n^{-1/2}\right),blackboard_P [ italic_Y ∈ over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ( bold_italic_X ) ∣ italic_T = 0 , italic_R = 0 , italic_D start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ] - ( 1 - italic_α ) = italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( roman_min start_POSTSUBSCRIPT italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ { 0 } end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) = italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) ,

which completes the proof.

Lemma A.1.

Let F0(r)=[S(𝐗,Y)rT=0,R=0]F_{0}(r)=\mathbb{P}[S(\boldsymbol{X},Y)\leq r\mid T=0,R=0]italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r ) = blackboard_P [ italic_S ( bold_italic_X , italic_Y ) ≤ italic_r ∣ italic_T = 0 , italic_R = 0 ] for r𝑟r\in\mathbb{R}italic_r ∈ blackboard_R, i.e., F0subscript𝐹0F_{0}italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the (marginal) cdf of the conformal score, given T=0,R=0formulae-sequence𝑇0𝑅0T=0,R=0italic_T = 0 , italic_R = 0. Suppose the conditions of Theorem 2.9 hold, as well as the following conditions:

  1. (i)

    F0subscript𝐹0F_{0}italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is L𝐿Litalic_L-Lipschitz in a neighborhood around r0subscript𝑟0r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT,

  2. (ii)

    r^jr0subscript^𝑟𝑗subscript𝑟0\widehat{r}_{j}\overset{\mathbb{P}}{\to}r_{0}over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT overblackboard_P start_ARG → end_ARG italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, suprm^j(r,)mj(r,)=o(1)subscriptsupremum𝑟delimited-∥∥subscript^𝑚𝑗𝑟subscript𝑚𝑗𝑟subscript𝑜1\sup_{r}\lVert\widehat{m}_{j}(r,\cdot)-m_{j}(r,\cdot)\rVert=o_{\mathbb{P}}(1)roman_sup start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∥ over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r , ⋅ ) - italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r , ⋅ ) ∥ = italic_o start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( 1 ), for all j𝒮{0}𝑗superscript𝒮0j\in\mathcal{S}^{*}\cup\{0\}italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ { 0 }, η^0η0=o(1)delimited-∥∥subscript^𝜂0subscript𝜂0subscript𝑜1\lVert\widehat{\eta}_{0}-\eta_{0}\rVert=o_{\mathbb{P}}(1)∥ over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ = italic_o start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( 1 ), and ω^j,0ωj,0=o(1)delimited-∥∥subscript^𝜔𝑗0subscript𝜔𝑗0subscript𝑜1\lVert\widehat{\omega}_{j,0}-\omega_{j,0}\rVert=o_{\mathbb{P}}(1)∥ over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT - italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ∥ = italic_o start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( 1 ) for all j𝒮𝑗superscript𝒮j\in\mathcal{S}^{*}italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, where the associated rates of convergence may be arbitrarily slow,

  3. (iii)

    The maps r(φj(O;r,m0,mj,ωj,0))maps-to𝑟subscript𝜑𝑗𝑂𝑟subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0r\mapsto\mathbb{P}\left(\varphi_{j}(O;r,m_{0},m_{j},\omega_{j,0})\right)italic_r ↦ blackboard_P ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_O ; italic_r , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) ) for j𝒮𝑗superscript𝒮j\in\mathcal{S}^{*}italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, r(φ0(O;m0,η0))maps-to𝑟subscript𝜑0𝑂subscript𝑚0subscript𝜂0r\mapsto\mathbb{P}\left(\varphi_{0}(O;m_{0},\eta_{0})\right)italic_r ↦ blackboard_P ( italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_O ; italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) are differentiable at r0subscript𝑟0r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, uniformly in the nuisance functions, the derivative matrices ddr(φj(O;r,m0,mj,ωj,0))|r=r0Vj(r0;m0,mj,ωj,0)evaluated-at𝑑𝑑𝑟subscript𝜑𝑗𝑂𝑟subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0𝑟subscript𝑟0subscript𝑉𝑗subscript𝑟0subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0\left.\frac{d}{dr}\mathbb{P}\left(\varphi_{j}(O;r,m_{0},m_{j},\omega_{j,0})% \right)\right|_{r=r_{0}}\eqqcolon V_{j}(r_{0};m_{0},m_{j},\omega_{j,0})divide start_ARG italic_d end_ARG start_ARG italic_d italic_r end_ARG blackboard_P ( italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_O ; italic_r , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) ) | start_POSTSUBSCRIPT italic_r = italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≕ italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) and ddr(φ0(O;r,m0,η0))|r=r0V0(r0;m0,η0)evaluated-at𝑑𝑑𝑟subscript𝜑0𝑂𝑟subscript𝑚0subscript𝜂0𝑟subscript𝑟0subscript𝑉0subscript𝑟0subscript𝑚0subscript𝜂0\left.\frac{d}{dr}\mathbb{P}\left(\varphi_{0}(O;r,m_{0},\eta_{0})\right)\right% |_{r=r_{0}}\eqqcolon V_{0}(r_{0};m_{0},\eta_{0})divide start_ARG italic_d end_ARG start_ARG italic_d italic_r end_ARG blackboard_P ( italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_O ; italic_r , italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) | start_POSTSUBSCRIPT italic_r = italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≕ italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) are invertible, Vj(r0;m^0,m^j,ω^j,0)Vj(r0;m0,mj,ωj,0)subscript𝑉𝑗subscript𝑟0subscript^𝑚0subscript^𝑚𝑗subscript^𝜔𝑗0subscript𝑉𝑗subscript𝑟0subscript𝑚0subscript𝑚𝑗subscript𝜔𝑗0V_{j}(r_{0};\widehat{m}_{0},\widehat{m}_{j},\widehat{\omega}_{j,0})\overset{% \mathbb{P}}{\to}V_{j}(r_{0};m_{0},m_{j},\omega_{j,0})italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) overblackboard_P start_ARG → end_ARG italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ω start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) for r𝒮𝑟superscript𝒮r\in\mathcal{S}^{*}italic_r ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, and V0(r0;m^0,η0^)V0(r0;m0,η0)subscript𝑉0subscript𝑟0subscript^𝑚0^subscript𝜂0subscript𝑉0subscript𝑟0subscript𝑚0subscript𝜂0V_{0}(r_{0};\widehat{m}_{0},\widehat{\eta_{0}})\overset{\mathbb{P}}{\to}V_{0}(% r_{0};m_{0},\eta_{0})italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) overblackboard_P start_ARG → end_ARG italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ; italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ).

Then

(I(T=0,R=0)(T=0,R=0){m0(rw,𝑿)m0(r^j,𝑿)})=O(Rn,j+k=0K1wkRn,k+n1/2),𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0subscript𝑚0subscript𝑟superscript𝑤𝑿subscript𝑚0subscript^𝑟𝑗𝑿subscript𝑂superscriptsubscript𝑅𝑛𝑗superscriptsubscript𝑘0𝐾1subscript𝑤𝑘superscriptsubscript𝑅𝑛𝑘superscript𝑛12\mathbb{P}\left(\frac{I(T=0,R=0)}{\mathbb{P}(T=0,R=0)}\left\{m_{0}(r_{w^{*}},% \boldsymbol{X})-m_{0}(\widehat{r}_{j},\boldsymbol{X})\right\}\right)=O_{% \mathbb{P}}\left(R_{n,j}^{*}+\sum_{k=0}^{K-1}w_{k}R_{n,k}^{*}+n^{-1/2}\right),blackboard_P ( divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG { italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , bold_italic_X ) - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_X ) } ) = italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_R start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) ,

for all j𝒮{0}𝑗superscript𝒮0j\in\mathcal{S}^{*}\cup\{0\}italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ { 0 }.

Proof.

Observe that, for j𝒮{0}𝑗superscript𝒮0j\in\mathcal{S}^{*}\cup\{0\}italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ { 0 },

|(I(T=0,R=0)(T=0,R=0){m0(rw,𝑿)m0(r^j,𝑿)})|=|F0(rw)F0(r^j)||rwr^j|,𝐼formulae-sequence𝑇0𝑅0formulae-sequence𝑇0𝑅0subscript𝑚0subscript𝑟superscript𝑤𝑿subscript𝑚0subscript^𝑟𝑗𝑿subscript𝐹0subscript𝑟superscript𝑤subscript𝐹0subscript^𝑟𝑗less-than-or-similar-tosubscript𝑟superscript𝑤subscript^𝑟𝑗\displaystyle\left|\mathbb{P}\left(\frac{I(T=0,R=0)}{\mathbb{P}(T=0,R=0)}\left% \{m_{0}(r_{w^{*}},\boldsymbol{X})-m_{0}(\widehat{r}_{j},\boldsymbol{X})\right% \}\right)\right|=\left|F_{0}(r_{w^{*}})-F_{0}(\widehat{r}_{j})\right|\lesssim|% r_{w^{*}}-\widehat{r}_{j}|,| blackboard_P ( divide start_ARG italic_I ( italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( italic_T = 0 , italic_R = 0 ) end_ARG { italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , bold_italic_X ) - italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_X ) } ) | = | italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) - italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) | ≲ | italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | , (9)

by condition (i). Since

|rwr^j||rwr0|+|r^jr0||r^jr0|+k=0K1wk|r^kr0|,subscript𝑟superscript𝑤subscript^𝑟𝑗subscript𝑟superscript𝑤subscript𝑟0subscript^𝑟𝑗subscript𝑟0subscript^𝑟𝑗subscript𝑟0superscriptsubscript𝑘0𝐾1subscript𝑤𝑘subscript^𝑟𝑘subscript𝑟0|r_{w^{*}}-\widehat{r}_{j}|\leq|r_{w^{*}}-r_{0}|+|\widehat{r}_{j}-r_{0}|\leq|% \widehat{r}_{j}-r_{0}|+\sum_{k=0}^{K-1}w_{k}|\widehat{r}_{k}-r_{0}|,| italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | ≤ | italic_r start_POSTSUBSCRIPT italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | + | over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | ≤ | over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | + ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | , (10)

it suffices to analyze |r^jr0|subscript^𝑟𝑗subscript𝑟0|\widehat{r}_{j}-r_{0}|| over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | for each j𝒮{0}𝑗superscript𝒮0j\in\mathcal{S}^{*}\cup\{0\}italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ { 0 }.

As the function classes {φj(;r,m^0,m^j,ω^j,0):r}conditional-setsubscript𝜑𝑗𝑟subscript^𝑚0subscript^𝑚𝑗subscript^𝜔𝑗0𝑟\{\varphi_{j}(\,\cdot\,;r,\widehat{m}_{0},\widehat{m}_{j},\widehat{\omega}_{j,% 0}):r\in\mathbb{R}\}{ italic_φ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( ⋅ ; italic_r , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_j , 0 end_POSTSUBSCRIPT ) : italic_r ∈ blackboard_R } and {φ0(;r,m^0,η^0):r}conditional-setsubscript𝜑0𝑟subscript^𝑚0subscript^𝜂0𝑟\{\varphi_{0}(\,\cdot\,;r,\widehat{m}_{0},\widehat{\eta}_{0}):r\in\mathbb{R}\}{ italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( ⋅ ; italic_r , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) : italic_r ∈ blackboard_R } are Donsker under the assumptions of Theorem 2.9, conditions (ii) and (iii) permit application of Lemma 3 of Kennedy et al. (2023) to obtain

r^jrj=O(n1/2+Rn,j),subscript^𝑟𝑗subscript𝑟𝑗subscript𝑂superscript𝑛12superscriptsubscript𝑅𝑛𝑗\widehat{r}_{j}-r_{j}=O_{\mathbb{P}}(n^{-1/2}+R_{n,j}^{*}),over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_O start_POSTSUBSCRIPT blackboard_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT + italic_R start_POSTSUBSCRIPT italic_n , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ,

for all j𝒮{0}𝑗superscript𝒮0j\in\mathcal{S}^{*}\cup\{0\}italic_j ∈ caligraphic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ { 0 }. Combining this with (9) and (10) yields the result. ∎

A.5 Details of Algorithm 1

In this Appendix, we present all details of Algorithm 1 in Section 2 in the following algorithm table (Algorithm 2).

Algorithm 2 Robust multi-source conformal prediction (complete version of Algorithm 1)
1:  Input: Training data 𝒟={𝒪i=(𝑿i,Ri,RiYi,Ti),i=1,,n}𝒟formulae-sequencesubscript𝒪𝑖subscript𝑿𝑖subscript𝑅𝑖subscript𝑅𝑖subscript𝑌𝑖subscript𝑇𝑖𝑖1𝑛\mathcal{D}=\{\mathcal{O}_{i}=(\boldsymbol{X}_{i},R_{i},R_{i}Y_{i},T_{i}),i=1,% \dots,n\}caligraphic_D = { caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i = 1 , … , italic_n }, where Ti{0,1,,K}subscript𝑇𝑖01𝐾T_{i}\in\{0,1,\dots,K\}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 , … , italic_K } with the target site indexed by T=0𝑇0T=0italic_T = 0 and source sites by T=k=1,,K1formulae-sequence𝑇𝑘1𝐾1T=k=1,\dots,K-1italic_T = italic_k = 1 , … , italic_K - 1; desired coverage probability 1α1𝛼1-\alpha1 - italic_α for an α(0,0.5)𝛼00.5\alpha\in(0,0.5)italic_α ∈ ( 0 , 0.5 ); estimators of the conditional putative cumulative distribution function mk(θ,𝑿)subscript𝑚𝑘𝜃𝑿m_{k}(\theta,\boldsymbol{X})italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ , bold_italic_X ) for the conformal score θ𝜃\thetaitalic_θ, ratio of the propensity score η0(𝑿)=(R=0𝑿,T=0)(R=1𝑿,T=0)subscript𝜂0𝑿𝑅conditional0𝑿𝑇0𝑅conditional1𝑿𝑇0\eta_{0}(\boldsymbol{X})=\dfrac{\mathbb{P}(R=0\mid\boldsymbol{X},T=0)}{\mathbb% {P}(R=1\mid\boldsymbol{X},T=0)}italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) = divide start_ARG blackboard_P ( italic_R = 0 ∣ bold_italic_X , italic_T = 0 ) end_ARG start_ARG blackboard_P ( italic_R = 1 ∣ bold_italic_X , italic_T = 0 ) end_ARG for the target site, and the density ratio ωk,0(𝑿)=(𝑿T=0,R=0)(𝑿T=k,R=1)subscript𝜔𝑘0𝑿formulae-sequenceconditional𝑿𝑇0𝑅0formulae-sequenceconditional𝑿𝑇𝑘𝑅1\omega_{k,0}(\boldsymbol{X})=\dfrac{\mathbb{P}(\boldsymbol{X}\mid T=0,R=0)}{% \mathbb{P}(\boldsymbol{X}\mid T=k,R=1)}italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( bold_italic_X ) = divide start_ARG blackboard_P ( bold_italic_X ∣ italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( bold_italic_X ∣ italic_T = italic_k , italic_R = 1 ) end_ARG for sites k=1,,K1𝑘1𝐾1k=1,\dots,K-1italic_k = 1 , … , italic_K - 1, denoted by m^k(θ^,𝑿)subscript^𝑚𝑘^𝜃𝑿\widehat{m}_{k}(\widehat{\theta},\boldsymbol{X})over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG , bold_italic_X ), η^0(𝑿)subscript^𝜂0𝑿\widehat{\eta}_{0}(\boldsymbol{X})over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) and ω^k,0(𝑿)subscript^𝜔𝑘0𝑿\widehat{\omega}_{k,0}(\boldsymbol{X})over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( bold_italic_X ) (where θ^^𝜃\widehat{\theta}over^ start_ARG italic_θ end_ARG is the estimated conformal score), respectively; a tuning parameter λ𝜆\lambdaitalic_λ (in the optimization step); a testing point 𝑿=𝒙𝑿𝒙\boldsymbol{X}=\boldsymbol{x}bold_italic_X = bold_italic_x from the target site.
2:  Output: A valid prediction set C^α(𝒙)subscript^𝐶𝛼𝒙\widehat{C}_{\alpha}(\boldsymbol{x})over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_italic_x ).
3:  Split the training data 𝒟𝒟\mathcal{D}caligraphic_D randomly into 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒟2subscript𝒟2\mathcal{D}_{2}caligraphic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, where 𝒟j={𝒪i𝒟,ij}subscript𝒟𝑗formulae-sequencesubscript𝒪𝑖𝒟𝑖subscript𝑗\mathcal{D}_{j}=\{\mathcal{O}_{i}\in\mathcal{D},i\in\mathcal{I}_{j}\}caligraphic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D , italic_i ∈ caligraphic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } for j=1,2𝑗12j=1,2italic_j = 1 , 2 and 12={1,2,,n}subscript1subscript212𝑛\mathcal{I}_{1}\cup\mathcal{I}_{2}=\{1,2,\dots,n\}caligraphic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { 1 , 2 , … , italic_n }.
4:  Fit nuisance functions m^ksubscript^𝑚𝑘\widehat{m}_{k}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ω^k,0subscript^𝜔𝑘0\widehat{\omega}_{k,0}over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT on 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and predict them on 𝒟2subscript𝒟2\mathcal{D}_{2}caligraphic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.
5:  For the target site k=0𝑘0k=0italic_k = 0, find the smallest θ^=r^0^𝜃subscript^𝑟0\widehat{\theta}=\widehat{r}_{0}over^ start_ARG italic_θ end_ARG = over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that
00\displaystyle 0 =1|2|i2[I(Ti=0,Ri=0)^(Ti=0,Ri=0){m^0(θ^,𝑿i)(1α)}+I(Ti=0,Ri=1)^(Ti=0,Ri=1)η^0(𝑿i){I(S(𝑿i,Yi)θ^)m^0(θ^,𝑿i)}φ0(𝒪i;θ^,m^0,η^0)].absent1subscript2subscript𝑖subscript2delimited-[]subscript𝐼formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖0^formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖0subscript^𝑚0^𝜃subscript𝑿𝑖1𝛼𝐼formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖1^formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖1subscript^𝜂0subscript𝑿𝑖𝐼𝑆subscript𝑿𝑖subscript𝑌𝑖^𝜃subscript^𝑚0^𝜃subscript𝑿𝑖subscript𝜑0subscript𝒪𝑖^𝜃subscript^𝑚0subscript^𝜂0\displaystyle=\frac{1}{|\mathcal{I}_{2}|}\sum_{i\in\mathcal{I}_{2}}\Bigg{[}% \underbrace{\frac{I(T_{i}=0,R_{i}=0)}{\widehat{\mathbb{P}}(T_{i}=0,R_{i}=0)}\{% \widehat{m}_{0}(\widehat{\theta},\boldsymbol{X}_{i})-(1-\alpha)\}+\frac{I(T_{i% }=0,R_{i}=1)}{\widehat{\mathbb{P}}(T_{i}=0,R_{i}=1)}\widehat{\eta}_{0}(% \boldsymbol{X}_{i})\{I(S(\boldsymbol{X}_{i},Y_{i})\leq\widehat{\theta})-% \widehat{m}_{0}(\widehat{\theta},\boldsymbol{X}_{i})\}}_{\varphi_{0}(\mathcal{% O}_{i};\widehat{\theta},\widehat{m}_{0},\widehat{\eta}_{0})}\Bigg{]}.= divide start_ARG 1 end_ARG start_ARG | caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ under⏟ start_ARG divide start_ARG italic_I ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) end_ARG start_ARG over^ start_ARG blackboard_P end_ARG ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) end_ARG { over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - ( 1 - italic_α ) } + divide start_ARG italic_I ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) end_ARG start_ARG over^ start_ARG blackboard_P end_ARG ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) end_ARG over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) { italic_I ( italic_S ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ over^ start_ARG italic_θ end_ARG ) - over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } end_ARG start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_θ end_ARG , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ] .
6:  For source sites k1𝑘1k\geq 1italic_k ≥ 1, find the smallest θ^=r^k^𝜃subscript^𝑟𝑘\widehat{\theta}=\widehat{r}_{k}over^ start_ARG italic_θ end_ARG = over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that solves
00\displaystyle 0 =1|2|i2[I(Ti=0,Ri=0)^(Ti=0,Ri=0){m^0(θ^,𝑿i)(1α)}+I(Ti=k,Ri=1)^(Ti=k,Ri=1)ω^k,0(𝑿i){I(S(𝑿i,Yi)θ^)m^k(θ^,𝑿i)}φk(𝒪i;θ^,m^0,m^k,ω^k,0)].absent1subscript2subscript𝑖subscript2delimited-[]subscript𝐼formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖0^formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖0subscript^𝑚0^𝜃subscript𝑿𝑖1𝛼𝐼formulae-sequencesubscript𝑇𝑖𝑘subscript𝑅𝑖1^formulae-sequencesubscript𝑇𝑖𝑘subscript𝑅𝑖1subscript^𝜔𝑘0subscript𝑿𝑖𝐼𝑆subscript𝑿𝑖subscript𝑌𝑖^𝜃subscript^𝑚𝑘^𝜃subscript𝑿𝑖subscript𝜑𝑘subscript𝒪𝑖^𝜃subscript^𝑚0subscript^𝑚𝑘subscript^𝜔𝑘0\displaystyle=\frac{1}{|\mathcal{I}_{2}|}\sum_{i\in\mathcal{I}_{2}}\Bigg{[}% \underbrace{\frac{I(T_{i}=0,R_{i}=0)}{\widehat{\mathbb{P}}(T_{i}=0,R_{i}=0)}\{% \widehat{m}_{0}(\widehat{\theta},\boldsymbol{X}_{i})-(1-\alpha)\}+\frac{I(T_{i% }=k,R_{i}=1)}{\widehat{\mathbb{P}}(T_{i}=k,R_{i}=1)}\widehat{\omega}_{k,0}(% \boldsymbol{X}_{i})\{I(S(\boldsymbol{X}_{i},Y_{i})\leq\widehat{\theta})-% \widehat{m}_{k}(\widehat{\theta},\boldsymbol{X}_{i})\}}_{\varphi_{k}(\mathcal{% O}_{i};\widehat{\theta},\widehat{m}_{0},\widehat{m}_{k},\widehat{\omega}_{k,0}% )}\Bigg{]}.= divide start_ARG 1 end_ARG start_ARG | caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ under⏟ start_ARG divide start_ARG italic_I ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) end_ARG start_ARG over^ start_ARG blackboard_P end_ARG ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) end_ARG { over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - ( 1 - italic_α ) } + divide start_ARG italic_I ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) end_ARG start_ARG over^ start_ARG blackboard_P end_ARG ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) end_ARG over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) { italic_I ( italic_S ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ over^ start_ARG italic_θ end_ARG ) - over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } end_ARG start_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_θ end_ARG , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ] .
Compute χ^k=|r^0r^k|subscript^𝜒𝑘subscript^𝑟0subscript^𝑟𝑘\widehat{\chi}_{k}=|\widehat{r}_{0}-\widehat{r}_{k}|over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = | over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT |.
7:  Solve for weights 𝒘^=(w^0,w^1,w^K1)^𝒘subscript^𝑤0subscript^𝑤1subscript^𝑤𝐾1\widehat{\boldsymbol{w}}=(\widehat{w}_{0},\widehat{w}_{1},\dots\widehat{w}_{K-% 1})over^ start_ARG bold_italic_w end_ARG = ( over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_K - 1 end_POSTSUBSCRIPT ) that minimize
Q(𝒘)𝑄𝒘\displaystyle Q(\boldsymbol{w})italic_Q ( bold_italic_w ) =1|2|i2[k=1K1wk{φ0(𝒪i;r^0,m^0,η^0)φk(𝒪i;r^0,m^0,m^k,ω^k,0)}φ0(𝒪i;r^0,m^0,η^0)]2+1|2|λk=1K1wkχ^k2,absent1subscript2subscript𝑖subscript2superscriptdelimited-[]superscriptsubscript𝑘1𝐾1subscript𝑤𝑘subscript𝜑0subscript𝒪𝑖subscript^𝑟0subscript^𝑚0subscript^𝜂0subscript𝜑𝑘subscript𝒪𝑖subscript^𝑟0subscript^𝑚0subscript^𝑚𝑘subscript^𝜔𝑘0subscript𝜑0subscript𝒪𝑖subscript^𝑟0subscript^𝑚0subscript^𝜂021subscript2𝜆superscriptsubscript𝑘1𝐾1subscript𝑤𝑘superscriptsubscript^𝜒𝑘2\displaystyle=\frac{1}{|\mathcal{I}_{2}|}\sum_{i\in\mathcal{I}_{2}}\Bigg{[}% \sum_{k=1}^{K-1}w_{k}\{\varphi_{0}(\mathcal{O}_{i};\widehat{r}_{0},\widehat{m}% _{0},\widehat{\eta}_{0})-\varphi_{k}(\mathcal{O}_{i};\widehat{r}_{0},\widehat{% m}_{0},\widehat{m}_{k},\widehat{\omega}_{k,0})\}-\varphi_{0}(\mathcal{O}_{i};% \widehat{r}_{0},\widehat{m}_{0},\widehat{\eta}_{0})\Bigg{]}^{2}+\frac{1}{|% \mathcal{I}_{2}|}\lambda\sum_{k=1}^{K-1}w_{k}\widehat{\chi}_{k}^{2},= divide start_ARG 1 end_ARG start_ARG | caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT { italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_φ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ) } - italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG | caligraphic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG italic_λ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_χ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
subject to 0wk10subscript𝑤𝑘10\leq w_{k}\leq 10 ≤ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 1 and k=0K1wk=1superscriptsubscript𝑘0𝐾1subscript𝑤𝑘1\displaystyle\sum_{k=0}^{K-1}w_{k}=1∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1.
8:  Compute θ^=r^0,fed=k=0K1w^kr^k^𝜃subscript^𝑟0fedsuperscriptsubscript𝑘0𝐾1subscript^𝑤𝑘subscript^𝑟𝑘\widehat{\theta}=\widehat{r}_{0,\text{fed}}=\displaystyle\sum_{k=0}^{K-1}% \widehat{w}_{k}\widehat{r}_{k}over^ start_ARG italic_θ end_ARG = over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 , fed end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT over^ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.
9:  Return: The prediction set C^α(𝒙)={y:S(𝒙,y)r^0,fed}subscript^𝐶𝛼𝒙conditional-set𝑦𝑆𝒙𝑦subscript^𝑟0fed\widehat{C}_{\alpha}(\boldsymbol{x})=\{y:S(\boldsymbol{x},y)\leq\widehat{r}_{0% ,\text{fed}}\}over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( bold_italic_x ) = { italic_y : italic_S ( bold_italic_x , italic_y ) ≤ over^ start_ARG italic_r end_ARG start_POSTSUBSCRIPT 0 , fed end_POSTSUBSCRIPT }.

Below, we also present all relevant details about estimating influence functions.

Algorithm 3 Estimation of influence functions
1:  Input: Training data 𝒟={𝒪i=(𝑿i,Ri,RiYi,Ti),i=1,,n}𝒟formulae-sequencesubscript𝒪𝑖subscript𝑿𝑖subscript𝑅𝑖subscript𝑅𝑖subscript𝑌𝑖subscript𝑇𝑖𝑖1𝑛\mathcal{D}=\{\mathcal{O}_{i}=(\boldsymbol{X}_{i},R_{i},R_{i}Y_{i},T_{i}),i=1,% \dots,n\}caligraphic_D = { caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i = 1 , … , italic_n }, where Ti{0,1,,K}subscript𝑇𝑖01𝐾T_{i}\in\{0,1,\dots,K\}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 , … , italic_K } with the target site indexed by T=0𝑇0T=0italic_T = 0 and source sites by T=k=1,,K1formulae-sequence𝑇𝑘1𝐾1T=k=1,\dots,K-1italic_T = italic_k = 1 , … , italic_K - 1.
2:  Desired coverage probability 1α1𝛼1-\alpha1 - italic_α for an α(0,0.5)𝛼00.5\alpha\in(0,0.5)italic_α ∈ ( 0 , 0.5 ).
3:  Output: Estimates of the target site influence function φ0(𝒪i;θ^,m^0,η^0)subscript𝜑0subscript𝒪𝑖^𝜃subscript^𝑚0subscript^𝜂0\varphi_{0}(\mathcal{O}_{i};\widehat{\theta},\widehat{m}_{0},\widehat{\eta}_{0})italic_φ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_θ end_ARG , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and the source site influence functions φk(𝒪i;θ^,m^0,m^k,ω^k,0)subscript𝜑𝑘subscript𝒪𝑖^𝜃subscript^𝑚0subscript^𝑚𝑘subscript^𝜔𝑘0\varphi_{k}(\mathcal{O}_{i};\widehat{\theta},\widehat{m}_{0},\widehat{m}_{k},% \widehat{\omega}_{k,0})italic_φ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_θ end_ARG , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ), k=1,,K1𝑘1𝐾1k=1,...,K-1italic_k = 1 , … , italic_K - 1.
4:  Randomly split the training data 𝒟𝒟\mathcal{D}caligraphic_D into two equal-sized folds 𝒟1𝒟2subscript𝒟1subscript𝒟2\mathcal{D}_{1}\cup\mathcal{D}_{2}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∪ caligraphic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.
5:  On the first split 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, fit models to estimate the following nuisance functions via any arbitrary regression model or density ratio model (nonparametric, semiparametric, or parametric):
  • Conditional CDF in the target site m0(θ,𝑿)subscript𝑚0𝜃𝑿m_{0}(\theta,\boldsymbol{X})italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_θ , bold_italic_X ) across a range of values θ𝜃\thetaitalic_θ for observations with observed Y𝑌Yitalic_Y (R=1𝑅1R=1italic_R = 1): estimate is m^0subscript^𝑚0\hat{m}_{0}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT

  • Conditional CDF mk(θ,𝑿)subscript𝑚𝑘𝜃𝑿m_{k}(\theta,\boldsymbol{X})italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ , bold_italic_X ) in source site k𝑘kitalic_k, k=1,,K1𝑘1𝐾1k=1,...,K-1italic_k = 1 , … , italic_K - 1, across a range of values θ𝜃\thetaitalic_θ for observations with observed Y𝑌Yitalic_Y (R=1𝑅1R=1italic_R = 1): estimate is m^ksubscript^𝑚𝑘\hat{m}_{k}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

  • Ratio of the missingness propensity score η0(𝑿)=(R=0𝑿,T=0)(R=1𝑿,T=0)subscript𝜂0𝑿𝑅conditional0𝑿𝑇0𝑅conditional1𝑿𝑇0\eta_{0}(\boldsymbol{X})=\dfrac{\mathbb{P}(R=0\mid\boldsymbol{X},T=0)}{\mathbb% {P}(R=1\mid\boldsymbol{X},T=0)}italic_η start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X ) = divide start_ARG blackboard_P ( italic_R = 0 ∣ bold_italic_X , italic_T = 0 ) end_ARG start_ARG blackboard_P ( italic_R = 1 ∣ bold_italic_X , italic_T = 0 ) end_ARG for the target site: estimate is η^0subscript^𝜂0\hat{\eta}_{0}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT

  • Density ratio ωk,0(𝑿)=(𝑿T=0,R=0)(𝑿T=k,R=1)subscript𝜔𝑘0𝑿formulae-sequenceconditional𝑿𝑇0𝑅0formulae-sequenceconditional𝑿𝑇𝑘𝑅1\omega_{k,0}(\boldsymbol{X})=\dfrac{\mathbb{P}(\boldsymbol{X}\mid T=0,R=0)}{% \mathbb{P}(\boldsymbol{X}\mid T=k,R=1)}italic_ω start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( bold_italic_X ) = divide start_ARG blackboard_P ( bold_italic_X ∣ italic_T = 0 , italic_R = 0 ) end_ARG start_ARG blackboard_P ( bold_italic_X ∣ italic_T = italic_k , italic_R = 1 ) end_ARG for sites k=1,,K1𝑘1𝐾1k=1,\dots,K-1italic_k = 1 , … , italic_K - 1: estimate is ω^k,0subscript^𝜔𝑘0\hat{\omega}_{k,0}over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT

We recommend using SuperLearner with the base learners being random forest, elastic net, and generalized linear model (GLM) for the first three nuisance functions and exponential tilting to estimate the density ratio model.
6:  On the second split 𝒟2subscript𝒟2\mathcal{D}_{2}caligraphic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, predict the nuisance functions using the models learned (m^ksubscript^𝑚𝑘\hat{m}_{k}over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, η^0subscript^𝜂0\hat{\eta}_{0}over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, ω^k,0subscript^𝜔𝑘0\hat{\omega}_{k,0}over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT) from the first split 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.
7:  For the target site k=0𝑘0k=0italic_k = 0, estimate the influence function as
φ^0(𝒪i;θ^,m^0,η^0)=I(Ti=0,Ri=0)^(Ti=0,Ri=0){m^0(θ^,𝑿i)(1α)}+I(Ti=0,Ri=1)^(Ti=0,Ri=1)η^0(𝑿i){I(S(𝑿i,Yi)θ^)m^0(θ^,𝑿i)}subscript^𝜑0subscript𝒪𝑖^𝜃subscript^𝑚0subscript^𝜂0𝐼formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖0^formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖0subscript^𝑚0^𝜃subscript𝑿𝑖1𝛼𝐼formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖1^formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖1subscript^𝜂0subscript𝑿𝑖𝐼𝑆subscript𝑿𝑖subscript𝑌𝑖^𝜃subscript^𝑚0^𝜃subscript𝑿𝑖\widehat{\varphi}_{0}(\mathcal{O}_{i};\widehat{\theta},\widehat{m}_{0},% \widehat{\eta}_{0})=\frac{I(T_{i}=0,R_{i}=0)}{\widehat{\mathbb{P}}(T_{i}=0,R_{% i}=0)}\{\widehat{m}_{0}(\widehat{\theta},\boldsymbol{X}_{i})-(1-\alpha)\}+% \frac{I(T_{i}=0,R_{i}=1)}{\widehat{\mathbb{P}}(T_{i}=0,R_{i}=1)}\widehat{\eta}% _{0}(\boldsymbol{X}_{i})\{I(S(\boldsymbol{X}_{i},Y_{i})\leq\widehat{\theta})-% \widehat{m}_{0}(\widehat{\theta},\boldsymbol{X}_{i})\}over^ start_ARG italic_φ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_θ end_ARG , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = divide start_ARG italic_I ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) end_ARG start_ARG over^ start_ARG blackboard_P end_ARG ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) end_ARG { over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - ( 1 - italic_α ) } + divide start_ARG italic_I ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) end_ARG start_ARG over^ start_ARG blackboard_P end_ARG ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) end_ARG over^ start_ARG italic_η end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) { italic_I ( italic_S ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ over^ start_ARG italic_θ end_ARG ) - over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) }
8:  For each of the source sites k=1,,K1𝑘1𝐾1k=1,...,K-1italic_k = 1 , … , italic_K - 1, estimate the influence functions as
φ^k(𝒪i;θ^,m^0,m^k,ω^k,0)subscript^𝜑𝑘subscript𝒪𝑖^𝜃subscript^𝑚0subscript^𝑚𝑘subscript^𝜔𝑘0\displaystyle\widehat{\varphi}_{k}(\mathcal{O}_{i};\widehat{\theta},\widehat{m% }_{0},\widehat{m}_{k},\widehat{\omega}_{k,0})over^ start_ARG italic_φ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( caligraphic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_θ end_ARG , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ) =I(Ti=0,Ri=0)^(Ti=0,Ri=0){m^0(θ^,𝑿i)(1α)}absent𝐼formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖0^formulae-sequencesubscript𝑇𝑖0subscript𝑅𝑖0subscript^𝑚0^𝜃subscript𝑿𝑖1𝛼\displaystyle=\frac{I(T_{i}=0,R_{i}=0)}{\widehat{\mathbb{P}}(T_{i}=0,R_{i}=0)}% \{\widehat{m}_{0}(\widehat{\theta},\boldsymbol{X}_{i})-(1-\alpha)\}= divide start_ARG italic_I ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) end_ARG start_ARG over^ start_ARG blackboard_P end_ARG ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ) end_ARG { over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - ( 1 - italic_α ) }
+I(Ti=k,Ri=1)^(Ti=k,Ri=1)ω^k,0(𝑿i){I(S(𝑿i,Yi)θ^)m^k(θ^,𝑿i)}𝐼formulae-sequencesubscript𝑇𝑖𝑘subscript𝑅𝑖1^formulae-sequencesubscript𝑇𝑖𝑘subscript𝑅𝑖1subscript^𝜔𝑘0subscript𝑿𝑖𝐼𝑆subscript𝑿𝑖subscript𝑌𝑖^𝜃subscript^𝑚𝑘^𝜃subscript𝑿𝑖\displaystyle+\frac{I(T_{i}=k,R_{i}=1)}{\widehat{\mathbb{P}}(T_{i}=k,R_{i}=1)}% \widehat{\omega}_{k,0}(\boldsymbol{X}_{i})\{I(S(\boldsymbol{X}_{i},Y_{i})\leq% \widehat{\theta})-\widehat{m}_{k}(\widehat{\theta},\boldsymbol{X}_{i})\}+ divide start_ARG italic_I ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) end_ARG start_ARG over^ start_ARG blackboard_P end_ARG ( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k , italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) end_ARG over^ start_ARG italic_ω end_ARG start_POSTSUBSCRIPT italic_k , 0 end_POSTSUBSCRIPT ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) { italic_I ( italic_S ( bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ over^ start_ARG italic_θ end_ARG ) - over^ start_ARG italic_m end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG , bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) }
9:  Return: Estimate of target site influence function φ^0subscript^𝜑0\widehat{\varphi}_{0}over^ start_ARG italic_φ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and source site influence functions φ^ksubscript^𝜑𝑘\widehat{\varphi}_{k}over^ start_ARG italic_φ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, k=1,,K1𝑘1𝐾1k=1,...,K-1italic_k = 1 , … , italic_K - 1.

Appendix B Additional Simulation Results

B.1 An experiment of sample size vs. interval width

We first conducted an experiment to assess the relationship between the sample size of the target site vs. the coverage and width of prediction interval. We use only the set-up of target site’s DGP but consider two generations for outcomes: the homogeneous variance with ε(x)𝒩(0,1)similar-to𝜀𝑥𝒩01\varepsilon(x)\sim\mathcal{N}(0,1)italic_ε ( italic_x ) ∼ caligraphic_N ( 0 , 1 ) and heterogeneous variance with ε(x)𝒩(0,[log(x)]2)similar-to𝜀𝑥𝒩0superscriptdelimited-[]𝑥2\varepsilon(x)\sim\mathcal{N}(0,[\log(x)]^{2})italic_ε ( italic_x ) ∼ caligraphic_N ( 0 , [ roman_log ( italic_x ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) for ε(Xi)𝜀subscript𝑋𝑖\varepsilon(X_{i})italic_ε ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) defined in (4). Under both cases, the oracle width of a 90%percent9090\%90 % prediction interval for the outcome is 2×z0.95𝔼{σ(Xi)}3.292subscript𝑧0.95𝔼𝜎subscript𝑋𝑖3.292\times z_{0.95}\mathbb{E}\{\sigma(X_{i})\}\approx 3.292 × italic_z start_POSTSUBSCRIPT 0.95 end_POSTSUBSCRIPT blackboard_E { italic_σ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } ≈ 3.29, where z0.95=1.645subscript𝑧0.951.645z_{0.95}=1.645italic_z start_POSTSUBSCRIPT 0.95 end_POSTSUBSCRIPT = 1.645 is the 95th percentile of the standard normal distribution. In addition, note that 𝔼{σ(Xi)}=01σ(x)𝑑x=1𝔼𝜎subscript𝑋𝑖superscriptsubscript01𝜎𝑥differential-d𝑥1\mathbb{E}\{\sigma(X_{i})\}=\int_{0}^{1}\sigma(x)dx=1blackboard_E { italic_σ ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_σ ( italic_x ) italic_d italic_x = 1 for both σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1 and σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x ) (see also (Lei & Candès, 2021)). Figure 4 shows the boxplots of interval widths in 500 Monte Carlo simulations. As we can see, the interval width converges to its oracle faster when the variance is homogeneous, by all 3 conformal scores. We can also note that using ASR as the conformal score has an essential bias to oracle width under heterogeneous width, even if the sample size is large enough, while the other two conformal scores are more robust.

Refer to caption
Figure 4: Boxplots of prediction interval widths

B.2 Complete simulation details and results of coverage probability and interval width, by all sample sizes, variance assumptions, and covariate and outcome distributions across sites

In this Appendix, we specify more details in our data generating process and competing methods in Section 3. In the complete simulation study, we consider 6 methods for constructing prediction interval C^α(x)subscript^𝐶𝛼𝑥\widehat{C}_{\alpha}(x)over^ start_ARG italic_C end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( italic_x ), where 3 of them (federated (ours), pooled sample and target only) have been described in Section 3.1. The additional 3 methods are equal weights, i.e., equally weighting each source site by 1/(K+1)1𝐾11/(K+1)1 / ( italic_K + 1 ) (here =0.2absent0.2=0.2= 0.2), and two alternative Federated weights, i.e., Federated I and III shown below.

  • Federated I: when solving Step 7 in Algorithm 1, set the limit of weight on each source site by wk[0,1],k=1,,Kformulae-sequencesubscript𝑤𝑘01𝑘1𝐾w_{k}\in[0,1],k=1,\dots,Kitalic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ 0 , 1 ] , italic_k = 1 , … , italic_K, and then the weight of site 0 is w0=1k=1K1wksubscript𝑤01superscriptsubscript𝑘1𝐾1subscript𝑤𝑘w_{0}=1-\displaystyle\sum_{k=1}^{K-1}w_{k}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

  • Federated II (ours, and the one in main text): when solving Step 7 in Algorithm 1, set the limit of weight on each source site by wk[0,1],k=1,,Kformulae-sequencesubscript𝑤𝑘01𝑘1𝐾w_{k}\in[0,1],k=1,\dots,Kitalic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ 0 , 1 ] , italic_k = 1 , … , italic_K, and let wk=wk×K/(K+1)=0.8wksuperscriptsubscript𝑤𝑘subscript𝑤𝑘𝐾𝐾10.8subscript𝑤𝑘w_{k}^{*}=w_{k}\times{K}/{(K+1)}=0.8w_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_K / ( italic_K + 1 ) = 0.8 italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (here K=4𝐾4K=4italic_K = 4), and use wksuperscriptsubscript𝑤𝑘w_{k}^{*}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as the weight of site k𝑘kitalic_k. Then, w0=1k=1K1wksubscript𝑤01superscriptsubscript𝑘1𝐾1superscriptsubscript𝑤𝑘w_{0}=1-\displaystyle\sum_{k=1}^{K-1}w_{k}^{*}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the weight of site 0. In this case, w01/(K+1)=0.2subscript𝑤01𝐾10.2w_{0}\geq 1/(K+1)=0.2italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≥ 1 / ( italic_K + 1 ) = 0.2 in most replications.

  • Federated III: setting the limit of weight on each source site by wk[0,1/(K+1)]=[0,0.2],k=2,,sformulae-sequencesubscript𝑤𝑘01𝐾100.2𝑘2𝑠w_{k}\in[0,1/(K+1)]=[0,0.2],k=2,\dots,sitalic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ 0 , 1 / ( italic_K + 1 ) ] = [ 0 , 0.2 ] , italic_k = 2 , … , italic_s, and then w0=1k=1K1wksubscript𝑤01superscriptsubscript𝑘1𝐾1subscript𝑤𝑘w_{0}=1-\displaystyle\sum_{k=1}^{K-1}w_{k}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for site 0. In this case, w01/(K+1)subscript𝑤01𝐾1w_{0}\geq 1/(K+1)italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≥ 1 / ( italic_K + 1 ) holds, and thus we always weight the target site the most.

In addition, in this Appendix, we present all simulation findings on coverage probabilities and interval widths via both numerical and visualized results in Tables 19, and Figures 510. We comment, in the following, on patterns and trends we found from these results.

First, only when CCOD holds does the pooled sample method perform well, where the coverage is close to the nominal level 0.90.90.90.9 and it is the most efficient one, having the shortest box width (except for CQR under heteroscedastcity). These results make sense as the pooled sample has a larger sample size and when CCOD holds for all sites, all data are directly useful for predicting Y𝑌Yitalic_Y from the target site. However, it can also easily fail when CCOD is violated, either weakly or strongly in our simulation. Compared to other methods, pooled sample can be substantially more sensitive to such violations, which often results in very conservative and wide interval estimations (e.g., from Table 2, the interval widths by pooled sample for ASR under weakly and strongly violations of CCOD are, respectively, 11.1911.1911.1911.19 and 31.9531.9531.9531.95 (for homoscedasticity), and 11.2711.2711.2711.27 and 31.9131.9131.9131.91 (for heteroscedasticity), which exceeds the oracle width 3.293.293.293.29 substantially, while other methods always have widths in the range [3.20,4.10]3.204.10[3.20,4.10][ 3.20 , 4.10 ]. This illustrates that pooling samples from all source sites is not a good strategy in general, especially when there are differences among sites.

Moreover, the equal weights method can also result in large biases when the distributions of covariates across sites are either weakly or strongly heterogeneous. The biases increase with a stronger difference among covariate distributions. In some cases, it is also less efficient than the federated methods; for instance, in Figure 9, the boxes of the equal weights method are often wider than those of the three federated methods, as reflected in the corresponding interval width plot, Figure 10. Therefore, it can be biased and less efficient under heterogeneous covariate distributions.

Furthermore, among all methods, only federated weights I, II, and III performed well across settings and exhibited consistent patterns in the coverage probabilities and interval width. The boxplots of coverage probabilities by these federated weights are often situated around the nominal coverage level of 0.90.90.90.9, and the widths of these boxes are often shorter, indicating higher efficiency in their interval predictions.

Finally, among the three federated weights, there are slight differences with respect to different conformal scores. Federated I and II are less efficient for CQR under both heteroscedasticity and heterogeneous covariate distributions (weakly and strongly). In these cases, federated III for CQR is more efficient, although it is also slightly more conservative (though less biased than the equal weights method). Based on our simulation, we recommend Federated III for CQR, as it offers the optimal choice regarding the bias-variance trade-off among all competing methods. For other cases considered in our simulation, all three federated methods perform similarly and result in valid predictions.

To explore settings in which the propensity score for observing the outcome is allowed to vary more, we provide a comparison by allowing the range of the true propensity score to be in (0.4, 0.6) (panel (a)) and in (0.1, 0.9) (panel (b) of Figure 13. We see that when the propensity score is allowed to have a wider range, our method is even more promising (efficient) than when the propensity score is constrained in (0.4, 0.6). Panel (b) shows the Federated (ours) method provides the overall most efficient interval estimations in ASR and local ASR conformal scores, and the efficiency gains are more obvious than those in panel (a). For CQR, while pooling has the most efficient result, the Federated (ours) also performs well, and it is overall the optimal and safest choice among the three methods compared.

CFS CCOD CP s.d.(CP) wd s.d.(wd) CP s.d.(CP) wd s.d.(wd)
Homoscedasticity where σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1
Federated I Pooled sample
holds 0.893 0.038 3.27 0.35 0.900 0.018 3.30 0.16
ASR weakly violated 0.894 0.040 3.29 0.37 1.000 0.000 11.16 0.55
strongly violated 0.894 0.039 3.29 0.37 1.000 0.000 31.99 0.97
holds 0.904 0.033 3.38 0.36 0.901 0.023 3.32 0.22
Local ASR weakly violated 0.906 0.035 3.41 0.37 0.943 0.026 3.90 0.42
strongly violated 0.903 0.035 3.38 0.36 0.952 0.024 4.06 0.44
holds 0.923 0.031 3.61 0.40 0.902 0.025 3.34 0.24
CQR weakly violated 0.925 0.031 3.63 0.39 0.903 0.031 3.36 0.30
strongly violated 0.924 0.032 3.63 0.40 0.905 0.029 3.38 0.29
Federated II (ours) Target site only
holds 0.896 0.032 3.29 0.30 0.901 0.046 3.39 0.46
ASR weakly violated 0.898 0.032 3.31 0.31 0.900 0.045 3.38 0.47
strongly violated 0.896 0.034 3.30 0.32 0.894 0.052 3.32 0.48
holds 0.907 0.032 3.42 0.38 0.909 0.055 3.55 0.72
Local ASR weakly violated 0.909 0.032 3.44 0.38 0.908 0.057 3.56 0.78
strongly violated 0.905 0.032 3.39 0.35 0.900 0.060 3.44 0.63
holds 0.925 0.029 3.63 0.39 0.922 0.054 3.71 0.67
CQR weakly violated 0.926 0.030 3.65 0.39 0.917 0.060 3.66 0.70
strongly violated 0.925 0.031 3.63 0.39 0.917 0.059 3.66 0.67
Federated III Equal weights
holds 0.900 0.032 3.33 0.30 0.900 0.032 3.33 0.30
ASR weakly violated 0.901 0.032 3.34 0.31 0.901 0.032 3.34 0.31
strongly violated 0.900 0.033 3.34 0.32 0.900 0.033 3.34 0.32
holds 0.916 0.029 3.50 0.33 0.916 0.029 3.50 0.33
Local ASR weakly violated 0.915 0.029 3.50 0.33 0.915 0.029 3.50 0.33
strongly violated 0.914 0.030 3.49 0.32 0.914 0.030 3.49 0.32
holds 0.933 0.025 3.72 0.34 0.933 0.025 3.72 0.34
CQR weakly violated 0.933 0.026 3.72 0.34 0.933 0.026 3.72 0.34
strongly violated 0.933 0.026 3.73 0.35 0.933 0.026 3.73 0.35
Heteroscedasticity where σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x )
Federated I Pooled sample
holds 0.904 0.034 4.22 0.94 0.900 0.015 3.94 0.35
ASR weakly violated 0.903 0.033 4.20 0.98 0.991 0.002 11.23 0.45
strongly violated 0.907 0.033 4.31 1.06 1.000 0.000 31.85 1.00
holds 0.914 0.046 4.08 2.17 0.903 0.031 3.55 0.46
Local ASR weakly violated 0.914 0.048 4.24 2.55 0.915 0.032 3.74 0.49
strongly violated 0.909 0.048 4.14 3.42 0.924 0.027 3.85 0.42
holds 0.926 0.029 3.40 0.31 0.902 0.024 3.21 0.15
CQR weakly violated 0.926 0.029 3.39 0.26 0.904 0.025 3.22 0.16
strongly violated 0.928 0.029 3.42 0.34 0.905 0.026 3.23 0.16
Federated II (ours) Target site only
holds 0.905 0.030 4.21 0.87 0.898 0.042 4.16 1.21
ASR weakly violated 0.905 0.029 4.20 0.86 0.900 0.043 4.21 1.22
strongly violated 0.908 0.030 4.30 0.92 0.902 0.042 4.27 1.26
holds 0.915 0.044 4.14 2.50 0.909 0.061 4.37 4.93
Local ASR weakly violated 0.916 0.047 4.29 2.70 0.910 0.065 4.52 3.97
strongly violated 0.911 0.046 4.16 3.27 0.905 0.065 4.23 3.20
holds 0.928 0.028 3.41 0.31 0.912 0.070 3.45 0.58
CQR weakly violated 0.929 0.027 3.42 0.28 0.916 0.059 3.48 0.71
strongly violated 0.930 0.027 3.43 0.33 0.916 0.064 3.48 0.58
Federated III Equal weights
holds 0.893 0.066 3.39 0.66 0.900 0.028 3.32 0.28
holds 0.910 0.031 4.35 0.90 0.910 0.031 4.35 0.90
ASR weakly violated 0.908 0.030 4.31 0.95 0.908 0.030 4.31 0.95
strongly violated 0.912 0.031 4.43 1.01 0.912 0.031 4.43 1.01
holds 0.923 0.041 4.41 5.69 0.923 0.041 4.41 5.69
Local ASR weakly violated 0.924 0.044 4.38 2.54 0.924 0.044 4.38 2.54
strongly violated 0.919 0.043 4.32 3.50 0.919 0.043 4.32 3.50
holds 0.939 0.026 3.51 0.33 0.939 0.026 3.51 0.33
CQR weakly violated 0.940 0.025 3.52 0.30 0.940 0.025 3.52 0.30
strongly violated 0.941 0.025 3.54 0.34 0.941 0.025 3.54 0.34
  • CFS: conformal score; CCOD: common conditional outcome distribution

  • CP: coverage probability; wd: width; s.d.: standard deviation (over 500 replications)

Table 1: nk=300subscript𝑛𝑘300n_{k}=300italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 300, homogeneous covariate distribution
CFS CCOD CP s.d.(CP) wd s.d.(wd) CP s.d.(CP) wd s.d.(wd)
Homoscedasticity where σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1
Federated I Pooled sample
holds 0.899 0.021 3.30 0.20 0.900 0.010 3.30 0.09
ASR weakly violated 0.899 0.022 3.29 0.20 1.000 0.000 11.19 0.28
strongly violated 0.898 0.021 3.29 0.20 1.000 0.000 31.95 0.50
holds 0.901 0.019 3.32 0.19 0.900 0.014 3.30 0.13
Local ASR weakly violated 0.901 0.020 3.31 0.19 0.947 0.014 3.89 0.22
strongly violated 0.902 0.020 3.33 0.18 0.952 0.012 3.99 0.22
holds 0.905 0.018 3.37 0.17 0.900 0.015 3.31 0.13
CQR weakly violated 0.906 0.019 3.36 0.18 0.901 0.018 3.31 0.17
strongly violated 0.905 0.018 3.36 0.17 0.900 0.018 3.31 0.16
Federated II (ours) Target site only
holds 0.900 0.017 3.31 0.16 0.901 0.023 3.32 0.22
ASR weakly violated 0.900 0.018 3.30 0.17 0.900 0.023 3.31 0.22
strongly violated 0.899 0.018 3.30 0.16 0.901 0.023 3.32 0.22
holds 0.901 0.018 3.32 0.17 0.901 0.030 3.34 0.30
Local ASR weakly violated 0.902 0.018 3.32 0.17 0.901 0.031 3.34 0.30
strongly violated 0.903 0.019 3.34 0.17 0.904 0.030 3.37 0.31
holds 0.906 0.018 3.37 0.17 0.905 0.033 3.39 0.33
CQR weakly violated 0.906 0.018 3.37 0.17 0.905 0.032 3.40 0.32
strongly violated 0.905 0.018 3.37 0.17 0.905 0.033 3.40 0.33
Federated III Equal weights
holds 0.902 0.018 3.32 0.18 0.902 0.018 3.32 0.18
ASR weakly violated 0.901 0.019 3.31 0.18 0.901 0.019 3.31 0.18
strongly violated 0.900 0.019 3.31 0.18 0.900 0.019 3.31 0.18
holds 0.905 0.017 3.35 0.18 0.905 0.017 3.35 0.18
Local ASR weakly violated 0.905 0.018 3.35 0.18 0.905 0.018 3.35 0.18
strongly violated 0.906 0.018 3.37 0.18 0.906 0.018 3.37 0.18
holds 0.910 0.017 3.41 0.16 0.910 0.017 3.41 0.16
CQR weakly violated 0.910 0.017 3.41 0.17 0.910 0.017 3.41 0.17
strongly violated 0.909 0.017 3.41 0.16 0.909 0.017 3.41 0.16
Heteroscedasticity where σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x )
Federated I Pooled sample
holds 0.901 0.020 3.99 0.50 0.900 0.010 3.91 0.20
ASR weakly violated 0.903 0.021 4.04 0.54 0.991 0.002 11.27 0.24
strongly violated 0.901 0.020 3.99 0.50 1.000 0.000 31.91 0.57
holds 0.907 0.031 3.67 0.71 0.899 0.020 3.47 0.23
Local ASR weakly violated 0.907 0.032 3.66 0.82 0.921 0.019 3.76 0.28
strongly violated 0.906 0.032 3.67 0.96 0.924 0.017 3.81 0.24
holds 0.907 0.017 3.23 0.12 0.900 0.015 3.19 0.11
CQR weakly violated 0.908 0.018 3.23 0.14 0.901 0.017 3.19 0.13
strongly violated 0.908 0.017 3.24 0.13 0.901 0.016 3.20 0.12
Federated II (ours) Target site only
holds 0.901 0.018 3.98 0.43 0.899 0.020 3.94 0.48
ASR weakly violated 0.903 0.019 4.03 0.48 0.900 0.022 3.98 0.52
strongly violated 0.901 0.018 3.98 0.44 0.900 0.021 3.97 0.51
holds 0.908 0.030 3.67 0.73 0.907 0.038 3.71 0.89
Local ASR weakly violated 0.908 0.031 3.67 0.84 0.907 0.040 3.71 1.06
strongly violated 0.907 0.031 3.68 0.93 0.908 0.039 3.73 0.94
holds 0.907 0.017 3.23 0.12 0.903 0.031 3.23 0.18
CQR weakly violated 0.908 0.018 3.23 0.13 0.907 0.034 3.26 0.19
strongly violated 0.909 0.017 3.24 0.13 0.907 0.032 3.26 0.18
Federated III Equal weights
holds 0.902 0.019 4.01 0.46 0.902 0.019 4.01 0.46
ASR weakly violated 0.904 0.020 4.07 0.51 0.904 0.020 4.07 0.51
strongly violated 0.902 0.019 4.01 0.47 0.902 0.019 4.01 0.47
holds 0.911 0.030 3.71 0.70 0.911 0.030 3.71 0.70
Local ASR weakly violated 0.911 0.031 3.71 0.83 0.911 0.031 3.71 0.83
strongly violated 0.910 0.031 3.71 0.93 0.910 0.031 3.71 0.93
holds 0.911 0.016 3.25 0.13 0.911 0.016 3.25 0.13
CQR weakly violated 0.913 0.017 3.25 0.14 0.913 0.017 3.25 0.14
strongly violated 0.914 0.017 3.26 0.13 0.914 0.017 3.26 0.13
  • CFS: conformal score; CCOD: common conditional outcome distribution

  • CP: coverage probability; wd: width; s.d.: standard deviation (over 500 replications)

Table 2: nk=1000subscript𝑛𝑘1000n_{k}=1000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1000, homogeneous covariate distribution
CFS CCOD CP s.d.(CP) wd s.d.(wd) CP s.d.(CP) wd s.d.(wd)
Homoscedasticity where σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1
Federated I Pooled sample
holds 0.900 0.014 3.29 0.12 0.900 0.008 3.29 0.05
ASR weakly violated 0.899 0.013 3.28 0.11 1.000 0.000 11.19 0.17
strongly violated 0.900 0.014 3.30 0.12 1.000 0.000 31.99 0.29
holds 0.900 0.014 3.30 0.12 0.900 0.010 3.29 0.08
Local ASR weakly violated 0.901 0.013 3.30 0.11 0.947 0.010 3.88 0.14
strongly violated 0.901 0.014 3.31 0.12 0.954 0.009 3.99 0.14
holds 0.901 0.013 3.31 0.11 0.899 0.011 3.29 0.09
CQR weakly violated 0.901 0.013 3.31 0.11 0.899 0.013 3.29 0.11
strongly violated 0.901 0.012 3.32 0.11 0.899 0.012 3.30 0.11
Federated II (ours) Target site only
holds 0.900 0.012 3.29 0.10 0.899 0.015 3.28 0.13
ASR weakly violated 0.899 0.012 3.29 0.10 0.900 0.014 3.30 0.12
strongly violated 0.901 0.012 3.30 0.10 0.901 0.014 3.31 0.13
holds 0.900 0.013 3.29 0.11 0.899 0.019 3.29 0.17
Local ASR weakly violated 0.901 0.013 3.31 0.11 0.902 0.018 3.32 0.17
strongly violated 0.901 0.013 3.31 0.12 0.901 0.018 3.32 0.17
holds 0.901 0.012 3.31 0.11 0.900 0.020 3.31 0.18
CQR weakly violated 0.901 0.013 3.31 0.11 0.902 0.020 3.33 0.19
strongly violated 0.902 0.012 3.32 0.11 0.902 0.019 3.33 0.18
Federated III Equal weights
holds 0.901 0.013 3.30 0.11 0.901 0.013 3.30 0.11
ASR weakly violated 0.900 0.012 3.29 0.10 0.900 0.012 3.29 0.10
strongly violated 0.901 0.013 3.31 0.11 0.901 0.013 3.31 0.11
holds 0.901 0.013 3.31 0.11 0.901 0.013 3.31 0.11
Local ASR weakly violated 0.902 0.013 3.32 0.11 0.902 0.013 3.32 0.11
strongly violated 0.902 0.013 3.32 0.12 0.902 0.013 3.32 0.12
holds 0.903 0.012 3.33 0.11 0.903 0.012 3.33 0.11
CQR weakly violated 0.903 0.012 3.33 0.11 0.903 0.012 3.33 0.11
strongly violated 0.903 0.012 3.33 0.10 0.903 0.012 3.33 0.10
Heteroscedasticity where σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x )
Federated I Pooled sample
holds 0.901 0.013 3.94 0.29 0.900 0.007 3.91 0.12
ASR weakly violated 0.901 0.013 3.94 0.30 0.991 0.002 11.28 0.14
strongly violated 0.901 0.012 3.95 0.29 1.000 0.000 31.96 0.35
holds 0.902 0.023 3.48 0.31 0.901 0.015 3.45 0.15
Local ASR weakly violated 0.899 0.021 3.48 0.27 0.921 0.015 3.77 0.18
strongly violated 0.901 0.023 3.50 0.30 0.925 0.013 3.80 0.17
holds 0.902 0.013 3.19 0.11 0.900 0.011 3.18 0.11
CQR weakly violated 0.902 0.012 3.20 0.11 0.900 0.012 3.19 0.11
strongly violated 0.902 0.013 3.20 0.11 0.899 0.012 3.18 0.11
Federated II (ours) Target site only
holds 0.901 0.011 3.94 0.25 0.900 0.013 3.91 0.28
ASR weakly violated 0.901 0.012 3.94 0.27 0.901 0.013 3.96 0.30
strongly violated 0.901 0.011 3.95 0.26 0.900 0.013 3.94 0.30
holds 0.901 0.022 3.48 0.30 0.900 0.027 3.47 0.34
Local ASR weakly violated 0.899 0.021 3.48 0.26 0.900 0.025 3.50 0.32
strongly violated 0.901 0.023 3.50 0.30 0.901 0.027 3.51 0.35
holds 0.902 0.013 3.19 0.11 0.900 0.022 3.19 0.13
CQR weakly violated 0.902 0.012 3.20 0.11 0.901 0.020 3.20 0.13
strongly violated 0.902 0.012 3.20 0.11 0.901 0.020 3.20 0.13
Federated III Equal weights
holds 0.901 0.012 3.95 0.27 0.901 0.012 3.95 0.27
ASR weakly violated 0.901 0.013 3.95 0.29 0.901 0.013 3.95 0.29
strongly violated 0.901 0.012 3.96 0.27 0.901 0.012 3.96 0.27
holds 0.903 0.022 3.50 0.30 0.903 0.022 3.50 0.30
Local ASR weakly violated 0.900 0.021 3.49 0.26 0.900 0.021 3.49 0.26
strongly violated 0.902 0.023 3.51 0.30 0.902 0.023 3.51 0.30
holds 0.903 0.013 3.20 0.11 0.903 0.013 3.20 0.11
CQR weakly violated 0.904 0.012 3.20 0.11 0.904 0.012 3.20 0.11
strongly violated 0.904 0.012 3.20 0.11 0.904 0.012 3.20 0.11
  • CFS: conformal score; CCOD: common conditional outcome distribution

  • CP: coverage probability; wd: width; s.d.: standard deviation (over 500 replications)

Table 3: nk=3000subscript𝑛𝑘3000n_{k}=3000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3000, homogeneous covariate distribution
Refer to caption
Figure 5: Boxplots of coverage probability, under homogeneous covariate distributions
Refer to caption
Figure 6: Boxplots of prediction interval width, under homogeneous covariate distributions
CFS CCOD CP s.d.(CP) wd s.d.(wd) CP s.d.(CP) wd s.d.(wd)
Homoscedasticity where σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1
Federated I Pooled sample
holds 0.892 0.043 3.28 0.44 0.899 0.018 3.29 0.16
ASR weakly violated 0.892 0.042 3.27 0.38 1.000 0.000 9.39 0.56
strongly violated 0.890 0.042 3.26 0.40 1.000 0.000 25.09 1.74
holds 0.898 0.038 3.33 0.37 0.899 0.024 3.31 0.23
Local ASR weakly violated 0.897 0.042 3.33 0.40 0.841 0.041 2.85 0.28
strongly violated 0.897 0.044 3.34 0.42 0.756 0.058 2.36 0.31
holds 0.925 0.043 6.42 17.84 0.901 0.025 3.33 0.24
CQR weakly violated 0.925 0.045 4.68 7.49 0.905 0.038 3.41 0.39
strongly violated 0.927 0.042 5.56 26.79 0.905 0.041 3.43 0.47
Federated II (ours) Target site only
holds 0.895 0.037 3.30 0.38 0.901 0.045 3.38 0.43
ASR weakly violated 0.896 0.036 3.29 0.34 0.902 0.046 3.40 0.47
strongly violated 0.894 0.035 3.29 0.35 0.901 0.051 3.41 0.50
holds 0.902 0.035 3.37 0.36 0.907 0.055 3.50 0.61
Local ASR weakly violated 0.902 0.037 3.37 0.39 0.909 0.059 3.56 0.72
strongly violated 0.902 0.040 3.38 0.42 0.908 0.061 3.58 0.73
holds 0.929 0.037 5.88 14.24 0.921 0.053 3.70 0.65
CQR weakly violated 0.929 0.039 4.49 5.97 0.920 0.062 3.72 0.72
strongly violated 0.931 0.036 5.19 21.43 0.920 0.063 3.72 0.70
Federated III Equal weights
holds 0.899 0.034 3.33 0.35 0.907 0.037 3.43 0.41
ASR weakly violated 0.900 0.033 3.33 0.33 0.904 0.035 3.39 0.38
strongly violated 0.898 0.034 3.32 0.34 0.904 0.035 3.39 0.38
holds 0.913 0.036 3.51 0.49 0.924 0.039 3.90 2.17
Local ASR weakly violated 0.915 0.035 3.53 0.50 0.923 0.038 4.05 4.30
strongly violated 0.913 0.036 3.50 0.43 0.923 0.038 3.74 0.93
holds 0.949 0.033 4.98 5.52 0.962 0.035 8.94 33.91
CQR weakly violated 0.947 0.032 4.38 2.66 0.960 0.035 6.23 6.98
strongly violated 0.948 0.032 4.61 7.20 0.961 0.037 6.22 8.50
Heteroscedasticity where σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x )
Federated I Pooled sample
holds 0.904 0.034 4.21 0.90 0.906 0.017 4.12 0.42
ASR weakly violated 0.903 0.036 4.21 1.06 0.985 0.004 9.69 0.64
strongly violated 0.905 0.034 4.25 0.96 1.000 0.000 25.21 1.76
holds 0.915 0.060 3.87 0.87 0.926 0.033 3.91 0.51
Local ASR weakly violated 0.920 0.049 4.11 3.12 0.861 0.043 3.10 0.41
strongly violated 0.920 0.048 3.97 0.90 0.779 0.056 2.49 0.37
holds 0.923 0.111 8.19 27.88 0.851 0.156 3.31 0.58
CQR weakly violated 0.919 0.120 8.36 57.29 0.858 0.156 3.43 0.70
strongly violated 0.922 0.122 7.36 40.15 0.841 0.175 3.46 0.91
Federated II (ours) Target site only
holds 0.906 0.030 4.22 0.82 0.902 0.043 4.27 1.24
ASR weakly violated 0.905 0.031 4.23 0.93 0.902 0.044 4.31 1.30
strongly violated 0.908 0.029 4.28 0.83 0.905 0.044 4.36 1.24
holds 0.919 0.052 3.93 0.86 0.906 0.083 4.14 1.63
Local ASR weakly violated 0.924 0.045 4.14 2.55 0.909 0.081 4.29 1.93
strongly violated 0.925 0.045 4.03 0.89 0.910 0.094 4.29 1.62
holds 0.931 0.101 7.36 22.26 0.872 0.192 4.05 1.46
CQR weakly violated 0.933 0.099 7.50 45.81 0.863 0.209 4.12 1.57
strongly violated 0.934 0.102 6.71 32.09 0.863 0.210 4.12 1.58
Federated III Equal weights
holds 0.912 0.029 4.39 0.85 0.916 0.028 4.55 0.91
ASR weakly violated 0.911 0.029 4.38 0.91 0.915 0.028 4.51 0.91
strongly violated 0.914 0.029 4.46 0.90 0.917 0.029 4.59 0.97
holds 0.938 0.041 4.28 0.95 0.946 0.038 4.68 1.91
Local ASR weakly violated 0.940 0.042 4.76 6.03 0.948 0.038 5.05 6.76
strongly violated 0.938 0.043 4.34 1.04 0.946 0.039 4.63 1.48
holds 0.970 0.054 6.53 11.45 0.977 0.047 10.45 22.37
CQR weakly violated 0.966 0.064 6.51 20.21 0.974 0.052 10.32 28.37
strongly violated 0.969 0.059 6.02 14.23 0.977 0.050 9.86 19.04
  • CFS: conformal score; CCOD: common conditional outcome distribution

  • CP: coverage probability; wd: width; s.d.: standard deviation (over 500 replications)

Table 4: nk=300subscript𝑛𝑘300n_{k}=300italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 300, weakly heterogeneous covariate distribution
CFS CCOD CP s.d.(CP) wd s.d.(wd) CP s.d.(CP) wd s.d.(wd)
Homoscedasticity where σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1
Federated I Pooled sample
holds 0.897 0.022 3.28 0.21 0.899 0.011 3.29 0.09
ASR weakly violated 0.896 0.025 3.27 0.22 1.000 0.000 9.23 0.26
strongly violated 0.894 0.027 3.26 0.24 1.000 0.000 24.51 0.96
holds 0.899 0.024 3.30 0.22 0.899 0.015 3.29 0.12
Local ASR weakly violated 0.897 0.027 3.28 0.23 0.832 0.022 2.76 0.14
strongly violated 0.897 0.024 3.29 0.22 0.736 0.030 2.25 0.14
holds 0.909 0.035 3.62 3.28 0.899 0.015 3.30 0.13
CQR weakly violated 0.911 0.037 3.56 0.91 0.903 0.020 3.34 0.20
strongly violated 0.907 0.036 3.53 1.28 0.900 0.023 3.32 0.22
Federated II (ours) Target site only
holds 0.899 0.019 3.29 0.18 0.901 0.025 3.33 0.23
ASR weakly violated 0.898 0.021 3.28 0.19 0.900 0.025 3.31 0.23
strongly violated 0.896 0.022 3.27 0.20 0.899 0.025 3.31 0.24
holds 0.900 0.022 3.32 0.21 0.903 0.032 3.37 0.31
Local ASR weakly violated 0.899 0.024 3.29 0.22 0.902 0.031 3.35 0.31
strongly violated 0.899 0.022 3.30 0.20 0.901 0.032 3.35 0.32
holds 0.910 0.030 3.58 2.62 0.905 0.036 3.40 0.35
CQR weakly violated 0.912 0.032 3.53 0.72 0.904 0.034 3.39 0.34
strongly violated 0.908 0.030 3.50 1.01 0.904 0.035 3.39 0.35
Federated III Equal weights
holds 0.902 0.021 3.33 0.20 0.905 0.024 3.37 0.25
ASR weakly violated 0.901 0.021 3.31 0.20 0.903 0.023 3.34 0.23
strongly violated 0.899 0.022 3.30 0.20 0.902 0.024 3.34 0.23
holds 0.906 0.023 3.39 0.30 0.913 0.027 3.53 0.88
Local ASR weakly violated 0.906 0.022 3.37 0.22 0.913 0.027 3.48 0.54
strongly violated 0.905 0.021 3.36 0.21 0.912 0.024 3.45 0.38
holds 0.923 0.027 3.65 1.09 0.936 0.032 3.94 1.28
CQR weakly violated 0.925 0.027 3.64 0.52 0.936 0.032 3.92 0.94
strongly violated 0.922 0.026 3.61 0.47 0.935 0.033 3.94 1.02
Heteroscedasticity where σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x )
Federated I Pooled sample
holds 0.907 0.022 4.16 0.58 0.908 0.011 4.12 0.24
ASR weakly violated 0.905 0.022 4.12 0.54 0.985 0.003 9.53 0.32
strongly violated 0.901 0.025 4.03 0.57 1.000 0.000 24.61 0.95
holds 0.921 0.040 3.86 0.54 0.929 0.020 3.91 0.31
Local ASR weakly violated 0.922 0.033 3.84 0.50 0.859 0.025 3.04 0.22
strongly violated 0.918 0.038 3.78 0.47 0.767 0.031 2.39 0.17
holds 0.894 0.136 3.84 1.73 0.872 0.107 3.23 0.35
CQR weakly violated 0.885 0.142 3.81 2.35 0.879 0.104 3.26 0.36
strongly violated 0.882 0.143 3.74 1.45 0.867 0.121 3.29 0.46
Federated II (ours) Target site only
holds 0.908 0.018 4.17 0.50 0.907 0.023 4.17 0.62
ASR weakly violated 0.907 0.018 4.13 0.47 0.908 0.022 4.18 0.57
strongly violated 0.903 0.021 4.06 0.50 0.906 0.022 4.16 0.60
holds 0.923 0.034 3.88 0.50 0.922 0.047 3.95 0.74
Local ASR weakly violated 0.925 0.029 3.87 0.45 0.926 0.041 3.97 0.68
strongly violated 0.921 0.032 3.82 0.43 0.925 0.041 3.97 0.71
holds 0.904 0.121 3.77 1.37 0.849 0.179 3.47 0.81
CQR weakly violated 0.899 0.119 3.75 1.86 0.857 0.172 3.51 0.77
strongly violated 0.896 0.123 3.69 1.14 0.857 0.172 3.51 0.80
Federated III Equal weights
holds 0.911 0.019 4.26 0.54 0.914 0.020 4.36 0.60
ASR weakly violated 0.910 0.019 4.23 0.52 0.913 0.021 4.33 0.60
strongly violated 0.908 0.020 4.19 0.54 0.911 0.022 4.31 0.63
holds 0.932 0.030 4.04 0.55 0.940 0.030 4.24 0.87
Local ASR weakly violated 0.934 0.028 4.03 0.54 0.940 0.029 4.28 1.76
strongly violated 0.932 0.028 4.01 0.50 0.939 0.030 4.30 1.88
holds 0.941 0.086 4.03 1.14 0.957 0.074 4.67 1.77
CQR weakly violated 0.944 0.071 3.97 0.97 0.956 0.069 4.67 2.34
strongly violated 0.941 0.088 3.96 0.87 0.958 0.067 4.80 3.17
  • CFS: conformal score; CCOD: common conditional outcome distribution

  • CP: coverage probability; wd: width; s.d.: standard deviation (over 500 replications)

Table 5: nk=1000subscript𝑛𝑘1000n_{k}=1000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1000, weakly heterogeneous covariate distribution
CFS CCOD CP s.d.(CP) wd s.d.(wd) CP s.d.(CP) wd s.d.(wd)
Homoscedasticity where σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1
Federated I Pooled sample
holds 0.898 0.016 3.27 0.14 0.900 0.008 3.29 0.05
ASR weakly violated 0.898 0.015 3.28 0.13 1.000 0.000 9.20 0.16
strongly violated 0.896 0.017 3.27 0.15 1.000 0.000 24.47 0.56
holds 0.898 0.016 3.28 0.14 0.900 0.011 3.29 0.09
Local ASR weakly violated 0.898 0.016 3.28 0.14 0.829 0.016 2.74 0.08
strongly violated 0.897 0.017 3.28 0.14 0.733 0.020 2.23 0.08
holds 0.899 0.019 3.30 0.17 0.899 0.012 3.29 0.10
CQR weakly violated 0.899 0.019 3.30 0.17 0.900 0.015 3.30 0.13
strongly violated 0.899 0.020 3.30 0.19 0.900 0.015 3.31 0.13
Federated II (ours) Target site only
holds 0.898 0.013 3.28 0.12 0.900 0.014 3.30 0.12
ASR weakly violated 0.899 0.013 3.29 0.11 0.901 0.015 3.30 0.13
strongly violated 0.897 0.014 3.27 0.12 0.900 0.014 3.30 0.13
holds 0.899 0.015 3.29 0.13 0.901 0.018 3.31 0.17
Local ASR weakly violated 0.899 0.015 3.28 0.13 0.900 0.020 3.31 0.18
strongly violated 0.898 0.015 3.29 0.13 0.901 0.018 3.32 0.17
holds 0.900 0.017 3.30 0.15 0.901 0.020 3.31 0.19
CQR weakly violated 0.900 0.016 3.31 0.15 0.902 0.021 3.34 0.20
strongly violated 0.900 0.017 3.31 0.16 0.902 0.021 3.34 0.20
Federated III Equal weights
holds 0.901 0.014 3.31 0.12 0.902 0.015 3.32 0.14
ASR weakly violated 0.902 0.014 3.31 0.13 0.903 0.015 3.33 0.14
strongly violated 0.900 0.014 3.30 0.13 0.901 0.016 3.32 0.15
holds 0.903 0.016 3.33 0.14 0.905 0.017 3.35 0.16
Local ASR weakly violated 0.903 0.015 3.33 0.14 0.905 0.016 3.35 0.15
strongly violated 0.903 0.015 3.33 0.14 0.905 0.016 3.35 0.16
holds 0.909 0.018 3.40 0.18 0.914 0.022 3.46 0.24
CQR weakly violated 0.909 0.016 3.40 0.16 0.914 0.019 3.46 0.22
strongly violated 0.909 0.017 3.40 0.17 0.913 0.020 3.46 0.23
Heteroscedasticity where σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x )
Federated I Pooled sample
holds 0.906 0.014 4.09 0.35 0.907 0.008 4.10 0.14
ASR weakly violated 0.906 0.015 4.08 0.37 0.985 0.003 9.46 0.17
strongly violated 0.903 0.016 4.04 0.37 1.000 0.000 24.53 0.52
holds 0.927 0.022 3.84 0.33 0.932 0.014 3.90 0.20
Local ASR weakly violated 0.925 0.023 3.86 0.33 0.856 0.018 3.02 0.15
strongly violated 0.924 0.023 3.82 0.33 0.763 0.021 2.36 0.12
holds 0.861 0.124 3.23 0.43 0.888 0.058 3.19 0.21
CQR weakly violated 0.860 0.132 3.26 0.47 0.890 0.057 3.21 0.23
strongly violated 0.861 0.125 3.23 0.42 0.880 0.081 3.21 0.28
Federated II (ours) Target site only
holds 0.907 0.012 4.09 0.29 0.907 0.013 4.10 0.33
ASR weakly violated 0.906 0.013 4.09 0.31 0.908 0.013 4.13 0.33
strongly violated 0.904 0.013 4.05 0.30 0.906 0.013 4.11 0.34
holds 0.928 0.020 3.85 0.30 0.928 0.026 3.90 0.42
Local ASR weakly violated 0.926 0.021 3.87 0.30 0.927 0.026 3.92 0.42
strongly violated 0.926 0.020 3.84 0.29 0.929 0.026 3.93 0.43
holds 0.872 0.108 3.23 0.36 0.853 0.138 3.24 0.46
CQR weakly violated 0.873 0.110 3.26 0.40 0.861 0.134 3.30 0.48
strongly violated 0.874 0.105 3.25 0.36 0.861 0.140 3.30 0.49
Federated III Equal weights
holds 0.910 0.013 4.18 0.34 0.911 0.014 4.22 0.38
ASR weakly violated 0.909 0.014 4.18 0.35 0.910 0.014 4.21 0.39
strongly violated 0.907 0.014 4.14 0.35 0.908 0.015 4.17 0.38
holds 0.935 0.020 3.98 0.33 0.937 0.021 4.04 0.38
Local ASR weakly violated 0.932 0.020 3.98 0.32 0.934 0.022 4.03 0.37
strongly violated 0.932 0.021 3.97 0.33 0.934 0.022 4.02 0.39
holds 0.919 0.080 3.49 0.42 0.927 0.082 3.63 0.56
CQR weakly violated 0.921 0.080 3.51 0.44 0.929 0.080 3.64 0.54
strongly violated 0.918 0.083 3.48 0.39 0.926 0.086 3.60 0.48
  • CFS: conformal score; CCOD: common conditional outcome distribution

  • CP: coverage probability; wd: width; s.d.: standard deviation (over 500 replications)

Table 6: nk=3000subscript𝑛𝑘3000n_{k}=3000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3000, weakly heterogeneous covariate distribution
Refer to caption
Figure 7: Boxplots of coverage probability, under weakly heterogeneous covariate distributions
Refer to caption
Figure 8: Boxplots of prediction interval width, under weakly heterogeneous covariate distributions
CFS CCOD CP s.d.(CP) wd s.d.(wd) CP s.d.(CP) wd s.d.(wd)
Homoscedasticity where σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1
Federated I Pooled sample
holds 0.891 0.043 3.26 0.37 0.901 0.018 3.31 0.16
ASR weakly violated 0.893 0.039 3.28 0.40 1.000 0.000 8.85 0.50
strongly violated 0.893 0.041 3.29 0.39 1.000 0.000 24.23 1.49
holds 0.899 0.043 3.36 0.44 0.901 0.023 3.33 0.22
Local ASR weakly violated 0.897 0.044 3.34 0.48 0.843 0.037 2.86 0.27
strongly violated 0.895 0.049 3.34 0.47 0.754 0.056 2.35 0.30
holds 0.923 0.064 4.55 6.23 0.903 0.025 3.35 0.24
CQR weakly violated 0.922 0.055 4.49 5.18 0.904 0.043 3.41 0.42
strongly violated 0.924 0.067 4.72 7.05 0.904 0.042 3.41 0.44
Federated II (ours) Target site only
holds 0.893 0.038 3.28 0.34 0.898 0.048 3.35 0.47
ASR weakly violated 0.895 0.034 3.29 0.34 0.898 0.043 3.34 0.41
strongly violated 0.896 0.036 3.30 0.35 0.896 0.049 3.35 0.47
holds 0.903 0.040 3.39 0.43 0.904 0.058 3.50 0.66
Local ASR weakly violated 0.900 0.041 3.37 0.46 0.904 0.056 3.48 0.64
strongly violated 0.900 0.043 3.37 0.44 0.905 0.058 3.50 0.63
holds 0.927 0.051 4.38 4.96 0.918 0.058 3.68 0.66
CQR weakly violated 0.926 0.045 4.33 4.12 0.920 0.062 3.71 0.71
strongly violated 0.928 0.055 4.52 5.62 0.920 0.060 3.71 0.69
Federated III Equal weights
holds 0.897 0.036 3.31 0.34 0.907 0.037 3.44 0.46
ASR weakly violated 0.899 0.032 3.32 0.32 0.907 0.036 3.44 0.44
strongly violated 0.900 0.034 3.34 0.35 0.908 0.039 3.46 0.47
holds 0.914 0.038 3.59 1.08 0.929 0.042 4.27 3.11
Local ASR weakly violated 0.912 0.037 3.55 0.89 0.930 0.042 4.70 7.53
strongly violated 0.913 0.039 3.53 0.57 0.930 0.042 4.30 4.39
holds 0.949 0.033 4.59 3.44 0.965 0.034 9.82 52.91
CQR weakly violated 0.946 0.035 4.55 3.13 0.960 0.056 6.84 8.83
strongly violated 0.950 0.031 4.65 4.31 0.966 0.032 6.82 7.94
Heteroscedasticity where σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x )
Federated I Pooled sample
holds 0.903 0.038 4.23 1.05 0.906 0.017 4.10 0.42
ASR weakly violated 0.902 0.035 4.16 0.90 0.983 0.004 9.22 0.52
strongly violated 0.904 0.034 4.21 0.95 1.000 0.000 24.37 1.53
holds 0.918 0.053 3.92 0.89 0.925 0.032 3.89 0.49
Local ASR weakly violated 0.919 0.053 3.97 0.92 0.870 0.044 3.19 0.45
strongly violated 0.919 0.052 3.96 0.91 0.789 0.058 2.55 0.40
holds 0.915 0.134 9.13 47.58 0.850 0.155 3.29 0.56
CQR weakly violated 0.916 0.127 4.57 8.47 0.854 0.169 3.43 0.72
strongly violated 0.915 0.126 5.86 16.05 0.839 0.183 3.48 0.91
Federated II (ours) Target site only
holds 0.905 0.034 4.23 0.97 0.899 0.046 4.24 1.29
ASR weakly violated 0.905 0.031 4.20 0.84 0.905 0.044 4.39 1.33
strongly violated 0.906 0.030 4.22 0.87 0.902 0.042 4.28 1.27
holds 0.920 0.049 3.96 0.91 0.901 0.091 4.12 1.67
Local ASR weakly violated 0.923 0.050 4.04 0.94 0.913 0.085 4.31 1.70
strongly violated 0.923 0.049 4.03 0.93 0.913 0.081 4.32 1.71
holds 0.925 0.120 8.10 38.03 0.855 0.213 3.97 1.46
CQR weakly violated 0.926 0.116 4.49 6.77 0.877 0.197 4.18 1.51
strongly violated 0.929 0.107 5.53 12.80 0.877 0.192 4.18 1.49
Federated III Equal weights
holds 0.910 0.031 4.38 1.00 0.918 0.031 4.69 1.25
ASR weakly violated 0.909 0.030 4.34 0.86 0.915 0.030 4.52 0.96
strongly violated 0.910 0.030 4.35 0.86 0.917 0.029 4.61 0.99
holds 0.936 0.044 4.27 1.02 0.951 0.038 4.99 2.81
Local ASR weakly violated 0.936 0.046 4.41 1.81 0.948 0.041 5.64 9.58
strongly violated 0.938 0.047 4.42 1.66 0.950 0.040 5.40 7.17
holds 0.958 0.097 7.31 18.71 0.975 0.066 16.39 86.25
CQR weakly violated 0.958 0.092 5.51 7.78 0.969 0.084 13.30 53.84
strongly violated 0.969 0.057 6.29 9.57 0.979 0.043 10.97 24.62
  • CFS: conformal score; CCOD: common conditional outcome distribution

  • CP: coverage probability; wd: width; s.d.: standard deviation (over 500 replications)

Table 7: nk=300subscript𝑛𝑘300n_{k}=300italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 300, strongly heterogeneous covariate distribution
CFS CCOD CP s.d.(CP) wd s.d.(wd) CP s.d.(CP) wd s.d.(wd)
Homoscedasticity where σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1
Federated I Pooled sample
holds 0.895 0.027 3.27 0.23 0.900 0.011 3.30 0.09
ASR weakly violated 0.896 0.026 3.27 0.23 1.000 0.000 8.77 0.26
strongly violated 0.895 0.026 3.27 0.22 1.000 0.000 23.95 0.85
holds 0.897 0.030 3.30 0.26 0.901 0.015 3.31 0.14
Local ASR weakly violated 0.899 0.027 3.30 0.25 0.839 0.021 2.80 0.14
strongly violated 0.896 0.028 3.29 0.25 0.739 0.036 2.26 0.17
holds 0.907 0.044 3.60 1.54 0.901 0.015 3.32 0.14
CQR weakly violated 0.909 0.045 3.60 1.44 0.902 0.022 3.33 0.21
strongly violated 0.906 0.042 3.57 1.66 0.900 0.025 3.33 0.25
Federated II (ours) Target site only
holds 0.897 0.022 3.28 0.20 0.902 0.024 3.34 0.24
ASR weakly violated 0.898 0.022 3.28 0.20 0.901 0.025 3.32 0.24
strongly violated 0.896 0.022 3.28 0.20 0.899 0.025 3.31 0.24
holds 0.900 0.026 3.32 0.23 0.906 0.029 3.40 0.31
Local ASR weakly violated 0.900 0.025 3.31 0.23 0.903 0.031 3.35 0.32
strongly violated 0.898 0.025 3.30 0.23 0.902 0.031 3.36 0.31
holds 0.910 0.036 3.57 1.22 0.909 0.031 3.43 0.33
CQR weakly violated 0.911 0.037 3.56 1.14 0.905 0.033 3.40 0.33
strongly violated 0.908 0.035 3.53 1.32 0.905 0.035 3.40 0.35
Federated III Equal weights
holds 0.898 0.022 3.29 0.20 0.904 0.028 3.36 0.28
ASR weakly violated 0.900 0.021 3.30 0.20 0.907 0.027 3.39 0.28
strongly violated 0.899 0.021 3.30 0.20 0.906 0.025 3.38 0.26
holds 0.904 0.023 3.36 0.23 0.916 0.029 3.52 0.41
Local ASR weakly violated 0.907 0.023 3.38 0.26 0.919 0.029 3.59 0.62
strongly violated 0.904 0.023 3.36 0.25 0.917 0.031 3.71 3.48
holds 0.922 0.028 3.63 0.61 0.942 0.034 4.03 1.02
CQR weakly violated 0.925 0.026 3.65 0.68 0.944 0.031 4.03 0.90
strongly violated 0.921 0.028 3.61 0.75 0.943 0.033 4.11 1.30
Heteroscedasticity where σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x )
Federated I Pooled sample
holds 0.904 0.025 4.10 0.60 0.907 0.010 4.10 0.23
ASR weakly violated 0.903 0.022 4.07 0.56 0.983 0.003 9.11 0.28
strongly violated 0.903 0.025 4.07 0.58 1.000 0.000 24.14 0.85
holds 0.918 0.040 3.80 0.54 0.929 0.020 3.90 0.31
Local ASR weakly violated 0.921 0.041 3.85 0.61 0.868 0.025 3.12 0.23
strongly violated 0.920 0.037 3.84 0.56 0.781 0.036 2.48 0.22
holds 0.862 0.168 3.83 2.02 0.866 0.111 3.22 0.35
CQR weakly violated 0.868 0.165 3.86 2.11 0.856 0.121 3.20 0.39
strongly violated 0.874 0.166 4.17 4.44 0.854 0.143 3.28 0.50
Federated II (ours) Target site only
holds 0.906 0.021 4.12 0.52 0.907 0.023 4.18 0.60
ASR weakly violated 0.904 0.019 4.08 0.49 0.906 0.022 4.12 0.58
strongly violated 0.904 0.021 4.09 0.50 0.905 0.023 4.14 0.60
holds 0.921 0.033 3.84 0.49 0.924 0.043 3.98 0.73
Local ASR weakly violated 0.924 0.035 3.87 0.56 0.924 0.043 3.96 0.72
strongly violated 0.923 0.032 3.87 0.52 0.923 0.047 3.97 0.76
holds 0.879 0.143 3.76 1.59 0.854 0.177 3.48 0.81
CQR weakly violated 0.884 0.141 3.77 1.67 0.849 0.183 3.46 0.81
strongly violated 0.889 0.143 4.03 3.53 0.849 0.177 3.46 0.82
Federated III Equal weights
holds 0.908 0.021 4.18 0.55 0.915 0.025 4.43 0.77
ASR weakly violated 0.908 0.019 4.17 0.51 0.915 0.023 4.44 0.73
strongly violated 0.907 0.020 4.16 0.53 0.914 0.024 4.43 0.73
holds 0.929 0.031 3.98 0.52 0.942 0.035 4.38 0.96
Local ASR weakly violated 0.932 0.031 4.03 0.67 0.947 0.032 4.58 2.28
strongly violated 0.930 0.032 3.99 0.56 0.947 0.033 4.49 1.24
holds 0.942 0.079 4.02 1.08 0.964 0.064 5.00 3.18
CQR weakly violated 0.937 0.088 4.01 1.54 0.960 0.074 4.81 2.30
strongly violated 0.940 0.080 4.07 1.72 0.964 0.065 5.11 2.77
  • CFS: conformal score; CCOD: common conditional outcome distribution

  • CP: coverage probability; wd: width; s.d.: standard deviation (over 500 replications)

Table 8: nk=1000subscript𝑛𝑘1000n_{k}=1000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1000, strongly heterogeneous covariate distribution
CFS CCOD CP s.d.(CP) wd s.d.(wd) CP s.d.(CP) wd s.d.(wd)
Homoscedasticity where σ(x)=1𝜎𝑥1\sigma(x)=1italic_σ ( italic_x ) = 1
Federated I Pooled sample
holds 0.898 0.017 3.28 0.15 0.899 0.008 3.28 0.06
ASR weakly violated 0.897 0.016 3.27 0.14 1.000 0.000 8.74 0.15
strongly violated 0.897 0.017 3.27 0.15 1.000 0.000 23.78 0.49
holds 0.898 0.018 3.28 0.15 0.899 0.011 3.29 0.09
Local ASR weakly violated 0.898 0.019 3.28 0.17 0.835 0.015 2.78 0.08
strongly violated 0.898 0.018 3.28 0.15 0.732 0.022 2.22 0.09
holds 0.900 0.024 3.31 0.22 0.899 0.011 3.29 0.10
CQR weakly violated 0.899 0.026 3.31 0.24 0.899 0.015 3.29 0.13
strongly violated 0.899 0.024 3.31 0.21 0.899 0.016 3.30 0.15
Federated II (ours) Target site only
holds 0.899 0.015 3.28 0.13 0.900 0.014 3.30 0.13
ASR weakly violated 0.898 0.014 3.28 0.12 0.900 0.014 3.30 0.13
strongly violated 0.898 0.014 3.28 0.12 0.900 0.014 3.30 0.12
holds 0.898 0.016 3.28 0.14 0.900 0.018 3.30 0.17
Local ASR weakly violated 0.899 0.017 3.29 0.15 0.901 0.019 3.31 0.18
strongly violated 0.899 0.016 3.29 0.13 0.900 0.018 3.31 0.17
holds 0.901 0.020 3.32 0.19 0.901 0.020 3.32 0.18
CQR weakly violated 0.900 0.022 3.31 0.20 0.903 0.021 3.34 0.19
strongly violated 0.900 0.020 3.31 0.18 0.903 0.020 3.34 0.18
Federated III Equal weights
holds 0.901 0.015 3.30 0.13 0.903 0.017 3.33 0.16
ASR weakly violated 0.900 0.015 3.30 0.13 0.903 0.018 3.33 0.17
strongly violated 0.901 0.013 3.30 0.12 0.902 0.017 3.33 0.16
holds 0.902 0.016 3.32 0.15 0.907 0.019 3.37 0.19
Local ASR weakly violated 0.903 0.017 3.33 0.15 0.907 0.019 3.37 0.19
strongly violated 0.903 0.014 3.33 0.13 0.906 0.018 3.37 0.17
holds 0.910 0.018 3.41 0.21 0.918 0.022 3.51 0.28
CQR weakly violated 0.908 0.020 3.39 0.20 0.915 0.025 3.48 0.28
strongly violated 0.909 0.017 3.41 0.17 0.915 0.022 3.48 0.25
Heteroscedasticity where σ(x)=log(x)𝜎𝑥𝑥\sigma(x)=-\log(x)italic_σ ( italic_x ) = - roman_log ( italic_x )
Federated I Pooled sample
holds 0.906 0.017 4.09 0.40 0.908 0.007 4.11 0.13
ASR weakly violated 0.905 0.015 4.06 0.37 0.983 0.003 9.04 0.18
strongly violated 0.904 0.016 4.07 0.38 1.000 0.000 23.98 0.51
holds 0.925 0.024 3.83 0.37 0.932 0.014 3.91 0.20
Local ASR weakly violated 0.924 0.024 3.85 0.35 0.866 0.019 3.11 0.15
strongly violated 0.924 0.028 3.85 0.37 0.773 0.023 2.42 0.14
holds 0.855 0.141 3.28 0.53 0.886 0.060 3.19 0.22
CQR weakly violated 0.861 0.142 3.30 0.53 0.877 0.078 3.19 0.25
strongly violated 0.863 0.134 3.29 0.49 0.868 0.098 3.19 0.32
Federated II (ours) Target site only
holds 0.907 0.014 4.10 0.33 0.908 0.014 4.14 0.34
ASR weakly violated 0.905 0.013 4.07 0.31 0.907 0.014 4.11 0.33
strongly violated 0.905 0.014 4.08 0.32 0.907 0.013 4.13 0.34
holds 0.927 0.021 3.85 0.32 0.930 0.027 3.93 0.44
Local ASR weakly violated 0.925 0.021 3.86 0.32 0.927 0.026 3.92 0.41
strongly violated 0.926 0.023 3.87 0.33 0.930 0.025 3.95 0.44
holds 0.866 0.123 3.27 0.44 0.850 0.144 3.26 0.50
CQR weakly violated 0.870 0.124 3.29 0.45 0.863 0.127 3.30 0.46
strongly violated 0.876 0.113 3.29 0.42 0.863 0.132 3.30 0.50
Federated III Equal weights
holds 0.909 0.014 4.17 0.34 0.912 0.017 4.27 0.45
ASR weakly violated 0.908 0.014 4.15 0.35 0.911 0.017 4.24 0.47
strongly violated 0.908 0.013 4.16 0.32 0.910 0.015 4.22 0.41
holds 0.933 0.021 3.96 0.36 0.939 0.024 4.11 0.49
Local ASR weakly violated 0.931 0.022 3.97 0.37 0.936 0.026 4.10 0.52
strongly violated 0.933 0.020 3.98 0.33 0.937 0.023 4.07 0.43
holds 0.914 0.090 3.49 0.43 0.933 0.083 3.75 0.60
CQR weakly violated 0.916 0.085 3.50 0.43 0.929 0.090 3.74 0.60
strongly violated 0.921 0.082 3.51 0.41 0.934 0.082 3.73 0.53
  • CFS: conformal score; CCOD: common conditional outcome distribution

  • CP: coverage probability; wd: width; s.d.: standard deviation (over 500 replications)

Table 9: nk=3000subscript𝑛𝑘3000n_{k}=3000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3000, strongly heterogeneous covariate distribution
Refer to caption
Figure 9: Boxplots of coverage probability, under strongly heterogeneous covariate distributions
Refer to caption
Figure 10: Boxplots of prediction interval width, under strongly heterogeneous covariate distributions

B.3 Local coverage over covariate values and scatterplots of federated weights

In this section, we provide the conditional coverage and federated weights plots.

Figure 11 shows the plots of local coverage of the constructed prediction intervals over a grid of X[0,1]𝑋01X\in[0,1]italic_X ∈ [ 0 , 1 ], where the sample size is set to be nk=3000subscript𝑛𝑘3000n_{k}=3000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3000 for all sites. We used the smoothing method and published R code by Lei et al. (2018) for these plots. We can see that under homoscedasticity, the local coverage is constant (a horizontal line) over the covariate values by a given conformal score. Most of these horizontal lines are close to 0.90.90.90.9, except for the pooled sample. The three federated weights consistently performed well under homoscedasticity. Furthermore, under heteroscedasticity, we can see the local coverage when the value of X𝑋Xitalic_X is too small always deviates from the nominal level by all methods and conformal scores, which makes sense as logx𝑥-\log x\to\infty- roman_log italic_x → ∞ when x0𝑥0x\to 0italic_x → 0. When X𝑋Xitalic_X is sufficiently larger than 00, the local coverage increases. Among the three conformal scores, ASR is the most sensitive one to the change in variance, and does not have coverage close to 0.90.90.90.9 almost everywhere. This confirms findings in Lei et al. (2018). The other two conformal scores are more robust against the heteroscedastic variance. When X[0.1,0.6]𝑋0.10.6X\in[0.1,0.6]italic_X ∈ [ 0.1 , 0.6 ], their local coverages are close to 0.90.90.90.9, except for the pooled sample method.

In addition, Figure 12 shows three federated weights vs. χk2subscriptsuperscript𝜒2𝑘\chi^{2}_{k}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT values using data of nk=3000subscript𝑛𝑘3000n_{k}=3000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3000 and under heteroscedasticity, where we only plotted weights corresponding to χk2[0,0.5]superscriptsubscript𝜒𝑘200.5\chi_{k}^{2}\in[0,0.5]italic_χ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ [ 0 , 0.5 ] for illustration purposes, i.e., some weights corresponding to χk2>0.5superscriptsubscript𝜒𝑘20.5\chi_{k}^{2}>0.5italic_χ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0.5 are not shown. As can be seen from the upper 9 panels when CCOD holds, in every case, all weights are clustered more or less around 0.20.20.20.2. When covariate distributions are heterogeneous, the weights distributions become more complex, but generally when χk2superscriptsubscript𝜒𝑘2\chi_{k}^{2}italic_χ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is smaller, there are larger weights in each panel. Also, there are obviously some larger weights (>0.2absent0.2>0.2> 0.2, i.e., above the red dashed lines) in site 1; about half of the weights are below 0.20.20.20.2 for both sites 2 and 3, and most weights for site 4 are close to 00. Although site 3 has some surprisingly large weights, it also shows a more unstable pattern of weights, which might be a reflection of its heterogeneity to the target site. Overall, the trend of weights fits the expectation of our method: the bigger the difference to the target site, the smaller (or the less stable) the weights.

Refer to caption
Refer to caption
Figure 11: Local coverages, under CCOD is strongly violated and strongly heterogeneous covariate distributions and nk=3000subscript𝑛𝑘3000n_{k}=3000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3000
Refer to caption
Refer to caption
Refer to caption
Figure 12: Weights vs. χk2superscriptsubscript𝜒𝑘2\chi_{k}^{2}italic_χ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT values, using nk=3000subscript𝑛𝑘3000n_{k}=3000italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3000 data under heteroscedasticity. The green points are by Federated I, the orange points are by Federated II (ours), the blue points are by Federated III, and the red dashed lines are for a reference line weights =0.2absent0.2=0.2= 0.2.
Refer to caption
Figure 13: Comparison of coverage probabilities and average interval width when modifying the propensity score of observing the outcome between (0.4,0.6)0.40.6(0.4,0.6)( 0.4 , 0.6 ) (panel (a)) and (0.1,0.9)0.10.9(0.1,0.9)( 0.1 , 0.9 ) (panel (b)).