Search | arXiv e-print repository

Data-Driven Switchback Experiments: Theoretical Tradeoffs and Empirical Bayes Designs

Authors: Ruoxuan Xiong, Alex Chin, Sean J. Taylor

Abstract: We study the design and analysis of switchback experiments conducted on a single aggregate unit. The design problem is to partition the continuous time space into intervals and switch treatments between intervals, in order to minimize the estimation error of the treatment effect. We show that the estimation error depends on four factors: carryover effects, periodicity, serially correlated outcomes… ▽ More We study the design and analysis of switchback experiments conducted on a single aggregate unit. The design problem is to partition the continuous time space into intervals and switch treatments between intervals, in order to minimize the estimation error of the treatment effect. We show that the estimation error depends on four factors: carryover effects, periodicity, serially correlated outcomes, and impacts from simultaneous experiments. We derive a rigorous bias-variance decomposition and show the tradeoffs of the estimation error from these factors. The decomposition provides three new insights in choosing a design: First, balancing the periodicity between treated and control intervals reduces the variance; second, switching less frequently reduces the bias from carryover effects while increasing the variance from correlated outcomes, and vice versa; third, randomizing interval start and end points reduces both bias and variance from simultaneous experiments. Combining these insights, we propose a new empirical Bayes design approach. This approach uses prior data and experiments for designing future experiments. We illustrate this approach using real data from a ride-sharing platform, yielding a design that reduces MSE by 33% compared to the status quo design used on the platform. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.00179 [pdf, other]

A Bayesian joint longitudinal-survival model with a latent stochastic process for intensive longitudinal data

Authors: Madeline R. Abbott, Walter H. Dempsey, Inbal Nahum-Shani, Lindsey N. Potter, David W. Wetter, Cho Y. Lam, Jeremy M. G. Taylor

Abstract: The availability of mobile health (mHealth) technology has enabled increased collection of intensive longitudinal data (ILD). ILD have potential to capture rapid fluctuations in outcomes that may be associated with changes in the risk of an event. However, existing methods for jointly modeling longitudinal and event-time outcomes are not well-equipped to handle ILD due to the high computational co… ▽ More The availability of mobile health (mHealth) technology has enabled increased collection of intensive longitudinal data (ILD). ILD have potential to capture rapid fluctuations in outcomes that may be associated with changes in the risk of an event. However, existing methods for jointly modeling longitudinal and event-time outcomes are not well-equipped to handle ILD due to the high computational cost. We propose a joint longitudinal and time-to-event model suitable for analyzing ILD. In this model, we summarize a multivariate longitudinal outcome as a smaller number of time-varying latent factors. These latent factors, which are modeled using an Ornstein-Uhlenbeck stochastic process, capture the risk of a time-to-event outcome in a parametric hazard model. We take a Bayesian approach to fit our joint model and conduct simulations to assess its performance. We use it to analyze data from an mHealth study of smoking cessation. We summarize the longitudinal self-reported intensity of nine emotions as the psychological states of positive and negative affect. These time-varying latent states capture the risk of the first smoking lapse after attempted quit. Understanding factors associated with smoking lapse is of keen interest to smoking cessation researchers. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: Main text is 32 pages with 6 figures. Supplementary material is 21 pages

arXiv:2401.12911 [pdf, other]

Pretraining and the Lasso

Authors: Erin Craig, Mert Pilanci, Thomas Le Menestrel, Balasubramanian Narasimhan, Manuel Rivas, Roozbeh Dehghannasiri, Julia Salzman, Jonathan Taylor, Robert Tibshirani

Abstract: Pretraining is a popular and powerful paradigm in machine learning. As an example, suppose one has a modest-sized dataset of images of cats and dogs, and plans to fit a deep neural network to classify them from the pixel features. With pretraining, we start with a neural network trained on a large corpus of images, consisting of not just cats and dogs but hundreds of other image types. Then we fix… ▽ More Pretraining is a popular and powerful paradigm in machine learning. As an example, suppose one has a modest-sized dataset of images of cats and dogs, and plans to fit a deep neural network to classify them from the pixel features. With pretraining, we start with a neural network trained on a large corpus of images, consisting of not just cats and dogs but hundreds of other image types. Then we fix all of the network weights except for the top layer (which makes the final classification) and train (or "fine tune") those weights on our dataset. This often results in dramatically better performance than the network trained solely on our smaller dataset. In this paper, we ask the question "Can pretraining help the lasso?". We develop a framework for the lasso in which an overall model is fit to a large set of data, and then fine-tuned to a specific task on a smaller dataset. This latter dataset can be a subset of the original dataset, but does not need to be. We find that this framework has a wide variety of applications, including stratified models, multinomial targets, multi-response models, conditional average treatment estimation and even gradient boosting. In the stratified model setting, the pretrained lasso pipeline estimates the coefficients common to all groups at the first stage, and then group specific coefficients at the second "fine-tuning" stage. We show that under appropriate assumptions, the support recovery rate of the common coefficients is superior to that of the usual lasso trained only on individual groups. This separate identification of common and individual coefficients can also be useful for scientific understanding. △ Less

Submitted 18 April, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2310.10740 [pdf, other]

Unbiased Estimation of Structured Prediction Error

Authors: Kevin Fry, Jonathan E. Taylor

Abstract: Many modern datasets, such as those in ecology and geology, are composed of samples with spatial structure and dependence. With such data violating the usual independent and identically distributed (IID) assumption in machine learning and classical statistics, it is unclear a priori how one should measure the performance and generalization of models. Several authors have empirically investigated c… ▽ More Many modern datasets, such as those in ecology and geology, are composed of samples with spatial structure and dependence. With such data violating the usual independent and identically distributed (IID) assumption in machine learning and classical statistics, it is unclear a priori how one should measure the performance and generalization of models. Several authors have empirically investigated cross-validation (CV) methods in this setting, reaching mixed conclusions. We provide a class of unbiased estimation methods for general quadratic errors, correlated Gaussian response, and arbitrary prediction function $g$, for a noise-elevated version of the error. Our approach generalizes the coupled bootstrap (CB) from the normal means problem to general normal data, allowing correlation both within and between the training and test sets. CB relies on creating bootstrap samples that are intelligently decoupled, in the sense of being statistically independent. Specifically, the key to CB lies in generating two independent "views" of our data and using them as stand-ins for the usual independent training and test samples. Beginning with Mallows' $C_p$, we generalize the estimator to develop our generalized $C_p$ estimators (GC). We show at under only a moment condition on $g$, this noise-elevated error estimate converges smoothly to the noiseless error estimate. We show that when Stein's unbiased risk estimator (SURE) applies, GC converges to SURE as in the normal means problem. Further, we use these same tools to analyze CV and provide some theoretical analysis to help understand when CV will provide good estimates of error. Simulations align with our theoretical results, demonstrating the effectiveness of GC and illustrating the behavior of CV methods. Lastly, we apply our estimator to a model selection task on geothermal data in Nevada. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 28 pages, 13 figures

arXiv:2309.11472 [pdf, other]

Optimizing Dynamic Predictions from Joint Models using Super Learning

Authors: Dimitris Rizopoulos, Jeremy M. G. Taylor

Abstract: Joint models for longitudinal and time-to-event data are often employed to calculate dynamic individualized predictions used in numerous applications of precision medicine. Two components of joint models that influence the accuracy of these predictions are the shape of the longitudinal trajectories and the functional form linking the longitudinal outcome history to the hazard of the event. Finding… ▽ More Joint models for longitudinal and time-to-event data are often employed to calculate dynamic individualized predictions used in numerous applications of precision medicine. Two components of joint models that influence the accuracy of these predictions are the shape of the longitudinal trajectories and the functional form linking the longitudinal outcome history to the hazard of the event. Finding a single well-specified model that produces accurate predictions for all subjects and follow-up times can be challenging, especially when considering multiple longitudinal outcomes. In this work, we use the concept of super learning and avoid selecting a single model. In particular, we specify a weighted combination of the dynamic predictions calculated from a library of joint models with different specifications. The weights are selected to optimize a predictive accuracy metric using V-fold cross-validation. We use as predictive accuracy measures the expected quadratic prediction error and the expected predictive cross-entropy. In a simulation study, we found that the super learning approach produces results very similar to the Oracle model, which was the model with the best performance in the test datasets. All proposed methodology is implemented in the freely available R package JMbayes2. △ Less

Submitted 1 December, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.07435 [pdf, other]

Uncertainty Intervals for Prediction Errors in Time Series Forecasting

Authors: Hui Xu, Song Mei, Stephen Bates, Jonathan Taylor, Robert Tibshirani

Abstract: Inference for prediction errors is critical in time series forecasting pipelines. However, providing statistically meaningful uncertainty intervals for prediction errors remains relatively under-explored. Practitioners often resort to forward cross-validation (FCV) for obtaining point estimators and constructing confidence intervals based on the Central Limit Theorem (CLT). The naive version assum… ▽ More Inference for prediction errors is critical in time series forecasting pipelines. However, providing statistically meaningful uncertainty intervals for prediction errors remains relatively under-explored. Practitioners often resort to forward cross-validation (FCV) for obtaining point estimators and constructing confidence intervals based on the Central Limit Theorem (CLT). The naive version assumes independence, a condition that is usually invalid due to time correlation. These approaches lack statistical interpretations and theoretical justifications even under stationarity. This paper systematically investigates uncertainty intervals for prediction errors in time series forecasting. We first distinguish two key inferential targets: the stochastic test error over near future data points, and the expected test error as the expectation of the former. The stochastic test error is often more relevant in applications needing to quantify uncertainty over individual time series instances. To construct prediction intervals for the stochastic test error, we propose the quantile-based forward cross-validation (QFCV) method. Under an ergodicity assumption, QFCV intervals have asymptotically valid coverage and are shorter than marginal empirical quantiles. In addition, we also illustrate why naive CLT-based FCV intervals fail to provide valid uncertainty intervals, even with certain corrections. For non-stationary time series, we further provide rolling intervals by combining QFCV with adaptive conformal prediction to give time-average coverage guarantees. Overall, we advocate the use of QFCV procedures and demonstrate their coverage and efficiency through simulations and real data examples. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 35 pages, 17 figures

arXiv:2309.02115 [pdf, ps, other]

Using Joint Models for Longitudinal and Time-to-Event Data to Investigate the Causal Effect of Salvage Therapy after Prostatectomy

Authors: Dimitris Rizopoulos, Jeremy M. G. Taylor, Grigorios Papageorgiou, Todd M. Morgan

Abstract: Prostate cancer patients who undergo prostatectomy are closely monitored for recurrence and metastasis using routine prostate-specific antigen (PSA) measurements. When PSA levels rise, salvage therapies are recommended to decrease the risk of metastasis. However, due to the side effects of these therapies and to avoid over-treatment, it is important to understand which patients and when to initiat… ▽ More Prostate cancer patients who undergo prostatectomy are closely monitored for recurrence and metastasis using routine prostate-specific antigen (PSA) measurements. When PSA levels rise, salvage therapies are recommended to decrease the risk of metastasis. However, due to the side effects of these therapies and to avoid over-treatment, it is important to understand which patients and when to initiate these salvage therapies. In this work, we use the University of Michigan Prostatectomy registry Data to tackle this question. Due to the observational nature of this data, we face the challenge that PSA is simultaneously a time-varying confounder and an intermediate variable for salvage therapy. We define different causal salvage therapy effects defined conditionally on different specifications of the longitudinal PSA history. We then illustrate how these effects can be estimated using the framework of joint models for longitudinal and time-to-event data. All proposed methodology is implemented in the freely-available R package JMbayes2. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2307.15681 [pdf, other]

A Continuous-Time Dynamic Factor Model for Intensive Longitudinal Data Arising from Mobile Health Studies

Authors: Madeline R. Abbott, Walter H. Dempsey, Inbal Nahum-Shani, Cho Y. Lam, David W. Wetter, Jeremy M. G. Taylor

Abstract: Intensive longitudinal data (ILD) collected in mobile health (mHealth) studies contain rich information on multiple outcomes measured frequently over time that have the potential to capture short-term and long-term dynamics. Motivated by an mHealth study of smoking cessation in which participants self-report the intensity of many emotions multiple times per day, we describe a dynamic factor model… ▽ More Intensive longitudinal data (ILD) collected in mobile health (mHealth) studies contain rich information on multiple outcomes measured frequently over time that have the potential to capture short-term and long-term dynamics. Motivated by an mHealth study of smoking cessation in which participants self-report the intensity of many emotions multiple times per day, we describe a dynamic factor model that summarizes the ILD as a low-dimensional, interpretable latent process. This model consists of two submodels: (i) a measurement submodel--a factor model--that summarizes the multivariate longitudinal outcome as lower-dimensional latent variables and (ii) a structural submodel--an Ornstein-Uhlenbeck (OU) stochastic process--that captures the temporal dynamics of the multivariate latent process in continuous time. We derive a closed-form likelihood for the marginal distribution of the outcome and the computationally-simpler sparse precision matrix for the OU process. We propose a block coordinate descent algorithm for estimation. Finally, we apply our method to the mHealth data to summarize the dynamics of 18 different emotions as two latent processes. These latent processes are interpreted by behavioral scientists as the psychological constructs of positive and negative affect and are key in understanding vulnerability to lapsing back to tobacco use among smokers attempting to quit. △ Less

Submitted 20 February, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: Main text is 20 pages with 5 figures and 1 table. Supplementary material is 26 pages

arXiv:2306.04675 [pdf, other]

Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models

Authors: George Stein, Jesse C. Cresswell, Rasa Hosseinzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Zhaoyan Liu, Anthony L. Caterini, J. Eric T. Taylor, Gabriel Loaiza-Ganem

Abstract: We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to evaluate them. Using best practices in psychophysics, we measure human perception of image realism for generated samples by conducting the largest experiment evaluating generative models to date, and find that no existing metr… ▽ More We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to evaluate them. Using best practices in psychophysics, we measure human perception of image realism for generated samples by conducting the largest experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations. Comparing to 17 modern metrics for evaluating the overall performance, fidelity, diversity, rarity, and memorization of generative models, we find that the state-of-the-art perceptual realism of diffusion models as judged by humans is not reflected in commonly reported metrics such as FID. This discrepancy is not explained by diversity in generated samples, though one cause is over-reliance on Inception-V3. We address these flaws through a study of alternative self-supervised feature extractors, find that the semantic information encoded by individual networks strongly depends on their training procedure, and show that DINOv2-ViT-L/14 allows for much richer evaluation of generative models. Next, we investigate data memorization, and find that generative models do memorize training examples on simple, smaller datasets like CIFAR10, but not necessarily on more complex datasets like ImageNet. However, our experiments show that current metrics do not properly detect memorization: none in the literature is able to separate memorization from other phenomena such as underfitting or mode shrinkage. To facilitate further development of generative models and their evaluation we release all generated image datasets, human evaluation data, and a modular library to compute 17 common metrics for 9 different encoders at https://github.com/layer6ai-labs/dgm-eval. △ Less

Submitted 30 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023. 53 pages, 29 figures, 12 tables. Code at https://github.com/layer6ai-labs/dgm-eval, reviews at https://openreview.net/forum?id=08zf7kTOoh

Journal ref: Thirty-seventh Conference on Neural Information Processing Systems (2023)

arXiv:2305.16735 [pdf, other]

Angular Combining of Forecasts of Probability Distributions

Authors: James W. Taylor, Xiaochun Meng

Abstract: When multiple forecasts are available for a probability distribution, forecast combining enables a pragmatic synthesis of the available information to extract the wisdom of the crowd. A linear opinion pool has been widely used, whereby the combining is applied to the probability predictions of the distributional forecasts. However, it has been argued that this will tend to deliver overdispersed di… ▽ More When multiple forecasts are available for a probability distribution, forecast combining enables a pragmatic synthesis of the available information to extract the wisdom of the crowd. A linear opinion pool has been widely used, whereby the combining is applied to the probability predictions of the distributional forecasts. However, it has been argued that this will tend to deliver overdispersed distributional forecasts, prompting the combination to be applied, instead, to the quantile predictions of the distributional forecasts. Results from different applications are mixed, leaving it as an empirical question whether to combine probabilities or quantiles. In this paper, we present an alternative approach. Looking at the distributional forecasts, combining the probability forecasts can be viewed as vertical combining, with quantile forecast combining seen as horizontal combining. Our alternative approach is to allow combining to take place on an angle between the extreme cases of vertical and horizontal combining. We term this angular combining. The angle is a parameter that can be optimized using a proper scoring rule. We show that, as with vertical and horizontal averaging, angular averaging results in a distribution with mean equal to the average of the means of the distributions that are being combined. We also show that angular averaging produces a distribution with lower variance than vertical averaging, and, under certain assumptions, greater variance than horizontal averaging. We provide empirical support for angular combining using weekly distributional forecasts of COVID-19 mortality at the national and state level in the U.S. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 47 pages, 16 figures

MSC Class: 90B50

arXiv:2212.12940 [pdf, other]

Exact Selective Inference with Randomization

Authors: Snigdha Panigrahi, Kevin Fry, Jonathan Taylor

Abstract: We introduce a pivot for exact selective inference with randomization. Not only does our pivot lead to exact inference in Gaussian regression models, but it is also available in closed form. We reduce the problem of exact selective inference to a bivariate truncated Gaussian distribution. By doing so, we give up some power that is achieved with approximate maximum likelihood estimation in Panigrah… ▽ More We introduce a pivot for exact selective inference with randomization. Not only does our pivot lead to exact inference in Gaussian regression models, but it is also available in closed form. We reduce the problem of exact selective inference to a bivariate truncated Gaussian distribution. By doing so, we give up some power that is achieved with approximate maximum likelihood estimation in Panigrahi and Taylor (2022). Yet our pivot always produces narrower confidence intervals than a closely related data splitting procedure. We investigate the trade-off between power and exact selective inference on simulated datasets and an HIV drug resistance dataset. △ Less

Submitted 22 December, 2023; v1 submitted 25 December, 2022; originally announced December 2022.

Comments: 48 pages, 8 Figures, 2 Tables

arXiv:2211.15826 [pdf, other]

Surrogacy Validation for Time-to-Event Outcomes with Illness-Death Frailty Models

Authors: Emily K. Roberts, Michael R. Elliott, Jeremy M. G. Taylor

Abstract: A common practice in clinical trials is to evaluate a treatment effect on an intermediate endpoint when the true outcome of interest would be difficult or costly to measure. We consider how to validate intermediate endpoints in a causally-valid way when the trial outcomes are time-to-event. Using counterfactual outcomes, those that would be observed if the counterfactual treatment had been given,… ▽ More A common practice in clinical trials is to evaluate a treatment effect on an intermediate endpoint when the true outcome of interest would be difficult or costly to measure. We consider how to validate intermediate endpoints in a causally-valid way when the trial outcomes are time-to-event. Using counterfactual outcomes, those that would be observed if the counterfactual treatment had been given, the causal association paradigm assesses the relationship of the treatment effect on the surrogate $S$ with the treatment effect on the true endpoint $T$. In particular, we propose illness death models to accommodate the censored and semi-competing risk structure of survival data. The proposed causal version of these models involves estimable and counterfactual frailty terms. Via these multi-state models, we characterize what a valid surrogate would look like using a causal effect predictiveness plot. We evaluate the estimation properties of a Bayesian method using Markov Chain Monte Carlo and assess the sensitivity of our model assumptions. Our motivating data source is a localized prostate cancer clinical trial where the two survival endpoints are time to distant metastasis and time to death. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2209.00181 [pdf, other]

Understanding the dynamic impact of COVID-19 through competing risk modeling with bivariate varying coefficients

Authors: Wenbo Wu, John D. Kalbfleisch, Jeremy M. G. Taylor, Jian Kang, Kevin He

Abstract: The coronavirus disease 2019 (COVID-19) pandemic has exerted a profound impact on patients with end-stage renal disease relying on kidney dialysis to sustain their lives. Motivated by a request by the U.S. Centers for Medicare & Medicaid Services, our analysis of their postdischarge hospital readmissions and deaths in 2020 revealed that the COVID-19 effect has varied significantly with postdischar… ▽ More The coronavirus disease 2019 (COVID-19) pandemic has exerted a profound impact on patients with end-stage renal disease relying on kidney dialysis to sustain their lives. Motivated by a request by the U.S. Centers for Medicare & Medicaid Services, our analysis of their postdischarge hospital readmissions and deaths in 2020 revealed that the COVID-19 effect has varied significantly with postdischarge time and time since the onset of the pandemic. However, the complex dynamics of the COVID-19 effect trajectories cannot be characterized by existing varying coefficient models. To address this issue, we propose a bivariate varying coefficient model for competing risks within a cause-specific hazard framework, where tensor-product B-splines are used to estimate the surface of the COVID-19 effect. An efficient proximal Newton algorithm is developed to facilitate the fitting of the new model to the massive Medicare data for dialysis patients. Difference-based anisotropic penalization is introduced to mitigate model overfitting and the wiggliness of the estimated trajectories; various cross-validation methods are considered in the determination of optimal tuning parameters. Hypothesis testing procedures are designed to examine whether the COVID-19 effect varies significantly with postdischarge time and the time since pandemic onset, either jointly or separately. Simulation experiments are conducted to evaluate the estimation accuracy, type I error rate, statistical power, and model selection procedures. Applications to Medicare dialysis patients demonstrate the real-world performance of the proposed methods. △ Less

Submitted 31 August, 2022; originally announced September 2022.

Comments: 40 pages, 8 figures, 1 table

arXiv:2203.14504 [pdf, other]

Black-box Selective Inference via Bootstrap**

Authors: Sifan Liu, Jelena Markovic-Voronov, Jonathan Taylor

Abstract: Conditional selective inference requires an exact characterization of the selection event, which is often unavailable except for a few examples like the lasso. This work addresses this challenge by introducing a generic approach to estimate the selection event, facilitating feasible inference conditioned on the selection event. The method proceeds by repeatedly generating bootstrap data and runnin… ▽ More Conditional selective inference requires an exact characterization of the selection event, which is often unavailable except for a few examples like the lasso. This work addresses this challenge by introducing a generic approach to estimate the selection event, facilitating feasible inference conditioned on the selection event. The method proceeds by repeatedly generating bootstrap data and running the selection algorithm on the new datasets. Using the outputs of the selection algorithm, we can estimate the selection probability as a function of certain summary statistics. This leads to an estimate of the distribution of the data conditioned on the selection event, which forms the basis for conditional selective inference. We provide a theoretical guarantee assuming both asymptotic normality of relevant statistics and accurate estimation of the selection probability. The applicability of the proposed method is demonstrated through a variety of problems that lack exact characterizations of selection, where conditional selective inference was previously infeasible. △ Less

Submitted 20 August, 2023; v1 submitted 28 March, 2022; originally announced March 2022.

arXiv:2108.02118 [pdf, other]

The volume-of-tube method for Gaussian random fields with inhomogeneous variance

Authors: Satoshi Kuriki, Akimichi Takemura, Jonathan E. Taylor

Abstract: The tube method or the volume-of-tube method approximates the tail probability of the maximum of a smooth Gaussian random field with zero mean and unit variance. This method evaluates the volume of a spherical tube about the index set, and then transforms it to the tail probability. In this study, we generalize the tube method to a case in which the variance is not constant. We provide the volume… ▽ More The tube method or the volume-of-tube method approximates the tail probability of the maximum of a smooth Gaussian random field with zero mean and unit variance. This method evaluates the volume of a spherical tube about the index set, and then transforms it to the tail probability. In this study, we generalize the tube method to a case in which the variance is not constant. We provide the volume formula for a spherical tube with a non-constant radius in terms of curvature tensors, and the tail probability formula of the maximum of a Gaussian random field with inhomogeneous variance, as well as its Laplace approximation. In particular, the critical radius of the tube is generalized for evaluation of the asymptotic approximation error. As an example, we discuss the approximation of the largest eigenvalue distribution of the Wishart matrix with a non-identity matrix parameter. The Bonferroni method is the tube method when the index set is a finite set. We provide the formula for the asymptotic approximation error for the Bonferroni method when the variance is not constant. △ Less

Submitted 9 September, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: 30 pages, 3 figures

MSC Class: 62H10 (Primary); 60G60 (Secondary)

arXiv:2106.06835 [pdf, other]

doi 10.1111/biom.13852

A synthetic data integration framework to leverage external summary-level information from heterogeneous populations

Authors: Tian Gu, Jeremy M. G. Taylor, Bhramar Mukherjee

Abstract: There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. External information relevant for a risk prediction model may come in multiple forms, through regression coefficient estimates or predicted values of the outcome variable. Different external models may use different sets of predictors an… ▽ More There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. External information relevant for a risk prediction model may come in multiple forms, through regression coefficient estimates or predicted values of the outcome variable. Different external models may use different sets of predictors and the algorithm they used to predict the outcome Y given these predictors may or may not be known. The underlying populations corresponding to each external model may be different from each other and from the internal study population. Motivated by a prostate cancer risk prediction problem where novel biomarkers are measured only in the internal study, this paper proposes an imputation-based methodology where the goal is to fit a target regression model with all available predictors in the internal study while utilizing summary information from external models that may have used only a subset of the predictors. The method allows for heterogeneity of covariate effects across the external populations. The proposed approach generates synthetic outcome data in each external population, uses stacked multiple imputation technique to create a long dataset with complete covariate information. The final analysis of the stacked imputed data is conducted by weighted regression. This flexible and unified approach can improve statistical efficiency of the estimated coefficients in the internal study, improve predictions by utilizing even partial information available from models that use a subset of the full set of covariates used in the internal study, and provide statistical inference for the external population with potentially different covariate effects from the internal population. △ Less

Submitted 1 June, 2022; v1 submitted 12 June, 2021; originally announced June 2021.

arXiv:2104.12947 [pdf, other]

Incorporating baseline covariates to validate surrogate endpoints with a constant biomarker under control arm

Authors: Emily Roberts, Michael Elliott, Jeremy M. G. Taylor

Abstract: A surrogate endpoint S in a clinical trial is an outcome that may be measured earlier or more easily than the true outcome of interest T. In this work, we extend causal inference approaches to validate such a surrogate using potential outcomes. The causal association paradigm assesses the relationship of the treatment effect on the surrogate with the treatment effect on the true endpoint. Using th… ▽ More A surrogate endpoint S in a clinical trial is an outcome that may be measured earlier or more easily than the true outcome of interest T. In this work, we extend causal inference approaches to validate such a surrogate using potential outcomes. The causal association paradigm assesses the relationship of the treatment effect on the surrogate with the treatment effect on the true endpoint. Using the principal surrogacy criteria, we utilize the joint conditional distribution of the potential outcomes T, given the potential outcomes S. In particular, our setting of interest allows us to assume the surrogate under the placebo, S(0), is zero-valued, and we incorporate baseline covariates in the setting of normally-distributed endpoints. We develop Bayesian methods to incorporate conditional independence and other modeling assumptions and explore their impact on the assessment of surrogacy. We demonstrate our approach via simulation and data that mimics an ongoing study of a muscular dystrophy gene therapy. △ Less

Submitted 2 February, 2022; v1 submitted 26 April, 2021; originally announced April 2021.

arXiv:2103.09577 [pdf, other]

doi 10.1007/s42979-021-00921-0

Theoretical bounds on data requirements for the ray-based classification

Authors: Brian J. Weber, Sandesh S. Kalantre, Thomas McJunkin, Jacob M. Taylor, Justyna P. Zwolak

Abstract: The problem of classifying high-dimensional shapes in real-world data grows in complexity as the dimension of the space increases. For the case of identifying convex shapes of different geometries, a new classification framework has recently been proposed in which the intersections of a set of one-dimensional representations, called rays, with the boundaries of the shape are used to identify the s… ▽ More The problem of classifying high-dimensional shapes in real-world data grows in complexity as the dimension of the space increases. For the case of identifying convex shapes of different geometries, a new classification framework has recently been proposed in which the intersections of a set of one-dimensional representations, called rays, with the boundaries of the shape are used to identify the specific geometry. This ray-based classification (RBC) has been empirically verified using a synthetic dataset of two- and three-dimensional shapes (Zwolak et al. in Proceedings of Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020), Vancouver, Canada [December 11, 2020], arXiv:2010.00500, 2020) and, more recently, has also been validated experimentally (Zwolak et al., PRX Quantum 2:020335, 2021). Here, we establish a bound on the number of rays necessary for shape classification, defined by key angular metrics, for arbitrary convex shapes. For two dimensions, we derive a lower bound on the number of rays in terms of the shape's length, diameter, and exterior angles. For convex polytopes in $\mathbb{R}^N$, we generalize this result to a similar bound given as a function of the dihedral angle and the geometrical parameters of polygonal faces. This result enables a different approach for estimating high-dimensional shapes using substantially fewer data elements than volumetric or surface-based approaches. △ Less

Submitted 26 February, 2022; v1 submitted 17 March, 2021; originally announced March 2021.

Comments: 10 pages, 5 figures

MSC Class: 68T20; 68Q32; 68U10

Journal ref: SN Comput. Sci. 3, 57 (2022)

arXiv:2103.02033 [pdf, other]

Multiple imputation with missing data indicators

Authors: Lauren J Beesley, Irina Bondarenko, Michael R Elliott, Allison W Kurian, Steven J Katz, Jeremy M G Taylor

Abstract: Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation (SRMI), also called chained equations multiple imputation. In this approach, we impute missing values using regression models for each variable, conditional on the other variables in the data. This approac… ▽ More Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation (SRMI), also called chained equations multiple imputation. In this approach, we impute missing values using regression models for each variable, conditional on the other variables in the data. This approach, however, assumes that the missingness mechanism is missing at random, and it is not well-justified under not-at-random missingness without additional modification. In this paper, we describe how we can generalize the SRMI imputation procedure to handle not-at-random missingness (MNAR) in the setting where missingness may depend on other variables that are also missing. We provide algebraic justification for several generalizations of standard SRMI using Taylor series and other approximations of the target imputation distribution under MNAR. Resulting regression model approximations include indicators for missingness, interactions, or other functions of the MNAR missingness model and observed data. In a simulation study, we demonstrate that the proposed SRMI modifications result in reduced bias in the final analysis compared to standard SRMI, with an approximation strategy involving inclusion of an offset in the imputation model performing the best overall. The method is illustrated in a breast cancer study, where the goal is to estimate the prevalence of a specific genetic pathogenic variant. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: See also: Supplemental Material

arXiv:2101.07954 [pdf, other]

Accounting for not-at-random missingness through imputation stacking

Authors: Lauren J Beesley, Jeremy M G Taylor

Abstract: Not-at-random missingness presents a challenge in addressing missing data in many health research applications. In this paper, we propose a new approach to account for not-at-random missingness after multiple imputation through weighted analysis of stacked multiple imputations. The weights are easily calculated as a function of the imputed data and assumptions about the not-at-random missingness.… ▽ More Not-at-random missingness presents a challenge in addressing missing data in many health research applications. In this paper, we propose a new approach to account for not-at-random missingness after multiple imputation through weighted analysis of stacked multiple imputations. The weights are easily calculated as a function of the imputed data and assumptions about the not-at-random missingness. We demonstrate through simulation that the proposed method has excellent performance when the missingness model is correctly specified. In practice, the missingness mechanism will not be known. We show how we can use our approach in a sensitivity analysis framework to evaluate the robustness of model inference to different assumptions about the missingness mechanism, and we provide R package StackImpute to facilitate implementation as part of routine sensitivity analyses. We apply the proposed method to account for not-at-random missingness in human papillomavirus test results in a study of survival for patients diagnosed with oropharyngeal cancer. △ Less

Submitted 19 January, 2021; originally announced January 2021.

Comments: See also: Supplementary Materials

arXiv:2101.02354 [pdf, other]

Kullback-Leibler-Based Discrete Failure Time Models for Integration of Published Prediction Models with New Time-To-Event Dataset

Authors: Di Wang, Wen Ye, Randall Sung, Hui Jiang, Jeremy M. G. Taylor, Lisa Ly, Kevin He

Abstract: Prediction of time-to-event data often suffers from rare event rates, small sample sizes, high dimensionality and low signal-to-noise ratios. Incorporating published prediction models from large-scale studies is expected to improve the performance of prognosis prediction on internal individual-level time-to-event data. However, existing integration approaches typically assume that underlying distr… ▽ More Prediction of time-to-event data often suffers from rare event rates, small sample sizes, high dimensionality and low signal-to-noise ratios. Incorporating published prediction models from large-scale studies is expected to improve the performance of prognosis prediction on internal individual-level time-to-event data. However, existing integration approaches typically assume that underlying distributions from the external and internal data sources are similar, which is often invalid. To account for challenges including heterogeneity, data sharing, and privacy constraints, we propose a discrete failure time modeling procedure, which utilizes a discrete hazard-based Kullback-Leibler discriminatory information measuring the discrepancy between the published models and the internal dataset. Simulations show the advantage of the proposed method compared with those solely based on the internal data or published models. We apply the proposed method to improve prediction performance on a kidney transplant dataset from a local hospital by integrating this small-scale dataset with published survival models obtained from the national transplant registry. △ Less

Submitted 28 July, 2022; v1 submitted 6 January, 2021; originally announced January 2021.

arXiv:2010.16001 [pdf, other]

Guaranteeing Safety of Learned Perception Modules via Measurement-Robust Control Barrier Functions

Authors: Sarah Dean, Andrew J. Taylor, Ryan K. Cosner, Benjamin Recht, Aaron D. Ames

Abstract: Modern nonlinear control theory seeks to develop feedback controllers that endow systems with properties such as safety and stability. The guarantees ensured by these controllers often rely on accurate estimates of the system state for determining control actions. In practice, measurement model uncertainty can lead to error in state estimates that degrades these guarantees. In this paper, we seek… ▽ More Modern nonlinear control theory seeks to develop feedback controllers that endow systems with properties such as safety and stability. The guarantees ensured by these controllers often rely on accurate estimates of the system state for determining control actions. In practice, measurement model uncertainty can lead to error in state estimates that degrades these guarantees. In this paper, we seek to unify techniques from control theory and machine learning to synthesize controllers that achieve safety in the presence of measurement model uncertainty. We define the notion of a Measurement-Robust Control Barrier Function (MR-CBF) as a tool for determining safe control inputs when facing measurement model uncertainty. Furthermore, MR-CBFs are used to inform sampling methodologies for learning-based perception systems and quantify tolerable error in the resulting learned models. We demonstrate the efficacy of MR-CBFs in achieving safety with measurement model uncertainty on a simulated Segway system. △ Less

Submitted 29 October, 2020; originally announced October 2020.

arXiv:2010.09971 [pdf, other]

doi 10.1093/biostatistics/kxab017

A meta-inference framework to integrate multiple external models into a current study

Authors: Tian Gu, Jeremy M. G. Taylor, Bhramar Mukherjee

Abstract: It is becoming increasingly common for researchers to consider incorporating external information from large studies to improve the accuracy of statistical inference instead of relying on a modestly sized dataset collected internally. With some new predictors only available internally, we aim to build improved regression models based on individual-level data from an "internal" study while incorpor… ▽ More It is becoming increasingly common for researchers to consider incorporating external information from large studies to improve the accuracy of statistical inference instead of relying on a modestly sized dataset collected internally. With some new predictors only available internally, we aim to build improved regression models based on individual-level data from an "internal" study while incorporating summary-level information from "external" models. We propose a meta-analysis framework along with two weighted estimators as the composite of empirical Bayes estimators, which combines the estimates from the different external models. The proposed framework is flexible and robust in the ways that (i) it is capable of incorporating external models that use a slightly different set of covariates; (ii) it can identify the most relevant external information and diminish the influence of information that is less compatible with the internal data; and (iii) it nicely balances the bias-variance trade-off while preserving the most efficiency gain. The proposed estimators are more efficient than the naive analysis of the internal data and other naive combinations of external estimators. △ Less

Submitted 9 April, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

arXiv:2010.00500 [pdf, other]

Ray-based classification framework for high-dimensional data

Authors: Justyna P. Zwolak, Sandesh S. Kalantre, Thomas McJunkin, Brian J. Weber, Jacob M. Taylor

Abstract: While classification of arbitrary structures in high dimensions may require complete quantitative information, for simple geometrical structures, low-dimensional qualitative information about the boundaries defining the structures can suffice. Rather than using dense, multi-dimensional data, we propose a deep neural network (DNN) classification framework that utilizes a minimal collection of one-d… ▽ More While classification of arbitrary structures in high dimensions may require complete quantitative information, for simple geometrical structures, low-dimensional qualitative information about the boundaries defining the structures can suffice. Rather than using dense, multi-dimensional data, we propose a deep neural network (DNN) classification framework that utilizes a minimal collection of one-dimensional representations, called \emph{rays}, to construct the "fingerprint" of the structure(s) based on substantially reduced information. We empirically study this framework using a synthetic dataset of double and triple quantum dot devices and apply it to the classification problem of identifying the device state. We show that the performance of the ray-based classifier is already on par with traditional 2D images for low dimensional systems, while significantly cutting down the data acquisition cost. △ Less

Submitted 26 February, 2022; v1 submitted 1 October, 2020; originally announced October 2020.

Journal ref: Proceedings of the Machine Learning and the Physical Sciences Workshop at NeurIPS 2020, Vancouver, Canada

arXiv:2008.04257 [pdf, other]

Using Multiple Imputation to Classify Potential Outcomes Subgroups

Authors: Yun Li, Irina Bondarenko, Michael R. Elliott, Timothy P. Hofer, Jeremy M. G. Taylor

Abstract: With medical tests becoming increasingly available, concerns about over-testing and over-treatment dramatically increase. Hence, it is important to understand the influence of testing on treatment selection in general practice. Most statistical methods focus on average effects of testing on treatment decisions. However, this may be ill-advised, particularly for patient subgroups that tend not to b… ▽ More With medical tests becoming increasingly available, concerns about over-testing and over-treatment dramatically increase. Hence, it is important to understand the influence of testing on treatment selection in general practice. Most statistical methods focus on average effects of testing on treatment decisions. However, this may be ill-advised, particularly for patient subgroups that tend not to benefit from such tests. Furthermore, missing data are common, representing large and often unaddressed threats to the validity of statistical methods. Finally, it is desirable to conduct analyses that can be interpreted causally. We propose to classify patients into four potential outcomes subgroups, defined by whether or not a patient's treatment selection is changed by the test result and by the direction of how the test result changes treatment selection. This subgroup classification naturally captures the differential influence of medical testing on treatment selections for different patients, which can suggest targets to improve the utilization of medical tests. We can then examine patient characteristics associated with patient potential outcomes subgroup memberships. We used multiple imputation methods to simultaneously impute the missing potential outcomes as well as regular missing values. This approach can also provide estimates of many traditional causal quantities. We find that explicitly incorporating causal inference assumptions into the multiple imputation process can improve the precision for some causal estimates of interest. We also find that bias can occur when the potential outcomes conditional independence assumption is violated; sensitivity analyses are proposed to assess the impact of this violation. We applied the proposed methods to examine the influence of 21-gene assay, the most commonly used genomic test, on chemotherapy selection among breast cancer patients. △ Less

Submitted 10 August, 2020; originally announced August 2020.

arXiv:2007.12158 [pdf, other]

Signal Enhancement for Magnetic Navigation Challenge Problem

Authors: Albert R. Gnadt, Joseph Belarge, Aaron Canciani, Glenn Carl, Lauren Conger, Joseph Curro, Alan Edelman, Peter Morales, Aaron P. Nielsen, Michael F. O'Keeffe, Christopher V. Rackauckas, Jonathan Taylor, Allan B. Wollaber

Abstract: Harnessing the magnetic field of the Earth for navigation has shown promise as a viable alternative to other navigation systems. A magnetic navigation system collects its own magnetic field data using a magnetometer and uses magnetic anomaly maps to determine the current location. The greatest challenge with magnetic navigation arises when the magnetic field measurements from the magnetometer enco… ▽ More Harnessing the magnetic field of the Earth for navigation has shown promise as a viable alternative to other navigation systems. A magnetic navigation system collects its own magnetic field data using a magnetometer and uses magnetic anomaly maps to determine the current location. The greatest challenge with magnetic navigation arises when the magnetic field measurements from the magnetometer encompass the magnetic field from not just the Earth, but also from the vehicle on which it is mounted. It is difficult to separate the Earth magnetic anomaly field, which is crucial for navigation, from the total magnetic field reading from the sensor. The purpose of this challenge problem is to decouple the Earth and aircraft magnetic signals in order to derive a clean signal from which to perform magnetic navigation. Baseline testing on the dataset has shown that the Earth magnetic field can be extracted from the total magnetic field using machine learning (ML). The challenge is to remove the aircraft magnetic field from the total magnetic field using a trained model. This challenge offers an opportunity to construct an effective model for removing the aircraft magnetic field from the dataset by using a scientific machine learning (SciML) approach comprised of an ML algorithm integrated with the physics of magnetic navigation. △ Less

Submitted 6 January, 2023; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: 12 pages, 2 figures. See https://github.com/MIT-AI-Accelerator/MagNav.jl for accompanying data and code

arXiv:2007.11103 [pdf]

A Comparison of Aggregation Methods for Probabilistic Forecasts of COVID-19 Mortality in the United States

Authors: Kathryn S. Taylor, James W. Taylor

Abstract: The COVID-19 pandemic has placed forecasting models at the forefront of health policy making. Predictions of mortality and hospitalization help governments meet planning and resource allocation challenges. In this paper, we consider the weekly forecasting of the cumulative mortality due to COVID-19 at the national and state level in the U.S. Optimal decision-making requires a forecast of a probabi… ▽ More The COVID-19 pandemic has placed forecasting models at the forefront of health policy making. Predictions of mortality and hospitalization help governments meet planning and resource allocation challenges. In this paper, we consider the weekly forecasting of the cumulative mortality due to COVID-19 at the national and state level in the U.S. Optimal decision-making requires a forecast of a probability distribution, rather than just a single point forecast. Interval forecasts are also important, as they can support decision making and provide situational awareness. We consider the case where probabilistic forecasts have been provided by multiple forecasting teams, and we aggregate the forecasts to extract the wisdom of the crowd. With only limited information available regarding the historical accuracy of the forecasting teams, we consider aggregation (i.e. combining) methods that do not rely on a record of past accuracy. In this empirical paper, we evaluate the accuracy of aggregation methods that have been previously proposed for interval forecasts and predictions of probability distributions. These include the use of the simple average, the median, and trimming methods, which enable robust estimation and allow the aggregate forecast to reduce the impact of a tendency for the forecasting teams to be under- or overconfident. We use data that has been made publicly available from the COVID-19 Forecast Hub. While the simple average performed well for the high mortality series, we obtained greater accuracy using the median and certain trimming methods for the low and medium mortality series. It will be interesting to see if this remains the case as the pandemic evolves. △ Less

Submitted 20 August, 2020; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: 32 pages, 11 figures, 5 tables

arXiv:2006.10256 [pdf, other]

doi 10.1038/s41586-020-2649-2

Array Programming with NumPy

Authors: Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke , et al. (1 additional authors not shown)

Abstract: Array programming provides a powerful, compact, expressive syntax for accessing, manipulating, and operating on data in vectors, matrices, and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It plays an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, material sci… ▽ More Array programming provides a powerful, compact, expressive syntax for accessing, manipulating, and operating on data in vectors, matrices, and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It plays an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, material science, engineering, finance, and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves and the first imaging of a black hole. Here we show how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring, and analyzing scientific data. NumPy is the foundation upon which the entire scientific Python universe is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Because of its central position in the ecosystem, NumPy increasingly plays the role of an interoperability layer between these new array computation libraries. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Journal ref: Nature 585, 357 (2020)

arXiv:2006.00335 [pdf]

doi 10.13140/RG.2.2.35079.01443

Probabilistic Forecasting of Patient Waiting Times in an Emergency Department

Authors: Siddharth Arora, James W. Taylor, Ho-Yin Mak

Abstract: We study the estimation of the probability distribution of individual patient waiting times in an emergency department (ED). Our feature-rich modelling allows for dynamic updating and refinement of waiting time estimates as patient- and ED-specific information (e.g., patient condition, ED congestion levels) is revealed during the waiting process. Aspects relating to communicating forecast uncertai… ▽ More We study the estimation of the probability distribution of individual patient waiting times in an emergency department (ED). Our feature-rich modelling allows for dynamic updating and refinement of waiting time estimates as patient- and ED-specific information (e.g., patient condition, ED congestion levels) is revealed during the waiting process. Aspects relating to communicating forecast uncertainty to patients, and implementing this methodology in practice, are also discussed. △ Less

Submitted 30 May, 2020; originally announced June 2020.

arXiv:2005.13271 [pdf, other]

Analysis of time-to-event for observational studies: Guidance to the use of intensity models

Authors: Per Kragh Andersen, Maja Pohar Perme, Hans C van Houwelingen, Richard J Cook, Pierre Joly, Torben Martinussen, Jeremy MG Taylor, Michal Abrahamowicz, Terry M Therneau

Abstract: This paper provides guidance for researchers with some mathematical background on the conduct of time-to-event analysis in observational studies based on intensity (hazard) models. Discussions of basic concepts like time axis, event definition and censoring are given. Hazard models are introduced, with special emphasis on the Cox proportional hazards regression model. We provide check lists that m… ▽ More This paper provides guidance for researchers with some mathematical background on the conduct of time-to-event analysis in observational studies based on intensity (hazard) models. Discussions of basic concepts like time axis, event definition and censoring are given. Hazard models are introduced, with special emphasis on the Cox proportional hazards regression model. We provide check lists that may be useful both when fitting the model and assessing its goodness of fit and when interpreting the results. Special attention is paid to how to avoid problems with immortal time bias by introducing time-dependent covariates. We discuss prediction based on hazard models and difficulties when attempting to draw proper causal conclusions from such models. Finally, we present a series of examples where the methods and check lists are exemplified. Computational details and implementation using the freely available R software are documented in Supplementary Material. The paper was prepared as part of the STRATOS initiative. △ Less

Submitted 28 May, 2020; v1 submitted 27 May, 2020; originally announced May 2020.

Comments: 28 pages, 12 figures. For associated Supplementary material, see http://publicifsv.sund.ku.dk/~pka/STRATOSTG8/

arXiv:2003.06723 [pdf, other]

Inferring Treatment Effects After Testing Instrument Strength in Linear Models

Authors: Nan Bi, Hyunseung Kang, Jonathan Taylor

Abstract: A common practice in IV studies is to check for instrument strength, i.e. its association to the treatment, with an F-test from regression. If the F-statistic is above some threshold, usually 10, the instrument is deemed to satisfy one of the three core IV assumptions and used to test for the treatment effect. However, in many cases, the inference on the treatment effect does not take into account… ▽ More A common practice in IV studies is to check for instrument strength, i.e. its association to the treatment, with an F-test from regression. If the F-statistic is above some threshold, usually 10, the instrument is deemed to satisfy one of the three core IV assumptions and used to test for the treatment effect. However, in many cases, the inference on the treatment effect does not take into account the strength test done a priori. In this paper, we show that not accounting for this pretest can severely distort the distribution of the test statistic and propose a method to correct this distortion, producing valid inference. A key insight in our method is to frame the F-test as a randomized convex optimization problem and to leverage recent methods in selective inference. We prove that our method provides conditional and marginal Type I error control. We also extend our method to weak instrument settings. We conclude with a reanalysis of studies concerning the effect of education on earning where we show that not accounting for pre-testing can dramatically alter the original conclusion about education's effects. △ Less

Submitted 14 March, 2020; originally announced March 2020.

Comments: 24 pages, 3 figures

arXiv:2002.09578 [pdf, other]

Scores for Multivariate Distributions and Level Sets

Authors: Xiaochun Meng, James W. Taylor, Souhaib Ben Taieb, Siran Li

Abstract: Forecasts of multivariate probability distributions are required for a variety of applications. Scoring rules enable the evaluation of forecast accuracy, and comparison between forecasting methods. We propose a theoretical framework for scoring rules for multivariate distributions, which encompasses the existing quadratic score and multivariate continuous ranked probability score. We demonstrate h… ▽ More Forecasts of multivariate probability distributions are required for a variety of applications. Scoring rules enable the evaluation of forecast accuracy, and comparison between forecasting methods. We propose a theoretical framework for scoring rules for multivariate distributions, which encompasses the existing quadratic score and multivariate continuous ranked probability score. We demonstrate how this framework can be used to generate new scoring rules. In some multivariate contexts, it is a forecast of a level set that is needed, such as a density level set for anomaly detection or the level set of the cumulative distribution as a measure of risk. This motivates consideration of scoring functions for such level sets. For univariate distributions, it is well-established that the continuous ranked probability score can be expressed as the integral over a quantile score. We show that, in a similar way, scoring rules for multivariate distributions can be decomposed to obtain scoring functions for level sets. Using this, we present scoring functions for different types of level set, including density level sets and level sets for cumulative distributions. To compute the scores, we propose a simple numerical algorithm. We perform a simulation study to support our proposals, and we use real data to illustrate usefulness for forecast combining and CoVaR estimation. △ Less

Submitted 21 June, 2023; v1 submitted 21 February, 2020; originally announced February 2020.

arXiv:1911.03985 [pdf, other]

Inference After Selecting Plausibly Valid Instruments with Application to Mendelian Randomization

Authors: Nan Bi, Hyunseung Kang, Jonathan Taylor

Abstract: Mendelian randomization (MR) is a popular method in genetic epidemiology to estimate the effect of an exposure on an outcome by using genetic instruments. These instruments are often selected from a combination of prior knowledge from genome wide association studies (GWAS) and data-driven instrument selection procedures or tests. Unfortunately, when testing for the exposure effect, the instrument… ▽ More Mendelian randomization (MR) is a popular method in genetic epidemiology to estimate the effect of an exposure on an outcome by using genetic instruments. These instruments are often selected from a combination of prior knowledge from genome wide association studies (GWAS) and data-driven instrument selection procedures or tests. Unfortunately, when testing for the exposure effect, the instrument selection process done a priori is not accounted for. This paper studies and highlights the bias resulting from not accounting for the instrument selection process by focusing on a recent data-driven instrument selection procedure, sisVIVE, as an example. We introduce a conditional inference approach that conditions on the instrument selection done a priori and leverage recent advances in selective inference to derive conditional null distributions of popular test statistics for the exposure effect in MR. The null distributions can be characterized with individual-level or summary-level data in MR. We show that our conditional confidence intervals derived from conditional null distributions attain the desired nominal level while typical confidence intervals computed in MR do not. We conclude by reanalyzing the effect of BMI on diastolic blood pressure using summary-level data from the UKBiobank that accounts for instrument selection. △ Less

Submitted 10 November, 2019; originally announced November 2019.

arXiv:1911.00515 [pdf, ps, other]

doi 10.1016/j.media.2020.101714

The reliability of a deep learning model in clinical out-of-distribution MRI data: a multicohort study

Authors: Gustav Mårtensson, Daniel Ferreira, Tobias Granberg, Lena Cavallin, Ketil Oppedal, Alessandro Padovani, Irena Rektorova, Laura Bonanni, Matteo Pardini, Milica Kramberger, John-Paul Taylor, Jakub Hort, Jón Snædal, Jaime Kulisevsky, Frederic Blanc, Angelo Antonini, Patrizia Mecocci, Bruno Vellas, Magda Tsolaki, Iwona Kłoszewska, Hilkka Soininen, Simon Lovestone, Andrew Simmons, Dag Aarsland, Eric Westman

Abstract: Deep learning (DL) methods have in recent years yielded impressive results in medical imaging, with the potential to function as clinical aid to radiologists. However, DL models in medical imaging are often trained on public research cohorts with images acquired with a single scanner or with strict protocol harmonization, which is not representative of a clinical setting. The aim of this study was… ▽ More Deep learning (DL) methods have in recent years yielded impressive results in medical imaging, with the potential to function as clinical aid to radiologists. However, DL models in medical imaging are often trained on public research cohorts with images acquired with a single scanner or with strict protocol harmonization, which is not representative of a clinical setting. The aim of this study was to investigate how well a DL model performs in unseen clinical data sets---collected with different scanners, protocols and disease populations---and whether more heterogeneous training data improves generalization. In total, 3117 MRI scans of brains from multiple dementia research cohorts and memory clinics, that had been visually rated by a neuroradiologist according to Scheltens' scale of medial temporal atrophy (MTA), were included in this study. By training multiple versions of a convolutional neural network on different subsets of this data to predict MTA ratings, we assessed the impact of including images from a wider distribution during training had on performance in external memory clinic data. Our results showed that our model generalized well to data sets acquired with similar protocols as the training data, but substantially worse in clinical cohorts with visibly different tissue contrasts in the images. This implies that future DL studies investigating performance in out-of-distribution (OOD) MRI data need to assess multiple external cohorts for reliable results. Further, by including data from a wider range of scanners and protocols the performance improved in OOD data, which suggests that more heterogeneous training data makes the model generalize better. To conclude, this is the most comprehensive study to date investigating the domain shift in deep learning on MRI data, and we advocate rigorous evaluation of DL models on clinical data prior to being certified for deployment. △ Less

Submitted 1 November, 2019; originally announced November 2019.

Comments: 11 pages, 3 figures

arXiv:1910.04625 [pdf, other]

A stacked approach for chained equations multiple imputation incorporating the substantive model

Authors: Lauren Beesley, Jeremy M G Taylor

Abstract: Multiple imputation by chained equations (MICE) has emerged as a popular approach for handling missing data. A central challenge for applying MICE is determining how to incorporate outcome information into covariate imputation models, particularly for complicated outcomes. Often, we have a particular analysis model in mind, and we would like to ensure congeniality between the imputation and analys… ▽ More Multiple imputation by chained equations (MICE) has emerged as a popular approach for handling missing data. A central challenge for applying MICE is determining how to incorporate outcome information into covariate imputation models, particularly for complicated outcomes. Often, we have a particular analysis model in mind, and we would like to ensure congeniality between the imputation and analysis models. We propose a novel strategy for directly incorporating the analysis model into the handling of missing data. In our proposed approach, multiple imputations of missing covariates are obtained without using outcome information. We then utilize the strategy of imputation stacking, where multiple imputations are stacked on top of each other to create a large dataset. The analysis model is then incorporated through weights. Instead of applying multiple imputation combining rules, we obtain parameter estimates by fitting a weighted version of the analysis model on the stacked dataset. We propose a novel estimator for obtaining standard errors for this stacked and weighted analysis. Our estimator is based on the observed data information principle in Louis (1982) and can be applied for analyzing stacked multiple imputations more generally. Our approach for analyzing stacked multiple imputations is the first well-motivated method that can be easily applied for a wide variety of standard analysis models and missing data settings. In simulations, the proposed strategy produced unbiased parameter estimates when the analysis model was correctly specified. We developed an R package, StackImpute, allowing this imputation approach to be easily implemented for many standard analysis models. △ Less

Submitted 10 October, 2019; originally announced October 2019.

arXiv:1905.07357 [pdf, other]

Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces

Authors: Philipp Becker, Harit Pandya, Gregor Gebhardt, Cheng Zhao, James Taylor, Gerhard Neumann

Abstract: In order to integrate uncertainty estimates into deep time-series modelling, Kalman Filters (KFs) (Kalman et al., 1960) have been integrated with deep learning models, however, such approaches typically rely on approximate inference techniques such as variational inference which makes learning more complex and often less scalable due to approximation errors. We propose a new deep approach to Kalma… ▽ More In order to integrate uncertainty estimates into deep time-series modelling, Kalman Filters (KFs) (Kalman et al., 1960) have been integrated with deep learning models, however, such approaches typically rely on approximate inference techniques such as variational inference which makes learning more complex and often less scalable due to approximation errors. We propose a new deep approach to Kalman filtering which can be learned directly in an end-to-end manner using backpropagation without additional approximations. Our approach uses a high-dimensional factorized latent state representation for which the Kalman updates simplify to scalar operations and thus avoids hard to backpropagate, computationally heavy and potentially unstable matrix inversions. Moreover, we use locally linear dynamic models to efficiently propagate the latent state to the next time step. The resulting network architecture, which we call Recurrent Kalman Network (RKN), can be used for any time-series data, similar to a LSTM (Hochreiter & Schmidhuber, 1997) but uses an explicit representation of uncertainty. As shown by our experiments, the RKN obtains much more accurate uncertainty estimates than an LSTM or Gated Recurrent Units (GRUs) (Cho et al., 2014) while also showing a slightly improved prediction performance and outperforms various recent generative models on an image imputation task. △ Less

Submitted 17 May, 2019; originally announced May 2019.

Comments: accepted at ICML 2019

arXiv:1902.07884 [pdf, other]

Approximate selective inference via maximum likelihood

Authors: Snigdha Panigrahi, Jonathan Taylor

Abstract: Several strategies have been developed recently to ensure valid inference after model selection; some of these are easy to compute, while others fare better in terms of inferential power. In this paper, we consider a selective inference framework for Gaussian data. We propose a new method for inference through approximate maximum likelihood estimation. Our goal is to: (i) achieve better inferentia… ▽ More Several strategies have been developed recently to ensure valid inference after model selection; some of these are easy to compute, while others fare better in terms of inferential power. In this paper, we consider a selective inference framework for Gaussian data. We propose a new method for inference through approximate maximum likelihood estimation. Our goal is to: (i) achieve better inferential power with the aid of randomization, (ii) bypass expensive MCMC sampling from exact conditional distributions that are hard to evaluate in closed forms. We construct approximate inference, e.g., p-values, confidence intervals etc., by solving a fairly simple, convex optimization problem. We illustrate the potential of our method across wide-ranging values of signal-to-noise ratio in simulations. On a cancer gene expression data set we find that our method improves upon the inferential power of some commonly used strategies for selective inference. △ Less

Submitted 11 July, 2022; v1 submitted 21 February, 2019; originally announced February 2019.

Comments: 63 Pages, 8 Figures

arXiv:1902.07634 [pdf, other]

Active Matrix Factorization for Surveys

Authors: Chelsea Zhang, Sean J. Taylor, Curtiss Cobb, Jasjeet Sekhon

Abstract: Amid historically low response rates, survey researchers seek ways to reduce respondent burden while measuring desired concepts with precision. We propose to ask fewer questions of respondents and impute missing responses via probabilistic matrix factorization. A variance-minimizing active learning criterion chooses the most informative questions per respondent. In simulations of our matrix sampli… ▽ More Amid historically low response rates, survey researchers seek ways to reduce respondent burden while measuring desired concepts with precision. We propose to ask fewer questions of respondents and impute missing responses via probabilistic matrix factorization. A variance-minimizing active learning criterion chooses the most informative questions per respondent. In simulations of our matrix sampling procedure on real-world surveys, as well as a Facebook survey experiment, we find active question selection achieves efficiency gains over baselines. The reduction in imputation error is heterogeneous across questions, and depends on the latent concepts they capture. The imputation procedure can benefit from incorporating respondent side information, modeling responses as ordered logit rather than Gaussian, and accounting for order effects. With our method, survey researchers obtain principled suggestions of questions to retain and, if desired, can automate the design of shorter instruments. △ Less

Submitted 18 June, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

arXiv:1901.09973 [pdf, other]

Inference after black box selection

Authors: Jelena Markovic, Jonathan Taylor, Jeremy Taylor

Abstract: We consider the problem of inference for parameters selected to report only after some algorithm, the canonical example being inference for model parameters after a model selection procedure. The conditional correction for selection requires knowledge of how the selection is affected by changes in the underlying data, and current research explicitly describes this selection. In this work, we assum… ▽ More We consider the problem of inference for parameters selected to report only after some algorithm, the canonical example being inference for model parameters after a model selection procedure. The conditional correction for selection requires knowledge of how the selection is affected by changes in the underlying data, and current research explicitly describes this selection. In this work, we assume 1) we have in silico access to the selection algorithm and 2) for parameters of interest, the data input into the algorithm satisfies (pre-selection) a central limit theorem jointly with an estimator of our parameter of interest. Under these assumptions, we recast the problem into a statistical learning problem which can be fit with off-the-shelf models for binary regression. The feature points in this problem are set by the user, opening up the possibility of active learning methods for computationally expensive selection algorithms. We consider two examples previously out of reach of this conditional approach: stability selection and multiple cross-validation. △ Less

Submitted 28 January, 2019; originally announced January 2019.

Comments: 20 pages, 4 figures

arXiv:1803.09590 [pdf]

Rule-based Autoregressive Moving Average Models for Forecasting Load on Special Days: A Case Study for France

Authors: Siddharth Arora, James W. Taylor

Abstract: This paper presents a case study on short-term load forecasting for France, with emphasis on special days, such as public holidays. We investigate the generalisability to French data of a recently proposed approach, which generates forecasts for normal and special days in a coherent and unified framework, by incorporating subjective judgment in univariate statistical models using a rule-based meth… ▽ More This paper presents a case study on short-term load forecasting for France, with emphasis on special days, such as public holidays. We investigate the generalisability to French data of a recently proposed approach, which generates forecasts for normal and special days in a coherent and unified framework, by incorporating subjective judgment in univariate statistical models using a rule-based methodology. The intraday, intraweek, and intrayear seasonality in load are accommodated using a rule-based triple seasonal adaptation of a seasonal autoregressive moving average (SARMA) model. We find that, for application to French load, the method requires an important adaption. We also adapt a recently proposed SARMA model that accommodates special day effects on an hourly basis using indicator variables. Using a rule formulated specifically for the French load, we compare the SARMA models with a range of different benchmark methods based on an evaluation of their point and density forecast accuracy. As sophisticated benchmarks, we employ the rule-based triple seasonal adaptations of Holt-Winters-Taylor (HWT) exponential smoothing and artificial neural networks (ANNs). We use nine years of half-hourly French load data, and consider lead times ranging from one half-hour up to a day ahead. The rule-based SARMA approach generated the most accurate forecasts. △ Less

Submitted 26 March, 2018; originally announced March 2018.

Comments: 11 figures, 3 tables

arXiv:1709.09636 [pdf, ps, other]

Randomized experiments to detect and estimate social influence in networks

Authors: Sean J. Taylor, Dean Eckles

Abstract: Estimation of social influence in networks can be substantially biased in observational studies due to homophily and network correlation in exposure to exogenous events. Randomized experiments, in which the researcher intervenes in the social system and uses randomization to determine how to do so, provide a methodology for credibly estimating of causal effects of social behaviors. In addition to… ▽ More Estimation of social influence in networks can be substantially biased in observational studies due to homophily and network correlation in exposure to exogenous events. Randomized experiments, in which the researcher intervenes in the social system and uses randomization to determine how to do so, provide a methodology for credibly estimating of causal effects of social behaviors. In addition to addressing questions central to the social sciences, these estimates can form the basis for effective marketing and public policy. In this review, we discuss the design space of experiments to measure social influence through combinations of interventions and randomizations. We define an experiment as combination of (1) a target population of individuals connected by an observed interaction network, (2) a set of treatments whereby the researcher will intervene in the social system, (3) a randomization strategy which maps individuals or edges to treatments, and (4) a measurement of an outcome of interest after treatment has been assigned. We review experiments that demonstrate potential experimental designs and we evaluate their advantages and tradeoffs for answering different types of causal questions about social influence. We show how randomization also provides a basis for statistical inference when analyzing these experiments. △ Less

Submitted 27 September, 2017; originally announced September 2017.

Comments: Forthcoming in Spreading Dynamics in Social Systems

arXiv:1709.04077 [pdf, other]

doi 10.1109/TPWRS.2018.2804353

Setpoint Tracking with Partially Observed Loads

Authors: Antoine Lesage-Landry, Joshua A. Taylor

Abstract: We use online convex optimization (OCO) for setpoint tracking with uncertain, flexible loads. We consider full feedback from the loads, bandit feedback, and two intermediate types of feedback: partial bandit where a subset of the loads are individually observed and the rest are observed in aggregate, and Bernoulli feedback where in each round the aggregator receives either full or bandit feedback… ▽ More We use online convex optimization (OCO) for setpoint tracking with uncertain, flexible loads. We consider full feedback from the loads, bandit feedback, and two intermediate types of feedback: partial bandit where a subset of the loads are individually observed and the rest are observed in aggregate, and Bernoulli feedback where in each round the aggregator receives either full or bandit feedback according to a known probability. We give sublinear regret bounds in all cases. We numerically evaluate our algorithms on examples with thermostatically controlled loads and electric vehicles. △ Less

Submitted 19 September, 2017; v1 submitted 12 September, 2017; originally announced September 2017.

Journal ref: IEEE Transactions on Power Systems, 32 (5): 5615-5627. September 2018

arXiv:1708.01977 [pdf, other]

Why Adaptively Collected Data Have Negative Bias and How to Correct for It

Authors: Xinkun Nie, Xiaoying Tian, Jonathan Taylor, James Zou

Abstract: From scientific experiments to online A/B testing, the previously observed data often affects how future experiments are performed, which in turn affects which data will be collected. Such adaptivity introduces complex correlations between the data and the collection procedure. In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the da… ▽ More From scientific experiments to online A/B testing, the previously observed data often affects how future experiments are performed, which in turn affects which data will be collected. Such adaptivity introduces complex correlations between the data and the collection procedure. In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the data have systematic \emph{negative} biases. As an example, consider an adaptive clinical trial where additional data points are more likely to be tested for treatments that show initial promise. Our surprising result implies that the average observed treatment effects would underestimate the true effects of each treatment. We quantitatively analyze the magnitude and behavior of this negative bias in a variety of settings. We also propose a novel debiasing algorithm based on selective inference techniques. In experiments, our method can effectively reduce bias and estimation error. △ Less

Submitted 30 December, 2017; v1 submitted 6 August, 2017; originally announced August 2017.

Comments: Accepted to the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spain

arXiv:1705.06916 [pdf, other]

doi 10.18637/jss.v079.i06

R Package ASMap: Efficient Genetic Linkage Map Construction and Diagnosis

Authors: Julian Taylor, David Butler

Abstract: Although various forms of linkage map construction software are widely available, there is a distinct lack of packages for use in the R statistical computing environment. This article introduces the ASMap linkage map construction R package which contains functions that use the efficient MSTmap algorithm for clustering and optimally ordering large sets of markers. Additional to the construction fun… ▽ More Although various forms of linkage map construction software are widely available, there is a distinct lack of packages for use in the R statistical computing environment. This article introduces the ASMap linkage map construction R package which contains functions that use the efficient MSTmap algorithm for clustering and optimally ordering large sets of markers. Additional to the construction functions, the package also contains a suite of tools to assist in the rapid diagnosis and repair of a constructed linkage map. The package functions can also be used for post linkage map construction techniques such as fine map** or combining maps of the same population. To showcase the efficiency and functionality of ASMap, the complete linkage map construction process is demonstrated with a high density barley backcross marker data set. △ Less

Submitted 19 May, 2017; originally announced May 2017.

Comments: Conditionally accepted for publication in Journal of Statistical Software

arXiv:1703.06559 [pdf, other]

Unifying approach to selective inference with applications to cross-validation

Authors: Jelena Markovic, Lucy Xia, Jonathan Taylor

Abstract: We develop tools to do valid post-selective inference for a family of model selection procedures, including choosing a model via cross-validated Lasso. The tools apply universally when the following random vectors are jointly asymptotically multivariate Gaussian: 1. the vector composed of each model's quality value evaluated under certain model selection criteria (e.g. cross-validation errors acro… ▽ More We develop tools to do valid post-selective inference for a family of model selection procedures, including choosing a model via cross-validated Lasso. The tools apply universally when the following random vectors are jointly asymptotically multivariate Gaussian: 1. the vector composed of each model's quality value evaluated under certain model selection criteria (e.g. cross-validation errors across folds, AIC, prediction errors etc.) 2. the test statistics from which we make inference on the parameters; it is worth noting that the parameters here are chosen after model selection methods are performed. Under these assumptions, we derive a pivotal quantity that has an asymptotically Unif(0,1) distribution which can be used to perform tests and construct confidence intervals. Both the tests and confidence intervals are selectively valid for the chosen parameter. While the above assumptions may not be satisfied in some applications, we propose a novel variation to these model selection procedures by adding Gaussian randomizations to either one of the two vectors. As a result, the joint distribution of the above random vectors is multivariate Gaussian and our general tools apply. We illustrate our method by applying it to four important procedures for which very few selective inference results have been developed: cross-validated Lasso, cross-validated randomized Lasso, AIC-based model selection among a fixed set of models and inference for a newly introduced novel marginal LOCO parameter, inspired by the LOCO parameter of Rinaldo et al (2016); and we provide complete results for these cases. For randomized model selection procedures, we develop Markov chain Monte Carlo sampling scheme to construct valid post-selective confidence intervals empirically. △ Less

Submitted 12 February, 2018; v1 submitted 19 March, 2017; originally announced March 2017.

arXiv:1703.06176 [pdf, other]

Scalable methods for Bayesian selective inference

Authors: Snigdha Panigrahi, Jonathan Taylor

Abstract: Modeled along the truncated approach in Panigrahi (2016), selection-adjusted inference in a Bayesian regime is based on a selective posterior. Such a posterior is determined together by a generative model imposed on data and the selection event that enforces a truncation on the assumed law. The effective difference between the selective posterior and the usual Bayesian framework is reflected in th… ▽ More Modeled along the truncated approach in Panigrahi (2016), selection-adjusted inference in a Bayesian regime is based on a selective posterior. Such a posterior is determined together by a generative model imposed on data and the selection event that enforces a truncation on the assumed law. The effective difference between the selective posterior and the usual Bayesian framework is reflected in the use of a truncated likelihood. The normalizer of the truncated law in the adjusted framework is the probability of the selection event; this is typically intractable and it leads to the computational bottleneck in sampling from such a posterior. The current work lays out a primal-dual approach of solving an approximating optimization problem to provide valid post-selective Bayesian inference. The selection procedures are posed as data-queries that solve a randomized version of a convex learning program which have the advantage of preserving more left-over information for inference. We propose a randomization scheme under which the optimization has separable constraints that result in a partially separable objective in lower dimensions for many commonly used selective queries to approximate the otherwise intractable selective posterior. We show that the approximating optimization under a Gaussian randomization gives a valid exponential rate of decay for the selection probability on a large deviation scale. We offer a primal-dual method to solve the optimization problem leading to an approximate posterior; this allows us to exploit the usual merits of a Bayesian machinery in both low and high dimensional regimes where the underlying signal is effectively sparse. We show that the adjusted estimates empirically demonstrate better frequentist properties in comparison to the unadjusted estimates based on the usual posterior, when applied to a wide range of constrained, convex data queries. △ Less

Submitted 11 September, 2017; v1 submitted 17 March, 2017; originally announced March 2017.

Comments: 48 pages, 6 figures

arXiv:1703.06154 [pdf, other]

An MCMC-free approach to post-selective inference

Authors: Snigdha Panigrahi, Jelena Markovic, Jonathan Taylor

Abstract: We develop a Monte Carlo-free approach to inference post output from randomized algorithms with a convex loss and a convex penalty. The pivotal statistic based on a truncated law, called the selective pivot, usually lacks closed form expressions. Inference in these settings relies upon standard Monte Carlo sampling techniques at a reference parameter followed by an exponential tilting at the refer… ▽ More We develop a Monte Carlo-free approach to inference post output from randomized algorithms with a convex loss and a convex penalty. The pivotal statistic based on a truncated law, called the selective pivot, usually lacks closed form expressions. Inference in these settings relies upon standard Monte Carlo sampling techniques at a reference parameter followed by an exponential tilting at the reference. Tilting can however be unstable for parameters that are far off from the reference parameter. We offer in this paper an alternative approach to construction of intervals and point estimates by proposing an approximation to the intractable selective pivot. Such an approximation solves a convex optimization problem in |E| dimensions, where |E| is the size of the active set observed from selection. We empirically show that the confidence intervals obtained by inverting the approximate pivot have valid coverage. △ Less

Submitted 18 May, 2017; v1 submitted 17 March, 2017; originally announced March 2017.

arXiv:1612.07811 [pdf, ps, other]

Bootstrap inference after using multiple queries for model selection

Authors: Jelena Markovic, Jonathan Taylor

Abstract: In this work, we provide a refinement of the selective CLT result of Tian and Taylor (2015), which allows for selective inference in non-parametric settings by adjusting for the asymptotic Gaussian limit for selection. Under some regularity assumptions on the density of the randomization, including heavier tails than Gaussian satisfied by e.g. logistic distribution, we prove the selective CLT hold… ▽ More In this work, we provide a refinement of the selective CLT result of Tian and Taylor (2015), which allows for selective inference in non-parametric settings by adjusting for the asymptotic Gaussian limit for selection. Under some regularity assumptions on the density of the randomization, including heavier tails than Gaussian satisfied by e.g. logistic distribution, we prove the selective CLT holds without any assumptions on the underlying parameter, allowing for rare selection events. We also show a selective CLT result for Gaussian randomization, though the quantitative results are qualitatively different for the Gaussian randomization as compared to the heavier tailed results. Furthermore, we propose a bootstrap version of this test statistic, which is provably asymptotically pivotal uniformly across a family of non-parametric distributions. This result can be interpreted as resolving the impossibility results of Leeb and Potscher (2006). We describe several sampling methods involving the projected Langevin Monte Carlo to compute the bootstrapped test statistic and the corresponding confidence intervals valid after selection. The applications of our work include valid inferential and sampling tools after running various model selection algorithms including their combinations into multiple views/queries framework. We also present a way to do data carving, providing more powerful tests than classical data splitting by reusing the information in the data from the first stage. △ Less

Submitted 27 September, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

Comments: 58 pages

arXiv:1607.06801 [pdf, other]

doi 10.1073/pnas.1614732113

High-dimensional regression adjustments in randomized experiments

Authors: Stefan Wager, Wenfei Du, Jonathan Taylor, Robert Tibshirani

Abstract: We study the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information, and show that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect. Our results considerably extend the range of settings where high-dimensional regression adjustments are guaranteed to provide valid in… ▽ More We study the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information, and show that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect. Our results considerably extend the range of settings where high-dimensional regression adjustments are guaranteed to provide valid inference about the population average treatment effect. We then propose cross-estimation, a simple method for obtaining finite-sample-unbiased treatment effect estimates that leverages high-dimensional regression adjustments. Our method can be used when the regression model is estimated using the lasso, the elastic net, subset selection, etc. Finally, we extend our analysis to allow for adaptive specification search via cross-validation, and flexible non-parametric regression adjustments with machine learning methods such as random forests or neural networks. △ Less

Submitted 27 October, 2016; v1 submitted 22 July, 2016; originally announced July 2016.

Comments: To appear in the Proceedings of the National Academy of Sciences. The present draft does not reflect final copyediting by the PNAS staff

arXiv:1605.08824 [pdf, other]

Integrative Methods for Post-Selection Inference Under Convex Constraints

Authors: Snigdha Panigrahi, Jonathan Taylor, Asaf Weinstein

Abstract: Inference after model selection has been an active research topic in the past few years, with numerous works offering different approaches to addressing the perils of the reuse of data. In particular, major progress has been made recently on large and useful classes of problems by harnessing general theory of hypothesis testing in exponential families, but these methods have their limitations. Per… ▽ More Inference after model selection has been an active research topic in the past few years, with numerous works offering different approaches to addressing the perils of the reuse of data. In particular, major progress has been made recently on large and useful classes of problems by harnessing general theory of hypothesis testing in exponential families, but these methods have their limitations. Perhaps most immediate is the gap between theory and practice: implementing the exact theoretical prescription in realistic situations---for example, when new data arrives and inference needs to be adjusted accordingly---may be a prohibitive task. In this paper we propose a Bayesian framework for carrying out inference after model selection in the linear model. Our framework is very flexible in the sense that it naturally accommodates different models for the data, instead of requiring a case-by-case treatment. At the core of our methods is a new approximation to the exact likelihood conditional on selection, the latter being generally intractable. We prove that, under appropriate conditions, our approximation is asymptotically consistent with the exact truncated likelihood. The advantages of our methods in practical data analysis are demonstrated in simulations and in application to HIV drug-resistance data. △ Less

Submitted 30 May, 2020; v1 submitted 27 May, 2016; originally announced May 2016.

Showing 1–50 of 85 results for author: Taylor, J