Search | arXiv e-print repository

Causal Interpretation of Regressions With Ranks

Abstract: In studies of educational production functions or intergenerational mobility, it is common to transform the key variables into percentile ranks. Yet, it remains unclear what the regression coefficient estimates with ranks of the outcome or the treatment. In this paper, we derive effective causal estimands for a broad class of commonly-used regression methods, including the ordinary least squares (… ▽ More In studies of educational production functions or intergenerational mobility, it is common to transform the key variables into percentile ranks. Yet, it remains unclear what the regression coefficient estimates with ranks of the outcome or the treatment. In this paper, we derive effective causal estimands for a broad class of commonly-used regression methods, including the ordinary least squares (OLS), two-stage least squares (2SLS), difference-in-differences (DiD), and regression discontinuity designs (RDD). Specifically, we introduce a novel primitive causal estimand, the Rank Average Treatment Effect (rank-ATE), and prove that it serves as the building block of the effective estimands of all the aforementioned econometrics methods. For 2SLS, DiD, and RDD, we show that direct applications to outcome ranks identify parameters that are difficult to interpret. To address this issue, we develop alternative methods to identify more interpretable causal parameters. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2402.05203 [pdf, other]

Bellman Conformal Inference: Calibrating Prediction Intervals For Time Series

Authors: Zitong Yang, Emmanuel Candès, Lihua Lei

Abstract: We introduce Bellman Conformal Inference (BCI), a framework that wraps around any time series forecasting models and provides approximately calibrated prediction intervals. Unlike existing methods, BCI is able to leverage multi-step ahead forecasts and explicitly optimize the average interval lengths by solving a one-dimensional stochastic control problem (SCP) at each time step. In particular, we… ▽ More We introduce Bellman Conformal Inference (BCI), a framework that wraps around any time series forecasting models and provides approximately calibrated prediction intervals. Unlike existing methods, BCI is able to leverage multi-step ahead forecasts and explicitly optimize the average interval lengths by solving a one-dimensional stochastic control problem (SCP) at each time step. In particular, we use the dynamic programming algorithm to find the optimal policy for the SCP. We prove that BCI achieves long-term coverage under arbitrary distribution shifts and temporal dependence, even with poor multi-step ahead forecasts. We find empirically that BCI avoids uninformative intervals that have infinite lengths and generates substantially shorter prediction intervals in multiple applications when compared with existing methods. △ Less

Submitted 9 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: 17 pages, 4 figures

arXiv:2401.13112 [pdf, other]

Distributional Counterfactual Explanation With Optimal Transport

Authors: Lei You, Lele Cao, Mattias Nilsson, Bo Zhao, Lei Lei

Abstract: Counterfactual explanations (CE) are the de facto method of providing insight and interpretability in black-box decision-making models by identifying alternative input instances that lead to different outcomes. This paper extends the concept of CE to a distributional context, broadening the scope from individual data points to entire input and output distributions, named distributional counterfact… ▽ More Counterfactual explanations (CE) are the de facto method of providing insight and interpretability in black-box decision-making models by identifying alternative input instances that lead to different outcomes. This paper extends the concept of CE to a distributional context, broadening the scope from individual data points to entire input and output distributions, named distributional counterfactual explanation (DCE). In DCE, we take the stakeholder's perspective and shift focus to analyzing the distributional properties of the factual and counterfactual, drawing parallels to the classical approach of assessing individual instances and their resulting decisions. We leverage optimal transport (OT) to frame a chance-constrained optimization problem, aiming to derive a counterfactual distribution that closely aligns with its factual counterpart, substantiated by statistical confidence. Our proposed optimization method, Discount, strategically balances this confidence in both the input and output distributions. This algorithm is accompanied by an analysis of its convergence rate. The efficacy of our proposed method is substantiated through a series of quantitative and qualitative experiments, highlighting its potential to provide deep insights into decision-making models. △ Less

Submitted 25 May, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.07152 [pdf, other]

Inference for Synthetic Controls via Refined Placebo Tests

Authors: Lihua Lei, Timothy Sudijono

Abstract: The synthetic control method is often applied to problems with one treated unit and a small number of control units. A common inferential task in this setting is to test null hypotheses regarding the average treatment effect on the treated. Inference procedures that are justified asymptotically are often unsatisfactory due to (1) small sample sizes that render large-sample approximation fragile an… ▽ More The synthetic control method is often applied to problems with one treated unit and a small number of control units. A common inferential task in this setting is to test null hypotheses regarding the average treatment effect on the treated. Inference procedures that are justified asymptotically are often unsatisfactory due to (1) small sample sizes that render large-sample approximation fragile and (2) simplification of the estimation procedure that is implemented in practice. An alternative is permutation inference, which is related to a common diagnostic called the placebo test. It has provable Type-I error guarantees in finite samples without simplification of the method, when the treatment is uniformly assigned. Despite this robustness, the placebo test suffers from low resolution since the null distribution is constructed from only $N$ reference estimates, where $N$ is the sample size. This creates a barrier for statistical inference at a common level like $α= 0.05$, especially when $N$ is small. We propose a novel leave-two-out procedure that bypasses this issue, while still maintaining the same finite-sample Type-I error guarantee under uniform assignment for a wide range of $N$. Unlike the placebo test whose Type-I error always equals the theoretical upper bound, our procedure often achieves a lower unconditional Type-I error than theory suggests; this enables useful inference in the challenging regime when $α< 1/N$. Empirically, our procedure achieves a higher power when the effect size is reasonably large and a comparable power otherwise. We generalize our procedure to non-uniform assignments and show how to conduct sensitivity analysis. From a methodological perspective, our procedure can be viewed as a new type of randomization inference different from permutation or rank-based inference, which is particularly effective in small samples. △ Less

Submitted 12 April, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

Comments: 40 pages. V2: Further literature review plus additional simulation results

arXiv:2312.07520 [pdf, other]

Estimating Counterfactual Matrix Means with Short Panel Data

Authors: Lihua Lei, Brad Ross

Abstract: We develop a new, spectral approach for identifying and estimating average counterfactual outcomes under a low-rank factor model with short panel data and general outcome missingness patterns. Applications include event studies and studies of outcomes of "matches" between agents of two types, e.g. workers and firms, typically conducted under less-flexible Two-Way-Fixed-Effects (TWFE) models of out… ▽ More We develop a new, spectral approach for identifying and estimating average counterfactual outcomes under a low-rank factor model with short panel data and general outcome missingness patterns. Applications include event studies and studies of outcomes of "matches" between agents of two types, e.g. workers and firms, typically conducted under less-flexible Two-Way-Fixed-Effects (TWFE) models of outcomes. Given an infinite population of units and a finite number of outcomes, we show our approach identifies all counterfactual outcome means, including those not estimable by existing methods, if a particular graph constructed based on overlaps in observed outcomes between subpopulations is connected. Our analogous, computationally efficient estimation procedure yields consistent, asymptotically normal estimates of counterfactual outcome means under fixed-$T$ (number of outcomes), large-$N$ (sample size) asymptotics. In a semi-synthetic simulation study based on matched employer-employee data, our estimator has lower bias and only slightly higher variance than a TWFE-model-based estimator when estimating average log-wages. △ Less

Submitted 6 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: 72 pages, 6 figures

arXiv:2310.14983 [pdf, other]

Causal clustering: design of cluster experiments under network interference

Authors: Davide Viviano, Lihua Lei, Guido Imbens, Brian Karrer, Okke Schrijvers, Liang Shi

Abstract: This paper studies the design of cluster experiments to estimate the global treatment effect in the presence of network spillovers. We provide a framework to choose the clustering that minimizes the worst-case mean-squared error of the estimated global effect. We show that optimal clustering solves a novel penalized min-cut optimization problem computed via off-the-shelf semi-definite programming… ▽ More This paper studies the design of cluster experiments to estimate the global treatment effect in the presence of network spillovers. We provide a framework to choose the clustering that minimizes the worst-case mean-squared error of the estimated global effect. We show that optimal clustering solves a novel penalized min-cut optimization problem computed via off-the-shelf semi-definite programming algorithms. Our analysis also characterizes simple conditions to choose between any two cluster designs, including choosing between a cluster or individual-level randomization. We illustrate the method's properties using unique network data from the universe of Facebook's users and existing data from a field experiment. △ Less

Submitted 13 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.08115 [pdf, other]

Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects

Authors: Wenlong Ji, Lihua Lei, Asher Spector

Abstract: Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the… ▽ More Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the potential outcomes given the covariates. Thus, existing approaches may fail under model misspecification or if consistency assumptions are violated. In this study, we propose a unified and model-agnostic inferential approach for a wide class of partially identified estimands, based on duality theory for optimal transport problems. In randomized experiments, our approach can wrap around any estimates of the conditional distributions and provide uniformly valid inference, even if the initial estimates are arbitrarily inaccurate. Also, our approach is doubly robust in observational studies. Notably, this property allows analysts to use the multiplier bootstrap to select covariates and models without sacrificing validity even if the true model is not included. Furthermore, if the conditional distributions are estimated at semiparametric rates, our approach matches the performance of an oracle with perfect knowledge of the outcome model. Finally, we propose an efficient computational framework, enabling implementation on many practical problems in causal inference. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 59 pages, 4 figures

MSC Class: 62G15 (Primary); 62G05 (Secondary) ACM Class: G.3; I.2.m

arXiv:2309.04002 [pdf, other]

Total Variation Floodgate for Variable Importance Inference in Classification

Authors: Wenshuo Wang, Lucas Janson, Lihua Lei, Aaditya Ramdas

Abstract: Inferring variable importance is the key problem of many scientific studies, where researchers seek to learn the effect of a feature $X$ on the outcome $Y$ in the presence of confounding variables $Z$. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model context. We… ▽ More Inferring variable importance is the key problem of many scientific studies, where researchers seek to learn the effect of a feature $X$ on the outcome $Y$ in the presence of confounding variables $Z$. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model context. We then introduce algorithms for statistical inference on the ETV under design-based/model-X assumptions. These algorithms build on the floodgate notion for regression problems (Zhang and Janson 2020). The algorithms we introduce can leverage any user-specified regression function and produce asymptotic lower confidence bounds for the ETV. We show the effectiveness of our algorithms with simulations and a case study in conjoint analysis on the US general election. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2304.11735 [pdf, other]

Policy Learning under Biased Sample Selection

Authors: Lihua Lei, Roshni Sahoo, Stefan Wager

Abstract: Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed (i.e., internal validity holds), the study participants may not represent a random sample of the target population (i.e., external validity fails)--and this may lea… ▽ More Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed (i.e., internal validity holds), the study participants may not represent a random sample of the target population (i.e., external validity fails)--and this may lead to policies that perform suboptimally on the target population. We consider a model where observable attributes can impact sample selection probabilities arbitrarily but the effect of unobservable attributes is bounded by a constant, and we aim to learn policies with the best possible performance guarantees that hold under any sampling bias of this type. In particular, we derive the partial identification result for the worst-case welfare in the presence of sampling bias and show that the optimal max-min, max-min gain, and minimax regret policies depend on both the conditional average treatment effect (CATE) and the conditional value-at-risk (CVaR) of potential outcomes given covariates. To avoid finite-sample inefficiencies of plug-in estimates, we further provide an end-to-end procedure for learning the optimal max-min and max-min gain policies that does not require the separate estimation of nuisance parameters. △ Less

Submitted 23 April, 2023; originally announced April 2023.

arXiv:2302.02942 [pdf, other]

doi 10.1007/s11538-023-01224-6

Empirical quantification of predictive uncertainty due to model discrepancy by training with an ensemble of experimental designs: an application to ion channel kinetics

Authors: Joseph G. Shuttleworth, Chon Lok Lei, Dominic G. Whittaker, Monique J. Windley, Adam P. Hill, Simon P. Preston, Gary R. Mirams

Abstract: When mathematical biology models are used to make quantitative predictions for clinical or industrial use, it is important that these predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises - where a mathematical model fails to recapitulate the true data generat… ▽ More When mathematical biology models are used to make quantitative predictions for clinical or industrial use, it is important that these predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises - where a mathematical model fails to recapitulate the true data generating process. This presents a particular challenge for making accurate predictions, and especially for making accurate estimates of uncertainty in these predictions. Experimentalists and modellers must choose which experimental procedures (protocols) are used to produce data to train their models. We propose to characterise uncertainty owing to model discrepancy with an ensemble of parameter sets, each of which results from training to data from a different protocol. The variability in predictions from this ensemble provides an empirical estimate of predictive uncertainty owing to model discrepancy, even for unseen protocols. We use the example of electrophysiology experiments, which are used to investigate the kinetics of the hERG potassium ion channel. Here, 'information-rich' protocols allow mathematical models to be trained using numerous short experiments performed on the same cell. Typically, assuming independent observational errors and training a model to an individual experiment results in parameter estimates with very little dependence on observational noise. Moreover, parameter sets arising from the same model applied to different experiments often conflict - indicative of model discrepancy. Our methods will help select more suitable mathematical models of hERG for future studies, and will be widely applicable to a range of biological modelling problems. △ Less

Submitted 19 February, 2024; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: Final published version with a typographical error in Table 1 (the value of q_6) corrected

MSC Class: 92B05; 92C30; 62M05

Journal ref: Bulletin of Mathematical Biology, 86(1), 2 (2024)

arXiv:2210.01592 [pdf, other]

Autocorrelated measurement processes and inference for ordinary differential equation models of biological systems

Authors: Ben Lambert, Chon Lok Lei, Martin Robinson, Michael Clerx, Richard Creswell, Sanmitra Ghosh, Simon Tavener, David Gavaghan

Abstract: Ordinary differential equation models are used to describe dynamic processes across biology. To perform likelihood-based parameter inference on these models, it is necessary to specify a statistical process representing the contribution of factors not explicitly included in the mathematical model. For this, independent Gaussian noise is commonly chosen, with its use so widespread that researchers… ▽ More Ordinary differential equation models are used to describe dynamic processes across biology. To perform likelihood-based parameter inference on these models, it is necessary to specify a statistical process representing the contribution of factors not explicitly included in the mathematical model. For this, independent Gaussian noise is commonly chosen, with its use so widespread that researchers typically provide no explicit justification for this choice. This noise model assumes `random' latent factors affect the system in ephemeral fashion resulting in unsystematic deviation of observables from their modelled counterparts. However, like the deterministically modelled parts of a system, these latent factors can have persistent effects on observables. Here, we use experimental data from dynamical systems drawn from cardiac physiology and electrochemistry to demonstrate that highly persistent differences between observations and modelled quantities can occur. Considering the case when persistent noise arises due only to measurement imperfections, we use the Fisher information matrix to quantify how uncertainty in parameter estimates is artificially reduced when erroneously assuming independent noise. We present a workflow to diagnose persistent noise from model fits and describe how to remodel accounting for correlated errors. △ Less

Submitted 4 October, 2022; originally announced October 2022.

arXiv:2209.01754 [pdf, other]

Learning from a Biased Sample

Authors: Roshni Sahoo, Lihua Lei, Stefan Wager

Abstract: The empirical risk minimization approach to data-driven decision making assumes that we can learn a decision rule from training data drawn under the same conditions as the ones we want to deploy it in. However, in a number of settings, we may be concerned that our training sample is biased, and that some groups (characterized by either observable or unobservable attributes) may be under- or over-r… ▽ More The empirical risk minimization approach to data-driven decision making assumes that we can learn a decision rule from training data drawn under the same conditions as the ones we want to deploy it in. However, in a number of settings, we may be concerned that our training sample is biased, and that some groups (characterized by either observable or unobservable attributes) may be under- or over-represented relative to the general population; and in this setting empirical risk minimization over the training set may fail to yield rules that perform well at deployment. We propose a model of sampling bias called $Γ$-biased sampling, where observed covariates can affect the probability of sample selection arbitrarily much but the amount of unexplained variation in the probability of sample selection is bounded by a constant factor. Applying the distributionally robust optimization framework, we propose a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions that can generate the training distribution under $Γ$-biased sampling. We apply a result of Rockafellar and Uryasev to show that this problem is equivalent to an augmented convex risk minimization problem. We give statistical guarantees for learning a model that is robust to sampling bias via the method of sieves, and propose a deep learning algorithm whose loss function captures our robust learning target. We empirically validate our proposed method in simulations and a case study on ICU length of stay prediction. △ Less

Submitted 5 January, 2023; v1 submitted 5 September, 2022; originally announced September 2022.

arXiv:2208.09542 [pdf, other]

Improving knockoffs with conditional calibration

Authors: Yixiang Luo, William Fithian, Lihua Lei

Abstract: The knockoff filter of Barber and Candes (arXiv:1404.5609) is a flexible framework for multiple testing in supervised learning models, based on introducing synthetic predictor variables to control the false discovery rate (FDR). Using the conditional calibration framework of Fithian and Lei (arXiv:2007.10438), we introduce the calibrated knockoff procedure, a method that uniformly improves the pow… ▽ More The knockoff filter of Barber and Candes (arXiv:1404.5609) is a flexible framework for multiple testing in supervised learning models, based on introducing synthetic predictor variables to control the false discovery rate (FDR). Using the conditional calibration framework of Fithian and Lei (arXiv:2007.10438), we introduce the calibrated knockoff procedure, a method that uniformly improves the power of any knockoff procedure. We implement our method for fixed-X knockoffs and show theoretically and empirically that the improvement is especially notable in two contexts where knockoff methods can be nearly powerless: when the rejection set is small, and when the structure of the design matrix prevents us from constructing good knockoff variables. In these contexts, calibrated knockoffs even outperform competing FDR-controlling methods like the (dependence-adjusted) Benjamini-Hochberg procedure in many scenarios. △ Less

Submitted 8 September, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

Comments: 52 pages, 19 figures

MSC Class: 62H15 (Primary); 62J15 (Secondary)

arXiv:2208.06685 [pdf, other]

Adaptive novelty detection with false discovery rate guarantee

Authors: Ariane Marandon, Lihua Lei, David Mary, Etienne Roquain

Abstract: This paper studies the semi-supervised novelty detection problem where a set of "typical" measurements is available to the researcher. Motivated by recent advances in multiple testing and conformal inference, we propose AdaDetect, a flexible method that is able to wrap around any probabilistic classification algorithm and control the false discovery rate (FDR) on detected novelties in finite sampl… ▽ More This paper studies the semi-supervised novelty detection problem where a set of "typical" measurements is available to the researcher. Motivated by recent advances in multiple testing and conformal inference, we propose AdaDetect, a flexible method that is able to wrap around any probabilistic classification algorithm and control the false discovery rate (FDR) on detected novelties in finite samples without any distributional assumption other than exchangeability. In contrast to classical FDR-controlling procedures that are often committed to a pre-specified p-value function, AdaDetect learns the transformation in a data-adaptive manner to focus the power on the directions that distinguish between inliers and outliers. Inspired by the multiple testing literature, we further propose variants of AdaDetect that are adaptive to the proportion of nulls while maintaining the finite-sample FDR control. The methods are illustrated on synthetic datasets and real-world datasets, including an application in astrophysics. △ Less

Submitted 25 October, 2023; v1 submitted 13 August, 2022; originally announced August 2022.

arXiv:2208.02814 [pdf, other]

Conformal Risk Control

Authors: Anastasios N. Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, Tal Schuster

Abstract: We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversar… ▽ More We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversarial risk control, and expectations of U-statistics. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score. △ Less

Submitted 29 April, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

Comments: Code available at https://github.com/aangelopoulos/conformal-risk

arXiv:2110.01052 [pdf, other]

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

Authors: Anastasios N. Angelopoulos, Stephen Bates, Emmanuel J. Candès, Michael I. Jordan, Lihua Lei

Abstract: We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersect… ▽ More We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use the framework to provide new calibration methods for several core machine learning tasks, with detailed worked examples in computer vision and tabular medical data. △ Less

Submitted 29 September, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

Comments: Code available at https://github.com/aangelopoulos/ltt

arXiv:2107.13737 [pdf, other]

Design-Robust Two-Way-Fixed-Effects Regression For Panel Data

Authors: Dmitry Arkhangelsky, Guido W. Imbens, Lihua Lei, Xiaoman Luo

Abstract: We propose a new estimator for average causal effects of a binary treatment with panel data in settings with general treatment patterns. Our approach augments the popular two-way-fixed-effects specification with unit-specific weights that arise from a model for the assignment mechanism. We show how to construct these weights in various settings, including the staggered adoption setting, where unit… ▽ More We propose a new estimator for average causal effects of a binary treatment with panel data in settings with general treatment patterns. Our approach augments the popular two-way-fixed-effects specification with unit-specific weights that arise from a model for the assignment mechanism. We show how to construct these weights in various settings, including the staggered adoption setting, where units opt into the treatment sequentially but permanently. The resulting estimator converges to an average (over units and time) treatment effect under the correct specification of the assignment model, even if the fixed effect model is misspecified. We show that our estimator is more robust than the conventional two-way estimator: it remains consistent if either the assignment mechanism or the two-way regression model is correctly specified. In addition, the proposed estimator performs better than the two-way-fixed-effect estimator if the outcome model and assignment mechanism are locally misspecified. This strong double robustness property underlines and quantifies the benefits of modeling the assignment process and motivates using our estimator in practice. We also discuss an extension of our estimator to handle dynamic treatment effects. △ Less

Submitted 4 March, 2024; v1 submitted 29 July, 2021; originally announced July 2021.

Comments: 131 pages; R package available at https://github.com/lihualei71/ripw; replication files available at https://github.com/xiaomanluo/ripwPaper

arXiv:2106.15743 [pdf, other]

BONuS: Multiple multivariate testing with a data-adaptivetest statistic

Authors: Chiao-Yu Yang, Lihua Lei, Nhat Ho, Will Fithian

Abstract: We propose a new adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric. BONuS is an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the false discovery rate (FDR), and is closely connected to the "counting knockoffs" pro… ▽ More We propose a new adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric. BONuS is an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the false discovery rate (FDR), and is closely connected to the "counting knockoffs" procedure analyzed in Weinstein et al. (2017). Contrary to procedures that start with a $p$-value for each hypothesis, our method analyzes the entire data set to adaptively estimate an optimal $p$-value transform based on an empirical Bayes model. Despite the extra adaptivity, our method controls FDR in finite samples even if the empirical Bayes model is incorrect or the estimation is poor. An extension, the Double BONuS procedure, validates the empirical Bayes model to guard against power loss due to model misspecification. △ Less

Submitted 1 July, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

arXiv:2104.08279 [pdf, other]

doi 10.1214/22-AOS2244

Testing for Outliers with Conformal p-values

Authors: Stephen Bates, Emmanuel Candès, Lihua Lei, Yaniv Romano, Matteo Sesia

Abstract: This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually depende… ▽ More This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data. △ Less

Submitted 24 May, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

Comments: Revision May 24, 2022: added "asymptotic" and "Monte Carlo" conditional calibration methods; added power analyses; updated numerical experiments to include new methods

Journal ref: Ann. Statist. 51(1): 149-178 (February 2023)

arXiv:2103.09763 [pdf, other]

Conformalized Survival Analysis

Authors: Emmanuel J. Candès, Lihua Lei, Zhimei Ren

Abstract: Existing survival analysis techniques heavily rely on strong modelling assumptions and are, therefore, prone to model misspecification errors. In this paper, we develop an inferential method based on ideas from conformal prediction, which can wrap around any survival prediction algorithm to produce calibrated, covariate-dependent lower predictive bounds on survival times. In the Type I right-censo… ▽ More Existing survival analysis techniques heavily rely on strong modelling assumptions and are, therefore, prone to model misspecification errors. In this paper, we develop an inferential method based on ideas from conformal prediction, which can wrap around any survival prediction algorithm to produce calibrated, covariate-dependent lower predictive bounds on survival times. In the Type I right-censoring setting, when the censoring times are completely exogenous, the lower predictive bounds have guaranteed coverage in finite samples without any assumptions other than that of operating on independent and identically distributed data points. Under a more general conditionally independent censoring assumption, the bounds satisfy a doubly robust property which states the following: marginal coverage is approximately guaranteed if either the censoring mechanism or the conditional survival function is estimated well. Further, we demonstrate that the lower predictive bounds remain valid and informative for other types of censoring. The validity and efficiency of our procedure are demonstrated on synthetic data and real COVID-19 data from the UK Biobank. △ Less

Submitted 23 April, 2023; v1 submitted 17 March, 2021; originally announced March 2021.

arXiv:2101.02703 [pdf, other]

Distribution-Free, Risk-Controlling Prediction Sets

Authors: Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, Michael I. Jordan

Abstract: While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying learning systems in consequential settings also requires calibrating and communicating the uncertainty of predictions. To convey instance-wise uncertainty for prediction tasks, we show how to generate set-valued predictions from a black-box… ▽ More While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying learning systems in consequential settings also requires calibrating and communicating the uncertainty of predictions. To convey instance-wise uncertainty for prediction tasks, we show how to generate set-valued predictions from a black-box predictor that control the expected loss on future test points at a user-specified level. Our approach provides explicit finite-sample guarantees for any dataset by using a holdout set to calibrate the size of the prediction sets. This framework enables simple, distribution-free, rigorous error control for many tasks, and we demonstrate it in five large-scale machine learning problems: (1) classification problems where some mistakes are more costly than others; (2) multi-label classification, where each observation has multiple associated labels; (3) classification problems where the labels have a hierarchical structure; (4) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (5) protein structure prediction. Lastly, we discuss extensions to uncertainty quantification for ranking, metric learning and distributionally robust learning. △ Less

Submitted 4 August, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

Comments: Project website available at http://www.angelopoulos.ai/blog/posts/rcps/ and codebase available at https://github.com/aangelopoulos/rcps

arXiv:2011.04854 [pdf, other]

Using flexible noise models to avoid noise model misspecification in inference of differential equation time series models

Authors: Richard Creswell, Ben Lambert, Chon Lok Lei, Martin Robinson, David Gavaghan

Abstract: When modelling time series, it is common to decompose observed variation into a "signal" process, the process of interest, and "noise", representing nuisance factors that obfuscate the signal. To separate signal from noise, assumptions must be made about both parts of the system. If the signal process is incorrectly specified, our predictions using this model may generalise poorly; similarly, if t… ▽ More When modelling time series, it is common to decompose observed variation into a "signal" process, the process of interest, and "noise", representing nuisance factors that obfuscate the signal. To separate signal from noise, assumptions must be made about both parts of the system. If the signal process is incorrectly specified, our predictions using this model may generalise poorly; similarly, if the noise process is incorrectly specified, we can attribute too much or too little observed variation to the signal. With little justification, independent Gaussian noise is typically chosen, which defines a statistical model that is simple to implement but often misstates system uncertainty and may underestimate error autocorrelation. There are a range of alternative noise processes available but, in practice, none of these may be entirely appropriate, as actual noise may be better characterised as a time-varying mixture of these various types. Here, we consider systems where the signal is modelled with ordinary differential equations and present classes of flexible noise processes that adapt to a system's characteristics. Our noise models include a multivariate normal kernel where Gaussian processes allow for non-stationary persistence and variance, and nonparametric Bayesian models that partition time series into distinct blocks of separate noise structures. Across the scenarios we consider, these noise processes faithfully reproduce true system uncertainty: that is, parameter estimate uncertainty when doing inference using the correct noise model. The models themselves and the methods for fitting them are scalable to large datasets and could help to ensure more appropriate quantification of uncertainty in a host of time series models. △ Less

Submitted 9 November, 2020; originally announced November 2020.

arXiv:2007.10438 [pdf, other]

Conditional calibration for false discovery rate control under dependence

Authors: William Fithian, Lihua Lei

Abstract: We introduce a new class of methods for finite-sample false discovery rate (FDR) control in multiple testing problems with dependent test statistics where the dependence is fully or partially known. Our approach separately calibrates a data-dependent p-value rejection threshold for each hypothesis, relaxing or tightening the threshold as appropriate to target exact FDR control. In addition to our… ▽ More We introduce a new class of methods for finite-sample false discovery rate (FDR) control in multiple testing problems with dependent test statistics where the dependence is fully or partially known. Our approach separately calibrates a data-dependent p-value rejection threshold for each hypothesis, relaxing or tightening the threshold as appropriate to target exact FDR control. In addition to our general framework we propose a concrete algorithm, the dependence-adjusted Benjamini-Hochberg (dBH) procedure, which adaptively thresholds the q-value for each hypothesis. Under positive regression dependence the dBH procedure uniformly dominates the standard BH procedure, and in general it uniformly dominates the Benjamini-Yekutieli (BY) procedure (also known as BH with log correction). Simulations and real data examples illustrate power gains over competing approaches to FDR control under dependence. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: 26 pages main text, 17 pages appendix

arXiv:2006.06138 [pdf, other]

Conformal Inference of Counterfactuals and Individual Treatment Effects

Authors: Lihua Lei, Emmanuel J. Candès

Abstract: Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This i… ▽ More Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This is troubling since assessing risk is crucial for reliable decision-making in sensitive and uncertain environments. In this work, we propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework. For completely randomized or stratified randomized experiments with perfect compliance, the intervals have guaranteed average coverage in finite samples regardless of the unknown data generating mechanism. For randomized experiments with ignorable compliance and general observational studies obeying the strong ignorability assumption, the intervals satisfy a doubly robust property which states the following: the average coverage is approximately controlled if either the propensity score or the conditional quantiles of potential outcomes can be estimated accurately. Numerical studies on both synthetic and real datasets empirically demonstrate that existing methods suffer from a significant coverage deficit even in simple models. In contrast, our methods achieve the desired coverage with reasonably short intervals. △ Less

Submitted 5 May, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Accepted by Journal of the Royal Statistical Society: Series B (JRSSB); 38 pages

arXiv:2005.11300 [pdf, other]

Model Evidence with Fast Tree Based Quadrature

Authors: Thomas Foster, Chon Lok Lei, Martin Robinson, David Gavaghan, Ben Lambert

Abstract: High dimensional integration is essential to many areas of science, ranging from particle physics to Bayesian inference. Approximating these integrals is hard, due in part to the difficulty of locating and sampling from regions of the integration domain that make significant contributions to the overall integral. Here, we present a new algorithm called Tree Quadrature (TQ) that separates this samp… ▽ More High dimensional integration is essential to many areas of science, ranging from particle physics to Bayesian inference. Approximating these integrals is hard, due in part to the difficulty of locating and sampling from regions of the integration domain that make significant contributions to the overall integral. Here, we present a new algorithm called Tree Quadrature (TQ) that separates this sampling problem from the problem of using those samples to produce an approximation of the integral. TQ places no qualifications on how the samples provided to it are obtained, allowing it to use state-of-the-art sampling algorithms that are largely ignored by existing integration algorithms. Given a set of samples, TQ constructs a surrogate model of the integrand in the form of a regression tree, with a structure optimised to maximise integral precision. The tree divides the integration domain into smaller containers, which are individually integrated and aggregated to estimate the overall integral. Any method can be used to integrate each individual container, so existing integration methods, like Bayesian Monte Carlo, can be combined with TQ to boost their performance. On a set of benchmark problems, we show that TQ provides accurate approximations to integrals in up to 15 dimensions; and in dimensions 4 and above, it outperforms simple Monte Carlo and the popular Vegas method. △ Less

Submitted 22 May, 2020; originally announced May 2020.

arXiv:2005.08457 [pdf, other]

Simultaneous Differential Network Analysis and Classification for High-dimensional Matrix-variate Data, with application to Brain Connectivity Alteration Detection and fMRI-guided Medical Diagnoses of Alzheimer's Disease

Authors: Chen Hao, Guo Ying, He Yong, Ji Jiadong, Liu Lei, Shi Yufeng, Wang Yikai, Yu Long, Zhang Xinsheng

Abstract: Alzheimer's disease (AD) is the most common form of dementia, which causes problems with memory, thinking and behavior. Growing evidence has shown that the brain connectivity network experiences alterations for such a complex disease. Network comparison, also known as differential network analysis, is thus particularly powerful to reveal the disease pathologies and identify clinical biomarkers for… ▽ More Alzheimer's disease (AD) is the most common form of dementia, which causes problems with memory, thinking and behavior. Growing evidence has shown that the brain connectivity network experiences alterations for such a complex disease. Network comparison, also known as differential network analysis, is thus particularly powerful to reveal the disease pathologies and identify clinical biomarkers for medical diagnoses (classification). Data from neurophysiological measurements are multi-dimensional and in matrix-form, which poses major challenges in brain connectivity analysis and medical diagnoses. Naive vectorization method is not sufficient as it ignores the structural information within the matrix. In the article, we adopt the Kronecker product covariance matrix framework to capture both spatial and temporal correlations of the matrix-variate data while the temporal covariance matrix is treated as a nuisance parameter. By recognizing that the strengths of network connections may vary across subjects, we develop an ensemble-learning procedure, which identifies the differential interaction patterns of brain regions between the AD group and the control group and conducts medical diagnosis (classification) of AD simultaneously. We applied the proposed procedure to functional connectivity analysis of fMRI dataset related with Alzheimer's disease. The hub nodes and differential interaction patterns identified are consistent with existing experimental studies, and satisfactory out-of-sample classification performance is achieved for medical diagnosis of Alzheimer's disease. An R package \SDNCMV" for implementation is available at https://github.com/heyongstat/SDNCMV. △ Less

Submitted 27 May, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

arXiv:2004.14531 [pdf, other]

Consistency of Spectral Clustering on Hierarchical Stochastic Block Models

Authors: Lihua Lei, Xiaodong Li, Xingmei Lou

Abstract: We study the hierarchy of communities in real-world networks under a generic stochastic block model, in which the connection probabilities are structured in a binary tree. Under such model, a standard recursive bi-partitioning algorithm is dividing the network into two communities based on the Fiedler vector of the unnormalized graph Laplacian and repeating the split until a stop** rule indicate… ▽ More We study the hierarchy of communities in real-world networks under a generic stochastic block model, in which the connection probabilities are structured in a binary tree. Under such model, a standard recursive bi-partitioning algorithm is dividing the network into two communities based on the Fiedler vector of the unnormalized graph Laplacian and repeating the split until a stop** rule indicates no further community structures. We prove the strong consistency of this method under a wide range of model parameters, which include sparse networks with node degrees as small as $O(\log n)$. In addition, unlike most of existing work, our theory covers multiscale networks where the connection probabilities may differ by orders of magnitude, which comprise an important class of models that are practically relevant but technically challenging to deal with. Finally we demonstrate the performance of our algorithm on synthetic data and real-world examples. △ Less

Submitted 18 November, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: 45 pages, 7 figures

arXiv:2002.05359 [pdf, other]

Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization

Authors: Samuel Horváth, Lihua Lei, Peter Richtárik, Michael I. Jordan

Abstract: Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step-size schemes and batch sizes, in different regimes. Despite the appealing theoretical results,… ▽ More Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step-size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the "geometrization" technique introduced by Lei & Jordan 2016 and the \texttt{SARAH} algorithm of Nguyen et al., 2017, we propose the Geometrized \texttt{SARAH} algorithm for non-convex finite-sum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the Polyak-Łojasiewicz (PL) constant if present. In addition, it achieves the best-available convergence rate for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives. △ Less

Submitted 13 February, 2020; originally announced February 2020.

Comments: 11 pages, 4 Figures, 20 pages Appendix

arXiv:2002.02581 [pdf, other]

doi 10.1109/JI0T.2020.3042007

Dynamic Energy Dispatch Based on Deep Reinforcement Learning in IoT-Driven Smart Isolated Microgrids

Authors: Lei Lei, Yue Tan, Glenn Dahlenburg, Wei Xiang, Kan Zheng

Abstract: Microgrids (MGs) are small, local power grids that can operate independently from the larger utility grid. Combined with the Internet of Things (IoT), a smart MG can leverage the sensory data and machine learning techniques for intelligent energy management. This paper focuses on deep reinforcement learning (DRL)-based energy dispatch for IoT-driven smart isolated MGs with diesel generators (DGs),… ▽ More Microgrids (MGs) are small, local power grids that can operate independently from the larger utility grid. Combined with the Internet of Things (IoT), a smart MG can leverage the sensory data and machine learning techniques for intelligent energy management. This paper focuses on deep reinforcement learning (DRL)-based energy dispatch for IoT-driven smart isolated MGs with diesel generators (DGs), photovoltaic (PV) panels, and a battery. A finite-horizon Partial Observable Markov Decision Process (POMDP) model is formulated and solved by learning from historical data to capture the uncertainty in future electricity consumption and renewable power generation. In order to deal with the instability problem of DRL algorithms and unique characteristics of finite-horizon models, two novel DRL algorithms, namely, finite-horizon deep deterministic policy gradient (FH-DDPG) and finite-horizon recurrent deterministic policy gradient (FH-RDPG), are proposed to derive energy dispatch policies with and without fully observable state information. A case study using real isolated MG data is performed, where the performance of the proposed algorithms are compared with the other baseline DRL and non-DRL algorithms. Moreover, the impact of uncertainties on MG performance is decoupled into two levels and evaluated respectively. △ Less

Submitted 16 November, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

Journal ref: IEEE Internet of Things Journal, vol. 8, no. 10, pp. 7938-7953, May15, 2021

arXiv:2001.09623 [pdf, other]

Variance Reduction with Sparse Gradients

Authors: Melih Elibol, Lihua Lei, Michael I. Jordan

Abstract: Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients to reduce the variance of stochastic gradients. Compared to SGD, these methods require at least double the number of operations per update to model parameters. To reduce the computational cost of these methods, we introduce a new sparsity operator: The random-top-k operator. Our operator reduce… ▽ More Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients to reduce the variance of stochastic gradients. Compared to SGD, these methods require at least double the number of operations per update to model parameters. To reduce the computational cost of these methods, we introduce a new sparsity operator: The random-top-k operator. Our operator reduces computational complexity by estimating gradient sparsity exhibited in a variety of applications by combining the top-k operator and the randomized coordinate descent operator. With this operator, large batch gradients offer an extra benefit beyond variance reduction: A reliable estimate of gradient sparsity. Theoretically, our algorithm is at least as good as the best algorithm (SpiderBoost), and further excels in performance whenever the random-top-k operator captures gradient sparsity. Empirically, our algorithm consistently outperforms SpiderBoost using various models on various tasks including image classification, natural language processing, and sparse matrix factorization. We also provide empirical evidence to support the intuition behind our algorithm via a simple gradient entropy computation, which serves to quantify gradient sparsity at every iteration. △ Less

Submitted 27 January, 2020; originally announced January 2020.

Comments: ICLR 2020

arXiv:2001.04230 [pdf, other]

doi 10.1098/rsta.2019.0349

Considering discrepancy when calibrating a mechanistic electrophysiology model

Authors: Chon Lok Lei, Sanmitra Ghosh, Dominic G. Whittaker, Yasser Aboelkassem, Kylie A. Beattie, Chris D. Cantwell, Tammo Delhaas, Charles Houston, Gustavo Montes Novaes, Alexander V. Panfilov, Pras Pathmanathan, Marina Riabiz, Rodrigo Weber dos Santos, John Walmsley, Keith Worden, Gary R. Mirams, Richard D. Wilkinson

Abstract: Uncertainty quantification (UQ) is a vital step in using mathematical models and simulations to take decisions. The field of cardiac simulation has begun to explore and adopt UQ methods to characterise uncertainty in model inputs and how that propagates through to outputs or predictions. In this perspective piece we draw attention to an important and under-addressed source of uncertainty in our pr… ▽ More Uncertainty quantification (UQ) is a vital step in using mathematical models and simulations to take decisions. The field of cardiac simulation has begun to explore and adopt UQ methods to characterise uncertainty in model inputs and how that propagates through to outputs or predictions. In this perspective piece we draw attention to an important and under-addressed source of uncertainty in our predictions -- that of uncertainty in the model structure or the equations themselves. The difference between imperfect models and reality is termed model discrepancy, and we are often uncertain as to the size and consequences of this discrepancy. Here we provide two examples of the consequences of discrepancy when calibrating models at the ion channel and action potential scales. Furthermore, we attempt to account for this discrepancy when calibrating and validating an ion channel model using different methods, based on modelling the discrepancy using Gaussian processes (GPs) and autoregressive-moving-average (ARMA) models, then highlight the advantages and shortcomings of each approach. Finally, suggestions and lines of enquiry for future work are provided. △ Less

Submitted 23 April, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

Comments: This version is published in Philosophical Transactions of the Royal Society A; Updated in response to reviewer comments, including: added details to the introduction, fixed mathematical notations for clarity, and moved the original Table 3 to the supplement to avoid confusion

Journal ref: Phil. Trans. R. Soc. A. 378 (2020): 20190349

arXiv:1911.09200 [pdf, other]

Smoothed Nested Testing on Directed Acyclic Graphs

Authors: Jackson H. Loper, Lihua Lei, William Fithian, Wesley Tansey

Abstract: We consider the problem of multiple hypothesis testing when there is a logical nested structure to the hypotheses. When one hypothesis is nested inside another, the outer hypothesis must be false if the inner hypothesis is false. We model the nested structure as a directed acyclic graph, including chain and tree graphs as special cases. Each node in the graph is a hypothesis and rejecting a node r… ▽ More We consider the problem of multiple hypothesis testing when there is a logical nested structure to the hypotheses. When one hypothesis is nested inside another, the outer hypothesis must be false if the inner hypothesis is false. We model the nested structure as a directed acyclic graph, including chain and tree graphs as special cases. Each node in the graph is a hypothesis and rejecting a node requires also rejecting all of its ancestors. We propose a general framework for adjusting node-level test statistics using the known logical constraints. Within this framework, we study a smoothing procedure that combines each node with all of its descendants to form a more powerful statistic. We prove a broad class of smoothing strategies can be used with existing selection procedures to control the familywise error rate, false discovery exceedance rate, or false discovery rate, so long as the original test statistics are independent under the null. When the null statistics are not independent but are derived from positively-correlated normal observations, we prove control for all three error rates when the smoothing method is arithmetic averaging of the observations. Simulations and an application to a real biology dataset demonstrate that smoothing leads to substantial power gains. △ Less

Submitted 15 March, 2021; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: Revised with genetic interaction maps application and new theory of PRDS

arXiv:1909.06851 [pdf, other]

Biased Estimates of Advantages over Path Ensembles

Authors: Lanxin Lei, Zhizhong Li, Dahua Lin

Abstract: The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path ensemble, which allows one to flexibly drive the learning process, towards or against risks. On top of this formulation, we systematically study the impacts of diff… ▽ More The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path ensemble, which allows one to flexibly drive the learning process, towards or against risks. On top of this formulation, we systematically study the impacts of different methods for estimating advantages. Our findings reveal that biased estimates, when chosen appropriately, can result in significant benefits. In particular, for the environments with sparse rewards, optimistic estimates would lead to more efficient exploration of the policy space; while for those where individual actions can have critical impacts, conservative estimates are preferable. On various benchmarks, including MuJoCo continuous control, Terrain locomotion, Atari games, and sparse-reward environments, the proposed biased estimation schemes consistently demonstrate improvement over mainstream methods, not only accelerating the learning process but also obtaining substantial performance gains. △ Less

Submitted 15 September, 2019; originally announced September 2019.

arXiv:1907.09059 [pdf, other]

Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges

Authors: Lei Lei, Yue Tan, Kan Zheng, Shiwen Liu, Kuan Zhang, Xuemin, Shen

Abstract: The Internet of Things (IoT) extends the Internet connectivity into billions of IoT devices around the world, where the IoT devices collect and share information to reflect status of the physical world. The Autonomous Control System (ACS), on the other hand, performs control functions on the physical systems without external intervention over an extended period of time. The integration of IoT and… ▽ More The Internet of Things (IoT) extends the Internet connectivity into billions of IoT devices around the world, where the IoT devices collect and share information to reflect status of the physical world. The Autonomous Control System (ACS), on the other hand, performs control functions on the physical systems without external intervention over an extended period of time. The integration of IoT and ACS results in a new concept - autonomous IoT (AIoT). The sensors collect information on the system status, based on which the intelligent agents in the IoT devices as well as the Edge/Fog/Cloud servers make control decisions for the actuators to react. In order to achieve autonomy, a promising method is for the intelligent agents to leverage the techniques in the field of artificial intelligence, especially reinforcement learning (RL) and deep reinforcement learning (DRL) for decision making. In this paper, we first provide a tutorial of DRL, and then propose a general model for the applications of RL/DRL in AIoT. Next, a comprehensive survey of the state-of-art research on DRL for AIoT is presented, where the existing works are classified and summarized under the umbrella of the proposed general DRL model. Finally, the challenges and open issues for future research are identified. △ Less

Submitted 13 April, 2020; v1 submitted 21 July, 2019; originally announced July 2019.

arXiv:1907.06133 [pdf, other]

doi 10.1093/biomet/asaa079

An Assumption-Free Exact Test For Fixed-Design Linear Models With Exchangeable Errors

Authors: Lihua Lei, Peter J. Bickel

Abstract: We propose the Cyclic Permutation Test (CPT) to test general linear hypotheses for linear models. This test is non-randomized and valid in finite samples with exact Type I error $α$ for an arbitrary fixed design matrix and arbitrary exchangeable errors, whenever $1 / α$ is an integer and $n / p \ge 1 / α- 1$. The test involves applying the marginal rank test to $1 / α$ linear statistics of the out… ▽ More We propose the Cyclic Permutation Test (CPT) to test general linear hypotheses for linear models. This test is non-randomized and valid in finite samples with exact Type I error $α$ for an arbitrary fixed design matrix and arbitrary exchangeable errors, whenever $1 / α$ is an integer and $n / p \ge 1 / α- 1$. The test involves applying the marginal rank test to $1 / α$ linear statistics of the outcome vector, where the coefficient vectors are determined by solving a linear system such that the joint distribution of the linear statistics is invariant with respect to a non-standard cyclic permutation group under the null hypothesis.The power can be further enhanced by solving a secondary non-linear travelling salesman problem, for which the genetic algorithm can find a reasonably good solution. Extensive simulation studies show that the CPT has comparable power to existing tests. When testing for a single contrast of coefficients, an exact confidence interval can be obtained by inverting the test. Furthermore, we provide a selective yet extensive literature review of the century-long efforts on this problem, highlighting the novelty of our test. △ Less

Submitted 31 December, 2020; v1 submitted 13 July, 2019; originally announced July 2019.

Comments: Accepted by Biometrika; 46 pages

arXiv:1906.07860 [pdf, ps, other]

Multi-user Resource Control with Deep Reinforcement Learning in IoT Edge Computing

Authors: Lei Lei, Huijuan Xu, Xiong Xiong, Kan Zheng, Wei Xiang, Xianbin Wang

Abstract: By leveraging the concept of mobile edge computing (MEC), massive amount of data generated by a large number of Internet of Things (IoT) devices could be offloaded to MEC server at the edge of wireless network for further computational intensive processing. However, due to the resource constraint of IoT devices and wireless network, both the communications and computation resources need to be allo… ▽ More By leveraging the concept of mobile edge computing (MEC), massive amount of data generated by a large number of Internet of Things (IoT) devices could be offloaded to MEC server at the edge of wireless network for further computational intensive processing. However, due to the resource constraint of IoT devices and wireless network, both the communications and computation resources need to be allocated and scheduled efficiently for better system performance. In this paper, we propose a joint computation offloading and multi-user scheduling algorithm for IoT edge computing system to minimize the long-term average weighted sum of delay and power consumption under stochastic traffic arrival. We formulate the dynamic optimization problem as an infinite-horizon average-reward continuous-time Markov decision process (CTMDP) model. One critical challenge in solving this MDP problem for the multi-user resource control is the curse-of-dimensionality problem, where the state space of the MDP model and the computation complexity increase exponentially with the growing number of users or IoT devices. In order to overcome this challenge, we use the deep reinforcement learning (RL) techniques and propose a neural network architecture to approximate the value functions for the post-decision system states. The designed algorithm to solve the CTMDP problem supports semi-distributed auction-based implementation, where the IoT devices submit bids to the BS to make the resource control decisions centrally. Simulation results show that the proposed algorithm provides significant performance improvement over the baseline algorithms, and also outperforms the RL algorithms based on other neural network architectures. △ Less

Submitted 18 June, 2019; originally announced June 2019.

arXiv:1904.04480 [pdf, other]

doi 10.1137/19M1256919

On the Adaptivity of Stochastic Gradient-Based Optimization

Authors: Lihua Lei, Michael I. Jordan

Abstract: Stochastic-gradient-based optimization has been a core enabling methodology in applications to large-scale problems in machine learning and related areas. Despite the progress, the gap between theory and practice remains significant, with theoreticians pursuing mathematical optimality at a cost of obtaining specialized procedures in different regimes (e.g., modulus of strong convexity, magnitude o… ▽ More Stochastic-gradient-based optimization has been a core enabling methodology in applications to large-scale problems in machine learning and related areas. Despite the progress, the gap between theory and practice remains significant, with theoreticians pursuing mathematical optimality at a cost of obtaining specialized procedures in different regimes (e.g., modulus of strong convexity, magnitude of target accuracy, signal-to-noise ratio), and with practitioners not readily able to know which regime is appropriate to their problem, and seeking broadly applicable algorithms that are reasonably close to optimality. To bridge these perspectives it is necessary to study algorithms that are adaptive to different regimes. We present the stochastically controlled stochastic gradient (SCSG) method for composite convex finite-sum optimization problems and show that SCSG is adaptive to both strong convexity and target accuracy. The adaptivity is achieved by batch variance reduction with adaptive batch sizes and a novel technique, which we referred to as geometrization, which sets the length of each epoch as a geometric random variable. The algorithm achieves strictly better theoretical complexity than other existing adaptive algorithms, while the tuning parameters of the algorithm only depend on the smoothness parameter of the objective. △ Less

Submitted 31 December, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

Comments: Accepted by SIAM Journal on Optimization; 54 pages

arXiv:1902.04326 [pdf, other]

An In-Vehicle KWS System with Multi-Source Fusion for Vehicle Applications

Authors: Yue Tan, Kan Zheng, Lei Lei

Abstract: In order to maximize detection precision rate as well as the recall rate, this paper proposes an in-vehicle multi-source fusion scheme in Keyword Spotting (KWS) System for vehicle applications. Vehicle information, as a new source for the original system, is collected by an in-vehicle data acquisition platform while the user is driving. A Deep Neural Network (DNN) is trained to extract acoustic fe… ▽ More In order to maximize detection precision rate as well as the recall rate, this paper proposes an in-vehicle multi-source fusion scheme in Keyword Spotting (KWS) System for vehicle applications. Vehicle information, as a new source for the original system, is collected by an in-vehicle data acquisition platform while the user is driving. A Deep Neural Network (DNN) is trained to extract acoustic features and make a speech classification. Based on the posterior probabilities obtained from DNN, the vehicle information including the speed and direction of vehicle is applied to choose the suitable parameter from a pair of sensitivity values for the KWS system. The experimental results show that the KWS system with the proposed multi-source fusion scheme can achieve better performances in term of precision rate, recall rate, and mean square error compared to the system without it. △ Less

Submitted 16 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

arXiv:1812.09028 [pdf, other]

NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning

Authors: Sirui Xie, Junning Huang, Lanxin Lei, Chunxiao Liu, Zheng Ma, Wei Zhang, Liang Lin

Abstract: Reinforcement learning agents need exploratory behaviors to escape from local optima. These behaviors may include both immediate dithering perturbation and temporally consistent exploration. To achieve these, a stochastic policy model that is inherently consistent through a period of time is in desire, especially for tasks with either sparse rewards or long term information. In this work, we intro… ▽ More Reinforcement learning agents need exploratory behaviors to escape from local optima. These behaviors may include both immediate dithering perturbation and temporally consistent exploration. To achieve these, a stochastic policy model that is inherently consistent through a period of time is in desire, especially for tasks with either sparse rewards or long term information. In this work, we introduce a novel on-policy temporally consistent exploration strategy - Neural Adaptive Dropout Policy Exploration (NADPEx) - for deep reinforcement learning agents. Modeled as a global random variable for conditional distribution, dropout is incorporated to reinforcement learning policies, equip** them with inherent temporal consistency, even when the reward signals are sparse. Two factors, gradients' alignment with the objective and KL constraint in policy space, are discussed to guarantee NADPEx policy's stable improvement. Our experiments demonstrate that NADPEx solves tasks with sparse reward while naive exploration and parameter noise fail. It yields as well or even faster convergence in the standard mujoco benchmark for continuous control. △ Less

Submitted 24 December, 2018; v1 submitted 21 December, 2018; originally announced December 2018.

Comments: To appear in ICLR 2019

arXiv:1810.01509 [pdf, other]

Hierarchical community detection by recursive partitioning

Authors: Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Koen Van den Berge, Purnamrita Sarkar, Peter J. Bickel, Elizaveta Levina

Abstract: The problem of community detection in networks is usually formulated as finding a single partition of the network into some "correct" number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive partitioning algorithm, starting with a single community and se… ▽ More The problem of community detection in networks is usually formulated as finding a single partition of the network into some "correct" number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive partitioning algorithm, starting with a single community and separating the nodes into two communities by spectral clustering repeatedly, until a stop** rule suggests there are no further communities. This class of algorithms is model-free, computationally efficient, and requires no tuning other than selecting a stop** rule. We show that there are regimes where this approach outperforms K-way spectral clustering, and propose a natural framework for analyzing the algorithm's theoretical performance, the binary tree stochastic block model. Under this model, we prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. We apply the algorithm to a gene network based on gene co-occurrence in 1580 research papers on anemia, and identify six clusters of genes in a meaningful hierarchy. We also illustrate the algorithm on a dataset of statistics papers. △ Less

Submitted 14 May, 2020; v1 submitted 2 October, 2018; originally announced October 2018.

arXiv:1710.02776 [pdf, other]

doi 10.1093/biomet/asaa064

STAR: A general interactive framework for FDR control under structural constraints

Authors: Lihua Lei, Aaditya Ramdas, William Fithian

Abstract: We propose a general framework based on selectively traversed accumulation rules (STAR) for interactive multiple testing with generic structural constraints on the rejection set. It combines accumulation tests from ordered multiple testing with data-carving ideas from post-selection inference, allowing for highly flexible adaptation to generic structural information. Our procedure defines an inter… ▽ More We propose a general framework based on selectively traversed accumulation rules (STAR) for interactive multiple testing with generic structural constraints on the rejection set. It combines accumulation tests from ordered multiple testing with data-carving ideas from post-selection inference, allowing for highly flexible adaptation to generic structural information. Our procedure defines an interactive protocol for gradually pruning a candidate rejection set, beginning with the set of all hypotheses and shrinking with each step. By restricting the information at each step via a technique we call masking, our protocol enables interaction while controlling the false discovery rate (FDR) in finite samples for any data-adaptive update rule that the analyst may choose. We suggest update rules for a variety of applications with complex structural constraints, show that STAR performs well for problems ranging from convex region detection to FDR control on directed acyclic graphs, and show how to extend it to regression problems where knockoff statistics are available in lieu of $p$-values. △ Less

Submitted 7 September, 2020; v1 submitted 8 October, 2017; originally announced October 2017.

Comments: To appear in Biometrika

arXiv:1609.06035 [pdf, other]

AdaPT: An interactive procedure for multiple testing with side information

Authors: Lihua Lei, William Fithian

Abstract: We consider the problem of multiple hypothesis testing with generic side information: for each hypothesis $H_i$ we observe both a p-value $p_i$ and some predictor $x_i$ encoding contextual information about the hypothesis. For large-scale problems, adaptively focusing power on the more promising hypotheses (those more likely to yield discoveries) can lead to much more powerful multiple testing pro… ▽ More We consider the problem of multiple hypothesis testing with generic side information: for each hypothesis $H_i$ we observe both a p-value $p_i$ and some predictor $x_i$ encoding contextual information about the hypothesis. For large-scale problems, adaptively focusing power on the more promising hypotheses (those more likely to yield discoveries) can lead to much more powerful multiple testing procedures. We propose a general iterative framework for this problem, called the Adaptive p-value Thresholding (AdaPT) procedure, which adaptively estimates a Bayes-optimal p-value rejection threshold and controls the false discovery rate (FDR) in finite samples. At each iteration of the procedure, the analyst proposes a rejection threshold and observes partially censored p-values, estimates the false discovery proportion (FDP) below the threshold, and either stops to reject or proposes another threshold, until the estimated FDP is below $α$. Our procedure is adaptive in an unusually strong sense, permitting the analyst to use any statistical or machine learning method she chooses to estimate the optimal threshold, and to switch between different models at each iteration as information accrues. We demonstrate the favorable performance of AdaPT by comparing it to state-of-the-art methods in five real applications and two simulation studies. △ Less

Submitted 24 July, 2018; v1 submitted 20 September, 2016; originally announced September 2016.

Comments: Accepted by JRSS-B; Develop an R package adaptMT (https://github.com/lihualei71/adaptMT)

arXiv:1609.03261 [pdf, other]

Less than a Single Pass: Stochastically Controlled Stochastic Gradient Method

Authors: Lihua Lei, Michael I. Jordan

Abstract: We develop and analyze a procedure for gradient-based optimization that we refer to as stochastically controlled stochastic gradient (SCSG). As a member of the SVRG family of algorithms, SCSG makes use of gradient estimates at two scales, with the number of updates at the faster scale being governed by a geometric random variable. Unlike most existing algorithms in this family, both the computatio… ▽ More We develop and analyze a procedure for gradient-based optimization that we refer to as stochastically controlled stochastic gradient (SCSG). As a member of the SVRG family of algorithms, SCSG makes use of gradient estimates at two scales, with the number of updates at the faster scale being governed by a geometric random variable. Unlike most existing algorithms in this family, both the computation cost and the communication cost of SCSG do not necessarily scale linearly with the sample size $n$; indeed, these costs are independent of $n$ when the target accuracy is low. An experimental evaluation on real datasets confirms the effectiveness of SCSG. △ Less

Submitted 16 May, 2019; v1 submitted 11 September, 2016; originally announced September 2016.

Comments: Add Lemma B.4

arXiv:1606.01969 [pdf, other]

Power of Ordered Hypothesis Testing

Authors: Lihua Lei, William Fithian

Abstract: Ordered testing procedures are multiple testing procedures that exploit a pre-specified ordering of the null hypotheses, from most to least promising. We analyze and compare the power of several recent proposals using the asymptotic framework of Li & Barber (2015). While accumulation tests including ForwardStop can be quite powerful when the ordering is very informative, they are asymptotically po… ▽ More Ordered testing procedures are multiple testing procedures that exploit a pre-specified ordering of the null hypotheses, from most to least promising. We analyze and compare the power of several recent proposals using the asymptotic framework of Li & Barber (2015). While accumulation tests including ForwardStop can be quite powerful when the ordering is very informative, they are asymptotically powerless when the ordering is weaker. By contrast, Selective SeqStep, proposed by Barber & Candès (2015), is much less sensitive to the quality of the ordering. We compare the power of these procedures in different régimes, concluding that Selective SeqStep dominates accumulation tests if either the ordering is weak or non-null hypotheses are sparse or weak. Motivated by our asymptotic analysis, we derive an improved version of Selective SeqStep which we call Adaptive SeqStep, analogous to Storey's improvement on the Benjamini-Hochberg procedure. We compare these methods using the GEOQuery data set analyzed by Li & Barber (2015) and find Adaptive SeqStep has favorable performance for both good and bad prior orderings. △ Less

Submitted 6 June, 2016; originally announced June 2016.

Comments: 18 pages. To appear at ICML 2016

Showing 1–44 of 44 results for author: Lei, L