Skip to main content

Showing 1–44 of 44 results for author: Lei, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.05548  [pdf, ps, other

    econ.EM math.ST stat.ME

    Causal Interpretation of Regressions With Ranks

    Authors: Lihua Lei

    Abstract: In studies of educational production functions or intergenerational mobility, it is common to transform the key variables into percentile ranks. Yet, it remains unclear what the regression coefficient estimates with ranks of the outcome or the treatment. In this paper, we derive effective causal estimands for a broad class of commonly-used regression methods, including the ordinary least squares (… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  2. arXiv:2402.05203  [pdf, other

    cs.LG stat.ML

    Bellman Conformal Inference: Calibrating Prediction Intervals For Time Series

    Authors: Zitong Yang, Emmanuel Candès, Lihua Lei

    Abstract: We introduce Bellman Conformal Inference (BCI), a framework that wraps around any time series forecasting models and provides approximately calibrated prediction intervals. Unlike existing methods, BCI is able to leverage multi-step ahead forecasts and explicitly optimize the average interval lengths by solving a one-dimensional stochastic control problem (SCP) at each time step. In particular, we… ▽ More

    Submitted 9 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 17 pages, 4 figures

  3. arXiv:2401.13112  [pdf, other

    cs.AI stat.ML

    Distributional Counterfactual Explanation With Optimal Transport

    Authors: Lei You, Lele Cao, Mattias Nilsson, Bo Zhao, Lei Lei

    Abstract: Counterfactual explanations (CE) are the de facto method of providing insight and interpretability in black-box decision-making models by identifying alternative input instances that lead to different outcomes. This paper extends the concept of CE to a distributional context, broadening the scope from individual data points to entire input and output distributions, named distributional counterfact… ▽ More

    Submitted 25 May, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  4. arXiv:2401.07152  [pdf, other

    stat.ME econ.EM math.ST

    Inference for Synthetic Controls via Refined Placebo Tests

    Authors: Lihua Lei, Timothy Sudijono

    Abstract: The synthetic control method is often applied to problems with one treated unit and a small number of control units. A common inferential task in this setting is to test null hypotheses regarding the average treatment effect on the treated. Inference procedures that are justified asymptotically are often unsatisfactory due to (1) small sample sizes that render large-sample approximation fragile an… ▽ More

    Submitted 12 April, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

    Comments: 40 pages. V2: Further literature review plus additional simulation results

  5. arXiv:2312.07520  [pdf, other

    econ.EM math.ST stat.ME

    Estimating Counterfactual Matrix Means with Short Panel Data

    Authors: Lihua Lei, Brad Ross

    Abstract: We develop a new, spectral approach for identifying and estimating average counterfactual outcomes under a low-rank factor model with short panel data and general outcome missingness patterns. Applications include event studies and studies of outcomes of "matches" between agents of two types, e.g. workers and firms, typically conducted under less-flexible Two-Way-Fixed-Effects (TWFE) models of out… ▽ More

    Submitted 6 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: 72 pages, 6 figures

  6. arXiv:2310.14983  [pdf, other

    econ.EM math.ST stat.ME

    Causal clustering: design of cluster experiments under network interference

    Authors: Davide Viviano, Lihua Lei, Guido Imbens, Brian Karrer, Okke Schrijvers, Liang Shi

    Abstract: This paper studies the design of cluster experiments to estimate the global treatment effect in the presence of network spillovers. We provide a framework to choose the clustering that minimizes the worst-case mean-squared error of the estimated global effect. We show that optimal clustering solves a novel penalized min-cut optimization problem computed via off-the-shelf semi-definite programming… ▽ More

    Submitted 13 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  7. arXiv:2310.08115  [pdf, other

    econ.EM math.ST stat.ME stat.ML

    Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects

    Authors: Wenlong Ji, Lihua Lei, Asher Spector

    Abstract: Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 59 pages, 4 figures

    MSC Class: 62G15 (Primary); 62G05 (Secondary) ACM Class: G.3; I.2.m

  8. arXiv:2309.04002  [pdf, other

    stat.ME

    Total Variation Floodgate for Variable Importance Inference in Classification

    Authors: Wenshuo Wang, Lucas Janson, Lihua Lei, Aaditya Ramdas

    Abstract: Inferring variable importance is the key problem of many scientific studies, where researchers seek to learn the effect of a feature $X$ on the outcome $Y$ in the presence of confounding variables $Z$. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model context. We… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  9. arXiv:2304.11735  [pdf, other

    econ.EM math.ST stat.ME

    Policy Learning under Biased Sample Selection

    Authors: Lihua Lei, Roshni Sahoo, Stefan Wager

    Abstract: Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed (i.e., internal validity holds), the study participants may not represent a random sample of the target population (i.e., external validity fails)--and this may lea… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

  10. arXiv:2302.02942  [pdf, other

    stat.CO math.DS math.OC q-bio.QM

    Empirical quantification of predictive uncertainty due to model discrepancy by training with an ensemble of experimental designs: an application to ion channel kinetics

    Authors: Joseph G. Shuttleworth, Chon Lok Lei, Dominic G. Whittaker, Monique J. Windley, Adam P. Hill, Simon P. Preston, Gary R. Mirams

    Abstract: When mathematical biology models are used to make quantitative predictions for clinical or industrial use, it is important that these predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises - where a mathematical model fails to recapitulate the true data generat… ▽ More

    Submitted 19 February, 2024; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Final published version with a typographical error in Table 1 (the value of q_6) corrected

    MSC Class: 92B05; 92C30; 62M05

    Journal ref: Bulletin of Mathematical Biology, 86(1), 2 (2024)

  11. arXiv:2210.01592  [pdf, other

    stat.ME q-bio.QM

    Autocorrelated measurement processes and inference for ordinary differential equation models of biological systems

    Authors: Ben Lambert, Chon Lok Lei, Martin Robinson, Michael Clerx, Richard Creswell, Sanmitra Ghosh, Simon Tavener, David Gavaghan

    Abstract: Ordinary differential equation models are used to describe dynamic processes across biology. To perform likelihood-based parameter inference on these models, it is necessary to specify a statistical process representing the contribution of factors not explicitly included in the mathematical model. For this, independent Gaussian noise is commonly chosen, with its use so widespread that researchers… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

  12. arXiv:2209.01754  [pdf, other

    stat.ME cs.LG stat.ML

    Learning from a Biased Sample

    Authors: Roshni Sahoo, Lihua Lei, Stefan Wager

    Abstract: The empirical risk minimization approach to data-driven decision making assumes that we can learn a decision rule from training data drawn under the same conditions as the ones we want to deploy it in. However, in a number of settings, we may be concerned that our training sample is biased, and that some groups (characterized by either observable or unobservable attributes) may be under- or over-r… ▽ More

    Submitted 5 January, 2023; v1 submitted 5 September, 2022; originally announced September 2022.

  13. arXiv:2208.09542  [pdf, other

    stat.ME

    Improving knockoffs with conditional calibration

    Authors: Yixiang Luo, William Fithian, Lihua Lei

    Abstract: The knockoff filter of Barber and Candes (arXiv:1404.5609) is a flexible framework for multiple testing in supervised learning models, based on introducing synthetic predictor variables to control the false discovery rate (FDR). Using the conditional calibration framework of Fithian and Lei (arXiv:2007.10438), we introduce the calibrated knockoff procedure, a method that uniformly improves the pow… ▽ More

    Submitted 8 September, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: 52 pages, 19 figures

    MSC Class: 62H15 (Primary); 62J15 (Secondary)

  14. arXiv:2208.06685  [pdf, other

    stat.ME math.ST stat.ML

    Adaptive novelty detection with false discovery rate guarantee

    Authors: Ariane Marandon, Lihua Lei, David Mary, Etienne Roquain

    Abstract: This paper studies the semi-supervised novelty detection problem where a set of "typical" measurements is available to the researcher. Motivated by recent advances in multiple testing and conformal inference, we propose AdaDetect, a flexible method that is able to wrap around any probabilistic classification algorithm and control the false discovery rate (FDR) on detected novelties in finite sampl… ▽ More

    Submitted 25 October, 2023; v1 submitted 13 August, 2022; originally announced August 2022.

  15. arXiv:2208.02814  [pdf, other

    stat.ME cs.AI cs.LG math.ST stat.ML

    Conformal Risk Control

    Authors: Anastasios N. Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, Tal Schuster

    Abstract: We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversar… ▽ More

    Submitted 29 April, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

    Comments: Code available at https://github.com/aangelopoulos/conformal-risk

  16. arXiv:2110.01052  [pdf, other

    cs.LG cs.AI cs.CV stat.ME stat.ML

    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

    Authors: Anastasios N. Angelopoulos, Stephen Bates, Emmanuel J. Candès, Michael I. Jordan, Lihua Lei

    Abstract: We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersect… ▽ More

    Submitted 29 September, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

    Comments: Code available at https://github.com/aangelopoulos/ltt

  17. arXiv:2107.13737  [pdf, other

    econ.EM econ.GN stat.ME

    Design-Robust Two-Way-Fixed-Effects Regression For Panel Data

    Authors: Dmitry Arkhangelsky, Guido W. Imbens, Lihua Lei, Xiaoman Luo

    Abstract: We propose a new estimator for average causal effects of a binary treatment with panel data in settings with general treatment patterns. Our approach augments the popular two-way-fixed-effects specification with unit-specific weights that arise from a model for the assignment mechanism. We show how to construct these weights in various settings, including the staggered adoption setting, where unit… ▽ More

    Submitted 4 March, 2024; v1 submitted 29 July, 2021; originally announced July 2021.

    Comments: 131 pages; R package available at https://github.com/lihualei71/ripw; replication files available at https://github.com/xiaomanluo/ripwPaper

  18. arXiv:2106.15743  [pdf, other

    stat.ME math.ST

    BONuS: Multiple multivariate testing with a data-adaptivetest statistic

    Authors: Chiao-Yu Yang, Lihua Lei, Nhat Ho, Will Fithian

    Abstract: We propose a new adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric. BONuS is an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the false discovery rate (FDR), and is closely connected to the "counting knockoffs" pro… ▽ More

    Submitted 1 July, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

  19. arXiv:2104.08279  [pdf, other

    stat.ME math.ST stat.ML

    Testing for Outliers with Conformal p-values

    Authors: Stephen Bates, Emmanuel Candès, Lihua Lei, Yaniv Romano, Matteo Sesia

    Abstract: This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually depende… ▽ More

    Submitted 24 May, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: Revision May 24, 2022: added "asymptotic" and "Monte Carlo" conditional calibration methods; added power analyses; updated numerical experiments to include new methods

    Journal ref: Ann. Statist. 51(1): 149-178 (February 2023)

  20. arXiv:2103.09763  [pdf, other

    stat.ME stat.ML

    Conformalized Survival Analysis

    Authors: Emmanuel J. Candès, Lihua Lei, Zhimei Ren

    Abstract: Existing survival analysis techniques heavily rely on strong modelling assumptions and are, therefore, prone to model misspecification errors. In this paper, we develop an inferential method based on ideas from conformal prediction, which can wrap around any survival prediction algorithm to produce calibrated, covariate-dependent lower predictive bounds on survival times. In the Type I right-censo… ▽ More

    Submitted 23 April, 2023; v1 submitted 17 March, 2021; originally announced March 2021.

  21. arXiv:2101.02703  [pdf, other

    cs.LG cs.AI cs.CV stat.ME stat.ML

    Distribution-Free, Risk-Controlling Prediction Sets

    Authors: Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, Michael I. Jordan

    Abstract: While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying learning systems in consequential settings also requires calibrating and communicating the uncertainty of predictions. To convey instance-wise uncertainty for prediction tasks, we show how to generate set-valued predictions from a black-box… ▽ More

    Submitted 4 August, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

    Comments: Project website available at http://www.angelopoulos.ai/blog/posts/rcps/ and codebase available at https://github.com/aangelopoulos/rcps

  22. arXiv:2011.04854  [pdf, other

    stat.ME

    Using flexible noise models to avoid noise model misspecification in inference of differential equation time series models

    Authors: Richard Creswell, Ben Lambert, Chon Lok Lei, Martin Robinson, David Gavaghan

    Abstract: When modelling time series, it is common to decompose observed variation into a "signal" process, the process of interest, and "noise", representing nuisance factors that obfuscate the signal. To separate signal from noise, assumptions must be made about both parts of the system. If the signal process is incorrectly specified, our predictions using this model may generalise poorly; similarly, if t… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

  23. arXiv:2007.10438  [pdf, other

    stat.ME math.ST

    Conditional calibration for false discovery rate control under dependence

    Authors: William Fithian, Lihua Lei

    Abstract: We introduce a new class of methods for finite-sample false discovery rate (FDR) control in multiple testing problems with dependent test statistics where the dependence is fully or partially known. Our approach separately calibrates a data-dependent p-value rejection threshold for each hypothesis, relaxing or tightening the threshold as appropriate to target exact FDR control. In addition to our… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

    Comments: 26 pages main text, 17 pages appendix

  24. arXiv:2006.06138  [pdf, other

    stat.ME math.ST stat.ML

    Conformal Inference of Counterfactuals and Individual Treatment Effects

    Authors: Lihua Lei, Emmanuel J. Candès

    Abstract: Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This i… ▽ More

    Submitted 5 May, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Accepted by Journal of the Royal Statistical Society: Series B (JRSSB); 38 pages

  25. arXiv:2005.11300  [pdf, other

    stat.ML cs.LG cs.MS stat.CO

    Model Evidence with Fast Tree Based Quadrature

    Authors: Thomas Foster, Chon Lok Lei, Martin Robinson, David Gavaghan, Ben Lambert

    Abstract: High dimensional integration is essential to many areas of science, ranging from particle physics to Bayesian inference. Approximating these integrals is hard, due in part to the difficulty of locating and sampling from regions of the integration domain that make significant contributions to the overall integral. Here, we present a new algorithm called Tree Quadrature (TQ) that separates this samp… ▽ More

    Submitted 22 May, 2020; originally announced May 2020.

  26. arXiv:2005.08457  [pdf, other

    stat.ME

    Simultaneous Differential Network Analysis and Classification for High-dimensional Matrix-variate Data, with application to Brain Connectivity Alteration Detection and fMRI-guided Medical Diagnoses of Alzheimer's Disease

    Authors: Chen Hao, Guo Ying, He Yong, Ji Jiadong, Liu Lei, Shi Yufeng, Wang Yikai, Yu Long, Zhang Xinsheng

    Abstract: Alzheimer's disease (AD) is the most common form of dementia, which causes problems with memory, thinking and behavior. Growing evidence has shown that the brain connectivity network experiences alterations for such a complex disease. Network comparison, also known as differential network analysis, is thus particularly powerful to reveal the disease pathologies and identify clinical biomarkers for… ▽ More

    Submitted 27 May, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

  27. arXiv:2004.14531  [pdf, other

    math.ST stat.ML

    Consistency of Spectral Clustering on Hierarchical Stochastic Block Models

    Authors: Lihua Lei, Xiaodong Li, Xingmei Lou

    Abstract: We study the hierarchy of communities in real-world networks under a generic stochastic block model, in which the connection probabilities are structured in a binary tree. Under such model, a standard recursive bi-partitioning algorithm is dividing the network into two communities based on the Fiedler vector of the unnormalized graph Laplacian and repeating the split until a stop** rule indicate… ▽ More

    Submitted 18 November, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: 45 pages, 7 figures

  28. arXiv:2002.05359  [pdf, other

    cs.LG math.OC stat.ML

    Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization

    Authors: Samuel Horváth, Lihua Lei, Peter Richtárik, Michael I. Jordan

    Abstract: Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step-size schemes and batch sizes, in different regimes. Despite the appealing theoretical results,… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: 11 pages, 4 Figures, 20 pages Appendix

  29. arXiv:2002.02581  [pdf, other

    cs.LG eess.SP stat.ML

    Dynamic Energy Dispatch Based on Deep Reinforcement Learning in IoT-Driven Smart Isolated Microgrids

    Authors: Lei Lei, Yue Tan, Glenn Dahlenburg, Wei Xiang, Kan Zheng

    Abstract: Microgrids (MGs) are small, local power grids that can operate independently from the larger utility grid. Combined with the Internet of Things (IoT), a smart MG can leverage the sensory data and machine learning techniques for intelligent energy management. This paper focuses on deep reinforcement learning (DRL)-based energy dispatch for IoT-driven smart isolated MGs with diesel generators (DGs),… ▽ More

    Submitted 16 November, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

    Journal ref: IEEE Internet of Things Journal, vol. 8, no. 10, pp. 7938-7953, May15, 2021

  30. arXiv:2001.09623  [pdf, other

    cs.LG stat.ML

    Variance Reduction with Sparse Gradients

    Authors: Melih Elibol, Lihua Lei, Michael I. Jordan

    Abstract: Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients to reduce the variance of stochastic gradients. Compared to SGD, these methods require at least double the number of operations per update to model parameters. To reduce the computational cost of these methods, we introduce a new sparsity operator: The random-top-k operator. Our operator reduce… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

    Comments: ICLR 2020

  31. arXiv:2001.04230  [pdf, other

    stat.CO q-bio.QM stat.AP stat.ML

    Considering discrepancy when calibrating a mechanistic electrophysiology model

    Authors: Chon Lok Lei, Sanmitra Ghosh, Dominic G. Whittaker, Yasser Aboelkassem, Kylie A. Beattie, Chris D. Cantwell, Tammo Delhaas, Charles Houston, Gustavo Montes Novaes, Alexander V. Panfilov, Pras Pathmanathan, Marina Riabiz, Rodrigo Weber dos Santos, John Walmsley, Keith Worden, Gary R. Mirams, Richard D. Wilkinson

    Abstract: Uncertainty quantification (UQ) is a vital step in using mathematical models and simulations to take decisions. The field of cardiac simulation has begun to explore and adopt UQ methods to characterise uncertainty in model inputs and how that propagates through to outputs or predictions. In this perspective piece we draw attention to an important and under-addressed source of uncertainty in our pr… ▽ More

    Submitted 23 April, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: This version is published in Philosophical Transactions of the Royal Society A; Updated in response to reviewer comments, including: added details to the introduction, fixed mathematical notations for clarity, and moved the original Table 3 to the supplement to avoid confusion

    Journal ref: Phil. Trans. R. Soc. A. 378 (2020): 20190349

  32. arXiv:1911.09200  [pdf, other

    stat.ME

    Smoothed Nested Testing on Directed Acyclic Graphs

    Authors: Jackson H. Loper, Lihua Lei, William Fithian, Wesley Tansey

    Abstract: We consider the problem of multiple hypothesis testing when there is a logical nested structure to the hypotheses. When one hypothesis is nested inside another, the outer hypothesis must be false if the inner hypothesis is false. We model the nested structure as a directed acyclic graph, including chain and tree graphs as special cases. Each node in the graph is a hypothesis and rejecting a node r… ▽ More

    Submitted 15 March, 2021; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: Revised with genetic interaction maps application and new theory of PRDS

  33. arXiv:1909.06851  [pdf, other

    cs.LG stat.ML

    Biased Estimates of Advantages over Path Ensembles

    Authors: Lanxin Lei, Zhizhong Li, Dahua Lin

    Abstract: The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path ensemble, which allows one to flexibly drive the learning process, towards or against risks. On top of this formulation, we systematically study the impacts of diff… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

  34. arXiv:1907.09059  [pdf, other

    cs.LG stat.ML

    Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges

    Authors: Lei Lei, Yue Tan, Kan Zheng, Shiwen Liu, Kuan Zhang, Xuemin, Shen

    Abstract: The Internet of Things (IoT) extends the Internet connectivity into billions of IoT devices around the world, where the IoT devices collect and share information to reflect status of the physical world. The Autonomous Control System (ACS), on the other hand, performs control functions on the physical systems without external intervention over an extended period of time. The integration of IoT and… ▽ More

    Submitted 13 April, 2020; v1 submitted 21 July, 2019; originally announced July 2019.

  35. An Assumption-Free Exact Test For Fixed-Design Linear Models With Exchangeable Errors

    Authors: Lihua Lei, Peter J. Bickel

    Abstract: We propose the Cyclic Permutation Test (CPT) to test general linear hypotheses for linear models. This test is non-randomized and valid in finite samples with exact Type I error $α$ for an arbitrary fixed design matrix and arbitrary exchangeable errors, whenever $1 / α$ is an integer and $n / p \ge 1 / α- 1$. The test involves applying the marginal rank test to $1 / α$ linear statistics of the out… ▽ More

    Submitted 31 December, 2020; v1 submitted 13 July, 2019; originally announced July 2019.

    Comments: Accepted by Biometrika; 46 pages

  36. arXiv:1906.07860  [pdf, ps, other

    eess.SP cs.LG cs.NI stat.ML

    Multi-user Resource Control with Deep Reinforcement Learning in IoT Edge Computing

    Authors: Lei Lei, Huijuan Xu, Xiong Xiong, Kan Zheng, Wei Xiang, Xianbin Wang

    Abstract: By leveraging the concept of mobile edge computing (MEC), massive amount of data generated by a large number of Internet of Things (IoT) devices could be offloaded to MEC server at the edge of wireless network for further computational intensive processing. However, due to the resource constraint of IoT devices and wireless network, both the communications and computation resources need to be allo… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

  37. arXiv:1904.04480  [pdf, other

    math.OC cs.LG stat.ML

    On the Adaptivity of Stochastic Gradient-Based Optimization

    Authors: Lihua Lei, Michael I. Jordan

    Abstract: Stochastic-gradient-based optimization has been a core enabling methodology in applications to large-scale problems in machine learning and related areas. Despite the progress, the gap between theory and practice remains significant, with theoreticians pursuing mathematical optimality at a cost of obtaining specialized procedures in different regimes (e.g., modulus of strong convexity, magnitude o… ▽ More

    Submitted 31 December, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

    Comments: Accepted by SIAM Journal on Optimization; 54 pages

  38. arXiv:1902.04326  [pdf, other

    cs.LG stat.ML

    An In-Vehicle KWS System with Multi-Source Fusion for Vehicle Applications

    Authors: Yue Tan, Kan Zheng, Lei Lei

    Abstract: In order to maximize detection precision rate as well as the recall rate, this paper proposes an in-vehicle multi-source fusion scheme in Keyword Spotting (KWS) System for vehicle applications. Vehicle information, as a new source for the original system, is collected by an in-vehicle data acquisition platform while the user is driving. A Deep Neural Network (DNN) is trained to extract acoustic fe… ▽ More

    Submitted 16 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

  39. arXiv:1812.09028  [pdf, other

    cs.LG cs.RO stat.ML

    NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning

    Authors: Sirui Xie, Junning Huang, Lanxin Lei, Chunxiao Liu, Zheng Ma, Wei Zhang, Liang Lin

    Abstract: Reinforcement learning agents need exploratory behaviors to escape from local optima. These behaviors may include both immediate dithering perturbation and temporally consistent exploration. To achieve these, a stochastic policy model that is inherently consistent through a period of time is in desire, especially for tasks with either sparse rewards or long term information. In this work, we intro… ▽ More

    Submitted 24 December, 2018; v1 submitted 21 December, 2018; originally announced December 2018.

    Comments: To appear in ICLR 2019

  40. arXiv:1810.01509  [pdf, other

    stat.ME math.ST stat.ML

    Hierarchical community detection by recursive partitioning

    Authors: Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Koen Van den Berge, Purnamrita Sarkar, Peter J. Bickel, Elizaveta Levina

    Abstract: The problem of community detection in networks is usually formulated as finding a single partition of the network into some "correct" number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive partitioning algorithm, starting with a single community and se… ▽ More

    Submitted 14 May, 2020; v1 submitted 2 October, 2018; originally announced October 2018.

  41. STAR: A general interactive framework for FDR control under structural constraints

    Authors: Lihua Lei, Aaditya Ramdas, William Fithian

    Abstract: We propose a general framework based on selectively traversed accumulation rules (STAR) for interactive multiple testing with generic structural constraints on the rejection set. It combines accumulation tests from ordered multiple testing with data-carving ideas from post-selection inference, allowing for highly flexible adaptation to generic structural information. Our procedure defines an inter… ▽ More

    Submitted 7 September, 2020; v1 submitted 8 October, 2017; originally announced October 2017.

    Comments: To appear in Biometrika

  42. arXiv:1609.06035  [pdf, other

    stat.ME

    AdaPT: An interactive procedure for multiple testing with side information

    Authors: Lihua Lei, William Fithian

    Abstract: We consider the problem of multiple hypothesis testing with generic side information: for each hypothesis $H_i$ we observe both a p-value $p_i$ and some predictor $x_i$ encoding contextual information about the hypothesis. For large-scale problems, adaptively focusing power on the more promising hypotheses (those more likely to yield discoveries) can lead to much more powerful multiple testing pro… ▽ More

    Submitted 24 July, 2018; v1 submitted 20 September, 2016; originally announced September 2016.

    Comments: Accepted by JRSS-B; Develop an R package adaptMT (https://github.com/lihualei71/adaptMT)

  43. arXiv:1609.03261  [pdf, other

    math.OC cs.DS cs.LG stat.ML

    Less than a Single Pass: Stochastically Controlled Stochastic Gradient Method

    Authors: Lihua Lei, Michael I. Jordan

    Abstract: We develop and analyze a procedure for gradient-based optimization that we refer to as stochastically controlled stochastic gradient (SCSG). As a member of the SVRG family of algorithms, SCSG makes use of gradient estimates at two scales, with the number of updates at the faster scale being governed by a geometric random variable. Unlike most existing algorithms in this family, both the computatio… ▽ More

    Submitted 16 May, 2019; v1 submitted 11 September, 2016; originally announced September 2016.

    Comments: Add Lemma B.4

  44. arXiv:1606.01969  [pdf, other

    stat.ME

    Power of Ordered Hypothesis Testing

    Authors: Lihua Lei, William Fithian

    Abstract: Ordered testing procedures are multiple testing procedures that exploit a pre-specified ordering of the null hypotheses, from most to least promising. We analyze and compare the power of several recent proposals using the asymptotic framework of Li & Barber (2015). While accumulation tests including ForwardStop can be quite powerful when the ordering is very informative, they are asymptotically po… ▽ More

    Submitted 6 June, 2016; originally announced June 2016.

    Comments: 18 pages. To appear at ICML 2016