Skip to main content

Showing 1–50 of 96 results for author: Wainwright, M J

Searching in archive math. Search in all archives.
.
  1. arXiv:2401.13665  [pdf, other

    math.ST econ.EM stat.ME stat.ML

    Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach

    Authors: Yuling Yan, Martin J. Wainwright

    Abstract: Longitudinal or panel data can be represented as a matrix with rows indexed by units and columns indexed by time. We consider inferential questions associated with the missing data version of panel data induced by staggered adoption. We propose a computationally efficient procedure for estimation, involving only simple matrix algebra and singular value decomposition, and prove non-asymptotic and h… ▽ More

    Submitted 1 July, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  2. arXiv:2401.05233  [pdf, other

    cs.LG cs.IT eess.SY math.OC stat.ML

    Taming "data-hungry" reinforcement learning? Stability in continuous state-action spaces

    Authors: Yaqi Duan, Martin J. Wainwright

    Abstract: We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both off-line and on-line settings. Our analysis highlights two key stability properties, relating to how changes in value functions and/or policies affect the Bellman operator and occupation measures. We argue that these properties are satisf… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  3. arXiv:2311.10076  [pdf, other

    stat.ME math.ST

    A decorrelation method for general regression adjustment in randomized experiments

    Authors: Fangzhou Su, Wenlong Mou, Peng Ding, Martin J. Wainwright

    Abstract: We study regression adjustment with general function class approximations for estimating the average treatment effect in the design-based setting. Standard regression adjustment involves bias due to sample re-use, and this bias leads to behavior that is sub-optimal in the sample size, and/or imposes restrictive assumptions. Our main contribution is to introduce a novel decorrelation-based approach… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: Fangzhou Su and Wenlong Mou contributed equally to this work

  4. arXiv:2309.01362  [pdf, other

    math.ST stat.ME

    Challenges of the inconsistency regime: Novel debiasing methods for missing data models

    Authors: Michael Celentano, Martin J. Wainwright

    Abstract: We study semi-parametric estimation of the population mean when data is observed missing at random (MAR) in the $n < p$ "inconsistency regime", in which neither the outcome model nor the propensity/missingness model can be estimated consistently. Consider a high-dimensional linear-GLM specification in which the number of confounders is proportional to the sample size. In the case $n > p$, past wor… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 89 pages, 6 figures

    MSC Class: 62J05; 62J12; 62F10

  5. arXiv:2303.12613  [pdf, other

    math.ST cs.IT

    Noisy recovery from random linear observations: Sharp minimax rates under elliptical constraints

    Authors: Reese Pathak, Martin J. Wainwright, Lin Xiao

    Abstract: Estimation problems with constrained parameter spaces arise in various settings. In many of these problems, the observations available to the statistician can be modelled as arising from the noisy realization of the image of a random linear operator; an important special case is random design regression. We derive sharp rates of estimation for arbitrary compact elliptical parameter sets and demons… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 53 pages, 2 figures

  6. arXiv:2303.02534  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Semi-parametric inference based on adaptively collected data

    Authors: Licong Lin, Koulik Khamaru, Martin J. Wainwright

    Abstract: Many standard estimators, when applied to adaptively collected data, fail to be asymptotically normal, thereby complicating the construction of confidence intervals. We address this challenge in a semi-parametric context: estimating the parameter vector of a generalized linear regression model contaminated by a non-parametric nuisance component. We construct suitably weighted estimating equations… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

  7. arXiv:2301.06240  [pdf, other

    math.ST stat.ME stat.ML

    Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency

    Authors: Wenlong Mou, Peng Ding, Martin J. Wainwright, Peter L. Bartlett

    Abstract: We study optimal procedures for estimating a linear functional based on observational data. In many problems of this kind, a widely used assumption is strict overlap, i.e., uniform boundedness of the importance ratio, which measures how well the observational data covers the directions of interest. When it is violated, the classical semi-parametric efficiency bound can easily become infinite, so t… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

  8. arXiv:2211.03899  [pdf, other

    stat.ML cs.LG math.ST

    Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

    Authors: Yaqi Duan, Martin J. Wainwright

    Abstract: We study non-parametric estimation of the value function of an infinite-horizon $γ$-discounted Markov reward process (MRP) using observations from a single trajectory. We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference (TD) estimates, including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the TD$(λ)$ family for $λ\in [0,1)$ as sp… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  9. arXiv:2210.11377  [pdf, other

    stat.ML cs.LG math.OC math.ST

    Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

    Authors: Eric Xia, Martin J. Wainwright

    Abstract: We present and analyze the Krylov-Bellman Boosting (KBB) algorithm for policy evaluation in general state spaces. It alternates between fitting the Bellman residual using non-parametric regression (as in boosting), and estimating the value function via the least-squares temporal difference (LSTD) procedure applied with a feature set that grows adaptively over time. By exploiting the connection to… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: 40 pages, 7 figures

  10. arXiv:2209.13075  [pdf, other

    math.ST cs.IT stat.ML

    Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

    Authors: Wenlong Mou, Martin J. Wainwright, Peter L. Bartlett

    Abstract: The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures. We analyze a broad class of two-stage procedures that first estimate the treatment effect function, and then use this quantity to estimate the linear functional. We prove non-asymptotic upper bounds on the mean-squared error of such procedures: these bounds re… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 56 pages, 6 figures

  11. arXiv:2205.02986  [pdf, other

    math.ST cs.LG stat.ML

    Optimally tackling covariate shift in RKHS-based nonparametric regression

    Authors: Cong Ma, Reese Pathak, Martin J. Wainwright

    Abstract: We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chose… ▽ More

    Submitted 6 June, 2023; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: to appear in the Annals of Statistics

  12. arXiv:2202.02837  [pdf, other

    math.ST cs.LG stat.ML

    A new similarity measure for covariate shift with applications to nonparametric regression

    Authors: Reese Pathak, Cong Ma, Martin J. Wainwright

    Abstract: We study covariate shift in the context of nonparametric regression. We introduce a new measure of distribution mismatch between the source and target distributions that is based on the integrated ratio of probabilities of balls at a given radius. We use the scaling of this measure with respect to the radius to characterize the minimax rate of estimation over a family of Hölder continuous function… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

    Comments: 22 pages, 2 figures, 1 table

  13. arXiv:2201.08518  [pdf, ps, other

    math.ST cs.LG math.OC stat.ML

    Optimal variance-reduced stochastic approximation in Banach spaces

    Authors: Wenlong Mou, Koulik Khamaru, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

    Abstract: We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contras… ▽ More

    Submitted 29 November, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

  14. arXiv:2112.12770  [pdf, ps, other

    math.OC cs.LG math.PR math.ST stat.ML

    Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

    Authors: Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright, Peter L. Bartlett

    Abstract: We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a… ▽ More

    Submitted 11 May, 2024; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: Published at Mathematical Statistics and Learning

  15. arXiv:2109.12002  [pdf, other

    stat.ML cs.LG math.ST

    Optimal policy evaluation using kernel-based temporal difference methods

    Authors: Yaqi Duan, Mengdi Wang, Martin J. Wainwright

    Abstract: We study methods based on reproducing kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process (MRP). We study a regularized form of the kernel least-squares temporal difference (LSTD) estimate; in the population limit of infinite data, it corresponds to the fixed point of a projected Bellman operator defined by the associated reproducing kern… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  16. arXiv:2107.02266  [pdf, other

    math.ST cs.LG stat.ML

    Near-optimal inference in adaptive linear regression

    Authors: Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright

    Abstract: When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation. Our pr… ▽ More

    Submitted 21 March, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: 51 pages, 7 figures

  17. arXiv:2101.07781  [pdf, other

    stat.ML cs.LG math.ST

    Minimax Off-Policy Evaluation for Multi-Armed Bandits

    Authors: Cong Ma, Banghua Zhu, Jiantao Jiao, Martin J. Wainwright

    Abstract: We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards, and develop minimax rate-optimal procedures under three settings. First, when the behavior policy is known, we show that the Switch estimator, a method that alternates between the plug-in and importance sampling estimators, is minimax rate-optimal for all sample sizes. Second, when the behavior poli… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

  18. arXiv:2012.05299  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Optimal oracle inequalities for solving projected fixed-point equations

    Authors: Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright

    Abstract: Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

  19. arXiv:2006.10189  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    Revisiting minimum description length complexity in overparameterized models

    Authors: Raaz Dwivedi, Chandan Singh, Bin Yu, Martin J. Wainwright

    Abstract: Complexity is a fundamental concept underlying statistical learning theory that aims to inform generalization performance. Parameter count, while successful in low-dimensional settings, is not well-justified for overparameterized settings when the number of parameters is more than the number of training samples. We revisit complexity measures based on Rissanen's principle of minimum description le… ▽ More

    Submitted 12 October, 2023; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: First two authors contributed equally

  20. arXiv:2005.11411  [pdf, other

    cs.LG math.ST stat.ML

    Instability, Computational Efficiency and Statistical Accuracy

    Authors: Nhat Ho, Koulik Khamaru, Raaz Dwivedi, Martin J. Wainwright, Michael I. Jordan, Bin Yu

    Abstract: Many statistical estimators are defined as the fixed point of a data-dependent operator, with estimators based on minimizing a cost function being an important special case. The limiting performance of such estimators depends on the properties of the population-level operator in the idealized limit of infinitely many samples. We develop a general framework that yields bounds on statistical accurac… ▽ More

    Submitted 20 March, 2022; v1 submitted 22 May, 2020; originally announced May 2020.

    Comments: 68 pages, 6 Figures, 2 Tables. First three authors contributed equally

  21. arXiv:2005.05238  [pdf, other

    cs.LG math.OC stat.ML

    FedSplit: An algorithmic framework for fast federated optimization

    Authors: Reese Pathak, Martin J. Wainwright

    Abstract: Motivated by federated learning, we consider the hub-and-spoke model of distributed optimization in which a central authority coordinates the computation of a solution among many agents while limiting communication. We first study some past procedures for federated optimization, and show that their fixed points need not correspond to stationary points of the original optimization problem, even in… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: 27 pages, 4 figures

  22. arXiv:2005.03725  [pdf, other

    math.ST cs.LG stat.ML

    Lower bounds in multiple testing: A framework based on derandomized proxies

    Authors: Max Rabinovich, Michael I. Jordan, Martin J. Wainwright

    Abstract: The large bulk of work in multiple testing has focused on specifying procedures that control the false discovery rate (FDR), with relatively less attention being paid to the corresponding Type II error known as the false non-discovery rate (FNR). A line of more recent work in multiple testing has begun to investigate the tradeoffs between the FDR and FNR and to provide lower bounds on the performa… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

  23. arXiv:2004.04719  [pdf, ps, other

    stat.ML cs.LG math.OC math.ST

    On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration

    Authors: Wenlong Mou, Chris Junchi Li, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

    Abstract: We undertake a precise study of the asymptotic and non-asymptotic properties of stochastic approximation procedures with Polyak-Ruppert averaging for solving a linear system $\bar{A} θ= \bar{b}$. When the matrix $\bar{A}$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity. The CLT characterizes the exact asym… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

  24. arXiv:2003.07337  [pdf, other

    stat.ML cs.LG math.OC

    Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

    Authors: Koulik Khamaru, Ashwin Pananjady, Feng Ruan, Martin J. Wainwright, Michael I. Jordan

    Abstract: We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations s… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.

    Comments: 38 pages, 3 figures

  25. arXiv:1912.05153  [pdf, other

    stat.ML cs.DS cs.LG math.PR stat.CO

    Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

    Authors: Wenlong Mou, Nhat Ho, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

    Abstract: We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

  26. arXiv:1909.08749  [pdf, other

    stat.ML cs.LG math.OC math.PR math.ST

    Instance-dependent $\ell_\infty$-bounds for policy evaluation in tabular reinforcement learning

    Authors: Ashwin Pananjady, Martin J. Wainwright

    Abstract: Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without a… ▽ More

    Submitted 15 September, 2020; v1 submitted 18 September, 2019; originally announced September 2019.

    Comments: Version v2 is consistent with manuscript to appear in IEEE Transactions on Information Theory

  27. arXiv:1909.00966  [pdf, ps, other

    math.ST

    A Diffusion Process Perspective on Posterior Contraction Rates for Parameters

    Authors: Wenlong Mou, Nhat Ho, Martin J. Wainwright, Peter Bartlett, Michael I. Jordan

    Abstract: We analyze the posterior contraction rates of parameters in Bayesian models via the Langevin diffusion process, in particular by controlling moments of the stochastic process and taking limits. Analogous to the non-asymptotic analysis of statistical M-estimators and stochastic optimization algorithms, our contraction rates depend on the structure of the population log-likelihood function, and stoc… ▽ More

    Submitted 16 August, 2022; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: 81 pages

  28. arXiv:1908.10859  [pdf, ps, other

    stat.ML cs.DS cs.LG math.OC stat.CO

    High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm

    Authors: Wenlong Mou, Yi-An Ma, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

    Abstract: We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities. The higher-order dynamics allow for more flexible discretization schemes, and we develop a specific method that combines splitting with more accurate integration. For a broad class of $d$-dimensional distributions arising from generali… ▽ More

    Submitted 26 May, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

    Comments: Changes from v1: improved algorithm with $O (d^{1/4} / \varepsilon^{1/2})$ mixing time

  29. arXiv:1907.11331  [pdf, other

    math.PR math.ST stat.CO stat.ML

    Improved Bounds for Discretization of Langevin Diffusions: Near-Optimal Rates without Convexity

    Authors: Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett

    Abstract: We present an improved analysis of the Euler-Maruyama discretization of the Langevin diffusion. Our analysis does not require global contractivity, and yields polynomial dependence on the time horizon. Compared to existing approaches, we make an additional smoothness assumption, and improve the existing rate from $O(η)$ to $O(η^2)$ in terms of the KL divergence. This result matches the correct ord… ▽ More

    Submitted 4 November, 2019; v1 submitted 25 July, 2019; originally announced July 2019.

    Comments: Changes from v1: corrections in the proof of Lemma 6 and Lemma 10; fixed some minor typos

  30. arXiv:1906.04697  [pdf, other

    cs.LG math.OC stat.ML

    Variance-reduced $Q$-learning is minimax optimal

    Authors: Martin J. Wainwright

    Abstract: We introduce and analyze a form of variance-reduced $Q$-learning. For $γ$-discounted MDPs with finite state space $\mathcal{X}$ and action space $\mathcal{U}$, we prove that it yields an $ε$-accurate estimate of the optimal $Q$-function in the $\ell_\infty$-norm using $\mathcal{O} \left(\left(\frac{D}{ ε^2 (1-γ)^3} \right) \; \log \left( \frac{D}{(1-γ)} \right) \right)$ samples, where… ▽ More

    Submitted 8 August, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

    Comments: Update from v1: new Proposition 1 on minimax optimality; updated referencing and discussion of related work

  31. arXiv:1905.06265  [pdf, other

    cs.LG math.OC stat.ML

    Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning

    Authors: Martin J. Wainwright

    Abstract: Motivated by the study of $Q$-learning algorithms in reinforcement learning, we study a class of stochastic approximation procedures based on operators that satisfy monotonicity and quasi-contractivity conditions with respect to an underlying cone. We prove a general sandwich relation on the iterate error at each time, and use it to derive non-asymptotic bounds on the error in terms of a cone-indu… ▽ More

    Submitted 24 June, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: Changes from v1: -- Part of Lemma 1 was incorrect; corrected -- proof of Lemma 2: fixed minor typo in equation (36)

  32. arXiv:1904.02144  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

    Authors: Jianbo Chen, Michael I. Jordan, Martin J. Wainwright

    Abstract: The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks opt… ▽ More

    Submitted 27 April, 2020; v1 submitted 3 April, 2019; originally announced April 2019.

  33. arXiv:1902.00194  [pdf, other

    math.ST cs.LG stat.ML

    Sharp Analysis of Expectation-Maximization for Weakly Identifiable Models

    Authors: Raaz Dwivedi, Nhat Ho, Koulik Khamaru, Martin J. Wainwright, Michael I. Jordan, Bin Yu

    Abstract: We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i.i.d. samples are known to have lower accuracy than the classical $n^{- \frac{1}{2}}$ error. We investigate whether the Expectation-Maximization (EM) algorithm also converges slowly for these models. We provide a rigorous characterization of EM for fitting a weakly identif… ▽ More

    Submitted 15 November, 2021; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: 30 pages, 4 figures. The first three authors contributed equally to this work. To appear in AISTATS 2020

  34. arXiv:1812.08305  [pdf, ps, other

    cs.LG math.OC stat.ML

    Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

    Authors: Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright

    Abstract: We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order eva… ▽ More

    Submitted 18 May, 2020; v1 submitted 19 December, 2018; originally announced December 2018.

    Comments: Version v3 consistent with paper appearing in JMLR

  35. arXiv:1810.00828  [pdf, other

    math.ST stat.ML

    Singularity, Misspecification, and the Convergence Rate of EM

    Authors: Raaz Dwivedi, Nhat Ho, Koulik Khamaru, Michael I. Jordan, Martin J. Wainwright, Bin Yu

    Abstract: A line of recent work has analyzed the behavior of the Expectation-Maximization (EM) algorithm in the well-specified setting, in which the population likelihood is locally strongly concave around its maximizing argument. Examples include suitably separated Gaussian mixture models and mixtures of linear regressions. We consider over-specified settings in which the number of fitted components is lar… ▽ More

    Submitted 28 April, 2020; v1 submitted 1 October, 2018; originally announced October 2018.

    Comments: 63 pages, 12 figures. The first three authors contributed equally to this work. To appear in Annals of Statistics

    MSC Class: Primary 62F15; 62G05; secondary 62G20

  36. arXiv:1806.09544  [pdf, other

    stat.ML cs.IT cs.LG math.ST

    Towards Optimal Estimation of Bivariate Isotonic Matrices with Unknown Permutations

    Authors: Cheng Mao, Ashwin Pananjady, Martin J. Wainwright

    Abstract: Many applications, including rank aggregation, crowd-labeling, and graphon estimation, can be modeled in terms of a bivariate isotonic matrix with unknown permutations acting on its rows and/or columns. We consider the problem of estimating an unknown matrix in this class, based on noisy observations of (possibly, a subset of) its entries. We design and analyze polynomial-time algorithms that impr… ▽ More

    Submitted 26 October, 2019; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: 60 pages, 1 figure. This paper is a longer version of the paper arXiv:1802.09963 v3, which appeared in part as a 4-page extended abstract at Conference on Learning Theory (COLT) 2018. This paper studies the problem in more general settings and in another error metric. This version corrects a statement in Theorem 2 of v1

  37. arXiv:1804.09629  [pdf, other

    stat.ML cs.LG math.OC

    Convergence guarantees for a class of non-convex and non-smooth optimization problems

    Authors: Koulik Khamaru, Martin J. Wainwright

    Abstract: We consider the problem of finding critical points of functions that are non-convex and non-smooth. Studying a fairly broad class of such problems, we analyze the behavior of three gradient-based methods (gradient descent, proximal update, and Frank-Wolfe update). For each of these methods, we establish rates of convergence for general problems, and also prove faster rates for continuous sub-analy… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

    Comments: 50 pages, 2 figures

  38. arXiv:1803.07763  [pdf, other

    math.ST cs.IT

    From Gauss to Kolmogorov: Localized Measures of Complexity for Ellipses

    Authors: Yuting Wei, Billy Fang, Martin J. Wainwright

    Abstract: The Gaussian width is a fundamental quantity in probability, statistics and geometry, known to underlie the intrinsic difficulty of estimation and hypothesis testing. In this work, we show how the Gaussian width, when localized to any given point of an ellipse, can be controlled by the Kolmogorov width of a set similarly localized. This connection leads to an explicit characterization of the estim… ▽ More

    Submitted 21 March, 2018; originally announced March 2018.

  39. arXiv:1802.09963  [pdf, other

    stat.ML cs.IT cs.LG math.ST

    Breaking the $1/\sqrt{n}$ Barrier: Faster Rates for Permutation-based Models in Polynomial Time

    Authors: Cheng Mao, Ashwin Pananjady, Martin J. Wainwright

    Abstract: Many applications, including rank aggregation and crowd-labeling, can be modeled in terms of a bivariate isotonic matrix with unknown permutations acting on its rows and columns. We consider the problem of estimating such a matrix based on noisy observations of a subset of its entries, and design and analyze a polynomial-time algorithm that improves upon the state of the art. In particular, our re… ▽ More

    Submitted 5 June, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: 30 pages, 1 figure. Accepted for presentation at Conference on Learning Theory (COLT) 2018

  40. arXiv:1712.00711  [pdf, other

    math.ST cs.IT

    The local geometry of testing in ellipses: Tight control via localized Kolmogorov widths

    Authors: Yuting Wei, Martin J. Wainwright

    Abstract: We study the local geometry of testing a mean vector within a high-dimensional ellipse against a compound alternative. Given samples of a Gaussian random vector, the goal is to distinguish whether the mean is equal to a known vector within an ellipse, or equal to some other unknown vector in the ellipse. Such ellipse testing problems lie at the heart of several applications, including non-parametr… ▽ More

    Submitted 3 January, 2018; v1 submitted 3 December, 2017; originally announced December 2017.

  41. arXiv:1710.00499  [pdf, other

    stat.ME cs.LG math.ST stat.ML

    Online control of the false discovery rate with decaying memory

    Authors: Aaditya Ramdas, Fanny Yang, Martin J. Wainwright, Michael I. Jordan

    Abstract: In the online multiple testing problem, p-values corresponding to different null hypotheses are observed one by one, and the decision of whether or not to reject the current hypothesis must be made immediately, after which the next p-value is observed. Alpha-investing algorithms to control the false discovery rate (FDR), formulated by Foster and Stine, have been generalized and applied to many set… ▽ More

    Submitted 2 October, 2017; originally announced October 2017.

    Comments: 20 pages, 4 figures. Published in the proceedings of NIPS 2017

  42. arXiv:1709.10250  [pdf, other

    stat.ME cs.LG math.ST stat.ML

    DAGGER: A sequential algorithm for FDR control on DAGs

    Authors: Aaditya Ramdas, Jianbo Chen, Martin J. Wainwright, Michael I. Jordan

    Abstract: We propose a linear-time, single-pass, top-down algorithm for multiple testing on directed acyclic graphs (DAGs), where nodes represent hypotheses and edges specify a partial ordering in which hypotheses must be tested. The procedure is guaranteed to reject a sub-DAG with bounded false discovery rate (FDR) while satisfying the logical constraint that a rejected node's parents must also be rejected… ▽ More

    Submitted 4 December, 2018; v1 submitted 29 September, 2017; originally announced September 2017.

    Comments: 29 pages, 10 figures, accepted for publication by Biometrika

  43. arXiv:1705.05391  [pdf, other

    math.ST stat.AP stat.ME

    Optimal Rates and Tradeoffs in Multiple Testing

    Authors: Maxim Rabinovich, Aaditya Ramdas, Michael I. Jordan, Martin J. Wainwright

    Abstract: Multiple hypothesis testing is a central topic in statistics, but despite abundant work on the false discovery rate (FDR) and the corresponding Type-II error concept known as the false non-discovery rate (FNR), a fine-grained understanding of the fundamental limits of multiple testing has not been developed. Our main contribution is to derive a precise non-asymptotic tradeoff between FNR and FDR f… ▽ More

    Submitted 15 May, 2017; originally announced May 2017.

  44. arXiv:1704.07461  [pdf, other

    stat.ML cs.IT math.ST

    Denoising Linear Models with Permuted Data

    Authors: Ashwin Pananjady, Martin J. Wainwright, Thomas A. Courtade

    Abstract: The multivariate linear regression model with shuffled data and additive Gaussian noise arises in various correspondence estimation and matching problems. Focusing on the denoising aspect of this problem, we provide a characterization the minimax error rate that is sharp up to logarithmic factors. We also analyze the performance of two versions of a computationally efficient estimator, and establi… ▽ More

    Submitted 24 April, 2017; originally announced April 2017.

    Comments: To appear in part at ISIT 2017, Aachen

  45. arXiv:1703.06810  [pdf, other

    math.ST cs.IT

    The geometry of hypothesis testing over convex cones: Generalized likelihood tests and minimax radii

    Authors: Yuting Wei, Martin J. Wainwright, Adityanand Guntuboyina

    Abstract: We consider a compound testing problem within the Gaussian sequence model in which the null and alternative are specified by a pair of closed, convex cones. Such cone testing problem arise in various applications, including detection of treatment effects, trend detection in econometrics, signal detection in radar processing, and shape-constrained inference in non-parametric statistics. We provide… ▽ More

    Submitted 26 March, 2018; v1 submitted 20 March, 2017; originally announced March 2017.

  46. arXiv:1703.06222  [pdf

    stat.ME math.ST stat.ML

    A unified treatment of multiple testing with prior knowledge using the p-filter

    Authors: Aaditya Ramdas, Rina Foygel Barber, Martin J. Wainwright, Michael I. Jordan

    Abstract: There is a significant literature on methods for incorporating knowledge into multiple testing procedures so as to improve their power and precision. Some common forms of prior knowledge include (a) beliefs about which hypotheses are null, modeled by non-uniform prior weights; (b) differing importances of hypotheses, modeled by differing penalties for false discoveries; (c) multiple arbitrary part… ▽ More

    Submitted 6 August, 2019; v1 submitted 17 March, 2017; originally announced March 2017.

    Comments: 36 pages, 1 figure, accepted for publication at the Annals of Statistics

  47. arXiv:1609.00978  [pdf, ps, other

    stat.ML cs.LG math.OC

    Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences

    Authors: Chi **, Yuchen Zhang, Sivaraman Balakrishnan, Martin J. Wainwright, Michael Jordan

    Abstract: We provide two fundamental results on the population (infinite-sample) likelihood function of Gaussian mixture models with $M \geq 3$ components. Our first main result shows that the population likelihood function has bad local maxima even in the special case of equally-weighted mixtures of well-separated and spherical Gaussians. We prove that the log-likelihood value of these bad local maxima can… ▽ More

    Submitted 4 September, 2016; originally announced September 2016.

    Comments: Neural Information Processing Systems (NIPS) 2016

  48. arXiv:1608.02902  [pdf, other

    math.ST cs.IT stat.ML

    Linear Regression with an Unknown Permutation: Statistical and Computational Limits

    Authors: Ashwin Pananjady, Martin J. Wainwright, Thomas A. Courtade

    Abstract: Consider a noisy linear observation model with an unknown permutation, based on observing $y = Π^* A x^* + w$, where $x^* \in \mathbb{R}^d$ is an unknown vector, $Π^*$ is an unknown $n \times n$ permutation matrix, and $w \in \mathbb{R}^n$ is additive Gaussian noise. We analyze the problem of permutation recovery in a random design setting in which the entries of the matrix $A$ are drawn i.i.d. fr… ▽ More

    Submitted 9 August, 2016; originally announced August 2016.

    Comments: To appear in part at the 2016 Allerton Conference on Control, Communication and Computing

  49. arXiv:1605.02077  [pdf, other

    math.ST cs.LG math.PR

    Function-Specific Mixing Times and Concentration Away from Equilibrium

    Authors: Maxim Rabinovich, Aaditya Ramdas, Michael I. Jordan, Martin J. Wainwright

    Abstract: Slow mixing is the central hurdle when working with Markov chains, especially those used for Monte Carlo approximations (MCMC). In many applications, it is only of interest to estimate the stationary expectations of a small set of functions, and so the usual definition of mixing based on total variation convergence may be too conservative. Accordingly, we introduce function-specific analogs of mix… ▽ More

    Submitted 30 September, 2016; v1 submitted 6 May, 2016; originally announced May 2016.

    MSC Class: Markov chains (60J10); Markov processes: estimation (62M05); Markov processes: hypothesis testing (62M02)

  50. arXiv:1512.08269  [pdf, other

    stat.ML cs.IT math.ST

    Statistical and Computational Guarantees for the Baum-Welch Algorithm

    Authors: Fanny Yang, Sivaraman Balakrishnan, Martin J. Wainwright

    Abstract: The Hidden Markov Model (HMM) is one of the mainstays of statistical modeling of discrete time series, with applications including speech recognition, computational biology, computer vision and econometrics. Estimating an HMM from its observation process is often addressed via the Baum-Welch algorithm, which is known to be susceptible to local optima. In this paper, we first give a general charact… ▽ More

    Submitted 27 December, 2015; originally announced December 2015.