Skip to main content

Showing 1–50 of 57 results for author: Mackey, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2404.12290  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Debiased Distribution Compression

    Authors: Lingxiao Li, Raaz Dwivedi, Lester Mackey

    Abstract: Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein kerne… ▽ More

    Submitted 26 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted to ICML 2024

  2. arXiv:2310.02304  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation

    Authors: Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai

    Abstract: Several recent advances in AI systems (e.g., Tree-of-Thoughts and Program-Aided Language Models) solve problems by providing a "scaffolding" program that structures multiple calls to language models to generate better outputs. A scaffolding program is written in a programming language such as Python. In this work, we use a language-model-infused scaffolding program to improve itself. We start with… ▽ More

    Submitted 1 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

  3. arXiv:2306.11839  [pdf, other

    stat.ME cs.LG stat.AP stat.ML

    Should I Stop or Should I Go: Early Stop** with Heterogeneous Populations

    Authors: Hammaad Adam, Fan Yin, Huibin, Hu, Neil Tenenholtz, Lorin Crawford, Lester Mackey, Allison Koenecke

    Abstract: Randomized experiments often need to be stopped prematurely due to the treatment having an unintended harmful effect. Existing methods that determine when to stop an experiment early are typically applied to the data in aggregate and do not account for treatment effect heterogeneity. In this paper, we study the early stop** of experiments for harm on heterogeneous populations. We first establish… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (spotlight)

  4. arXiv:2305.14943  [pdf, other

    stat.ML cs.LG stat.ME

    Learning Rate Free Sampling in Constrained Domains

    Authors: Louis Sharrock, Lester Mackey, Christopher Nemeth

    Abstract: We introduce a suite of new particle-based algorithms for sampling in constrained domains which are entirely learning rate free. Our approach leverages coin betting ideas from convex optimisation, and the viewpoint of constrained sampling as a mirrored optimisation problem on the space of probability measures. Based on this viewpoint, we also introduce a unifying framework for several existing con… ▽ More

    Submitted 26 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023

  5. arXiv:2301.05974  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Compress Then Test: Powerful Kernel Testing in Near-linear Time

    Authors: Carles Domingo-Enrich, Raaz Dwivedi, Lester Mackey

    Abstract: Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on $n$ sample points. However, existing kernel tests either run in $n^2$ time or sacrifice undue power to improve runtime. To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximate… ▽ More

    Submitted 23 February, 2023; v1 submitted 14 January, 2023; originally announced January 2023.

    Comments: Accepted as a paper at AISTATS 2023

  6. arXiv:2211.09721  [pdf, ps, other

    cs.LG stat.ML

    A Finite-Particle Convergence Rate for Stein Variational Gradient Descent

    Authors: Jiaxin Shi, Lester Mackey

    Abstract: We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub-Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order… ▽ More

    Submitted 1 November, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2023

  7. arXiv:2211.05408  [pdf, other

    stat.ML cs.LG stat.CO

    Controlling Moments with Kernel Stein Discrepancies

    Authors: Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey

    Abstract: Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant. Notable applications include the diagnosis of approximate MCMC samplers and goodness-of-fit tests for unnormalized statistical models. The present work analyzes the convergence control properties of KSDs. We first show tha… ▽ More

    Submitted 25 June, 2024; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: 103 pages, 10 figures

  8. arXiv:2209.12835  [pdf, ps, other

    stat.ML cs.LG math.ST

    Targeted Separation and Convergence with Kernel Discrepancies

    Authors: Alessandro Barp, Carl-Johann Simon-Gabriel, Mark Girolami, Lester Mackey

    Abstract: Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to… ▽ More

    Submitted 6 December, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

  9. arXiv:2209.10666  [pdf, other

    cs.LG physics.ao-ph stat.ML

    Adaptive Bias Correction for Improved Subseasonal Forecasting

    Authors: Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Judah Cohen, Miruna Oprescu, Ernest Fraenkel, Lester Mackey

    Abstract: Subseasonal forecasting -- predicting temperature and precipitation 2 to 6 weeks ahead -- is critical for effective water allocation, wildfire management, and drought and flood mitigation. Recent international research efforts have advanced the subseasonal capabilities of operational dynamical models, yet temperature and precipitation prediction skills remain poor, partly due to stubborn errors in… ▽ More

    Submitted 15 May, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

  10. arXiv:2204.01668  [pdf, other

    stat.CO cs.LG stat.ME stat.ML

    Scalable Spike-and-Slab

    Authors: Niloy Biswas, Lester Mackey, Xiao-Li Meng

    Abstract: Spike-and-slab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties. However, existing samplers for spike-and-slab posteriors incur prohibitive computational costs when the number of variables is large. In this article, we propose Scalable Spike-and-Slab ($S^3$), a scalable Gibbs sampling implementation for high-dimensional Ba… ▽ More

    Submitted 25 June, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to ICML 2022. Open-source software in Python and R available at https://github.com/niloyb/ScaleSpikeSlab

  11. arXiv:2202.09497  [pdf, other

    stat.ML cs.LG

    Gradient Estimation with Discrete Stein Operators

    Authors: Jiaxin Shi, Yuhao Zhou, Jessica Hwang, Michalis K. Titsias, Lester Mackey

    Abstract: Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems. However, when the distribution is discrete, most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, we introduce a variance reduction technique based on Stein oper… ▽ More

    Submitted 14 April, 2024; v1 submitted 18 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022. Source code: https://github.com/thjashin/rodeo

  12. arXiv:2112.03152  [pdf, other

    stat.CO cs.LG stat.ME stat.ML

    Bounding Wasserstein distance with couplings

    Authors: Niloy Biswas, Lester Mackey

    Abstract: Markov chain Monte Carlo (MCMC) provides asymptotically consistent estimates of intractable posterior expectations as the number of iterations tends to infinity. However, in large data applications, MCMC can be computationally expensive per iteration. This has catalyzed interest in approximating MCMC in a manner that improves computational speed per iteration but does not produce asymptotically co… ▽ More

    Submitted 2 November, 2023; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: 52 pages, 8 figures

  13. arXiv:2111.07941  [pdf, other

    stat.ML cs.DS cs.LG math.ST stat.ME

    Distribution Compression in Near-linear Time

    Authors: Abhishek Shetty, Raaz Dwivedi, Lester Mackey

    Abstract: In distribution compression, one aims to accurately summarize a probability distribution $\mathbb{P}$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$. Unfortunately, these algorithms suffer from quadrat… ▽ More

    Submitted 17 October, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: Accepted to ICLR 2022; An outdated proof of Theorem 2 was previously included in the appendix; this oversight is corrected in this version

  14. arXiv:2110.01593  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Generalized Kernel Thinning

    Authors: Raaz Dwivedi, Lester Mackey

    Abstract: The kernel thinning (KT) algorithm of Dwivedi and Mackey (2021) compresses a probability distribution more effectively than independent sampling by targeting a reproducing kernel Hilbert space (RKHS) and leveraging a less smooth square-root kernel. Here we provide four improvements. First, we show that KT applied directly to the target RKHS yields tighter, dimension-free guarantees for any kernel,… ▽ More

    Submitted 19 July, 2022; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: Published in ICLR 2022

  15. arXiv:2109.10399  [pdf, other

    physics.ao-ph cs.LG stat.ML

    SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and Benchmarking

    Authors: Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Miruna Oprescu, Judah Cohen, Franklyn Wang, Sean Knight, Maria Geogdzhayeva, Sam Levang, Ernest Fraenkel, Lester Mackey

    Abstract: Subseasonal forecasting of the weather two to six weeks in advance is critical for resource allocation and advance disaster notice but poses many challenges for the forecasting community. At this forecast horizon, physics-based dynamical models have limited skill, and the targets for prediction depend in a complex manner on both local weather variables and global climate variables. Recently, machi… ▽ More

    Submitted 16 January, 2024; v1 submitted 21 September, 2021; originally announced September 2021.

  16. arXiv:2107.02266  [pdf, other

    math.ST cs.LG stat.ML

    Near-optimal inference in adaptive linear regression

    Authors: Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright

    Abstract: When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation. Our pr… ▽ More

    Submitted 21 March, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: 51 pages, 7 figures

  17. arXiv:2106.12506  [pdf, other

    stat.ML cs.LG

    Sampling with Mirrored Stein Operators

    Authors: Jiaxin Shi, Chang Liu, Lester Mackey

    Abstract: We introduce a new family of particle evolution samplers suitable for constrained domains and non-Euclidean geometries. Stein Variational Mirror Descent and Mirrored Stein Variational Gradient Descent minimize the Kullback-Leibler (KL) divergence to constrained target distributions by evolving particles in a dual space defined by a mirror map. Stein Variational Natural Gradient exploits non-Euclid… ▽ More

    Submitted 24 April, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: ICLR 2022; Source code: https://github.com/thjashin/mirror-stein-samplers

  18. arXiv:2106.06885  [pdf, other

    cs.LG stat.ML

    Online Learning with Optimism and Delay

    Authors: Genevieve Flaspohler, Francesco Orabona, Judah Cohen, Soukayna Mouatadid, Miruna Oprescu, Paulo Orenstein, Lester Mackey

    Abstract: Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback. Our algorithms -- DORM, DORM+, and AdaHedgeD -- arise from a novel reduction of delayed online learning to optimistic online learning that reveals how optimistic hints can mitigate the regr… ▽ More

    Submitted 12 July, 2021; v1 submitted 12 June, 2021; originally announced June 2021.

    Comments: ICML 2021. 9 pages of main paper and 26 pages of appendix text

  19. arXiv:2105.05842  [pdf, other

    stat.ML cs.LG math.ST stat.CO stat.ME

    Kernel Thinning

    Authors: Raaz Dwivedi, Lester Mackey

    Abstract: We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}_{\star}$ and $O(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated… ▽ More

    Submitted 11 May, 2024; v1 submitted 12 May, 2021; originally announced May 2021.

    Comments: Accepted for presentation as an extended abstract at the Conference on Learning Theory (COLT) 2021, and published in the Journal of Machine Learning Research (JMLR) 2024

  20. arXiv:2105.03481  [pdf, other

    stat.ME math.ST stat.CO

    Stein's Method Meets Computational Statistics: A Review of Some Recent Developments

    Authors: Andreas Anastasiou, Alessandro Barp, François-Xavier Briol, Bruno Ebner, Robert E. Gaunt, Fatemeh Ghaderinezhad, Jackson Gorham, Arthur Gretton, Christophe Ley, Qiang Liu, Lester Mackey, Chris. J. Oates, Gesine Reinert, Yvik Swan

    Abstract: Stein's method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein's method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments and, in doing so, to stim… ▽ More

    Submitted 22 June, 2022; v1 submitted 7 May, 2021; originally announced May 2021.

    Comments: Accepted for publication by "Statistical Science"

  21. arXiv:2105.01029  [pdf, other

    stat.ML cs.AI cs.CL cs.CV cs.LG

    Initialization and Regularization of Factorized Neural Layers

    Authors: Mikhail Khodak, Neil Tenenholtz, Lester Mackey, Nicolò Fusi

    Abstract: Factorized layers--operations parameterized by products of two or more matrices--occur in a variety of deep learning contexts, including compressed model training, certain types of knowledge distillation, and multi-head self-attention architectures. We study how to initialize and regularize deep nets containing such layers, examining two simple, understudied schemes, spectral initialization and Fr… ▽ More

    Submitted 4 October, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: ICLR 2021 camera-ready, amended due to error pointed out in arXiv:2209.13569v1 (amendment shown in blue)

  22. arXiv:2104.09732  [pdf, other

    stat.ML cs.LG

    Knowledge Distillation as Semiparametric Inference

    Authors: Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, Lester Mackey

    Abstract: A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model. Surprisingly, this two-step knowledge distillation process often leads to higher accuracy than training the student directly on labeled data. To explain and enhance this phenomenon, we cast knowledge distillation as a semiparametric in… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  23. arXiv:2010.10218  [pdf, other

    cs.LG cs.AI stat.ML

    Model-specific Data Subsampling with Influence Functions

    Authors: Anant Raj, Cameron Musco, Lester Mackey, Nicolo Fusi

    Abstract: Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the datasets of interest are increasing in size. As a result, the process of model selection is time-consuming and computationally inefficient. In this work, we dev… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  24. arXiv:2009.10780  [pdf, other

    stat.ME math.ST stat.ML

    Independent finite approximations for Bayesian nonparametric inference

    Authors: Tin D. Nguyen, Jonathan Huggins, Lorenzo Masoero, Lester Mackey, Tamara Broderick

    Abstract: Completely random measures (CRMs) and their normalizations (NCRMs) offer flexible models in Bayesian nonparametrics. But their infinite dimensionality presents challenges for inference. Two popular finite approximations are truncated finite approximations (TFAs) and independent finite approximations (IFAs). While the former have been well-studied, IFAs lack similarly general bounds on approximatio… ▽ More

    Submitted 5 November, 2023; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: The paper has been accepted for publication in Bayesian Analysis. Currently, it is posted on Bayesian Analysis Advance Publication

  25. arXiv:2007.12671  [pdf, other

    stat.ML cs.LG math.ST

    Cross-validation Confidence Intervals for Test Error

    Authors: Pierre Bayle, Alexandre Bayle, Lucas Janson, Lester Mackey

    Abstract: This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for $k$-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller $k$-fold test error than another.… ▽ More

    Submitted 31 October, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020); 40 pages, 15 figures

  26. arXiv:2007.02857  [pdf, other

    stat.ML cs.LG math.PR stat.ME

    Stochastic Stein Discrepancies

    Authors: Jackson Gorham, Anant Raj, Lester Mackey

    Abstract: Stein discrepancies (SDs) monitor convergence and non-convergence in approximate inference when exact integration and sampling are intractable. However, the computation of a Stein discrepancy can be prohibitive if the Stein operator - often a sum over likelihood terms or potentials - is expensive to evaluate. To address this deficiency, we show that stochastic Stein discrepancies (SSDs) based on s… ▽ More

    Submitted 22 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

  27. arXiv:2006.09268  [pdf, ps, other

    cs.LG math.PR math.ST stat.ML

    Metrizing Weak Convergence with Maximum Mean Discrepancies

    Authors: Carl-Johann Simon-Gabriel, Alessandro Barp, Bernhard Schölkopf, Lester Mackey

    Abstract: This paper characterizes the maximum mean discrepancies (MMD) that metrize the weak convergence of probability measures for a wide class of kernels. More precisely, we prove that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, whose reproducing kernel Hilbert space (RKHS) functions vanish at infinity, metrizes the weak convergence of… ▽ More

    Submitted 3 September, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 14 pages. Corrects in particular Thm.12 of Simon-Gabriel and Schölkopf, JMLR, 19(44):1-29, 2018. See http://jmlr.org/papers/v19/16-291.html

    MSC Class: 60B10 (Primary) 60F05; 60-08; 28-08 (Secondary) ACM Class: G.3; I.2.6; I.5.0

  28. arXiv:2006.07201  [pdf, other

    econ.EM cs.LG math.ST stat.ML

    Minimax Estimation of Conditional Moment Models

    Authors: Nishanth Dikkala, Greg Lewis, Lester Mackey, Vasilis Syrgkanis

    Abstract: We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression. We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game between a modeler who is optimizing over the hypothesis space of the target model and an adversary… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  29. arXiv:2005.03952  [pdf, other

    stat.ME math.ST stat.CO stat.ML

    Optimal Thinning of MCMC Output

    Authors: Marina Riabiz, Wilson Chen, Jon Cockayne, Pawel Swietach, Steven A. Niederer, Lester Mackey, Chris. J. Oates

    Abstract: The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively se… ▽ More

    Submitted 11 January, 2022; v1 submitted 8 May, 2020; originally announced May 2020.

    Comments: To appear in the Journal of the Royal Statistical Society, Series B, 2022+

  30. arXiv:2003.09465  [pdf, other

    stat.ML cs.LG

    Weighted Meta-Learning

    Authors: Diana Cai, Rishit Sheth, Lester Mackey, Nicolo Fusi

    Abstract: Meta-learning leverages related source tasks to learn an initialization that can be quickly fine-tuned to a target task with limited labeled examples. However, many popular meta-learning algorithms, such as model-agnostic meta-learning (MAML), only assume access to the target samples for fine-tuning. In this work, we provide a general framework for meta-learning based on weighting the loss of diff… ▽ More

    Submitted 20 March, 2020; originally announced March 2020.

    Comments: 18 pages, 7 figures

  31. arXiv:2003.00617  [pdf, other

    stat.ML cs.LG

    Approximate Cross-validation: Guarantees for Model Assessment and Selection

    Authors: Ashia Wilson, Maximilian Kasy, Lester Mackey

    Abstract: Cross-validation (CV) is a popular approach for assessing and selecting predictive models. However, when the number of folds is large, CV suffers from a need to repeatedly refit a learning procedure on a large number of training datasets. Recent work in empirical risk minimization (ERM) approximates the expensive refitting with a single Newton step warm-started from the full training set optimizer… ▽ More

    Submitted 10 June, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

  32. arXiv:1911.01575  [pdf, other

    math.OC cs.LG stat.ML

    Importance Sampling via Local Sensitivity

    Authors: Anant Raj, Cameron Musco, Lester Mackey

    Abstract: Given a loss function $F:\mathcal{X} \rightarrow \R^+$ that can be written as the sum of losses over a large set of inputs $a_1,\ldots, a_n$, it is often desirable to approximate $F$ by subsampling the input points. Strong theoretical guarantees require taking into account the importance of each point, measured by how much its individual loss contributes to $F(x)$. Maximizing this importance over… ▽ More

    Submitted 19 March, 2020; v1 submitted 3 November, 2019; originally announced November 2019.

  33. arXiv:1908.02341  [pdf, other

    stat.ML cs.LG

    Single Point Transductive Prediction

    Authors: Nilesh Tripuraneni, Lester Mackey

    Abstract: Standard methods in supervised learning separate training and prediction: the model is fit independently of any test points it may encounter. However, can knowledge of the next test point $\mathbf{x}_{\star}$ be exploited to improve prediction accuracy? We address this question in the context of linear prediction, showing how techniques from semi-parametric inference can be used transductively to… ▽ More

    Submitted 29 June, 2020; v1 submitted 6 August, 2019; originally announced August 2019.

    Comments: 37th International Conference on Machine Learning (ICML 2020)

  34. A Kernel Stein Test for Comparing Latent Variable Models

    Authors: Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton

    Abstract: We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable. The proposed test generalizes the recently proposed kernel Stein discrepancy (KSD) tests (Liu et al., 2016, Chwialkowski et al., 2016, Yang et al., 2018) t… ▽ More

    Submitted 9 May, 2023; v1 submitted 1 July, 2019; originally announced July 2019.

    Comments: This is a pre-copyedited, author-produced version of an article accepted for publication in The Journal of the Royal Statistical Society Series: B following peer review

  35. arXiv:1906.08283  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Minimum Stein Discrepancy Estimators

    Authors: Alessandro Barp, Francois-Xavier Briol, Andrew B. Duncan, Mark Girolami, Lester Mackey

    Abstract: When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with co… ▽ More

    Submitted 5 October, 2022; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: Accepted for publication at NeurIPS 2019

  36. arXiv:1906.07868  [pdf, other

    stat.ML cs.LG

    Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond

    Authors: Xuechen Li, Denny Wu, Lester Mackey, Murat A. Erdogdu

    Abstract: Sampling with Markov chain Monte Carlo methods often amounts to discretizing some continuous-time dynamics with numerical integration. In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth Itô diffusions exhibiting fast Wasserstein-$2$ contraction, based on local deviation properties of the integration scheme. In particular, we study a sampling alg… ▽ More

    Submitted 1 February, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: 56 pages; update acknowledgements

  37. arXiv:1905.03673  [pdf, other

    stat.CO math.ST stat.ME stat.ML

    Stein Point Markov Chain Monte Carlo

    Authors: Wilson Ye Chen, Alessandro Barp, François-Xavier Briol, Jackson Gorham, Mark Girolami, Lester Mackey, Chris. J. Oates

    Abstract: An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain ea… ▽ More

    Submitted 14 September, 2020; v1 submitted 9 May, 2019; originally announced May 2019.

    Comments: Minor bug fixed in Theorem 4 (result unchanged)

    Journal ref: ICML 2019

  38. arXiv:1812.02271  [pdf, other

    cs.LG stat.ML

    Teacher-Student Compression with Generative Adversarial Networks

    Authors: Ruishan Liu, Nicolo Fusi, Lester Mackey

    Abstract: More accurate machine learning models often demand more computation and memory at test time, making them difficult to deploy on CPU- or memory-constrained devices. Teacher-student compression (TSC), also known as distillation, alleviates this burden by training a less expensive student model to mimic the expensive teacher model while maintaining most of the original accuracy. However, when fresh d… ▽ More

    Submitted 20 March, 2020; v1 submitted 5 December, 2018; originally announced December 2018.

  39. arXiv:1810.12361  [pdf, other

    stat.ML cs.LG stat.CO

    Global Non-convex Optimization with Discretized Diffusions

    Authors: Murat A. Erdogdu, Lester Mackey, Ohad Shamir

    Abstract: An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems. We show that this property holds for any suitably smooth diffusion and that different diffusions are suitable for optimizing different classes of convex and non-convex functions. This allows us to design diffusions suitable for globally optimizing… ▽ More

    Submitted 27 December, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: 19 pages, NeurIPS 2018 camera ready version

  40. arXiv:1809.07394  [pdf, other

    stat.AP cs.CY stat.ML

    Improving Subseasonal Forecasting in the Western U.S. with Machine Learning

    Authors: Jessica Hwang, Paulo Orenstein, Judah Cohen, Karl Pfeiffer, Lester Mackey

    Abstract: Water managers in the western United States (U.S.) rely on longterm forecasts of temperature and precipitation to prepare for droughts and other wet weather extremes. To improve the accuracy of these longterm forecasts, the U.S. Bureau of Reclamation and the National Oceanic and Atmospheric Administration (NOAA) launched the Subseasonal Climate Forecast Rodeo, a year-long real-time forecasting cha… ▽ More

    Submitted 22 May, 2019; v1 submitted 19 September, 2018; originally announced September 2018.

  41. arXiv:1806.07788  [pdf, other

    stat.ML cs.LG

    Random Feature Stein Discrepancies

    Authors: Jonathan H. Huggins, Lester Mackey

    Abstract: Computable Stein discrepancies have been deployed for a variety of applications, ranging from sampler selection in posterior inference to approximate Bayesian inference to goodness-of-fit testing. Existing convergence-determining Stein discrepancies admit strong theoretical guarantees but suffer from a computational cost that grows quadratically in the sample size. While linear-time Stein discrepa… ▽ More

    Submitted 9 October, 2021; v1 submitted 20 June, 2018; originally announced June 2018.

    Comments: In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems (NeurIPS 2018). Code available at: https://bitbucket.org/jhhuggins/random-feature-stein-discrepancies

  42. arXiv:1803.10161  [pdf, other

    stat.CO cs.LG stat.ML

    Stein Points

    Authors: Wilson Ye Chen, Lester Mackey, Jackson Gorham, François-Xavier Briol, Chris J. Oates

    Abstract: An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$. This paper focuses on methods where the selection of points is essentially deterministic, with an emphasis on achieving accurate approximation when $n$ is small. To this end, we present `Stein P… ▽ More

    Submitted 19 June, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

  43. arXiv:1712.06695  [pdf, other

    stat.ML cs.LG

    Accurate Inference for Adaptive Linear Models

    Authors: Yash Deshpande, Lester Mackey, Vasilis Syrgkanis, Matt Taddy

    Abstract: Estimators computed from adaptively collected data do not behave like their non-adaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method -- $\mathbf{W}$-decorrelation -- for transforming the bias of adaptive linear regression estimators into variance. The method u… ▽ More

    Submitted 2 January, 2020; v1 submitted 18 December, 2017; originally announced December 2017.

    Comments: Typos fixed for clarification

  44. arXiv:1711.00342  [pdf, other

    cs.LG econ.EM math.ST stat.ML

    Orthogonal Machine Learning: Power and Limitations

    Authors: Lester Mackey, Vasilis Syrgkanis, Ilias Zadik

    Abstract: Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate. The key is to employ Neyman-orthogonal moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the $n^{-1/4}$ requirement can be improved to… ▽ More

    Submitted 1 August, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

  45. arXiv:1707.05807  [pdf, other

    stat.ML cs.LG math.PR stat.ME

    Improving Gibbs Sampler Scan Quality with DoGS

    Authors: Ioannis Mitliagkas, Lester Mackey

    Abstract: The pairwise influence matrix of Dobrushin has long been used as an analytical tool to bound the rate of convergence of Gibbs sampling. In this work, we use Dobrushin influence as the basis of a practical tool to certify and efficiently improve the quality of a discrete Gibbs sampler. Our Dobrushin-optimized Gibbs samplers (DoGS) offer customized variable selection orders for a given sampling budg… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

    Comments: ICML 2017

  46. arXiv:1703.01717  [pdf, other

    stat.ML cs.LG

    Measuring Sample Quality with Kernels

    Authors: Jackson Gorham, Lester Mackey

    Abstract: Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid sampling at the cost of more biased inference. Since standard MCMC diagnostics fail to detect these biases, researchers have developed computable Stein discrepancy measures that provably determine the convergence of a sample to its target distribution. This approach was recently combined with the theory of reproducing ker… ▽ More

    Submitted 14 October, 2020; v1 submitted 5 March, 2017; originally announced March 2017.

  47. arXiv:1611.06972  [pdf, other

    stat.ML cs.LG math.PR

    Measuring Sample Quality with Diffusions

    Authors: Jackson Gorham, Andrew B. Duncan, Sebastian J. Vollmer, Lester Mackey

    Abstract: Stein's method for measuring convergence to a continuous target distribution relies on an operator characterizing the target and Stein factor bounds on the solutions of an associated differential equation. While such operators and bounds are readily available for a diversity of univariate targets, few multivariate targets have been analyzed. We introduce a new class of characterizing operators bas… ▽ More

    Submitted 12 November, 2018; v1 submitted 21 November, 2016; originally announced November 2016.

    MSC Class: 60J60; 62-04; 62E17; 60E15; 65C60 (Primary) 62-07; 65C05; 68T05 (Secondary)

  48. arXiv:1511.05190  [pdf, other

    hep-ph physics.data-an stat.ML

    Jet-Images -- Deep Learning Edition

    Authors: Luke de Oliveira, Michael Kagan, Lester Mackey, Benjamin Nachman, Ariel Schwartzman

    Abstract: Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons. Modern deep learning algorithms trained on jet images can out-perform standard physically-motivated feature drive… ▽ More

    Submitted 22 January, 2017; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: 32 pages, 24 figures. Version that is published in JHEP

    Journal ref: JHEP 07 (2016) 069

  49. Fuzzy Jets

    Authors: Lester Mackey, Benjamin Nachman, Ariel Schwartzman, Conrad Stansbury

    Abstract: Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets. To construct jets, the experimental collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hierarchical clustering schemes known as sequential recombination. We propose a new class of algorithms for clustering jets that use infrared and co… ▽ More

    Submitted 7 September, 2015; originally announced September 2015.

    Journal ref: JHEP 06 (2016) 010

  50. arXiv:1508.01280  [pdf, other

    stat.ME stat.AP stat.CO

    Empirical Bayesian analysis of simultaneous changepoints in multiple data sequences

    Authors: Zhou Fan, Lester Mackey

    Abstract: Copy number variations in cancer cells and volatility fluctuations in stock prices are commonly manifested as changepoints occurring at the same positions across related data sequences. We introduce a Bayesian modeling framework, BASIC, that employs a changepoint prior to capture the co-occurrence tendency in data of this type. We design efficient algorithms to sample from and maximize over the BA… ▽ More

    Submitted 13 April, 2017; v1 submitted 6 August, 2015; originally announced August 2015.

    Comments: 31 pages, 11 figures v3: Modify synthetic data comparisons based on reviewer feedback