Skip to main content

Showing 1–50 of 58 results for author: Mackey, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.12290  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Debiased Distribution Compression

    Authors: Lingxiao Li, Raaz Dwivedi, Lester Mackey

    Abstract: Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein kerne… ▽ More

    Submitted 26 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted to ICML 2024

  2. arXiv:2311.17179  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

    Authors: Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, Marc Rußwurm

    Abstract: Geographic information is essential for modeling tasks in fields ranging from ecology to epidemiology. However, extracting relevant location characteristics for a given task can be challenging, often requiring expensive data fusion or distillation from massive global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP). This global, g… ▽ More

    Submitted 12 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  3. arXiv:2310.02304  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation

    Authors: Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai

    Abstract: Several recent advances in AI systems (e.g., Tree-of-Thoughts and Program-Aided Language Models) solve problems by providing a "scaffolding" program that structures multiple calls to language models to generate better outputs. A scaffolding program is written in a programming language such as Python. In this work, we use a language-model-infused scaffolding program to improve itself. We start with… ▽ More

    Submitted 1 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

  4. arXiv:2307.08774  [pdf, other

    cs.AI

    Reflections from the Workshop on AI-Assisted Decision Making for Conservation

    Authors: Lily Xu, Esther Rolf, Sara Beery, Joseph R. Bennett, Tanya Berger-Wolf, Tanya Birch, Elizabeth Bondi-Kelly, Justin Brashares, Melissa Chapman, Anthony Corso, Andrew Davies, Nikhil Garg, Angela Gaylard, Robert Heilmayr, Hannah Kerner, Konstantin Klemmer, Vipin Kumar, Lester Mackey, Claire Monteleoni, Paul Moorcroft, Jonathan Palmer, Andrew Perrault, David Thau, Milind Tambe

    Abstract: In this white paper, we synthesize key points made during presentations and discussions from the AI-Assisted Decision Making for Conservation workshop, hosted by the Center for Research on Computation and Society at Harvard University on October 20-21, 2022. We identify key open research questions in resource allocation, planning, and interventions for biodiversity conservation, highlighting conse… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Co-authored by participants from the October 2022 workshop: https://crcs.seas.harvard.edu/conservation-workshop

  5. arXiv:2306.11839  [pdf, other

    stat.ME cs.LG stat.AP stat.ML

    Should I Stop or Should I Go: Early Stop** with Heterogeneous Populations

    Authors: Hammaad Adam, Fan Yin, Huibin, Hu, Neil Tenenholtz, Lorin Crawford, Lester Mackey, Allison Koenecke

    Abstract: Randomized experiments often need to be stopped prematurely due to the treatment having an unintended harmful effect. Existing methods that determine when to stop an experiment early are typically applied to the data in aggregate and do not account for treatment effect heterogeneity. In this paper, we study the early stop** of experiments for harm on heterogeneous populations. We first establish… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (spotlight)

  6. arXiv:2305.18248  [pdf, other

    cs.CL cs.AI

    Do Language Models Know When They're Hallucinating References?

    Authors: Ayush Agrawal, Mirac Suzgun, Lester Mackey, Adam Tauman Kalai

    Abstract: State-of-the-art language models (LMs) are notoriously susceptible to generating hallucinated information. Such inaccurate outputs not only undermine the reliability of these models but also limit their use and raise serious concerns about misinformation and propaganda. In this work, we focus on hallucinated book and article references and present them as the "model organism" of language model hal… ▽ More

    Submitted 20 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

  7. arXiv:2305.14943  [pdf, other

    stat.ML cs.LG stat.ME

    Learning Rate Free Sampling in Constrained Domains

    Authors: Louis Sharrock, Lester Mackey, Christopher Nemeth

    Abstract: We introduce a suite of new particle-based algorithms for sampling in constrained domains which are entirely learning rate free. Our approach leverages coin betting ideas from convex optimisation, and the viewpoint of constrained sampling as a mirrored optimisation problem on the space of probability measures. Based on this viewpoint, we also introduce a unifying framework for several existing con… ▽ More

    Submitted 26 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023

  8. arXiv:2301.05974  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Compress Then Test: Powerful Kernel Testing in Near-linear Time

    Authors: Carles Domingo-Enrich, Raaz Dwivedi, Lester Mackey

    Abstract: Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on $n$ sample points. However, existing kernel tests either run in $n^2$ time or sacrifice undue power to improve runtime. To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximate… ▽ More

    Submitted 23 February, 2023; v1 submitted 14 January, 2023; originally announced January 2023.

    Comments: Accepted as a paper at AISTATS 2023

  9. arXiv:2211.09721  [pdf, ps, other

    cs.LG stat.ML

    A Finite-Particle Convergence Rate for Stein Variational Gradient Descent

    Authors: Jiaxin Shi, Lester Mackey

    Abstract: We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub-Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order… ▽ More

    Submitted 1 November, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2023

  10. arXiv:2211.05408  [pdf, other

    stat.ML cs.LG stat.CO

    Controlling Moments with Kernel Stein Discrepancies

    Authors: Heishiro Kanagawa, Alessandro Barp, Arthur Gretton, Lester Mackey

    Abstract: Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant. Notable applications include the diagnosis of approximate MCMC samplers and goodness-of-fit tests for unnormalized statistical models. The present work analyzes the convergence control properties of KSDs. We first show tha… ▽ More

    Submitted 25 June, 2024; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: 103 pages, 10 figures

  11. arXiv:2210.13630  [pdf, other

    cs.LG cs.IT

    Budget-Constrained Bounds for Mini-Batch Estimation of Optimal Transport

    Authors: David Alvarez-Melis, Nicolò Fusi, Lester Mackey, Tal Wagner

    Abstract: Optimal Transport (OT) is a fundamental tool for comparing probability distributions, but its exact computation remains prohibitive for large datasets. In this work, we introduce novel families of upper and lower bounds for the OT problem constructed by aggregating solutions of mini-batch OT problems. The upper bound family contains traditional mini-batch averaging at one extreme and a tight bound… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  12. arXiv:2209.12835  [pdf, ps, other

    stat.ML cs.LG math.ST

    Targeted Separation and Convergence with Kernel Discrepancies

    Authors: Alessandro Barp, Carl-Johann Simon-Gabriel, Mark Girolami, Lester Mackey

    Abstract: Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to… ▽ More

    Submitted 6 December, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

  13. arXiv:2209.10666  [pdf, other

    cs.LG physics.ao-ph stat.ML

    Adaptive Bias Correction for Improved Subseasonal Forecasting

    Authors: Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Judah Cohen, Miruna Oprescu, Ernest Fraenkel, Lester Mackey

    Abstract: Subseasonal forecasting -- predicting temperature and precipitation 2 to 6 weeks ahead -- is critical for effective water allocation, wildfire management, and drought and flood mitigation. Recent international research efforts have advanced the subseasonal capabilities of operational dynamical models, yet temperature and precipitation prediction skills remain poor, partly due to stubborn errors in… ▽ More

    Submitted 15 May, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

  14. arXiv:2204.01668  [pdf, other

    stat.CO cs.LG stat.ME stat.ML

    Scalable Spike-and-Slab

    Authors: Niloy Biswas, Lester Mackey, Xiao-Li Meng

    Abstract: Spike-and-slab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties. However, existing samplers for spike-and-slab posteriors incur prohibitive computational costs when the number of variables is large. In this article, we propose Scalable Spike-and-Slab ($S^3$), a scalable Gibbs sampling implementation for high-dimensional Ba… ▽ More

    Submitted 25 June, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to ICML 2022. Open-source software in Python and R available at https://github.com/niloyb/ScaleSpikeSlab

  15. arXiv:2202.09497  [pdf, other

    stat.ML cs.LG

    Gradient Estimation with Discrete Stein Operators

    Authors: Jiaxin Shi, Yuhao Zhou, Jessica Hwang, Michalis K. Titsias, Lester Mackey

    Abstract: Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems. However, when the distribution is discrete, most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, we introduce a variance reduction technique based on Stein oper… ▽ More

    Submitted 14 April, 2024; v1 submitted 18 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022. Source code: https://github.com/thjashin/rodeo

  16. arXiv:2112.03152  [pdf, other

    stat.CO cs.LG stat.ME stat.ML

    Bounding Wasserstein distance with couplings

    Authors: Niloy Biswas, Lester Mackey

    Abstract: Markov chain Monte Carlo (MCMC) provides asymptotically consistent estimates of intractable posterior expectations as the number of iterations tends to infinity. However, in large data applications, MCMC can be computationally expensive per iteration. This has catalyzed interest in approximating MCMC in a manner that improves computational speed per iteration but does not produce asymptotically co… ▽ More

    Submitted 2 November, 2023; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: 52 pages, 8 figures

  17. arXiv:2111.07941  [pdf, other

    stat.ML cs.DS cs.LG math.ST stat.ME

    Distribution Compression in Near-linear Time

    Authors: Abhishek Shetty, Raaz Dwivedi, Lester Mackey

    Abstract: In distribution compression, one aims to accurately summarize a probability distribution $\mathbb{P}$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$. Unfortunately, these algorithms suffer from quadrat… ▽ More

    Submitted 17 October, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: Accepted to ICLR 2022; An outdated proof of Theorem 2 was previously included in the appendix; this oversight is corrected in this version

  18. arXiv:2110.01593  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Generalized Kernel Thinning

    Authors: Raaz Dwivedi, Lester Mackey

    Abstract: The kernel thinning (KT) algorithm of Dwivedi and Mackey (2021) compresses a probability distribution more effectively than independent sampling by targeting a reproducing kernel Hilbert space (RKHS) and leveraging a less smooth square-root kernel. Here we provide four improvements. First, we show that KT applied directly to the target RKHS yields tighter, dimension-free guarantees for any kernel,… ▽ More

    Submitted 19 July, 2022; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: Published in ICLR 2022

  19. arXiv:2109.10399  [pdf, other

    physics.ao-ph cs.LG stat.ML

    SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and Benchmarking

    Authors: Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Miruna Oprescu, Judah Cohen, Franklyn Wang, Sean Knight, Maria Geogdzhayeva, Sam Levang, Ernest Fraenkel, Lester Mackey

    Abstract: Subseasonal forecasting of the weather two to six weeks in advance is critical for resource allocation and advance disaster notice but poses many challenges for the forecasting community. At this forecast horizon, physics-based dynamical models have limited skill, and the targets for prediction depend in a complex manner on both local weather variables and global climate variables. Recently, machi… ▽ More

    Submitted 16 January, 2024; v1 submitted 21 September, 2021; originally announced September 2021.

  20. Social Norm Bias: Residual Harms of Fairness-Aware Algorithms

    Authors: Myra Cheng, Maria De-Arteaga, Lester Mackey, Adam Tauman Kalai

    Abstract: Many modern machine learning algorithms mitigate bias by enforcing fairness constraints across coarsely-defined groups related to a sensitive attribute like gender or race. However, these algorithms seldom account for within-group heterogeneity and biases that may disproportionately affect some members of a group. In this work, we characterize Social Norm Bias (SNoB), a subtle but consequential ty… ▽ More

    Submitted 10 August, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: Spotlighted at the 2021 ICML Machine Learning for Data Workshop and presented at the 2021 ICML Socially Responsible Machine Learning Workshop

    Report number: Data Min Knowl Disc (2023)

  21. arXiv:2107.02266  [pdf, other

    math.ST cs.LG stat.ML

    Near-optimal inference in adaptive linear regression

    Authors: Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright

    Abstract: When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation. Our pr… ▽ More

    Submitted 21 March, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: 51 pages, 7 figures

  22. arXiv:2106.12506  [pdf, other

    stat.ML cs.LG

    Sampling with Mirrored Stein Operators

    Authors: Jiaxin Shi, Chang Liu, Lester Mackey

    Abstract: We introduce a new family of particle evolution samplers suitable for constrained domains and non-Euclidean geometries. Stein Variational Mirror Descent and Mirrored Stein Variational Gradient Descent minimize the Kullback-Leibler (KL) divergence to constrained target distributions by evolving particles in a dual space defined by a mirror map. Stein Variational Natural Gradient exploits non-Euclid… ▽ More

    Submitted 24 April, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: ICLR 2022; Source code: https://github.com/thjashin/mirror-stein-samplers

  23. arXiv:2106.06885  [pdf, other

    cs.LG stat.ML

    Online Learning with Optimism and Delay

    Authors: Genevieve Flaspohler, Francesco Orabona, Judah Cohen, Soukayna Mouatadid, Miruna Oprescu, Paulo Orenstein, Lester Mackey

    Abstract: Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback. Our algorithms -- DORM, DORM+, and AdaHedgeD -- arise from a novel reduction of delayed online learning to optimistic online learning that reveals how optimistic hints can mitigate the regr… ▽ More

    Submitted 12 July, 2021; v1 submitted 12 June, 2021; originally announced June 2021.

    Comments: ICML 2021. 9 pages of main paper and 26 pages of appendix text

  24. arXiv:2105.05842  [pdf, other

    stat.ML cs.LG math.ST stat.CO stat.ME

    Kernel Thinning

    Authors: Raaz Dwivedi, Lester Mackey

    Abstract: We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}_{\star}$ and $O(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated… ▽ More

    Submitted 11 May, 2024; v1 submitted 12 May, 2021; originally announced May 2021.

    Comments: Accepted for presentation as an extended abstract at the Conference on Learning Theory (COLT) 2021, and published in the Journal of Machine Learning Research (JMLR) 2024

  25. arXiv:2105.01029  [pdf, other

    stat.ML cs.AI cs.CL cs.CV cs.LG

    Initialization and Regularization of Factorized Neural Layers

    Authors: Mikhail Khodak, Neil Tenenholtz, Lester Mackey, Nicolò Fusi

    Abstract: Factorized layers--operations parameterized by products of two or more matrices--occur in a variety of deep learning contexts, including compressed model training, certain types of knowledge distillation, and multi-head self-attention architectures. We study how to initialize and regularize deep nets containing such layers, examining two simple, understudied schemes, spectral initialization and Fr… ▽ More

    Submitted 4 October, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: ICLR 2021 camera-ready, amended due to error pointed out in arXiv:2209.13569v1 (amendment shown in blue)

  26. arXiv:2104.09732  [pdf, other

    stat.ML cs.LG

    Knowledge Distillation as Semiparametric Inference

    Authors: Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, Lester Mackey

    Abstract: A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model. Surprisingly, this two-step knowledge distillation process often leads to higher accuracy than training the student directly on labeled data. To explain and enhance this phenomenon, we cast knowledge distillation as a semiparametric in… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  27. arXiv:2010.10218  [pdf, other

    cs.LG cs.AI stat.ML

    Model-specific Data Subsampling with Influence Functions

    Authors: Anant Raj, Cameron Musco, Lester Mackey, Nicolo Fusi

    Abstract: Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the datasets of interest are increasing in size. As a result, the process of model selection is time-consuming and computationally inefficient. In this work, we dev… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  28. arXiv:2007.12671  [pdf, other

    stat.ML cs.LG math.ST

    Cross-validation Confidence Intervals for Test Error

    Authors: Pierre Bayle, Alexandre Bayle, Lucas Janson, Lester Mackey

    Abstract: This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for $k$-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller $k$-fold test error than another.… ▽ More

    Submitted 31 October, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020); 40 pages, 15 figures

  29. arXiv:2007.02857  [pdf, other

    stat.ML cs.LG math.PR stat.ME

    Stochastic Stein Discrepancies

    Authors: Jackson Gorham, Anant Raj, Lester Mackey

    Abstract: Stein discrepancies (SDs) monitor convergence and non-convergence in approximate inference when exact integration and sampling are intractable. However, the computation of a Stein discrepancy can be prohibitive if the Stein operator - often a sum over likelihood terms or potentials - is expensive to evaluate. To address this deficiency, we show that stochastic Stein discrepancies (SSDs) based on s… ▽ More

    Submitted 22 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

  30. arXiv:2006.09268  [pdf, ps, other

    cs.LG math.PR math.ST stat.ML

    Metrizing Weak Convergence with Maximum Mean Discrepancies

    Authors: Carl-Johann Simon-Gabriel, Alessandro Barp, Bernhard Schölkopf, Lester Mackey

    Abstract: This paper characterizes the maximum mean discrepancies (MMD) that metrize the weak convergence of probability measures for a wide class of kernels. More precisely, we prove that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, whose reproducing kernel Hilbert space (RKHS) functions vanish at infinity, metrizes the weak convergence of… ▽ More

    Submitted 3 September, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 14 pages. Corrects in particular Thm.12 of Simon-Gabriel and Schölkopf, JMLR, 19(44):1-29, 2018. See http://jmlr.org/papers/v19/16-291.html

    MSC Class: 60B10 (Primary) 60F05; 60-08; 28-08 (Secondary) ACM Class: G.3; I.2.6; I.5.0

  31. arXiv:2006.07201  [pdf, other

    econ.EM cs.LG math.ST stat.ML

    Minimax Estimation of Conditional Moment Models

    Authors: Nishanth Dikkala, Greg Lewis, Lester Mackey, Vasilis Syrgkanis

    Abstract: We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression. We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game between a modeler who is optimizing over the hypothesis space of the target model and an adversary… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  32. arXiv:2003.09465  [pdf, other

    stat.ML cs.LG

    Weighted Meta-Learning

    Authors: Diana Cai, Rishit Sheth, Lester Mackey, Nicolo Fusi

    Abstract: Meta-learning leverages related source tasks to learn an initialization that can be quickly fine-tuned to a target task with limited labeled examples. However, many popular meta-learning algorithms, such as model-agnostic meta-learning (MAML), only assume access to the target samples for fine-tuning. In this work, we provide a general framework for meta-learning based on weighting the loss of diff… ▽ More

    Submitted 20 March, 2020; originally announced March 2020.

    Comments: 18 pages, 7 figures

  33. arXiv:2003.00617  [pdf, other

    stat.ML cs.LG

    Approximate Cross-validation: Guarantees for Model Assessment and Selection

    Authors: Ashia Wilson, Maximilian Kasy, Lester Mackey

    Abstract: Cross-validation (CV) is a popular approach for assessing and selecting predictive models. However, when the number of folds is large, CV suffers from a need to repeatedly refit a learning procedure on a large number of training datasets. Recent work in empirical risk minimization (ERM) approximates the expensive refitting with a single Newton step warm-started from the full training set optimizer… ▽ More

    Submitted 10 June, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

  34. arXiv:1911.01575  [pdf, other

    math.OC cs.LG stat.ML

    Importance Sampling via Local Sensitivity

    Authors: Anant Raj, Cameron Musco, Lester Mackey

    Abstract: Given a loss function $F:\mathcal{X} \rightarrow \R^+$ that can be written as the sum of losses over a large set of inputs $a_1,\ldots, a_n$, it is often desirable to approximate $F$ by subsampling the input points. Strong theoretical guarantees require taking into account the importance of each point, measured by how much its individual loss contributes to $F(x)$. Maximizing this importance over… ▽ More

    Submitted 19 March, 2020; v1 submitted 3 November, 2019; originally announced November 2019.

  35. arXiv:1908.02341  [pdf, other

    stat.ML cs.LG

    Single Point Transductive Prediction

    Authors: Nilesh Tripuraneni, Lester Mackey

    Abstract: Standard methods in supervised learning separate training and prediction: the model is fit independently of any test points it may encounter. However, can knowledge of the next test point $\mathbf{x}_{\star}$ be exploited to improve prediction accuracy? We address this question in the context of linear prediction, showing how techniques from semi-parametric inference can be used transductively to… ▽ More

    Submitted 29 June, 2020; v1 submitted 6 August, 2019; originally announced August 2019.

    Comments: 37th International Conference on Machine Learning (ICML 2020)

  36. A Kernel Stein Test for Comparing Latent Variable Models

    Authors: Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton

    Abstract: We propose a kernel-based nonparametric test of relative goodness of fit, where the goal is to compare two models, both of which may have unobserved latent variables, such that the marginal distribution of the observed variables is intractable. The proposed test generalizes the recently proposed kernel Stein discrepancy (KSD) tests (Liu et al., 2016, Chwialkowski et al., 2016, Yang et al., 2018) t… ▽ More

    Submitted 9 May, 2023; v1 submitted 1 July, 2019; originally announced July 2019.

    Comments: This is a pre-copyedited, author-produced version of an article accepted for publication in The Journal of the Royal Statistical Society Series: B following peer review

  37. arXiv:1906.08283  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Minimum Stein Discrepancy Estimators

    Authors: Alessandro Barp, Francois-Xavier Briol, Andrew B. Duncan, Mark Girolami, Lester Mackey

    Abstract: When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with co… ▽ More

    Submitted 5 October, 2022; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: Accepted for publication at NeurIPS 2019

  38. arXiv:1906.07868  [pdf, other

    stat.ML cs.LG

    Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond

    Authors: Xuechen Li, Denny Wu, Lester Mackey, Murat A. Erdogdu

    Abstract: Sampling with Markov chain Monte Carlo methods often amounts to discretizing some continuous-time dynamics with numerical integration. In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth Itô diffusions exhibiting fast Wasserstein-$2$ contraction, based on local deviation properties of the integration scheme. In particular, we study a sampling alg… ▽ More

    Submitted 1 February, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: 56 pages; update acknowledgements

  39. arXiv:1812.02271  [pdf, other

    cs.LG stat.ML

    Teacher-Student Compression with Generative Adversarial Networks

    Authors: Ruishan Liu, Nicolo Fusi, Lester Mackey

    Abstract: More accurate machine learning models often demand more computation and memory at test time, making them difficult to deploy on CPU- or memory-constrained devices. Teacher-student compression (TSC), also known as distillation, alleviates this burden by training a less expensive student model to mimic the expensive teacher model while maintaining most of the original accuracy. However, when fresh d… ▽ More

    Submitted 20 March, 2020; v1 submitted 5 December, 2018; originally announced December 2018.

  40. arXiv:1810.12361  [pdf, other

    stat.ML cs.LG stat.CO

    Global Non-convex Optimization with Discretized Diffusions

    Authors: Murat A. Erdogdu, Lester Mackey, Ohad Shamir

    Abstract: An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems. We show that this property holds for any suitably smooth diffusion and that different diffusions are suitable for optimizing different classes of convex and non-convex functions. This allows us to design diffusions suitable for globally optimizing… ▽ More

    Submitted 27 December, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: 19 pages, NeurIPS 2018 camera ready version

  41. arXiv:1809.07394  [pdf, other

    stat.AP cs.CY stat.ML

    Improving Subseasonal Forecasting in the Western U.S. with Machine Learning

    Authors: Jessica Hwang, Paulo Orenstein, Judah Cohen, Karl Pfeiffer, Lester Mackey

    Abstract: Water managers in the western United States (U.S.) rely on longterm forecasts of temperature and precipitation to prepare for droughts and other wet weather extremes. To improve the accuracy of these longterm forecasts, the U.S. Bureau of Reclamation and the National Oceanic and Atmospheric Administration (NOAA) launched the Subseasonal Climate Forecast Rodeo, a year-long real-time forecasting cha… ▽ More

    Submitted 22 May, 2019; v1 submitted 19 September, 2018; originally announced September 2018.

  42. arXiv:1806.07788  [pdf, other

    stat.ML cs.LG

    Random Feature Stein Discrepancies

    Authors: Jonathan H. Huggins, Lester Mackey

    Abstract: Computable Stein discrepancies have been deployed for a variety of applications, ranging from sampler selection in posterior inference to approximate Bayesian inference to goodness-of-fit testing. Existing convergence-determining Stein discrepancies admit strong theoretical guarantees but suffer from a computational cost that grows quadratically in the sample size. While linear-time Stein discrepa… ▽ More

    Submitted 9 October, 2021; v1 submitted 20 June, 2018; originally announced June 2018.

    Comments: In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems (NeurIPS 2018). Code available at: https://bitbucket.org/jhhuggins/random-feature-stein-discrepancies

  43. DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation

    Authors: Jimmy Wu, Bolei Zhou, Diondra Peck, Scott Hsieh, Vandana Dialani, Lester Mackey, Genevieve Patterson

    Abstract: We propose DeepMiner, a framework to discover interpretable representations in deep neural networks and to build explanations for medical predictions. By probing convolutional neural networks (CNNs) trained to classify cancer in mammograms, we show that many individual units in the final convolutional layer of a CNN respond strongly to diseased tissue concepts specified by the BI-RADS lexicon. Aft… ▽ More

    Submitted 17 August, 2021; v1 submitted 31 May, 2018; originally announced May 2018.

    Comments: Harvard Data Science Review (HDSR), 2021. Code available at https://github.com/jimmyyhwu/ddsm-visual-primitives

  44. arXiv:1803.10161  [pdf, other

    stat.CO cs.LG stat.ML

    Stein Points

    Authors: Wilson Ye Chen, Lester Mackey, Jackson Gorham, François-Xavier Briol, Chris J. Oates

    Abstract: An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$. This paper focuses on methods where the selection of points is essentially deterministic, with an emphasis on achieving accurate approximation when $n$ is small. To this end, we present `Stein P… ▽ More

    Submitted 19 June, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

  45. Expert identification of visual primitives used by CNNs during mammogram classification

    Authors: Jimmy Wu, Diondra Peck, Scott Hsieh, Vandana Dialani, Constance D. Lehman, Bolei Zhou, Vasilis Syrgkanis, Lester Mackey, Genevieve Patterson

    Abstract: This work interprets the internal representations of deep neural networks trained for classification of diseased tissue in 2D mammograms. We propose an expert-in-the-loop interpretation method to label the behavior of internal units in convolutional neural networks (CNNs). Expert radiologists identify that the visual patterns detected by the units are correlated with meaningful medical phenomena s… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

    Journal ref: Medical Imaging 2018: Computer-Aided Diagnosis, Proc. of SPIE Vol. 10575, 105752T

  46. arXiv:1712.06695  [pdf, other

    stat.ML cs.LG

    Accurate Inference for Adaptive Linear Models

    Authors: Yash Deshpande, Lester Mackey, Vasilis Syrgkanis, Matt Taddy

    Abstract: Estimators computed from adaptively collected data do not behave like their non-adaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method -- $\mathbf{W}$-decorrelation -- for transforming the bias of adaptive linear regression estimators into variance. The method u… ▽ More

    Submitted 2 January, 2020; v1 submitted 18 December, 2017; originally announced December 2017.

    Comments: Typos fixed for clarification

  47. arXiv:1711.00342  [pdf, other

    cs.LG econ.EM math.ST stat.ML

    Orthogonal Machine Learning: Power and Limitations

    Authors: Lester Mackey, Vasilis Syrgkanis, Ilias Zadik

    Abstract: Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate. The key is to employ Neyman-orthogonal moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the $n^{-1/4}$ requirement can be improved to… ▽ More

    Submitted 1 August, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

  48. arXiv:1707.05807  [pdf, other

    stat.ML cs.LG math.PR stat.ME

    Improving Gibbs Sampler Scan Quality with DoGS

    Authors: Ioannis Mitliagkas, Lester Mackey

    Abstract: The pairwise influence matrix of Dobrushin has long been used as an analytical tool to bound the rate of convergence of Gibbs sampling. In this work, we use Dobrushin influence as the basis of a practical tool to certify and efficiently improve the quality of a discrete Gibbs sampler. Our Dobrushin-optimized Gibbs samplers (DoGS) offer customized variable selection orders for a given sampling budg… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

    Comments: ICML 2017

  49. arXiv:1703.01717  [pdf, other

    stat.ML cs.LG

    Measuring Sample Quality with Kernels

    Authors: Jackson Gorham, Lester Mackey

    Abstract: Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid sampling at the cost of more biased inference. Since standard MCMC diagnostics fail to detect these biases, researchers have developed computable Stein discrepancy measures that provably determine the convergence of a sample to its target distribution. This approach was recently combined with the theory of reproducing ker… ▽ More

    Submitted 14 October, 2020; v1 submitted 5 March, 2017; originally announced March 2017.

  50. arXiv:1611.06972  [pdf, other

    stat.ML cs.LG math.PR

    Measuring Sample Quality with Diffusions

    Authors: Jackson Gorham, Andrew B. Duncan, Sebastian J. Vollmer, Lester Mackey

    Abstract: Stein's method for measuring convergence to a continuous target distribution relies on an operator characterizing the target and Stein factor bounds on the solutions of an associated differential equation. While such operators and bounds are readily available for a diversity of univariate targets, few multivariate targets have been analyzed. We introduce a new class of characterizing operators bas… ▽ More

    Submitted 12 November, 2018; v1 submitted 21 November, 2016; originally announced November 2016.

    MSC Class: 60J60; 62-04; 62E17; 60E15; 65C60 (Primary) 62-07; 65C05; 68T05 (Secondary)