Skip to main content

Showing 1–23 of 23 results for author: Arbel, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.20233  [pdf, other

    stat.ML cs.LG

    Functional Bilevel Optimization for Machine Learning

    Authors: Ieva Petrulionyte, Julien Mairal, Michael Arbel

    Abstract: In this paper, we introduce a new functional point of view on bilevel optimization problems for machine learning, where the inner objective is minimized over a function space. These types of problems are most often solved by using methods developed in the parametric setting, where the inner objective is strongly convex with respect to the parameters of the prediction function. The functional point… ▽ More

    Submitted 13 June, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

  2. arXiv:2402.13831  [pdf, other

    cs.LG cs.SE

    MLXP: A Framework for Conducting Replicable Experiments in Python

    Authors: Michael Arbel, Alexandre Zouaoui

    Abstract: Replicability in machine learning (ML) research is increasingly concerning due to the utilization of complex non-deterministic algorithms and the dependence on numerous hyper-parameter choices, such as model architecture and training datasets. Ensuring reproducible and replicable results is crucial for advancing the field, yet often requires significant technical effort to conduct systematic and w… ▽ More

    Submitted 17 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  3. arXiv:2402.11305  [pdf, other

    cs.CV

    On Good Practices for Task-Specific Distillation of Large Pretrained Visual Models

    Authors: Juliette Marrie, Michael Arbel, Julien Mairal, Diane Larlus

    Abstract: Large pretrained visual models exhibit remarkable generalization across diverse recognition tasks. Yet, real-world applications often demand compact models tailored to specific problems. Variants of knowledge distillation have been devised for such a purpose, enabling task-specific compact models (the students) to learn from a generic large pretrained one (the teacher). In this paper, we show that… ▽ More

    Submitted 7 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Journal ref: Published in Transactions on Machine Learning Research (TMLR), 2024

  4. arXiv:2306.09998  [pdf, other

    cs.CV cs.LG

    SLACK: Stable Learning of Augmentations with Cold-start and KL regularization

    Authors: Juliette Marrie, Michael Arbel, Diane Larlus, Julien Mairal

    Abstract: Data augmentation is known to improve the generalization capabilities of neural networks, provided that the set of transformations is chosen with care, a selection often performed manually. Automatic data augmentation aims at automating this process. However, most recent approaches still rely on some prior information; they start from a small pool of manually-selected default transformations that… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPR 2023

  5. arXiv:2302.02904  [pdf, other

    cs.LG math.OC

    Rethinking Gauss-Newton for learning over-parameterized models

    Authors: Michael Arbel, Romain Menegaux, Pierre Wolinski

    Abstract: This work studies the global convergence and implicit bias of Gauss Newton's (GN) when optimizing over-parameterized one-hidden layer networks in the mean-field regime. We first establish a global convergence result for GN in the continuous-time limit exhibiting a faster convergence rate compared to GD due to improved conditioning. We then perform an empirical study on a synthetic regression task… ▽ More

    Submitted 12 December, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  6. arXiv:2210.14756  [pdf, other

    cs.LG stat.ML

    Maximum Likelihood Learning of Unnormalized Models for Simulation-Based Inference

    Authors: Pierre Glaser, Michael Arbel, Samo Hromadka, Arnaud Doucet, Arthur Gretton

    Abstract: We introduce two synthetic likelihood methods for Simulation-Based Inference (SBI), to conduct either amortized or targeted inference from experimental observations when a high-fidelity simulator is available. Both methods learn a conditional energy-based model (EBM) of the likelihood using synthetic data generated by the simulator, conditioned on parameters drawn from a proposal distribution. The… ▽ More

    Submitted 18 April, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

  7. arXiv:2201.13117  [pdf, other

    stat.ML cond-mat.stat-mech cs.LG hep-lat

    Continual Repeated Annealed Flow Transport Monte Carlo

    Authors: Alexander G. D. G. Matthews, Michael Arbel, Danilo J. Rezende, Arnaud Doucet

    Abstract: We propose Continual Repeated Annealed Flow Transport Monte Carlo (CRAFT), a method that combines a sequential Monte Carlo (SMC) sampler (itself a generalization of Annealed Importance Sampling) with variational inference using normalizing flows. The normalizing flows are directly trained to transport between annealing temperatures using a KL divergence for each transition. This optimization objec… ▽ More

    Submitted 6 April, 2023; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: 21 pages, 6 figures Published at International Conference on Machine Learning (ICML) 2022

  8. arXiv:2111.14580  [pdf, other

    math.OC cs.LG

    Amortized Implicit Differentiation for Stochastic Bilevel Optimization

    Authors: Michael Arbel, Julien Mairal

    Abstract: We study a class of algorithms for solving bilevel optimization problems in both stochastic and deterministic settings when the inner-level objective is strongly convex. Specifically, we consider algorithms based on inexact implicit differentiation and we exploit a warm-start strategy to amortize the estimation of the exact gradient. We then introduce a unified theoretical framework inspired by th… ▽ More

    Submitted 11 July, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

  9. arXiv:2111.02994  [pdf, other

    cs.LG

    Towards an Understanding of Default Policies in Multitask Policy Optimization

    Authors: Ted Moskovitz, Michael Arbel, Jack Parker-Holder, Aldo Pacchiano

    Abstract: Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical founda… ▽ More

    Submitted 23 March, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

  10. arXiv:2106.08929  [pdf, other

    stat.ML cs.LG

    KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support

    Authors: Pierre Glaser, Michael Arbel, Arthur Gretton

    Abstract: We study the gradient flow for a relaxed approximation to the Kullback-Leibler (KL) divergence between a moving source and a fixed target distribution. This approximation, termed the KALE (KL approximate lower-bound estimator), solves a regularized version of the Fenchel dual problem defining the KL over a restricted class of functions. When using a Reproducing Kernel Hilbert Space (RKHS) to defin… ▽ More

    Submitted 29 October, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

  11. arXiv:2102.07501  [pdf, other

    stat.ML cond-mat.stat-mech cs.LG math.ST

    Annealed Flow Transport Monte Carlo

    Authors: Michael Arbel, Alexander G. D. G. Matthews, Arnaud Doucet

    Abstract: Annealed Importance Sampling (AIS) and its Sequential Monte Carlo (SMC) extensions are state-of-the-art methods for estimating normalizing constants of probability distributions. We propose here a novel Monte Carlo algorithm, Annealed Flow Transport (AFT), that builds upon AIS and SMC and combines them with normalizing flows (NFs) for improved performance. This method transports a set of particles… ▽ More

    Submitted 9 July, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

  12. arXiv:2102.03765  [pdf, other

    cs.LG

    Tactical Optimism and Pessimism for Deep Reinforcement Learning

    Authors: Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano, Michael Arbel, Michael I. Jordan

    Abstract: In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control. One of the primary drivers of this improved performance is the use of pessimistic value updates to address function approximation errors, which previously led to disappointing performance. However, a direct consequence of pessimism is reduced exploration, runni… ▽ More

    Submitted 6 April, 2022; v1 submitted 7 February, 2021; originally announced February 2021.

  13. arXiv:2101.07528  [pdf, other

    cs.CV cs.LG

    The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

    Authors: Louis Thiry, Michael Arbel, Eugene Belilovsky, Edouard Oyallon

    Abstract: A recent line of work showed that various forms of convolutional kernel methods can be competitive with standard supervised deep convolutional networks on datasets like CIFAR-10, obtaining accuracies in the range of 87-90% while being more amenable to theoretical analysis. In this work, we highlight the importance of a data-dependent feature extraction step that is key to the obtain good performan… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

    Journal ref: International Conference on Learning Representation (ICLR 2021), 2021, Vienna (online), Austria

  14. arXiv:2010.05380  [pdf, other

    cs.LG

    Efficient Wasserstein Natural Gradients for Reinforcement Learning

    Authors: Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton

    Abstract: A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. This method follows the recent theme in RL of including a divergence penal… ▽ More

    Submitted 18 March, 2021; v1 submitted 11 October, 2020; originally announced October 2020.

  15. arXiv:2007.07105  [pdf, other

    stat.ML cs.LG

    Estimating Barycenters of Measures in High Dimensions

    Authors: Samuel Cohen, Michael Arbel, Marc Peter Deisenroth

    Abstract: Barycentric averaging is a principled way of summarizing populations of measures. Existing algorithms for estimating barycenters typically parametrize them as weighted sums of Diracs and optimize their weights and/or locations. However, these approaches do not scale to high-dimensional settings due to the curse of dimensionality. In this paper, we propose a scalable and general algorithm for estim… ▽ More

    Submitted 14 February, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: In submission

  16. arXiv:2006.09797  [pdf, other

    stat.ML cs.LG

    A Non-Asymptotic Analysis for Stein Variational Gradient Descent

    Authors: Anna Korba, Adil Salim, Michael Arbel, Giulia Luise, Arthur Gretton

    Abstract: We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $π\propto e^{-V}$ on $\mathbb{R}^d$. In the population limit, SVGD performs gradient descent in the space of probability distributions on the KL divergence with respect to $π$, where the gradient is smoothed through a kernel integral operator. In thi… ▽ More

    Submitted 3 January, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: Accepted to Neurips 2020

  17. arXiv:2004.00663  [pdf, other

    cs.CV cs.GR cs.LG cs.RO stat.ML

    Synchronizing Probability Measures on Rotations via Optimal Transport

    Authors: Tolga Birdal, Michael Arbel, Umut Şimşekli, Leonidas Guibas

    Abstract: We introduce a new paradigm, $\textit{measure synchronization}$, for synchronizing graphs with measure-valued edges. We formulate this problem as maximization of the cycle-consistency in the space of probability measures over relative rotations. In particular, we aim at estimating marginal distributions of absolute orientations by synchronizing the $\textit{conditional}$ ones, which are defined on… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: Accepted for publication at CVPR 2020, includes supplementary material. Project website: https://github.com/SynchInVision/probsync

  18. arXiv:2003.05033  [pdf, other

    stat.ML cs.LG

    Generalized Energy Based Models

    Authors: Michael Arbel, Liang Zhou, Arthur Gretton

    Abstract: We introduce the Generalized Energy Based Model (GEBM) for generative modelling. These models combine two trained components: a base distribution (generally an implicit model), which can learn the support of data with low intrinsic dimension in a high dimensional space; and an energy function, to refine the probability mass on the learned support. Both the energy function and base jointly constitu… ▽ More

    Submitted 21 December, 2021; v1 submitted 10 March, 2020; originally announced March 2020.

  19. arXiv:1910.09652  [pdf, other

    stat.ML cs.LG

    Kernelized Wasserstein Natural Gradient

    Authors: Michael Arbel, Arthur Gretton, Wuchen Li, Guido Montufar

    Abstract: Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions. It is often beneficial to solve such optimization problems using natural gradient methods. These methods are invariant to the parametrization of the family, and thus can yield more effective optimization. Unfortunately, computing the natural gradient is… ▽ More

    Submitted 13 February, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

  20. arXiv:1906.04370  [pdf, other

    stat.ML cs.LG

    Maximum Mean Discrepancy Gradient Flow

    Authors: Michael Arbel, Anna Korba, Adil Salim, Arthur Gretton

    Abstract: We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties. The MMD is an integral probability metric defined for a reproducing kernel Hilbert space (RKHS), and serves as a metric on probability measures for a sufficiently rich RKHS. We obtain conditions for convergence of the gradient flow towards a global optimum, that can be related to… ▽ More

    Submitted 3 December, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

  21. arXiv:1805.11565  [pdf, other

    stat.ML cs.LG

    On gradient regularizers for MMD GANs

    Authors: Michael Arbel, Danica J. Sutherland, Mikołaj Bińkowski, Arthur Gretton

    Abstract: We propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD). We show that controlling the gradient of the critic is vital to having a sensible loss function, and devise a method to enforce exact, analytical gradient constraints at no additional cost compared to existing approxim… ▽ More

    Submitted 14 January, 2021; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: Code at https://github.com/MichaelArbel/Scaled-MMD-GAN

    Journal ref: Advances in Neural Information Processing Systems 31 (NeurIPS 2018), 6700-6710

  22. arXiv:1801.01401  [pdf, other

    stat.ML cs.LG

    Demystifying MMD GANs

    Authors: Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, Arthur Gretton

    Abstract: We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning a… ▽ More

    Submitted 14 January, 2021; v1 submitted 4 January, 2018; originally announced January 2018.

    Comments: Published at ICLR 2018: https://openreview.net/forum?id=r1lUOzWCW

  23. arXiv:1705.08360  [pdf, other

    stat.ML cs.LG stat.ME

    Efficient and principled score estimation with Nyström kernel exponential families

    Authors: Danica J. Sutherland, Heiko Strathmann, Michael Arbel, Arthur Gretton

    Abstract: We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional. The model is learned by fitting the derivative of the log density, the score, thus avoiding the need to compute a normalization constant. Our approach improves the computational efficiency of an… ▽ More

    Submitted 14 January, 2021; v1 submitted 23 May, 2017; originally announced May 2017.

    Journal ref: Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (AISTATS 2018), PMLR 84:652-660