Search | arXiv e-print repository

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Authors: Clement Chadebec, Onur Tasar, Eyal Benaroche, Benjamin Aubin

Abstract: In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the-art performances in terms of FID and CLIP-Score for few steps image generation on the COCO2014 and COCO2017 datasets, while requiring only several GPU hours of training and fewer trainable parameters than exis… ▽ More In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the-art performances in terms of FID and CLIP-Score for few steps image generation on the COCO2014 and COCO2017 datasets, while requiring only several GPU hours of training and fewer trainable parameters than existing methods. In addition to its efficiency, the versatility of the method is also exposed across several tasks such as text-to-image, inpainting, face-swap**, super-resolution and using different backbones such as UNet-based denoisers (SD1.5, SDXL) or DiT (Pixart-$α$), as well as adapters. In all cases, the method allowed to reduce drastically the number of sampling steps while maintaining very high-quality image generation. The official implementation is available at https://github.com/gojasper/flash-diffusion. △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 16 pages + 16 pages appendices

arXiv:2103.05945 [pdf, other]

Mean-field methods and algorithmic perspectives for high-dimensional machine learning

Authors: Benjamin Aubin

Abstract: The main difficulty that arises in the analysis of most machine learning algorithms is to handle, analytically and numerically, a large number of interacting random variables. In this Ph.D manuscript, we revisit an approach based on the tools of statistical physics of disordered systems. Developed through a rich literature, they have been precisely designed to infer the macroscopic behavior of a l… ▽ More The main difficulty that arises in the analysis of most machine learning algorithms is to handle, analytically and numerically, a large number of interacting random variables. In this Ph.D manuscript, we revisit an approach based on the tools of statistical physics of disordered systems. Developed through a rich literature, they have been precisely designed to infer the macroscopic behavior of a large number of particles from their microscopic interactions. At the heart of this work, we strongly capitalize on the deep connection between the replica method and message passing algorithms in order to shed light on the phase diagrams of various theoretical models, with an emphasis on the potential differences between statistical and algorithmic thresholds. We essentially focus on synthetic tasks and data generated in the teacher-student paradigm. In particular, we apply these mean-field methods to the Bayes-optimal analysis of committee machines, to the worst-case analysis of Rademacher generalization bounds for perceptrons, and to empirical risk minimization in the context of generalized linear models. Finally, we develop a framework to analyze estimation models with structured prior informations, produced for instance by deep neural networks based generative models with random weights. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: Ph.D manuscript

arXiv:2102.10867 [pdf, other]

Linear unit-tests for invariance discovery

Authors: Benjamin Aubin, Agnieszka Słowik, Martin Arjovsky, Leon Bottou, David Lopez-Paz

Abstract: There is an increasing interest in algorithms to learn invariant correlations across training environments. A big share of the current proposals find theoretical support in the causality literature but, how useful are they in practice? The purpose of this note is to propose six linear low-dimensional problems -- unit tests -- to evaluate different types of out-of-distribution generalization in a p… ▽ More There is an increasing interest in algorithms to learn invariant correlations across training environments. A big share of the current proposals find theoretical support in the causality literature but, how useful are they in practice? The purpose of this note is to propose six linear low-dimensional problems -- unit tests -- to evaluate different types of out-of-distribution generalization in a precise manner. Following initial experiments, none of the three recently proposed alternatives passes all tests. By providing the code to automatically replicate all the results in this manuscript (https://www.github.com/facebookresearch/InvarianceUnitTests), we hope that our unit tests become a standard step**stone for researchers in out-of-distribution generalization. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: 5 pages, Causal Discovery & Causality-Inspired Machine Learning Workshop at Neural Information Processing Systems

arXiv:2006.06560 [pdf, other]

Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

Authors: Benjamin Aubin, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

Abstract: We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $α=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we… ▽ More We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $α=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we prove a formula for the generalization error achieved by $\ell_2$ regularized classifiers that minimize a convex loss. This formula was first obtained by the heuristic replica method of statistical physics. Secondly, focussing on commonly used loss functions and optimizing the $\ell_2$ regularization strength, we observe that while ridge regression performance is poor, logistic and hinge regression are surprisingly able to approach the Bayes-optimal generalization error extremely closely. As $α\to \infty$ they lead to Bayes-optimal rates, a fact that does not follow from predictions of margin-based generalization error bounds. Third, we design an optimal loss and regularizer that provably leads to Bayes-optimal generalization error. △ Less

Submitted 7 November, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 11 pages + 45 pages Supplementary Material / 5 figures, v2 revised and accepted at NeurIPS

Journal ref: Advances in Neural Information Processing Systems, v33, pages 12199--12210, 2020

arXiv:2004.01571 [pdf, other]

Tree-AMP: Compositional Inference with Tree Approximate Message Passing

Authors: Antoine Baker, Benjamin Aubin, Florent Krzakala, Lenka Zdeborová

Abstract: We introduce Tree-AMP, standing for Tree Approximate Message Passing, a python package for compositional inference in high-dimensional tree-structured models. The package provides a unifying framework to study several approximate message passing algorithms previously derived for a variety of machine learning tasks such as generalized linear models, inference in multi-layer networks, matrix factori… ▽ More We introduce Tree-AMP, standing for Tree Approximate Message Passing, a python package for compositional inference in high-dimensional tree-structured models. The package provides a unifying framework to study several approximate message passing algorithms previously derived for a variety of machine learning tasks such as generalized linear models, inference in multi-layer networks, matrix factorization, and reconstruction using non-separable penalties. For some models, the asymptotic performance of the algorithm can be theoretically predicted by the state evolution, and the measurements entropy estimated by the free entropy formalism. The implementation is modular by design: each module, which implements a factor, can be composed at will with other modules to solve complex inference tasks. The user only needs to declare the factor graph of the model: the inference algorithm, state evolution and entropy estimation are fully automated. △ Less

Submitted 11 December, 2021; v1 submitted 3 April, 2020; originally announced April 2020.

Comments: Source code available at https://github.com/sphinxteam/tramp and documentation at https://sphinxteam.github.io/tramp.docs

Journal ref: Journal of Machine Learning Research 24 (2023) 1-89

arXiv:1912.02729 [pdf, ps, other]

Rademacher complexity and spin glasses: A link between the replica and statistical theories of learning

Authors: Alia Abbara, Benjamin Aubin, Florent Krzakala, Lenka Zdeborová

Abstract: Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher… ▽ More Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher complexity in statistical learning and the theories of generalization for typical-case synthetic models from statistical physics, involving quantities known as Gardner capacity and ground state energy. We show that in these models the Rademacher complexity is closely related to the ground state energy computed by replica theories. Using this connection, one may reinterpret many results of the literature as rigorous Rademacher bounds in a variety of models in the high-dimensional statistics limit. Somewhat surprisingly, we also show that statistical learning theory provides predictions for the behavior of the ground-state energies in some full replica symmetry breaking models. △ Less

Submitted 15 June, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: 15 + 10 pages, v2 revised and accepted at MSML

Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:27-54, 2020

arXiv:1912.02008 [pdf, other]

Exact asymptotics for phase retrieval and compressed sensing with random generative priors

Authors: Benjamin Aubin, Bruno Loureiro, Antoine Baker, Florent Krzakala, Lenka Zdeborová

Abstract: We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the p… ▽ More We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the performance to sparse separable priors and conclude that generative priors might be advantageous in terms of algorithmic performance. In particular, while sparsity does not allow to perform compressive phase retrieval efficiently close to its information-theoretic limit, it is found that under the random generative prior compressed phase retrieval becomes tractable. △ Less

Submitted 12 June, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

Comments: 13+3 pages, 7 figures, v2 revised and accepted at MSML

Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:55-73, 2020

arXiv:1905.12385 [pdf, other]

The spiked matrix model with generative priors

Authors: Benjamin Aubin, Bruno Loureiro, Antoine Maillard, Florent Krzakala, Lenka Zdeborová

Abstract: Using a low-dimensional parametrization of signals is a generic and powerful way to enhance performance in signal processing and statistical inference. A very popular and widely explored type of dimensionality reduction is sparsity; another type is generative modelling of signal distributions. Generative models based on neural networks, such as GANs or variational auto-encoders, are particularly p… ▽ More Using a low-dimensional parametrization of signals is a generic and powerful way to enhance performance in signal processing and statistical inference. A very popular and widely explored type of dimensionality reduction is sparsity; another type is generative modelling of signal distributions. Generative models based on neural networks, such as GANs or variational auto-encoders, are particularly performant and are gaining on applicability. In this paper we study spiked matrix models, where a low-rank matrix is observed through a noisy channel. This problem with sparse structure of the spikes has attracted broad attention in the past literature. Here, we replace the sparsity assumption by generative modelling, and investigate the consequences on statistical and algorithmic properties. We analyze the Bayes-optimal performance under specific generative models for the spike. In contrast with the sparsity assumption, we do not observe regions of parameters where statistical performance is superior to the best known algorithmic performance. We show that in the analyzed cases the approximate message passing algorithm is able to reach optimal performance. We also design enhanced spectral algorithms and analyze their performance and thresholds using random matrix theory, showing their superiority to the classical principal component analysis. We complement our theoretical results by illustrating the performance of the spectral algorithms when the spikes come from real datasets. △ Less

Submitted 30 May, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: 12 + 56, 8 figures, v2 lighter jpeg figures

Journal ref: Advances in Neural Information Processing Systems, pp. 8364-8375. 2019

arXiv:1806.05451 [pdf, other]

doi 10.1088/1742-5468/ab43d2

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Authors: Benjamin Aubin, Antoine Maillard, Jean Barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová

Abstract: Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of… ▽ More Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. We find that there are regimes in which a low generalization error is information-theoretically achievable while the AMP algorithm fails to deliver it, strongly suggesting that no efficient algorithm exists for those cases, and unveiling a large computational gap. △ Less

Submitted 29 February, 2024; v1 submitted 14 June, 2018; originally announced June 2018.

Comments: 18 pages + supplementary material, 3 figures. (v2: update to match the published version ; v3: clarification of the caption of Fig. 3)

Journal ref: J. Stat. Mech. (2019) 124023. & NeurIPS 2018

Showing 1–9 of 9 results for author: Aubin, B