Skip to main content

Showing 1–50 of 69 results for author: Balestriero, R

.
  1. arXiv:2406.10743  [pdf, other

    cs.LG cs.AI

    Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?

    Authors: Mark Ibrahim, David Klindt, Randall Balestriero

    Abstract: Deep Learning is often depicted as a trio of data-architecture-loss. Yet, recent Self Supervised Learning (SSL) solutions have introduced numerous additional design choices, e.g., a projector network, positive views, or teacher-student networks. These additions pose two challenges. First, they limit the impact of theoretical studies that often fail to incorporate all those intertwined designs. Sec… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  2. arXiv:2406.09657  [pdf, other

    cs.LG stat.ML

    ScaLES: Scalable Latent Exploration Score for Pre-Trained Generative Networks

    Authors: Omer Ronen, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk, Bin Yu

    Abstract: We develop Scalable Latent Exploration Score (ScaLES) to mitigate over-exploration in Latent Space Optimization (LSO), a popular method for solving black-box discrete optimization problems. LSO utilizes continuous optimization within the latent space of a Variational Autoencoder (VAE) and is known to be susceptible to over-exploration, which manifests in unrealistic solutions that reduce its pract… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2402.15555  [pdf, other

    cs.LG cs.AI cs.CV

    Deep Networks Always Grok and Here is Why

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: Grokking, or delayed generalization, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking in specific controlled settings, such as DNNs initialized with large-norm parameters or transformers trained on algorithmic datasets. We demonstrate that grokking is actually much mor… ▽ More

    Submitted 6 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: ICML 2024. Website: https://bit.ly/grok-adversarial. Pages 24, Figures 36

  4. arXiv:2402.11337  [pdf, other

    cs.CV cs.AI stat.ML

    Learning by Reconstruction Produces Uninformative Features For Perception

    Authors: Randall Balestriero, Yann LeCun

    Abstract: Input space reconstruction is an attractive representation learning paradigm. Despite interpretability of the reconstruction and generation, we identify a misalignment between learning by reconstruction, and learning for perception. We show that the former allocates a model's capacity towards a subspace of the data explaining the observed variance--a subspace with uninformative features for the la… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  5. arXiv:2401.11188  [pdf, other

    cs.LG cs.AI

    Fast and Exact Enumeration of Deep Networks Partitions Regions

    Authors: Randall Balestriero, Yann LeCun

    Abstract: One fruitful formulation of Deep Networks (DNs) enabling their theoretical study and providing practical guidelines to practitioners relies on Piecewise Affine Splines. In that realm, a DN's input-map** is expressed as per-region affine map** where those regions are implicitly determined by the model's architecture and form a partition of their input space. That partition -- which is involved… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  6. arXiv:2401.01990  [pdf, other

    cs.CV cs.AI cs.LG

    GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning

    Authors: Aarash Feizi, Randall Balestriero, Adriana Romero-Soriano, Reihaneh Rabbany

    Abstract: We propose Guided Positive Sampling Self-Supervised Learning (GPS-SSL), a general method to inject a priori knowledge into Self-Supervised Learning (SSL) positive samples selection. Current SSL methods leverage Data-Augmentations (DA) for generating positive samples and incorporate prior knowledge - an incorrect, or too weak DA will drastically reduce the quality of the learned representation. GPS… ▽ More

    Submitted 9 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  7. arXiv:2401.01764  [pdf, other

    cs.CV cs.LG

    Understanding the Detrimental Class-level Effects of Data Augmentation

    Authors: Polina Kirichenko, Mark Ibrahim, Randall Balestriero, Diane Bouchacourt, Ramakrishna Vedantam, Hamed Firooz, Andrew Gordon Wilson

    Abstract: Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. The… ▽ More

    Submitted 7 December, 2023; originally announced January 2024.

    Comments: Neural Information Processing Systems (NeurIPS), 2023

  8. arXiv:2312.01648  [pdf, other

    cs.AI cs.CL cs.LG

    Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation

    Authors: Randall Balestriero, Romain Cosentino, Sarath Shekkizhar

    Abstract: Large Language Models~(LLMs) drive current AI breakthroughs despite very little being known about their internal representations, e.g., how to extract a few informative features to solve various downstream tasks. To provide a practical and principled answer, we propose to characterize LLMs from a geometric perspective. We obtain in closed form (i) the intrinsic dimension in which the Multi-Head At… ▽ More

    Submitted 10 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

  9. arXiv:2310.12977  [pdf, other

    cs.LG cs.AI cs.CV

    Training Dynamics of Deep Network Linear Regions

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: The study of Deep Network (DN) training dynamics has largely focused on the evolution of the loss function, evaluated on or around train and test set data points. In fact, many DN phenomenon were first introduced in literature with that respect, e.g., double descent, grokking. In this study, we look at the training dynamics of the input space partition or linear regions formed by continuous piecew… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 14 pages, 14 figures

  10. arXiv:2305.16189  [pdf, other

    cs.LG astro-ph.EP stat.ML

    Martian time-series unraveled: A multi-scale nested approach with factorial variational autoencoders

    Authors: Ali Siahkoohi, Rudy Morel, Randall Balestriero, Erwan Allys, Grégory Sainton, Taichi Kawamura, Maarten V. de Hoop

    Abstract: Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of timescales exhibited by sources. Existing methods typically rely on a preselected window size that det… ▽ More

    Submitted 19 February, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  11. arXiv:2304.12210  [pdf, other

    cs.LG cs.CV

    A Cookbook of Self-Supervised Learning

    Authors: Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Gei**, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun, Micah Goldblum

    Abstract: Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier… ▽ More

    Submitted 28 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

  12. arXiv:2304.05369  [pdf, other

    cs.LG

    A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation

    Authors: Florian Bordes, Samuel Lavoie, Randall Balestriero, Nicolas Ballas, Pascal Vincent

    Abstract: Self-Supervised Learning (SSL) models rely on a pretext task to learn representations. Because this pretext task differs from the downstream tasks used to evaluate the performance of these models, there is an inherent misalignment or pretraining bias. A commonly used trick in SSL, shown to make deep networks more robust to such bias, is the addition of a small projector (usually a 2 or 3 layer mul… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  13. arXiv:2303.15256  [pdf, other

    cs.LG cs.AI cs.HC

    Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need

    Authors: Vivien Cabannes, Leon Bottou, Yann Lecun, Randall Balestriero

    Abstract: Self-Supervised Learning (SSL) has emerged as the solution of choice to learn transferable representations from unlabeled data. However, SSL requires to build samples that are known to be semantically akin, i.e. positive views. Requiring such knowledge is the main limitation of SSL and is often tackled by ad-hoc strategies e.g. applying known data-augmentations to the same input. In this work, we… ▽ More

    Submitted 29 September, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 8 main pages, 20 totals, 10 figures

    ACM Class: I.2.6

  14. arXiv:2303.01986  [pdf, other

    cs.LG

    Towards Democratizing Joint-Embedding Self-Supervised Learning

    Authors: Florian Bordes, Randall Balestriero, Pascal Vincent

    Abstract: Joint Embedding Self-Supervised Learning (JE-SSL) has seen rapid developments in recent years, due to its promise to effectively leverage large unlabeled data. The development of JE-SSL methods was driven primarily by the search for ever increasing downstream classification accuracies, using huge computational resources, and typically built upon insights and intuitions inherited from a close paren… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

  15. arXiv:2303.00633  [pdf, other

    cs.IT cs.AI

    An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization

    Authors: Ravid Shwartz-Ziv, Randall Balestriero, Kenji Kawaguchi, Tim G. J. Rudner, Yann LeCun

    Abstract: Variance-Invariance-Covariance Regularization (VICReg) is a self-supervised learning (SSL) method that has shown promising results on a variety of tasks. However, the fundamental mechanisms underlying VICReg remain unexplored. In this paper, we present an information-theoretic perspective on the VICReg objective. We begin by deriving information-theoretic quantities for deterministic networks as a… ▽ More

    Submitted 1 May, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

  16. arXiv:2303.00586  [pdf, other

    stat.ML cs.AI cs.CV cs.CY cs.LG

    FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling

    Authors: Wei-Yin Ko, Daniel D'souza, Karina Nguyen, Randall Balestriero, Sara Hooker

    Abstract: Ensembling multiple Deep Neural Networks (DNNs) is a simple and effective way to improve top-line metrics and to outperform a larger single model. In this work, we go beyond top-line metrics and instead explore the impact of ensembling on subgroup performances. Surprisingly, we observe that even with a simple homogeneous ensemble -- all the individual DNNs share the same training set, architecture… ▽ More

    Submitted 20 December, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  17. arXiv:2302.12828  [pdf, other

    cs.CV cs.LG

    SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Guha Balakrishnan, Richard Baraniuk

    Abstract: Current Deep Network (DN) visualization and interpretability methods rely heavily on data space visualizations such as scoring which dimensions of the data are responsible for their associated prediction or generating new data features or samples that best match a given DN unit or representation. In this paper, we go one step further by develo** the first provably exact method for computing the… ▽ More

    Submitted 6 June, 2024; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: 11 pages, 20 figures

  18. arXiv:2302.10260  [pdf, other

    cs.AI cs.CV cs.LG

    Unsupervised Learning on a DIET: Datum IndEx as Target Free of Self-Supervision, Reconstruction, Projector Head

    Authors: Randall Balestriero

    Abstract: Costly, noisy, and over-specialized, labels are to be set aside in favor of unsupervised learning if we hope to learn cheap, reliable, and transferable models. To that end, spectral embedding, self-supervised learning, or generative modeling have offered competitive solutions. Those methods however come with numerous challenges \textit{e.g.} estimating geodesic distances, specifying projector arch… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  19. arXiv:2302.02774  [pdf, other

    stat.ML cs.AI cs.LG math.ST

    The SSL Interplay: Augmentations, Inductive Bias, and Generalization

    Authors: Vivien Cabannes, Bobak T. Kiani, Randall Balestriero, Yann LeCun, Alberto Bietti

    Abstract: Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architect… ▽ More

    Submitted 1 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    MSC Class: 68Q32 ACM Class: G.3

    Journal ref: Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023

  20. On minimal variations for unsupervised representation learning

    Authors: Vivien Cabannes, Alberto Bietti, Randall Balestriero

    Abstract: Unsupervised representation learning aims at describing raw data efficiently to solve various downstream tasks. It has been approached with many techniques, such as manifold learning, diffusion maps, or more recently self-supervised learning. Those techniques are arguably all based on the underlying assumption that target functions, associated with future downstream tasks, have low variations in d… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: 5 pages, 1 figure; 1 table

    MSC Class: 68Q32 ACM Class: G.3

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1-5,

  21. arXiv:2211.01866  [pdf, other

    cs.CV cs.LG

    ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

    Authors: Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, Mark Ibrahim

    Abstract: Deep learning vision systems are widely deployed across applications where reliability is critical. However, even today's best models can fail to recognize an object when its pose, lighting, or background varies. While existing benchmarks surface examples challenging for models, they do not explain why such mistakes arise. To address this need, we introduce ImageNet-X, a set of sixteen human annot… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

  22. arXiv:2211.01340  [pdf, other

    cs.LG cs.CV stat.ML

    POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks

    Authors: Randall Balestriero, Yann LeCun

    Abstract: Deep Neural Networks (DNNs) outshine alternative function approximators in many settings thanks to their modularity in composing any desired differentiable operator. The formed parametrized functional is then tuned to solve a task at hand from simple gradient descent. This modularity comes at the cost of making strict enforcement of constraints on DNNs, e.g. from a priori knowledge of the task, or… ▽ More

    Submitted 10 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  23. arXiv:2210.07277  [pdf, other

    cs.LG cs.AI cs.CV

    The Hidden Uniform Cluster Prior in Self-Supervised Learning

    Authors: Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas

    Abstract: A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e.g., SimCLR, VICReg, SwAV, MSN). We show that in the formulation of all these methods is an overlooked prior to learn features that enable uniform clustering of the data. While this prior has led to remarkably semantic representations when pretraining on class-bal… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  24. arXiv:2210.02885  [pdf, other

    cs.LG cs.AI cs.CV

    RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank

    Authors: Quentin Garrido, Randall Balestriero, Laurent Najman, Yann Lecun

    Abstract: Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL's core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non info… ▽ More

    Submitted 26 June, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Journal ref: The Fortieth International Conference on Machine Learning, 2023, Honolulu, United States

  25. arXiv:2209.14905  [pdf, other

    cs.LG

    Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

    Authors: Grégoire Mialon, Randall Balestriero, Yann LeCun

    Abstract: Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector's output. This study highlights important properties of such strategy, which we coin Variance-Covariance regularization (VCReg). More precisely, we show that {\em VCReg combined to a MLP projector… ▽ More

    Submitted 14 February, 2024; v1 submitted 29 September, 2022; originally announced September 2022.

  26. arXiv:2209.14884  [pdf, other

    cs.LG cs.AI stat.ML

    Joint Embedding Self-Supervised Learning in the Kernel Regime

    Authors: Bobak T. Kiani, Randall Balestriero, Yubei Chen, Seth Lloyd, Yann LeCun

    Abstract: The fundamental goal of self-supervised learning (SSL) is to produce useful representations of data without access to any labels for classifying the data. Modern methods in SSL, which form representations based on known or constructed relationships between samples, have been particularly effective at this task. Here, we aim to extend this framework to incorporate algorithms based on kernel methods… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  27. arXiv:2209.14778  [pdf, other

    cs.LG cs.AI cs.CG cs.CV stat.ML

    Batch Normalization Explained

    Authors: Randall Balestriero, Richard G. Baraniuk

    Abstract: A critically important, ubiquitous, and yet poorly understood ingredient in modern deep networks (DNs) is batch normalization (BN), which centers and normalizes the feature maps. To date, only limited progress has been made understanding why BN boosts DN learning and inference performance; work has focused exclusively on showing that BN smooths a DN's loss landscape. In this paper, we study BN the… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  28. arXiv:2207.10081  [pdf, other

    cs.LG cs.AI

    What Do We Maximize in Self-Supervised Learning?

    Authors: Ravid Shwartz-Ziv, Randall Balestriero, Yann LeCun

    Abstract: In this paper, we examine self-supervised learning methods, particularly VICReg, to provide an information-theoretical understanding of their construction. As a first step, we demonstrate how information-theoretic quantities can be obtained for a deterministic network, offering a possible alternative to prior work that relies on stochastic models. This enables us to demonstrate how VICReg can be (… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  29. arXiv:2206.13378  [pdf, other

    cs.LG

    Guillotine Regularization: Why removing layers is needed to improve generalization in Self-Supervised Learning

    Authors: Florian Bordes, Randall Balestriero, Quentin Garrido, Adrien Bardes, Pascal Vincent

    Abstract: One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method, and using this network on downstream tasks but with its last few projector layers entirely removed. This trick of throwing away the projector is actually critical for SSL methods to display competitive performances on ImageNet for which more than 30 percentag… ▽ More

    Submitted 9 June, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted at TMLR 2023

  30. arXiv:2205.11508  [pdf, other

    cs.LG cs.AI cs.CV math.SP stat.ML

    Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods

    Authors: Randall Balestriero, Yann LeCun

    Abstract: Self-Supervised Learning (SSL) surmises that inputs and pairwise positive relationships are enough to learn meaningful representations. Although SSL has recently reached a milestone: outperforming supervised methods in many modalities\dots the theoretical foundations are limited, method-specific, and fail to provide principled design guidelines to practitioners. In this paper, we propose a unifyin… ▽ More

    Submitted 10 June, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

  31. arXiv:2204.03632  [pdf, other

    cs.LG cs.CV stat.ML

    The Effects of Regularization and Data Augmentation are Class Dependent

    Authors: Randall Balestriero, Leon Bottou, Yann LeCun

    Abstract: Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that… ▽ More

    Submitted 8 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

  32. arXiv:2204.03145  [pdf, other

    stat.AP cs.LG stat.ML

    DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors

    Authors: Vishwanath Saragadam, Randall Balestriero, Ashok Veeraraghavan, Richard G. Baraniuk

    Abstract: DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-squared approximati… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: 14 pages

  33. arXiv:2203.05483  [pdf, other

    cs.LG cs.AI quant-ph

    projUNN: efficient method for training deep networks with unitary matrices

    Authors: Bobak Kiani, Randall Balestriero, Yann LeCun, Seth Lloyd

    Abstract: In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-$k$ updates -- or their rank-$k$ approxi… ▽ More

    Submitted 13 October, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

  34. Singular Value Perturbation and Deep Network Optimization

    Authors: Rudolf H. Riedi, Randall Balestriero, Richard G. Baraniuk

    Abstract: We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network. In particular, we explain analytically what deep learning practitioners have long observed empirically: the parameters of some deep architectures (e.g., residual networks, ResNets, and Dense networks, DenseNets) are easier to optimize than others (e.g., convol… ▽ More

    Submitted 5 December, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: Constr Approx (2022)

  35. arXiv:2203.02502  [pdf, other

    cs.LG cs.AI

    No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Anastasios Kyrillidis, Richard Baraniuk

    Abstract: Centroid based clustering methods such as k-means, k-medoids and k-centers are heavily applied as a go-to tool in exploratory data analysis. In many cases, those methods are used to obtain representative centroids of the data manifold for visualization or summarization of a dataset. Real world datasets often contain inherent abnormalities, e.g., repeated samples and sampling bias, that manifest im… ▽ More

    Submitted 15 June, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted for ICASSP 2022, 8 figures, 1 table

  36. arXiv:2203.01993  [pdf, other

    cs.CV

    Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: We present Polarity Sampling, a theoretically justified plug-and-play method for controlling the generation quality and diversity of pre-trained deep generative networks DGNs). Leveraging the fact that DGNs are, or can be approximated by, continuous piecewise affine splines, we derive the analytical DGN output space distribution as a function of the product of the DGN's Jacobian singular values ra… ▽ More

    Submitted 6 May, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: 20 pages, 16 figures, CVPR 2022 Oral, Camera Ready

  37. arXiv:2202.11811  [pdf, other

    cs.LG

    NeuroView-RNN: It's About Time

    Authors: CJ Barberan, Sina Alemohammad, Naiming Liu, Randall Balestriero, Richard G. Baraniuk

    Abstract: Recurrent Neural Networks (RNNs) are important tools for processing sequential data such as time-series or video. Interpretability is defined as the ability to be understood by a person and is different from explainability, which is the ability to be explained in a mathematical formulation. A key interpretability issue with RNNs is that it is not clear how each hidden state per time step contribut… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: 21 pages, 13 figures, 9 tables

  38. arXiv:2202.08325  [pdf, other

    cs.LG cs.CV

    A Data-Augmentation Is Worth A Thousand Samples: Exact Quantification From Analytical Augmented Sample Moments

    Authors: Randall Balestriero, Ishan Misra, Yann LeCun

    Abstract: Data-Augmentation (DA) is known to improve performance across tasks and datasets. We propose a method to theoretically analyze the effect of DA and study questions such as: how many augmented samples are needed to correctly estimate the information encoded by that DA? How does the augmentation policy impact the final parameters of a model? We derive several quantities in close-form, such as the ex… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

  39. arXiv:2202.07829  [pdf, other

    cs.LG cs.CV

    Spatial Transformer K-Means

    Authors: Romain Cosentino, Randall Balestriero, Yanis Bahroun, Anirvan Sengupta, Richard Baraniuk, Behnaam Aazhang

    Abstract: K-means defines one of the most employed centroid-based clustering algorithms with performances tied to the data's embedding. Intricate data embeddings have been designed to push $K$-means performances at the cost of reduced theoretical guarantees and interpretability of the results. Instead, we propose preserving the intrinsic data space and augment K-means with a similarity measure invariant to… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2012.09743

  40. arXiv:2112.09164  [pdf, other

    cs.LG cs.AI

    High Fidelity Visualization of What Your Self-Supervised Representation Knows About

    Authors: Florian Bordes, Randall Balestriero, Pascal Vincent

    Abstract: Discovering what is learned by neural networks remains a challenge. In self-supervised learning, classification is the most common task used to evaluate how good a representation is. However, relying only on such downstream task can limit our understanding of what information is retained in the representation of a given input. In this work, we showcase the use of a Representation Conditional Diffu… ▽ More

    Submitted 16 August, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted at TMLR 2022

  41. arXiv:2110.09485  [pdf, other

    cs.LG cs.CV

    Learning in High Dimension Always Amounts to Extrapolation

    Authors: Randall Balestriero, Jerome Pesenti, Yann LeCun

    Abstract: The notion of interpolation and extrapolation is fundamental in various fields from deep learning to function approximation. Interpolation occurs for a sample $x$ whenever this sample falls inside or on the boundary of the given dataset's convex hull. Extrapolation occurs when $x$ falls outside of that convex hull. One fundamental (mis)conception is that state-of-the-art algorithms work so well be… ▽ More

    Submitted 29 October, 2021; v1 submitted 18 October, 2021; originally announced October 2021.

  42. arXiv:2110.08009  [pdf, other

    cs.LG cs.CV

    MaGNET: Uniform Sampling from Deep Generative Network Manifolds Without Retraining

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: Deep Generative Networks (DGNs) are extensively employed in Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and their variants to approximate the data manifold and distribution. However, training samples are often distributed in a non-uniform fashion on the manifold, due to costs or convenience of collection. For example, the CelebA dataset contains a large fraction of smi… ▽ More

    Submitted 20 January, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: ICLR Accepted version, 28 pages, 23 figures

  43. arXiv:2110.07778  [pdf, other

    cs.CV cs.LG

    NeuroView: Explainable Deep Network Decision Making

    Authors: CJ Barberan, Randall Balestriero, Richard G. Baraniuk

    Abstract: Deep neural networks (DNs) provide superhuman performance in numerous computer vision tasks, yet it remains unclear exactly which of a DN's units contribute to a particular decision. NeuroView is a new family of DN architectures that are interpretable/explainable by design. Each member of the family is derived from a standard DN architecture by vector quantizing the unit output values and feeding… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: 12 pages, 7 figures

  44. arXiv:2104.00219  [pdf, other

    cs.LG

    Fast Jacobian-Vector Product for Deep Networks

    Authors: Randall Balestriero, Richard Baraniuk

    Abstract: Jacobian-vector products (JVPs) form the backbone of many recent developments in Deep Networks (DNs), with applications including faster constrained optimization, regularization with generalization guarantees, and adversarial example sensitivity assessments. Unfortunately, JVPs are computationally expensive for real world DN architectures and require the use of automatic differentiation to avoid m… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

  45. arXiv:2101.02338  [pdf, other

    cs.LG cs.AI

    Max-Affine Spline Insights Into Deep Network Pruning

    Authors: Haoran You, Randall Balestriero, Zhihan Lu, Yutong Kou, Huihong Shi, Shunyao Zhang, Shang Wu, Yingyan Lin, Richard Baraniuk

    Abstract: In this paper, we study the importance of pruning in Deep Networks (DNs) and the yin & yang relationship between (1) pruning highly overparametrized DNs that have been trained from random initialization and (2) training small DNs that have been "cleverly" initialized. As in most cases practitioners can only resort to random initialization, there is a strong need to develop a grounded understanding… ▽ More

    Submitted 18 August, 2022; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: Accepted by TMLR

  46. arXiv:2012.09743  [pdf, other

    cs.CV math.GR

    Interpretable Image Clustering via Diffeomorphism-Aware K-Means

    Authors: Romain Cosentino, Randall Balestriero, Yanis Bahroun, Anirvan Sengupta, Richard Baraniuk, Behnaam Aazhang

    Abstract: We design an interpretable clustering algorithm aware of the nonlinear structure of image manifolds. Our approach leverages the interpretability of $K$-means applied in the image space while addressing its clustering performance issues. Specifically, we develop a measure of similarity between images and centroids that encompasses a general class of deformations: diffeomorphisms, rendering the clus… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

  47. arXiv:2012.07662  [pdf, other

    stat.ML cs.LG

    Sparse Multi-Family Deep Scattering Network

    Authors: Romain Cosentino, Randall Balestriero

    Abstract: In this work, we propose the Sparse Multi-Family Deep Scattering Network (SMF-DSN), a novel architecture exploiting the interpretability of the Deep Scattering Network (DSN) and improving its expressive power. The DSN extracts salient and interpretable features in signals by cascading wavelet transforms, complex modulus and extract the representation of the data via a translation-invariant operato… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1712.09117

  48. arXiv:2012.04859  [pdf, other

    cs.LG stat.ML

    Enhanced Recurrent Neural Tangent Kernels for Non-Time-Series Data

    Authors: Sina Alemohammad, Randall Balestriero, Zichao Wang, Richard Baraniuk

    Abstract: Kernels derived from deep neural networks (DNNs) in the infinite-width regime provide not only high performance in a range of machine learning tasks but also new theoretical insights into DNN training dynamics and generalization. In this paper, we extend the family of kernels associated with recurrent neural networks (RNNs), which were previously derived only for simple RNNs, to more complex archi… ▽ More

    Submitted 19 October, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

  49. arXiv:2010.13975  [pdf, other

    eess.SP cs.LG

    Wearing a MASK: Compressed Representations of Variable-Length Sequences Using Recurrent Neural Tangent Kernels

    Authors: Sina Alemohammad, Hossein Babaei, Randall Balestriero, Matt Y. Cheung, Ahmed Imtiaz Humayun, Daniel LeJeune, Naiming Liu, Lorenzo Luzi, Jasper Tan, Zichao Wang, Richard G. Baraniuk

    Abstract: High dimensionality poses many challenges to the use of data, from visualization and interpretation, to prediction and storage for historical preservation. Techniques abound to reduce the dimensionality of fixed-length sequences, yet these methods rarely generalize to variable-length sequences. To address this gap, we extend existing methods that rely on the use of kernels to variable-length seque… ▽ More

    Submitted 17 April, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

  50. arXiv:2009.09525  [pdf, other

    cs.LG math.GR stat.ML

    Deep Autoencoders: From Understanding to Generalization Guarantees

    Authors: Romain Cosentino, Randall Balestriero, Richard Baraniuk, Behnaam Aazhang

    Abstract: A big mystery in deep learning continues to be the ability of methods to generalize when the number of model parameters is larger than the number of training examples. In this work, we take a step towards a better understanding of the underlying phenomena of Deep Autoencoders (AEs), a mainstream deep learning solution for learning compressed, interpretable, and structured data representations. In… ▽ More

    Submitted 24 November, 2021; v1 submitted 20 September, 2020; originally announced September 2020.

    Journal ref: R. Cosentino, R. Balestriero, R. Baraniuk, B. Aazhang, 2nd Annual Conference on Mathematical and Scientific Machine Learning (2021)