Skip to main content

Showing 1–50 of 155 results for author: LeCun, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19314  [pdf, other

    cs.CL cs.AI cs.LG

    LiveBench: A Challenging, Contamination-Free LLM Benchmark

    Authors: Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Ben Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, Chinmay Hegde, Yann LeCun, Tom Goldstein, Willie Neiswanger, Micah Goldblum

    Abstract: Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.16860  [pdf, other

    cs.CV

    Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

    Authors: Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

    Abstract: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Website at https://cambrian-mllm.github.io

  3. arXiv:2406.11463  [pdf, other

    cs.LG stat.ML

    Just How Flexible are Neural Networks in Practice?

    Authors: Ravid Shwartz-Ziv, Micah Goldblum, Arpit Bansal, C. Bayan Bruss, Yann LeCun, Andrew Gordon Wilson

    Abstract: It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.09366  [pdf, other

    cs.LG cs.CV q-bio.NC

    Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

    Authors: Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

    Abstract: Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to impro… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2405.18418  [pdf, other

    cs.LG cs.CV cs.RO

    Hierarchical World Models as Visual Whole-Body Humanoid Controllers

    Authors: Nicklas Hansen, Jyothir S V, Vlad Sobal, Yann LeCun, Xiaolong Wang, Hao Su

    Abstract: Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, rewa… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Code and videos at https://nicklashansen.com/rlpuppeteer

  6. arXiv:2405.15802  [pdf

    cs.SE cs.AI

    Towards a Framework for Openness in Foundation Models: Proceedings from the Columbia Convening on Openness in Artificial Intelligence

    Authors: Adrien Basdevant, Camille François, Victor Storchan, Kevin Bankston, Ayah Bdeir, Brian Behlendorf, Merouane Debbah, Sayash Kapoor, Yann LeCun, Mark Surman, Helen King-Turvey, Nathan Lambert, Stefano Maffulli, Nik Marda, Govind Shivkumar, Justine Tunney

    Abstract: Over the past year, there has been a robust debate about the benefits and risks of open sourcing foundation models. However, this discussion has often taken place at a high level of generality or with a narrow focus on specific technical attributes. In part, this is because defining open source for foundation models has proven tricky, given its significant differences from traditional software dev… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  7. arXiv:2405.10292  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

    Authors: Yuexiang Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine

    Abstract: Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic… ▽ More

    Submitted 16 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  8. arXiv:2405.05012  [pdf, other

    cs.CV

    The Entropy Enigma: Success and Failure of Entropy Minimization

    Authors: Ori Press, Ravid Shwartz-Ziv, Yann LeCun, Matthias Bethge

    Abstract: Entropy minimization (EM) is frequently used to increase the accuracy of classification models when they're faced with new data at test time. EM is a self-supervised learning method that optimizes classifiers to assign even higher probabilities to their top predicted classes. In this paper, we analyze why EM works when adapting a model for a few steps and why it eventually fails after adapting for… ▽ More

    Submitted 12 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  9. arXiv:2405.01469  [pdf, other

    cs.CV cs.AI

    Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning

    Authors: Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann LeCun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, Maria Vakalopoulou

    Abstract: AI Foundation models are gaining traction in various applications, including medical fields like radiology. However, medical foundation models are often tested on limited tasks, leaving their generalisability and biases unexplored. We present RayDINO, a large visual encoder trained by self-supervision on 873k chest X-rays. We compare RayDINO to previous state-of-the-art models across nine radiolog… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  10. arXiv:2404.09991  [pdf, other

    cs.RO cs.CV

    EgoPet: Egomotion and Interaction Data from an Animal's Perspective

    Authors: Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann LeCun, Amir Globerson, Trevor Darrell

    Abstract: Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction.… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: https://www.amirbar.net/egopet

  11. arXiv:2404.08471  [pdf, other

    cs.CV cs.AI cs.LG

    Revisiting Feature Prediction for Learning Visual Representations from Video

    Authors: Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann LeCun, Mahmoud Assran, Nicolas Ballas

    Abstract: This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datase… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.

  12. arXiv:2403.00504  [pdf, other

    cs.CV cs.AI cs.LG

    Learning and Leveraging World Models in Visual Representation Learning

    Authors: Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, Yann LeCun

    Abstract: Joint-Embedding Predictive Architecture (JEPA) has emerged as a promising self-supervised approach that learns by leveraging a world model. While previously limited to predicting missing parts of an input, we explore how to generalize the JEPA prediction task to a broader set of corruptions. We introduce Image World Models, an approach that goes beyond masked image modeling and learns to predict t… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 23 pages, 16 figures

  13. arXiv:2402.11337  [pdf, other

    cs.CV cs.AI stat.ML

    Learning by Reconstruction Produces Uninformative Features For Perception

    Authors: Randall Balestriero, Yann LeCun

    Abstract: Input space reconstruction is an attractive representation learning paradigm. Despite interpretability of the reconstruction and generation, we identify a misalignment between learning by reconstruction, and learning for perception. We show that the former allocates a model's capacity towards a subspace of the data explaining the observed variance--a subspace with uninformative features for the la… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  14. arXiv:2402.07630  [pdf, other

    cs.LG

    G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering

    Authors: Xiaoxin He, Yijun Tian, Yifei Sun, Nitesh V. Chawla, Thomas Laurent, Yann LeCun, Xavier Bresson, Bryan Hooi

    Abstract: Given a graph with textual attributes, we enable users to `chat with their graph': that is, to ask questions about the graph using a conversational interface. In response to a user's questions, our method provides textual replies and highlights the relevant parts of the graph. While existing works integrate large language models (LLMs) and graph neural networks (GNNs) in various ways, they mostly… ▽ More

    Submitted 27 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  15. arXiv:2401.11188  [pdf, other

    cs.LG cs.AI

    Fast and Exact Enumeration of Deep Networks Partitions Regions

    Authors: Randall Balestriero, Yann LeCun

    Abstract: One fruitful formulation of Deep Networks (DNs) enabling their theoretical study and providing practical guidelines to practitioners relies on Piecewise Affine Splines. In that realm, a DN's input-map** is expressed as per-region affine map** where those regions are implicitly determined by the model's architecture and form a partition of their input space. That partition -- which is involved… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  16. arXiv:2401.06209  [pdf, other

    cs.CV

    Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

    Authors: Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie

    Abstract: Is vision good enough for language? Recent advancements in multimodal models primarily stem from the powerful reasoning abilities of large language models (LLMs). However, the visual component typically depends only on the instance-level contrastive language-image pre-training (CLIP). Our research reveals that the visual capabilities in recent multimodal LLMs (MLLMs) still exhibit systematic short… ▽ More

    Submitted 25 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Project page: https://tsb0601.github.io/mmvp_blog/

  17. arXiv:2312.17227  [pdf, other

    cs.LG cs.AI

    Gradient-based Planning with World Models

    Authors: Jyothir S V, Siddhartha Jalagam, Yann LeCun, Vlad Sobal

    Abstract: The enduring challenge in the field of artificial intelligence has been the control of systems to achieve desired behaviours. While for systems governed by straightforward dynamics equations, methods like Linear Quadratic Regulation (LQR) have historically proven highly effective, most real-world tasks, which require a general problem-solver, demand world models with dynamics that cannot be easily… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  18. arXiv:2311.12983  [pdf, other

    cs.CL cs.AI

    GAIA: a benchmark for General AI Assistants

    Authors: Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom

    Abstract: We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human r… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  19. arXiv:2310.04496  [pdf, other

    cs.CV cs.LG

    URLOST: Unsupervised Representation Learning without Stationarity or Topology

    Authors: Zeyu Yun, Juexiao Zhang, Bruno Olshausen, Yann LeCun, Yubei Chen

    Abstract: Unsupervised representation learning has seen tremendous progress but is constrained by its reliance on data modality-specific stationarity and topology, a limitation not found in biological intelligence systems. For instance, human vision processes visual signals derived from irregular and non-stationary sampling lattices yet accurately perceives the geometry of the world. We introduce a novel fr… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: 17 pages, 7 figures

  20. arXiv:2308.00566  [pdf, other

    cs.CV cs.AI cs.LG

    Stochastic positional embeddings improve masked image modeling

    Authors: Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann LeCun

    Abstract: Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determi… ▽ More

    Submitted 27 February, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: Code and models available in https://github.com/amirbar/StoP

  21. arXiv:2307.12698  [pdf, other

    cs.CV cs.AI cs.LG

    MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features

    Authors: Adrien Bardes, Jean Ponce, Yann LeCun

    Abstract: Self-supervised learning of visual representations has been focusing on learning content features, which do not capture object motion or location, and focus on identifying and differentiating objects in images and videos. On the other hand, optical flow estimation is a task that does not involve understanding the content of the images on which it is estimated. We unify the two approaches and intro… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  22. arXiv:2307.05432  [pdf, other

    cs.LG math.NA

    Self-Supervised Learning with Lie Symmetries for Partial Differential Equations

    Authors: Grégoire Mialon, Quentin Garrido, Hannah Lawrence, Danyal Rehman, Yann LeCun, Bobak T. Kiani

    Abstract: Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations… ▽ More

    Submitted 14 February, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  23. arXiv:2306.13292  [pdf, other

    cs.LG cs.AI cs.CV

    Variance-Covariance Regularization Improves Representation Learning

    Authors: Jiachen Zhu, Katrina Evtimova, Yubei Chen, Ravid Shwartz-Ziv, Yann LeCun

    Abstract: Transfer learning plays a key role in advancing machine learning models, yet conventional supervised pretraining often undermines feature transferability by prioritizing features that minimize the pretraining loss. In this work, we adapt a self-supervised learning regularization technique from the VICReg method to supervised learning contexts, introducing Variance-Covariance Regularization (VCReg)… ▽ More

    Submitted 22 February, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: 165 pages, 5 figures

  24. arXiv:2306.02572  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence

    Authors: Anna Dawid, Yann LeCun

    Abstract: Current automated systems have crucial limitations that need to be addressed before artificial intelligence can reach human-like levels and bring new technological revolutions. Among others, our societies still lack Level 5 self-driving cars, domestic robots, and virtual assistants that learn reliable world models, reason, and plan complex action sequences. In these notes, we summarize the main id… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: 23 pages + 1-page appendix, 11 figures. These notes follow the content of three lectures given by Yann LeCun during the Les Houches Summer School on Statistical Physics and Machine Learning in 2022. Feedback and comments are most welcome!

  25. arXiv:2305.19523  [pdf, other

    cs.LG

    Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning

    Authors: Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Yann LeCun, Bryan Hooi

    Abstract: Representation learning on text-attributed graphs (TAGs) has become a critical research problem in recent years. A typical example of a TAG is a paper citation graph, where the text of each paper serves as node attributes. Initial graph neural network (GNN) pipelines handled these text attributes by transforming them into shallow or hand-crafted features, such as skip-gram or bag-of-words features… ▽ More

    Submitted 6 March, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: In Proceedings of ICLR 2024

  26. arXiv:2305.15614  [pdf, other

    cs.LG cs.AI

    Reverse Engineering Self-Supervised Learning

    Authors: Ido Ben-Shaul, Ravid Shwartz-Ziv, Tomer Galanti, Shai Dekel, Yann LeCun

    Abstract: Self-supervised learning (SSL) is a powerful tool in machine learning, but understanding the learned representations and their underlying mechanisms remains a challenge. This paper presents an in-depth empirical analysis of SSL-trained representations, encompassing diverse models, architectures, and hyperparameters. Our study reveals an intriguing aspect of the SSL training process: it inherently… ▽ More

    Submitted 31 May, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  27. arXiv:2304.12210  [pdf, other

    cs.LG cs.CV

    A Cookbook of Self-Supervised Learning

    Authors: Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Gei**, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun, Micah Goldblum

    Abstract: Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier… ▽ More

    Submitted 28 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

  28. arXiv:2304.09355  [pdf, other

    cs.LG cs.IT

    To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review

    Authors: Ravid Shwartz-Ziv, Yann LeCun

    Abstract: Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory, and notably the information bottleneck principle, has been pivotal in sha** deep neural networks. This principle focuses on optimizing the trade-off be… ▽ More

    Submitted 21 November, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

  29. arXiv:2304.03977  [pdf, other

    cs.CV cs.AI

    EMP-SSL: Towards Self-Supervised Learning in One Training Epoch

    Authors: Shengbang Tong, Yubei Chen, Yi Ma, Yann Lecun

    Abstract: Recently, self-supervised learning (SSL) has achieved tremendous success in learning image representation. Despite the empirical success, most self-supervised learning methods are rather "inefficient" learners, typically taking hundreds of training epochs to fully converge. In this work, we show that the key towards efficient self-supervised learning is to increase the number of crops from each im… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

  30. arXiv:2303.15256  [pdf, other

    cs.LG cs.AI cs.HC

    Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need

    Authors: Vivien Cabannes, Leon Bottou, Yann Lecun, Randall Balestriero

    Abstract: Self-Supervised Learning (SSL) has emerged as the solution of choice to learn transferable representations from unlabeled data. However, SSL requires to build samples that are known to be semantically akin, i.e. positive views. Requiring such knowledge is the main limitation of SSL and is often tackled by ad-hoc strategies e.g. applying known data-augmentations to the same input. In this work, we… ▽ More

    Submitted 29 September, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 8 main pages, 20 totals, 10 figures

    ACM Class: I.2.6

  31. arXiv:2303.00633  [pdf, other

    cs.IT cs.AI

    An Information-Theoretic Perspective on Variance-Invariance-Covariance Regularization

    Authors: Ravid Shwartz-Ziv, Randall Balestriero, Kenji Kawaguchi, Tim G. J. Rudner, Yann LeCun

    Abstract: Variance-Invariance-Covariance Regularization (VICReg) is a self-supervised learning (SSL) method that has shown promising results on a variety of tasks. However, the fundamental mechanisms underlying VICReg remain unexplored. In this paper, we present an information-theoretic perspective on the VICReg objective. We begin by deriving information-theoretic quantities for deterministic networks as a… ▽ More

    Submitted 1 May, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

  32. arXiv:2302.10283  [pdf, other

    cs.CV cs.AI cs.LG

    Self-supervised learning of Split Invariant Equivariant representations

    Authors: Quentin Garrido, Laurent Najman, Yann Lecun

    Abstract: Recent progress has been made towards learning invariant or equivariant representations with self-supervised learning. While invariant methods are evaluated on large scale datasets, equivariant ones are evaluated in smaller, more controlled, settings. We aim at bridging the gap between the two in order to learn more diverse representations that are suitable for a wide range of tasks. We start by i… ▽ More

    Submitted 19 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Journal ref: The Fortieth International Conference on Machine Learning, 2023, Honolulu, United States

  33. arXiv:2302.07842  [pdf, ps, other

    cs.CL

    Augmented Language Models: a Survey

    Authors: Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, Thomas Scialom

    Abstract: This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demo… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  34. arXiv:2302.02774  [pdf, other

    stat.ML cs.AI cs.LG math.ST

    The SSL Interplay: Augmentations, Inductive Bias, and Generalization

    Authors: Vivien Cabannes, Bobak T. Kiani, Randall Balestriero, Yann LeCun, Alberto Bietti

    Abstract: Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architect… ▽ More

    Submitted 1 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    MSC Class: 68Q32 ACM Class: G.3

    Journal ref: Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023

  35. arXiv:2302.01647  [pdf, other

    cs.CV cs.AI cs.LG

    Blockwise Self-Supervised Learning at Scale

    Authors: Shoaib Ahmed Siddiqui, David Krueger, Yann LeCun, Stéphane Deny

    Abstract: Current state-of-the-art deep networks are all powered by backpropagation. In this paper, we explore alternatives to full backpropagation in the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. We show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins' loss functi… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

  36. arXiv:2301.08243  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

    Authors: Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas

    Abstract: This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target block… ▽ More

    Submitted 13 April, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: 2023 IEEE/CVF International Conference on Computer Vision

  37. arXiv:2212.13350  [pdf, other

    cs.CV

    A Generalization of ViT/MLP-Mixer to Graphs

    Authors: Xiaoxin He, Bryan Hooi, Thomas Laurent, Adam Perold, Yann LeCun, Xavier Bresson

    Abstract: Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significant… ▽ More

    Submitted 30 May, 2023; v1 submitted 26 December, 2022; originally announced December 2022.

    Comments: In Proceedings of ICML 2023

  38. arXiv:2211.10831  [pdf, other

    cs.LG

    Joint Embedding Predictive Architectures Focus on Slow Features

    Authors: Vlad Sobal, Jyothir S V, Siddhartha Jalagam, Nicolas Carion, Kyunghyun Cho, Yann LeCun

    Abstract: Many common methods for learning a world model for pixel-based environments use generative architectures trained with pixel-level reconstruction objectives. Recently proposed Joint Embedding Predictive Architectures (JEPA) offer a reconstruction-free alternative. In this work, we analyze performance of JEPA trained with VICReg and SimCLR objectives in the fully offline setting without access to re… ▽ More

    Submitted 19 November, 2022; originally announced November 2022.

    Comments: 4 pages (3 figures) short paper for SSL Theory and Practice workshop at NeurIPS 2022. Code is available at https://github.com/vladisai/JEPA_SSL_NeurIPS_2022

  39. arXiv:2211.01340  [pdf, other

    cs.LG cs.CV stat.ML

    POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks

    Authors: Randall Balestriero, Yann LeCun

    Abstract: Deep Neural Networks (DNNs) outshine alternative function approximators in many settings thanks to their modularity in composing any desired differentiable operator. The formed parametrized functional is then tuned to solve a task at hand from simple gradient descent. This modularity comes at the cost of making strict enforcement of constraints on DNNs, e.g. from a priori knowledge of the task, or… ▽ More

    Submitted 10 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  40. arXiv:2210.16782  [pdf, other

    cs.CV

    Unsupervised Learning of Structured Representations via Closed-Loop Transcription

    Authors: Shengbang Tong, Xili Dai, Yubei Chen, Mingyang Li, Zengyi Li, Brent Yi, Yann LeCun, Yi Ma

    Abstract: This paper proposes an unsupervised method for learning a unified representation that serves both discriminative and generative purposes. While most existing unsupervised learning approaches focus on a representation for only one of these two goals, we show that a unified representation can enjoy the mutual benefits of having both. Such a representation is attainable by generalizing the recently p… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: 17 pages

  41. arXiv:2210.08340  [pdf

    cs.AI q-bio.NC

    Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution

    Authors: Anthony Zador, Sean Escola, Blake Richards, Bence Ölveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, Claudia Clopath, James DiCarlo, Surya Ganguli, Jeff Hawkins, Konrad Koerding, Alexei Koulakov, Yann LeCun, Timothy Lillicrap, Adam Marblestone, Bruno Olshausen, Alexandre Pouget, Cristina Savin, Terrence Sejnowski, Eero Simoncelli, Sara Solla, David Sussillo , et al. (2 additional authors not shown)

    Abstract: Neuroscience has long been an essential driver of progress in artificial intelligence (AI). We propose that to accelerate progress in AI, we must invest in fundamental research in NeuroAI. A core component of this is the embodied Turing test, which challenges AI animal models to interact with the sensorimotor world at skill levels akin to their living counterparts. The embodied Turing test shifts… ▽ More

    Submitted 22 February, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

    Comments: White paper, 10 pages + 8 pages of references, 1 figures

  42. arXiv:2210.04135  [pdf, other

    cs.CV cs.LG cs.MM

    VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

    Authors: Shraman Pramanick, Li **g, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann LeCun, Rama Chellappa

    Abstract: Vision-language pre-training (VLP) has recently proven highly effective for various uni- and multi-modal downstream applications. However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression comprehension. Unfortunately, such high-resolution images with accu… ▽ More

    Submitted 29 October, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

    Comments: Published in TMLR 2023

  43. arXiv:2210.02885  [pdf, other

    cs.LG cs.AI cs.CV

    RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank

    Authors: Quentin Garrido, Randall Balestriero, Laurent Najman, Yann Lecun

    Abstract: Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL's core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non info… ▽ More

    Submitted 26 June, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Journal ref: The Fortieth International Conference on Machine Learning, 2023, Honolulu, United States

  44. arXiv:2210.01571  [pdf, other

    cs.CV cs.AI cs.LG

    VICRegL: Self-Supervised Learning of Local Visual Features

    Authors: Adrien Bardes, Jean Ponce, Yann LeCun

    Abstract: Most recent self-supervised methods for learning image representations focus on either producing a global feature with invariance properties, or producing a set of local features. The former works best for classification tasks while the latter is best for detection and segmentation tasks. This paper explores the fundamental trade-off between learning local and global features. A new method called… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022

  45. arXiv:2209.15261  [pdf, other

    cs.LG cs.CV stat.ML

    Minimalistic Unsupervised Learning with the Sparse Manifold Transform

    Authors: Yubei Chen, Zeyu Yun, Yi Ma, Bruno Olshausen, Yann LeCun

    Abstract: We describe a minimalistic and interpretable method for unsupervised learning, without resorting to data augmentation, hyperparameter tuning, or other engineering designs, that achieves performance close to the SOTA SSL methods. Our approach leverages the sparse manifold transform, which unifies sparse coding, manifold learning, and slow feature analysis. With a one-layer deterministic sparse mani… ▽ More

    Submitted 27 April, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: This paper is published at ICLR 2023

    Journal ref: The Eleventh International Conference on Learning Representations (2023)

  46. arXiv:2209.14905  [pdf, other

    cs.LG

    Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

    Authors: Grégoire Mialon, Randall Balestriero, Yann LeCun

    Abstract: Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector's output. This study highlights important properties of such strategy, which we coin Variance-Covariance regularization (VCReg). More precisely, we show that {\em VCReg combined to a MLP projector… ▽ More

    Submitted 14 February, 2024; v1 submitted 29 September, 2022; originally announced September 2022.

  47. arXiv:2209.14884  [pdf, other

    cs.LG cs.AI stat.ML

    Joint Embedding Self-Supervised Learning in the Kernel Regime

    Authors: Bobak T. Kiani, Randall Balestriero, Yubei Chen, Seth Lloyd, Yann LeCun

    Abstract: The fundamental goal of self-supervised learning (SSL) is to produce useful representations of data without access to any labels for classifying the data. Modern methods in SSL, which form representations based on known or constructed relationships between samples, have been particularly effective at this task. Here, we aim to extend this framework to incorporate algorithms based on kernel methods… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  48. arXiv:2208.12345  [pdf, other

    cs.LG cs.AI

    Light-weight probing of unsupervised representations for Reinforcement Learning

    Authors: Wancong Zhang, Anthony GX-Chen, Vlad Sobal, Yann LeCun, Nicolas Carion

    Abstract: Unsupervised visual representation learning offers the opportunity to leverage large corpora of unlabeled trajectories to form useful visual representations, which can benefit the training of reinforcement learning (RL) algorithms. However, evaluating the fitness of such representations requires training RL algorithms which is computationally intensive and has high variance outcomes. Inspired by t… ▽ More

    Submitted 31 May, 2024; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: To appear in the proceedings of the Reinforcement Learning Conference 2024

  49. arXiv:2207.10081  [pdf, other

    cs.LG cs.AI

    What Do We Maximize in Self-Supervised Learning?

    Authors: Ravid Shwartz-Ziv, Randall Balestriero, Yann LeCun

    Abstract: In this paper, we examine self-supervised learning methods, particularly VICReg, to provide an information-theoretical understanding of their construction. As a first step, we demonstrate how information-theoretic quantities can be obtained for a deterministic network, offering a possible alternative to prior work that relies on stochastic models. This enables us to demonstrate how VICReg can be (… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  50. arXiv:2206.10698  [pdf, other

    cs.CV cs.AI cs.LG

    TiCo: Transformation Invariance and Covariance Contrast for Self-Supervised Visual Representation Learning

    Authors: Jiachen Zhu, Rafael M. Moraes, Serkan Karakulak, Vlad Sobol, Alfredo Canziani, Yann LeCun

    Abstract: We present Transformation Invariance and Covariance Contrast (TiCo) for self-supervised visual representation learning. Similar to other recent self-supervised learning methods, our method is based on maximizing the agreement among embeddings of different distorted versions of the same image, which pushes the encoder to produce transformation invariant representations. To avoid the trivial solutio… ▽ More

    Submitted 23 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.