Skip to main content

Showing 1–19 of 19 results for author: Hénaff, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17711  [pdf, other

    cs.LG cs.AI

    Data curation via joint example selection further accelerates multimodal learning

    Authors: Talfan Evans, Nikhil Parthasarathy, Hamza Merzic, Olivier J. Henaff

    Abstract: Data curation is an essential component of large-scale pretraining. In this work, we demonstrate that jointly selecting batches of data is more effective for learning than selecting examples independently. Multimodal contrastive objectives expose the dependencies between data and thus naturally yield criteria for measuring the joint learnability of a batch. We derive a simple and tractable algorit… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Main text: 9 pages, 5 figures, 3 tables, 1 algorithm. Appendix: 7 pages, 5 figures, 1 table, 2. algorithm

  2. arXiv:2406.09384  [pdf, other

    cs.LG cs.CV

    Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models

    Authors: Lukas Thede, Karsten Roth, Olivier J. Hénaff, Matthias Bethge, Zeynep Akata

    Abstract: With the advent and recent ubiquity of foundation models, continual learning (CL) has recently shifted from continual training from scratch to the continual adaptation of pretrained models, seeing particular success on rehearsal-free CL benchmarks (RFCL). To achieve this, most proposed methods adapt and restructure parameter-efficient finetuning techniques (PEFT) to suit the continual nature of th… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 3rd Conference on Lifelong Learning Agents (CoLLAs) 2024

  3. arXiv:2402.05861  [pdf, other

    cs.CV

    Memory Consolidation Enables Long-Context Video Understanding

    Authors: Ivana Balažević, Yuge Shi, Pinelopi Papalampidi, Rahma Chaabouni, Skanda Koppula, Olivier J. Hénaff

    Abstract: Most transformer-based video encoders are limited to short temporal contexts due to their quadratic complexity. While various attempts have been made to extend this context, this has often come at the cost of both conceptual and computational complexity. We propose to instead re-purpose existing pre-trained video transformers by simply fine-tuning them to attend to memories derived non-parametrica… ▽ More

    Submitted 31 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  4. arXiv:2312.11436  [pdf, other

    q-bio.NC cs.CV cs.LG

    Layerwise complexity-matched learning yields an improved model of cortical area V2

    Authors: Nikhil Parthasarathy, Olivier J. Hénaff, Eero P. Simoncelli

    Abstract: Human ability to recognize complex visual patterns arises through transformations performed by successive areas in the ventral visual cortex. Deep neural networks trained end-to-end for object recognition approach human capabilities, and offer the best descriptions to date of neural responses in the late stages of the hierarchy. But these networks provide a poor account of the early stages, compar… ▽ More

    Submitted 3 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 28 pages, 12 figures

  5. arXiv:2312.05328  [pdf, other

    cs.AI

    Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding

    Authors: Talfan Evans, Shreya Pathak, Hamza Merzic, Jonathan Schwarz, Ryutaro Tanno, Olivier J. Henaff

    Abstract: Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow. Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples. Despite their appeal, these methods have yet to be widely adopted since no one algorithm has been shown to a) generalize across models and tasks b) scale to large datasets and c) yield over… ▽ More

    Submitted 14 February, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: Technical report

  6. arXiv:2310.17653  [pdf, other

    cs.LG cs.CV

    Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model

    Authors: Karsten Roth, Lukas Thede, Almut Sophia Koepke, Oriol Vinyals, Olivier Hénaff, Zeynep Akata

    Abstract: Training deep networks requires various design decisions regarding for instance their architecture, data augmentation, or optimization. In this work, we find these training variations to result in networks learning unique feature sets from the data. Using public model libraries comprising thousands of models trained on canonical datasets like ImageNet, we observe that for arbitrary pairings of pre… ▽ More

    Submitted 26 February, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 (spotlight)

  7. arXiv:2306.01667  [pdf, other

    cs.CV

    Towards In-context Scene Understanding

    Authors: Ivana Balažević, David Steiner, Nikhil Parthasarathy, Relja Arandjelović, Olivier J. Hénaff

    Abstract: In-context learning$\unicode{x2013}$the ability to configure a model's behavior with different prompts$\unicode{x2013}$has revolutionized the field of natural language processing, alleviating the need for task-specific models and paving the way for generalist models capable of assisting with any query. Computer vision, in contrast, has largely stayed in the former regime: specialized decoders and… ▽ More

    Submitted 31 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

  8. arXiv:2303.13518  [pdf, other

    cs.CV cs.AI cs.LG

    Three ways to improve feature alignment for open vocabulary detection

    Authors: Relja Arandjelović, Alex Andonian, Arthur Mensch, Olivier J. Hénaff, Jean-Baptiste Alayrac, Andrew Zisserman

    Abstract: The core problem in zero-shot open vocabulary detection is how to align visual and text features, so that the detector performs well on unseen classes. Previous approaches train the feature pyramid and detection head from scratch, which breaks the vision-text feature alignment established during pretraining, and struggles to prevent the language model from forgetting unseen classes. We propose t… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

  9. arXiv:2210.06433  [pdf, other

    cs.CV cs.AI cs.LG

    Self-supervised video pretraining yields human-aligned visual representations

    Authors: Nikhil Parthasarathy, S. M. Ali Eslami, João Carreira, Olivier J. Hénaff

    Abstract: Humans learn powerful representations of objects and scenes by observing how they evolve over time. Yet, outside of specific tasks that require explicit temporal understanding, static image pretraining remains the dominant paradigm for learning visual foundation models. We question this mismatch, and ask whether video pretraining can yield visual representations that bear the hallmarks of human pe… ▽ More

    Submitted 25 July, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Technical report

  10. arXiv:2209.15589  [pdf, other

    cs.CV cs.LG

    Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods

    Authors: Skanda Koppula, Yazhe Li, Evan Shelhamer, Andrew Jaegle, Nikhil Parthasarathy, Relja Arandjelovic, João Carreira, Olivier Hénaff

    Abstract: Self-supervised methods have achieved remarkable success in transfer learning, often achieving the same or better accuracy than supervised pre-training. Most prior work has done so by increasing pre-training computation by adding complex data augmentation, multiple views, or lengthy training schedules. In this work, we investigate a related, but orthogonal question: given a fixed FLOP budget, what… ▽ More

    Submitted 18 October, 2022; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: 11 pages. 36th Conference on Neural Information Processing Systems, Workshop on Self-Supervised Learning (2022)

  11. arXiv:2203.08777  [pdf, other

    cs.CV cs.AI cs.LG

    Object discovery and representation networks

    Authors: Olivier J. Hénaff, Skanda Koppula, Evan Shelhamer, Daniel Zoran, Andrew Jaegle, Andrew Zisserman, João Carreira, Relja Arandjelović

    Abstract: The promise of self-supervised learning (SSL) is to leverage large amounts of unlabeled data to solve complex tasks. While there has been excellent progress with simple, image-level learning, recent methods have shown the advantage of including knowledge of image structure. However, by introducing hand-crafted image segmentations to define regions of interest, or specialized augmentation strategie… ▽ More

    Submitted 27 July, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: European Conference on Computer Vision (ECCV) 2022

  12. arXiv:2202.10890  [pdf, other

    cs.CV

    HiP: Hierarchical Perceiver

    Authors: Joao Carreira, Skanda Koppula, Daniel Zoran, Adria Recasens, Catalin Ionescu, Olivier Henaff, Evan Shelhamer, Relja Arandjelovic, Matt Botvinick, Oriol Vinyals, Karen Simonyan, Andrew Zisserman, Andrew Jaegle

    Abstract: General perception systems such as Perceivers can process arbitrary modalities in any combination and are able to handle up to a few hundred thousand inputs. They achieve this generality by using exclusively global attention operations. This however hinders them from scaling up to the inputs sizes required to process raw high-resolution images or video. In this paper, we show that some degree of l… ▽ More

    Submitted 3 November, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

  13. arXiv:2107.14795  [pdf, other

    cs.LG cs.CL cs.CV cs.SD eess.AS

    Perceiver IO: A General Architecture for Structured Inputs & Outputs

    Authors: Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joāo Carreira

    Abstract: A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data f… ▽ More

    Submitted 15 March, 2022; v1 submitted 30 July, 2021; originally announced July 2021.

    Comments: ICLR 2022 camera ready. Code: https://dpmd.ai/perceiver-code

  14. arXiv:2105.08054  [pdf, other

    cs.CV

    Divide and Contrast: Self-supervised Learning from Uncurated Data

    Authors: Yonglong Tian, Olivier J. Henaff, Aaron van den Oord

    Abstract: Self-supervised learning holds promise in leveraging large amounts of unlabeled data, however much of its progress has thus far been limited to highly curated pre-training data such as ImageNet. We explore the effects of contrastive learning from larger, less-curated image datasets such as YFCC, and find there is indeed a large difference in the resulting representation quality. We hypothesize tha… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  15. arXiv:2103.10957  [pdf, other

    cs.CV

    Efficient Visual Pretraining with Contrastive Detection

    Authors: Olivier J. Hénaff, Skanda Koppula, Jean-Baptiste Alayrac, Aaron van den Oord, Oriol Vinyals, João Carreira

    Abstract: Self-supervised pretraining has been shown to yield powerful representations for transfer learning. These performance gains come at a large computational cost however, with state-of-the-art methods requiring an order of magnitude more computation than supervised pretraining. We tackle this computational bottleneck by introducing a new self-supervised objective, contrastive detection, which tasks r… ▽ More

    Submitted 5 August, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

    Comments: Technical report

  16. arXiv:2006.07159  [pdf, other

    cs.CV cs.LG

    Are we done with ImageNet?

    Authors: Lucas Beyer, Olivier J. Hénaff, Alexander Kolesnikov, Xiaohua Zhai, Aäron van den Oord

    Abstract: Yes, and no. We ask whether recent progress on the ImageNet classification benchmark continues to represent meaningful generalization, or whether the community has started to overfit to the idiosyncrasies of its labeling procedure. We therefore develop a significantly more robust procedure for collecting human annotations of the ImageNet validation set. Using these new labels, we reassess the accu… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: All five authors contributed equally. New labels at https://github.com/google-research/reassessed-imagenet

  17. arXiv:1905.09272  [pdf, other

    cs.CV cs.LG

    Data-Efficient Image Recognition with Contrastive Predictive Coding

    Authors: Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord

    Abstract: Human observers can learn to recognize new categories of images from a handful of examples, yet doing so with artificial ones remains an open challenge. We hypothesize that data-efficient recognition is enabled by representations which make the variability in natural signals more predictable. We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning suc… ▽ More

    Submitted 1 July, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

  18. arXiv:1511.06394  [pdf, other

    cs.CV cs.LG

    Geodesics of learned representations

    Authors: Olivier J. Hénaff, Eero P. Simoncelli

    Abstract: We develop a new method for visualizing and refining the invariances of learned representations. Specifically, we test for a general form of invariance, linearization, in which the action of a transformation is confined to a low-dimensional subspace. Given two reference images (typically, differing by some transformation), we synthesize a sequence of images lying on a path between them that is of… ▽ More

    Submitted 22 February, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Published as a conference paper at ICLR 2016

    Journal ref: Presented at: Int'l Conf on Learning Representations (ICLR), San Juan, Puerto Rico, May 2016

  19. arXiv:1412.6626  [pdf, other

    cs.CV

    The local low-dimensionality of natural images

    Authors: Olivier J. Hénaff, Johannes Ballé, Neil C. Rabinowitz, Eero P. Simoncelli

    Abstract: We develop a new statistical model for photographic images, in which the local responses of a bank of linear filters are described as jointly Gaussian, with zero mean and a covariance that varies slowly over spatial position. We optimize sets of filters so as to minimize the nuclear norms of matrices of their local activations (i.e., the sum of the singular values), thus encouraging a flexible for… ▽ More

    Submitted 23 March, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

    Comments: Published as conference paper at ICLR 2015