Multimodal CLIP Inference for Meta-Few-Shot Image Classification
Authors:
Constance Ferragu,
Philomene Chagniot,
Vincent Coyette
Abstract:
In recent literature, few-shot classification has predominantly been defined by the N-way k-shot meta-learning problem. Models designed for this purpose are usually trained to excel on standard benchmarks following a restricted setup, excluding the use of external data. Given the recent advancements in large language and vision models, a question naturally arises: can these models directly perform…
▽ More
In recent literature, few-shot classification has predominantly been defined by the N-way k-shot meta-learning problem. Models designed for this purpose are usually trained to excel on standard benchmarks following a restricted setup, excluding the use of external data. Given the recent advancements in large language and vision models, a question naturally arises: can these models directly perform well on meta-few-shot learning benchmarks? Multimodal foundation models like CLIP, which learn a joint (image, text) embedding, are of particular interest. Indeed, multimodal training has proven to enhance model robustness, especially regarding ambiguities, a limitation frequently observed in the few-shot setup. This study demonstrates that combining modalities from CLIP's text and image encoders outperforms state-of-the-art meta-few-shot learners on widely adopted benchmarks, all without additional training. Our results confirm the potential and robustness of multimodal foundation models like CLIP and serve as a baseline for existing and future approaches leveraging such models.
△ Less
Submitted 26 March, 2024;
originally announced May 2024.
Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX
Authors:
Clément Bonnet,
Daniel Luo,
Donal Byrne,
Shikha Surana,
Sasha Abramowitz,
Paul Duckworth,
Vincent Coyette,
Laurence I. Midgley,
Elshadai Tegegn,
Tristan Kalloniatis,
Omayma Mahjoub,
Matthew Macfarlane,
Andries P. Smit,
Nathan Grinsztajn,
Raphael Boige,
Cemlyn N. Waters,
Mohamed A. Mimouni,
Ulrich A. Mbou Sob,
Ruan de Kock,
Siddarth Singh,
Daniel Furelos-Blanco,
Victor Le,
Arnu Pretorius,
Alexandre Laterre
Abstract:
Open-source reinforcement learning (RL) environments have played a crucial role in driving progress in the development of AI algorithms. In modern RL research, there is a need for simulated environments that are performant, scalable, and modular to enable their utilization in a wider range of potential real-world applications. Therefore, we present Jumanji, a suite of diverse RL environments speci…
▽ More
Open-source reinforcement learning (RL) environments have played a crucial role in driving progress in the development of AI algorithms. In modern RL research, there is a need for simulated environments that are performant, scalable, and modular to enable their utilization in a wider range of potential real-world applications. Therefore, we present Jumanji, a suite of diverse RL environments specifically designed to be fast, flexible, and scalable. Jumanji provides a suite of environments focusing on combinatorial problems frequently encountered in industry, as well as challenging general decision-making tasks. By leveraging the efficiency of JAX and hardware accelerators like GPUs and TPUs, Jumanji enables rapid iteration of research ideas and large-scale experimentation, ultimately empowering more capable agents. Unlike existing RL environment suites, Jumanji is highly customizable, allowing users to tailor the initial state distribution and problem complexity to their needs. Furthermore, we provide actor-critic baselines for each environment, accompanied by preliminary findings on scaling and generalization scenarios. Jumanji aims to set a new standard for speed, adaptability, and scalability of RL environments.
△ Less
Submitted 15 March, 2024; v1 submitted 16 June, 2023;
originally announced June 2023.