Skip to main content

Showing 1–16 of 16 results for author: Ramé, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16768  [pdf, other

    cs.LG cs.AI

    WARP: On the Benefits of Weight Averaged Rewarded Policies

    Authors: Alexandre Ramé, Johan Ferret, Nino Vieillard, Robert Dadashi, Léonard Hussenot, Pierre-Louis Cedoz, Pier Giuseppe Sessa, Sertan Girgin, Arthur Douillard, Olivier Bachem

    Abstract: Reinforcement learning from human feedback (RLHF) aligns large language models (LLMs) by encouraging their generations to have high rewards, using a reward model trained on human preferences. To prevent the forgetting of pre-trained knowledge, RLHF usually incorporates a KL regularization; this forces the policy to remain close to its supervised fine-tuned initialization, though it hinders the rew… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 11 main pages (34 pages with Appendix)

  2. arXiv:2402.04792  [pdf, other

    cs.AI cs.CL cs.HC

    Direct Language Model Alignment from Online AI Feedback

    Authors: Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi Liu, Misha Khalman, Felipe Llinares, Alexandre Rame, Thomas Mesnard, Yao Zhao, Bilal Piot, Johan Ferret, Mathieu Blondel

    Abstract: Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datasets used in DAP methods are usually collected ahead of training and never updated, thus the feedback is purely offline. Moreover, responses in these datasets are… ▽ More

    Submitted 29 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 18 pages, 9 figures, 4 tables

  3. arXiv:2401.12187  [pdf, other

    cs.LG cs.AI cs.CL

    WARM: On the Benefits of Weight Averaged Reward Models

    Authors: Alexandre Ramé, Nino Vieillard, Léonard Hussenot, Robert Dadashi, Geoffrey Cideron, Olivier Bachem, Johan Ferret

    Abstract: Aligning large language models (LLMs) with human preferences through reinforcement learning (RLHF) can lead to reward hacking, where LLMs exploit failures in the reward model (RM) to achieve seemingly high rewards without meeting the underlying objectives. We identify two primary challenges when designing RMs to mitigate reward hacking: distribution shifts during the RL process and inconsistencies… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 14 pages, 9 figures

  4. arXiv:2310.00647  [pdf, other

    cs.CV cs.MM

    Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

    Authors: Mustafa Shukor, Alexandre Rame, Corentin Dancette, Matthieu Cord

    Abstract: Following the success of Large Language Models (LLMs), Large Multimodal Models (LMMs), such as the Flamingo model and its subsequent competitors, have started to emerge as natural steps towards generalist agents. However, interacting with recent LMMs reveals major limitations that are hardly captured by the current evaluation benchmarks. Indeed, task performances (e.g., VQA accuracy) alone do not… ▽ More

    Submitted 22 January, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Project Page: https://evalign-icl.github.io/

  5. arXiv:2307.16184  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    UnIVAL: Unified Model for Image, Video, Audio and Language Tasks

    Authors: Mustafa Shukor, Corentin Dancette, Alexandre Rame, Matthieu Cord

    Abstract: Large Language Models (LLMs) have made the ambitious quest for generalist agents significantly far from being a fantasy. A key hurdle for building such general models is the diversity and heterogeneity of tasks and modalities. A promising solution is unification, allowing the support of a myriad of tasks and modalities within one unified framework. While few large models (e.g., Flamingo (Alayrac e… ▽ More

    Submitted 22 December, 2023; v1 submitted 30 July, 2023; originally announced July 2023.

    Comments: Accepted at TMLR 2023. 40 pages. Project page: https://unival-model.github.io/

  6. arXiv:2306.04488  [pdf, other

    cs.LG cs.AI cs.CV

    Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

    Authors: Alexandre Ramé, Guillaume Couairon, Mustafa Shukor, Corentin Dancette, Jean-Baptiste Gaya, Laure Soulier, Matthieu Cord

    Abstract: Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfections in the proxy reward may hinder the training and lead to suboptimal results; the diversity of objectives in real-world tasks and human opinions exacerbate th… ▽ More

    Submitted 16 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

  7. arXiv:2212.10445  [pdf, other

    cs.LG cs.AI cs.CV

    Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization

    Authors: Alexandre Ramé, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, Léon Bottou, David Lopez-Paz

    Abstract: Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: from a pre-trained foundation model, they fine-tune the weights on the target task of interest. So, the Internet is swarmed by a handful of foundation models fine-tuned on many diverse tasks: these individual fine-tunings exist in isolation without ben… ▽ More

    Submitted 9 August, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: 24 pages, 10 tables, 21 figures

  8. arXiv:2205.10139  [pdf, other

    cs.LG

    Towards efficient feature sharing in MIMO architectures

    Authors: Rémy Sun, Alexandre Ramé, Clément Masson, Nicolas Thome, Matthieu Cord

    Abstract: Multi-input multi-output architectures propose to train multiple subnetworks within one base network and then average the subnetwork predictions to benefit from ensembling for free. Despite some relative success, these architectures are wasteful in their use of parameters. Indeed, we highlight in this paper that the learned subnetwork fail to share even generic features which limits their applicab… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: 7 pages, 6 figures, 1 table

  9. arXiv:2205.09739  [pdf, other

    cs.CV cs.AI cs.LG

    Diverse Weight Averaging for Out-of-Distribution Generalization

    Authors: Alexandre Ramé, Matthieu Kirchmeyer, Thibaud Rahier, Alain Rakotomamonjy, Patrick Gallinari, Matthieu Cord

    Abstract: Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strategies were shown to perform best on the competitive DomainBed benchmark; they directly average the weights of multiple networks despite their nonlinearities. In t… ▽ More

    Submitted 27 January, 2023; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: 36 pages, 16 figures, 15 tables

  10. arXiv:2111.11326  [pdf, other

    cs.CV cs.LG

    DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

    Authors: Arthur Douillard, Alexandre Ramé, Guillaume Couairon, Matthieu Cord

    Abstract: Deep network architectures struggle to continually learn new tasks without forgetting the previous tasks. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting efficiently in continual learning. However, existing approaches often require a task identifier at test-time, need complex tuning to balance the growing number of para… ▽ More

    Submitted 7 August, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: CVPR 2022, Code at https://github.com/arthurdouillard/dytox

  11. arXiv:2109.02934  [pdf, other

    cs.LG cs.AI cs.CV

    Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization

    Authors: Alexandre Rame, Corentin Dancette, Matthieu Cord

    Abstract: Learning robust models that generalize well under changes in the data distribution is critical for real-world applications. To this end, there has been a growing surge of interest to learn simultaneously from multiple training domains - while enforcing different types of invariance across those domains. Yet, all existing approaches fail to show systematic benefits under controlled evaluation proto… ▽ More

    Submitted 1 June, 2022; v1 submitted 7 September, 2021; originally announced September 2021.

    Comments: 31 pages, 14 tables, 7 figures

    Journal ref: ICML 2022

  12. arXiv:2103.06132  [pdf, other

    cs.LG cs.AI cs.CV

    MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks

    Authors: Alexandre Rame, Remy Sun, Matthieu Cord

    Abstract: Recent strategies achieved ensembling "for free" by fitting concurrently diverse subnetworks inside a single base network. The main idea during training is that each subnetwork learns to classify only one of the multiple inputs simultaneously provided. However, the question of how to best mix these multiple inputs has not been studied so far. In this paper, we introduce MixMo, a new generalized fr… ▽ More

    Submitted 24 August, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: 8 pages, 10 figures, 6 tables

  13. arXiv:2101.05544  [pdf, other

    cs.LG cs.CV cs.IT

    DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation

    Authors: Alexandre Rame, Matthieu Cord

    Abstract: Deep ensembles perform better than a single network thanks to the diversity among their members. Recent approaches regularize predictions to increase diversity; however, they also drastically decrease individual members' performances. In this paper, we argue that learning strategies for deep ensembles need to tackle the trade-off between ensemble diversity and individual accuracies. Motivated by a… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: Published as a conference paper at ICLR 2021. 9 main pages, 13 figures, 12 tables

  14. arXiv:2010.02849  [pdf, other

    cs.CV cs.LG

    CoRe: Color Regression for Multicolor Fashion Garments

    Authors: Alexandre Rame, Arthur Douillard, Charles Ollion

    Abstract: Develo** deep networks that analyze fashion garments has many real-world applications. Among all fashion attributes, color is one of the most important yet challenging to detect. Existing approaches are classification-based and thus cannot go beyond the list of discrete predefined color names. In this paper, we handle color detection as a regression problem to predict the exact RGB values. That'… ▽ More

    Submitted 31 May, 2022; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: 6 pages,3 figures,1 table

    Journal ref: CVPR 2022, Workshop on Computer Vision for Fashion, Art, and Design

  15. arXiv:1812.02611  [pdf, other

    cs.CV

    OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation

    Authors: Alexandre Rame, Emilien Garreau, Hedi Ben-Younes, Charles Ollion

    Abstract: Object detectors tend to perform poorly in new or open domains, and require exhaustive yet costly annotations from fully labeled datasets. We aim at benefiting from several datasets with different categories but without additional labelling, not only to increase the number of categories detected, but also to take advantage from transfer learning and to enhance domain independence. Our dataset me… ▽ More

    Submitted 25 March, 2019; v1 submitted 6 December, 2018; originally announced December 2018.

    Comments: 9 pages, 5 figures, 4 tables

  16. Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction

    Authors: Charles Corbière, Hedi Ben-Younes, Alexandre Ramé, Charles Ollion

    Abstract: In this paper, we present a method to learn a visual representation adapted for e-commerce products. Based on weakly supervised learning, our model learns from noisy datasets crawled on e-commerce website catalogs and does not require any manual labeling. We show that our representation can be used for downward classification tasks over clothing categories with different levels of granularity. We… ▽ More

    Submitted 27 September, 2017; originally announced September 2017.

    Journal ref: 2017 IEEE International Conference on Computer Vision Workshop (ICCVW)