Skip to main content

Showing 1–12 of 12 results for author: Dancette, C

.
  1. arXiv:2405.14654  [pdf, other

    cs.CL cs.AI

    Efficient Medical Question Answering with Knowledge-Augmented Question Generation

    Authors: Julien Khlaut, Corentin Dancette, Elodie Ferreres, Alaedine Bennani, Paul Hérent, Pierre Manceron

    Abstract: In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain. Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind. In this work, we introduce a method to improve the proficiency of a small language model in the medi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted at the Clinical Natural Language Processing Workshop, NAACL 2024

  2. arXiv:2310.00647  [pdf, other

    cs.CV cs.MM

    Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

    Authors: Mustafa Shukor, Alexandre Rame, Corentin Dancette, Matthieu Cord

    Abstract: Following the success of Large Language Models (LLMs), Large Multimodal Models (LMMs), such as the Flamingo model and its subsequent competitors, have started to emerge as natural steps towards generalist agents. However, interacting with recent LMMs reveals major limitations that are hardly captured by the current evaluation benchmarks. Indeed, task performances (e.g., VQA accuracy) alone do not… ▽ More

    Submitted 22 January, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Project Page: https://evalign-icl.github.io/

  3. arXiv:2307.16184  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    UnIVAL: Unified Model for Image, Video, Audio and Language Tasks

    Authors: Mustafa Shukor, Corentin Dancette, Alexandre Rame, Matthieu Cord

    Abstract: Large Language Models (LLMs) have made the ambitious quest for generalist agents significantly far from being a fantasy. A key hurdle for building such general models is the diversity and heterogeneity of tasks and modalities. A promising solution is unification, allowing the support of a myriad of tasks and modalities within one unified framework. While few large models (e.g., Flamingo (Alayrac e… ▽ More

    Submitted 22 December, 2023; v1 submitted 30 July, 2023; originally announced July 2023.

    Comments: Accepted at TMLR 2023. 40 pages. Project page: https://unival-model.github.io/

  4. arXiv:2306.08751  [pdf, other

    cs.CV

    Improving Selective Visual Question Answering by Learning from Your Peers

    Authors: Corentin Dancette, Spencer Whitehead, Rishabh Maheshwary, Ramakrishna Vedantam, Stefan Scherer, Xinlei Chen, Matthieu Cord, Marcus Rohrbach

    Abstract: Despite advances in Visual Question Answering (VQA), the ability of models to assess their own correctness remains underexplored. Recent work has shown that VQA models, out-of-the-box, can have difficulties abstaining from answering when they are wrong. The option to abstain, also called Selective Prediction, is highly relevant when deploying systems to users who must trust the system's output (e.… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: CVPR 2023. Code available here: https://github.com/facebookresearch/selective-vqa_ood

  5. arXiv:2306.04488  [pdf, other

    cs.LG cs.AI cs.CV

    Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

    Authors: Alexandre Ramé, Guillaume Couairon, Mustafa Shukor, Corentin Dancette, Jean-Baptiste Gaya, Laure Soulier, Matthieu Cord

    Abstract: Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfections in the proxy reward may hinder the training and lead to suboptimal results; the diversity of objectives in real-world tasks and human opinions exacerbate th… ▽ More

    Submitted 16 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

  6. arXiv:2303.11403  [pdf, other

    cs.CV cs.CL cs.LG

    eP-ALM: Efficient Perceptual Augmentation of Language Models

    Authors: Mustafa Shukor, Corentin Dancette, Matthieu Cord

    Abstract: Large Language Models (LLMs) have so far impressed the world, with unprecedented capabilities that emerge in models at large scales. On the vision side, transformer models (i.e., ViT) are following the same trend, achieving the best performance on challenging benchmarks. With the abundance of such unimodal models, a natural question arises; do we need also to follow this trend to tackle multimodal… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted at ICCV 2023. Project page: https://mshukor.github.io/eP-ALM.github.io/

  7. arXiv:2205.10873  [pdf, other

    cs.CV

    Dynamic Query Selection for Fast Visual Perceiver

    Authors: Corentin Dancette, Matthieu Cord

    Abstract: Transformers have been matching deep convolutional networks for vision architectures in recent works. Most work is focused on getting the best results on large-scale benchmarks, and scaling laws seem to be the most successful strategy: bigger models, more data, and longer training result in higher performance. However, the reduction of network complexity and inference time remains under-explored.… ▽ More

    Submitted 21 March, 2023; v1 submitted 22 May, 2022; originally announced May 2022.

    Comments: Accepted at the Transformer for Vision workshop, CVPR 2022

  8. arXiv:2109.02934  [pdf, other

    cs.LG cs.AI cs.CV

    Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization

    Authors: Alexandre Rame, Corentin Dancette, Matthieu Cord

    Abstract: Learning robust models that generalize well under changes in the data distribution is critical for real-world applications. To this end, there has been a growing surge of interest to learn simultaneously from multiple training domains - while enforcing different types of invariance across those domains. Yet, all existing approaches fail to show systematic benefits under controlled evaluation proto… ▽ More

    Submitted 1 June, 2022; v1 submitted 7 September, 2021; originally announced September 2021.

    Comments: 31 pages, 14 tables, 7 figures

    Journal ref: ICML 2022

  9. arXiv:2104.03149  [pdf, other

    cs.CV cs.AI

    Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

    Authors: Corentin Dancette, Remi Cadene, Damien Teney, Matthieu Cord

    Abstract: We introduce an evaluation methodology for visual question answering (VQA) to better diagnose cases of shortcut learning. These cases happen when a model exploits spurious statistical regularities to produce correct answers but does not actually deploy the desired behavior. There is a need to identify possible shortcuts in a dataset and assess their use before deploying a model in the real world.… ▽ More

    Submitted 1 September, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted at ICCV 2021. Code is available at https://github.com/cdancette/detect-shortcuts

  10. arXiv:2006.10079  [pdf, other

    cs.CV cs.CL cs.LG eess.IV

    Overcoming Statistical Shortcuts for Open-ended Visual Counting

    Authors: Corentin Dancette, Remi Cadene, Xinlei Chen, Matthieu Cord

    Abstract: Machine learning models tend to over-rely on statistical shortcuts. These spurious correlations between parts of the input and the output labels does not hold in real-world settings. We target this issue on the recent open-ended visual counting task which is well suited to study statistical shortcuts. We aim to develop models that learn a proper mechanism of counting regardless of the output label… ▽ More

    Submitted 1 July, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: 17 pages, 8 figures

  11. arXiv:1906.10169  [pdf, other

    cs.CV cs.CL cs.LG

    RUBi: Reducing Unimodal Biases in Visual Question Answering

    Authors: Remi Cadene, Corentin Dancette, Hedi Ben-younes, Matthieu Cord, Devi Parikh

    Abstract: Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUB… ▽ More

    Submitted 23 March, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019 http://papers.nips.cc/paper/8371-rubi-reducing-unimodal-biases-for-visual-question-answering

    Journal ref: Advances in Neural Information Processing Systems 2019 (pp. 839-850)

  12. arXiv:1804.11297  [pdf, other

    cs.CL cs.LG

    Sampling strategies in Siamese Networks for unsupervised speech representation learning

    Authors: Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz, Emmanuel Dupoux

    Abstract: Recent studies have investigated siamese network architectures for learning invariant speech representations using same-different side information at the word level. Here we investigate systematically an often ignored component of siamese networks: the sampling procedure (how pairs of same vs. different tokens are selected). We show that sampling strategies taking into account Zipf's Law, the dist… ▽ More

    Submitted 23 August, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

    Comments: Conference paper at Interspeech 2018