Skip to main content

Showing 1–11 of 11 results for author: Antipov, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2202.06858  [pdf, other

    cs.CV

    An experimental study of the vision-bottleneck in VQA

    Authors: Pierre Marza, Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf

    Abstract: As in many tasks combining vision and language, both modalities play a crucial role in Visual Question Answering (VQA). To properly solve the task, a given model should both understand the content of the proposed image and the nature of the question. While the fusion between modalities, which is another obviously important part of the problem, has been highly studied, the vision part has received… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  2. arXiv:2112.12572  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Are E2E ASR models ready for an industrial usage?

    Authors: Valentin Vielzeuf, Grigory Antipov

    Abstract: The Automated Speech Recognition (ASR) community experiences a major turning point with the rise of the fully-neural (End-to-End, E2E) approaches. At the same time, the conventional hybrid model remains the standard choice for the practical usage of ASR. According to previous studies, the adoption of E2E ASR in real-world applications was hindered by two main limitations: their ability to generali… ▽ More

    Submitted 21 October, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

  3. arXiv:2106.05597  [pdf, other

    cs.CV cs.LG

    Supervising the Transfer of Reasoning Patterns in VQA

    Authors: Corentin Kervadec, Christian Wolf, Grigory Antipov, Moez Baccouche, Madiha Nadri

    Abstract: Methods for Visual Question Anwering (VQA) are notorious for leveraging dataset biases rather than performing reasoning, hindering generalization. It has been recently shown that better reasoning patterns emerge in attention layers of a state-of-the-art VQA model when they are trained on perfect (oracle) visual inputs. This provides evidence that deep neural networks can learn to reason when train… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  4. arXiv:2104.03656  [pdf, other

    cs.CV

    How Transferable are Reasoning Patterns in VQA?

    Authors: Corentin Kervadec, Theo Jaunet, Grigory Antipov, Moez Baccouche, Romain Vuillemot, Christian Wolf

    Abstract: Since its inception, Visual Question Answering (VQA) is notoriously known as a task, where models are prone to exploit biases in datasets to find shortcuts instead of performing high-level reasoning. Classical methods address this by removing biases from training data, or adding branches to models to detect and remove biases. In this paper, we argue that uncertainty in vision is a dominating facto… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

  5. arXiv:2104.00926  [pdf, other

    cs.CV cs.HC

    VisQA: X-raying Vision and Language Reasoning in Transformers

    Authors: Theo Jaunet, Corentin Kervadec, Romain Vuillemot, Grigory Antipov, Moez Baccouche, Christian Wolf

    Abstract: Visual Question Answering systems target answering open-ended textual questions given input images. They are a testbed for learning high-level reasoning with a primary use in HCI, for instance assistance for the visually impaired. Recent research has shown that state-of-the-art models tend to produce answers exploiting biases and shortcuts in the training data, and sometimes do not even look at th… ▽ More

    Submitted 20 July, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  6. arXiv:2008.05889  [pdf, other

    eess.AS cs.MM cs.SD

    Automatic Quality Assessment for Audio-Visual Verification Systems. The LOVe submission to NIST SRE Challenge 2019

    Authors: Grigory Antipov, Nicolas Gengembre, Olivier Le Blouch, Gaƫl Le Lan

    Abstract: Fusion of scores is a cornerstone of multimodal biometric systems composed of independent unimodal parts. In this work, we focus on quality-dependent fusion for speaker-face verification. To this end, we propose a universal model which can be trained for automatic quality assessment of both face and speaker modalities. This model estimates the quality of representations produced by unimodal system… ▽ More

    Submitted 14 August, 2020; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: 5 pages, 1 figure, accepted at INTERSPEECH 2020. Corrected the reference [20]

  7. arXiv:2006.05726  [pdf, other

    cs.CV cs.CL

    Estimating semantic structure for the VQA answer space

    Authors: Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf

    Abstract: Since its appearance, Visual Question Answering (VQA, i.e. answering a question posed over an image), has always been treated as a classification problem over a set of predefined answers. Despite its convenience, this classification approach poorly reflects the semantics of the problem limiting the answering to a choice between independent proposals, without taking into account the similarity betw… ▽ More

    Submitted 8 April, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: [WARNING] We want to notice the reader that additional experiments (not in the paper) have shown that using a `random' semantic space performs as much as the proposed semantic loss. This additional result question the effectiveness of our method

  8. arXiv:2006.05121  [pdf, other

    cs.CV

    Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?

    Authors: Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf

    Abstract: Models for Visual Question Answering (VQA) are notorious for their tendency to rely on dataset biases, as the large and unbalanced diversity of questions and concepts involved and tends to prevent models from learning to reason, leading them to perform educated guesses instead. In this paper, we claim that the standard evaluation metric, which consists in measuring the overall in-domain accuracy,… ▽ More

    Submitted 7 April, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  9. arXiv:1912.03063  [pdf, other

    cs.CV cs.CL cs.LG cs.NE

    Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks

    Authors: Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf

    Abstract: The large adoption of the self-attention (i.e. transformer model) and BERT-like training principles has recently resulted in a number of high performing models on a large panoply of vision-and-language problems (such as Visual Question Answering (VQA), image retrieval, etc.). In this paper we claim that these State-Of-The-Art (SOTA) approaches perform reasonably well in structuring information ins… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

  10. arXiv:1907.03697  [pdf, other

    eess.IV cs.CV cs.LG

    Prediction of Soil Moisture Content Based On Satellite Data and Sequence-to-Sequence Networks

    Authors: Natalia Efremova, Dmitry Zausaev, Gleb Antipov

    Abstract: The main objective of this study is to combine remote sensing and machine learning to detect soil moisture content. Growing population and food consumption has led to the need to improve agricultural yield and to reduce wastage of natural resources. In this paper, we propose a neural network architecture, based on recent work by the research community, that can make a strong social impact and aid… ▽ More

    Submitted 5 June, 2019; originally announced July 2019.

    Comments: Presented on NeurIPS 2018 WiML workshop

  11. arXiv:1702.01983  [pdf, other

    cs.CV

    Face Aging With Conditional Generative Adversarial Networks

    Authors: Grigory Antipov, Moez Baccouche, Jean-Luc Dugelay

    Abstract: It has been recently shown that Generative Adversarial Networks (GANs) can produce synthetic images of exceptional visual fidelity. In this work, we propose the GAN-based method for automatic face aging. Contrary to previous works employing GANs for altering of facial attributes, we make a particular emphasize on preserving the original person's identity in the aged version of his/her face. To thi… ▽ More

    Submitted 30 May, 2017; v1 submitted 7 February, 2017; originally announced February 2017.

    Comments: 5 pages, 3 figures, accepted at ICIP 2017. With respect to v1: (1) changed the abbreviation of the main model from "acGAN" to "Age-cGAN" in order to avoid confusion with "Auxiliary Classifier Generative Adversarial Networks" introduced by Odena et al.; (2) corrected a typo in Formula 1