Skip to main content

Showing 1–3 of 3 results for author: KV, G

.
  1. arXiv:2108.12585  [pdf, other

    cs.CV

    On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering

    Authors: Gouthaman KV, Anurag Mittal

    Abstract: Generalizing beyond the experiences has a significant role in develo** practical AI systems. It has been shown that current Visual Question Answering (VQA) models are over-dependent on the language-priors (spurious correlations between question-types and their most frequent answers) from the train set and pose poor performance on Out-of-Distribution (OOD) test sets. This conduct limits their gen… ▽ More

    Submitted 21 December, 2021; v1 submitted 28 August, 2021; originally announced August 2021.

  2. Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks

    Authors: Gouthaman KV, Athira Nambiar, Kancheti Sai Srinivas, Anurag Mittal

    Abstract: Attention models are widely used in Vision-language (V-L) tasks to perform the visual-textual correlation. Humans perform such a correlation with a strong linguistic understanding of the visual world. However, even the best performing attention model in V-L tasks lacks such a high-level linguistic understanding, thus creating a semantic gap between the modalities. In this paper, we propose an atte… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Journal ref: Pattern Recognition, 2021

  3. arXiv:2007.06198  [pdf, other

    cs.CV cs.CL

    Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder

    Authors: Gouthaman KV, Anurag Mittal

    Abstract: Recent studies have shown that current VQA models are heavily biased on the language priors in the train set to answer the question, irrespective of the image. E.g., overwhelmingly answer "what sport is" as "tennis" or "what color banana" as "yellow." This behavior restricts them from real-world application scenarios. In this work, we propose a novel model-agnostic question encoder, Visually-Groun… ▽ More

    Submitted 18 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: ECCV 2020