Skip to main content

Showing 1–3 of 3 results for author: Penamakuri, A S

.
  1. arXiv:2306.16713  [pdf, other

    cs.CV cs.AI cs.LG

    Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering

    Authors: Abhirama Subramanyam Penamakuri, Manish Gupta, Mithun Das Gupta, Anand Mishra

    Abstract: We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context. For such a setting, a model must first retrieve relevant images from the pool and answer the question from these retrieved images. We refer to this problem as retrieval-based visual question answering (or RETVQA in short). The RETVQA is distinctively di… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: Accepted to IJCAI 2023

  2. arXiv:2211.12926  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification

    Authors: Nakul Sharma, Abhirama S. Penamakuri, Anand Mishra

    Abstract: In this paper, we study the problem of identifying logos of business brands in natural scenes in an open-set one-shot setting. This problem setup is significantly more challenging than traditionally-studied 'closed-set' and 'large-scale training samples per category' logo recognition settings. We propose a novel multi-view textual-visual encoding framework that encodes text appearing in the logos… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted to ICVGIP 2022

  3. arXiv:2210.08554  [pdf, other

    cs.CV cs.CL

    COFAR: Commonsense and Factual Reasoning in Image Search

    Authors: Prajwal Gatti, Abhirama Subramanyam Penamakuri, Revant Teotia, Anand Mishra, Shubhashis Sengupta, Roshni Ramnani

    Abstract: One characteristic that makes humans superior to modern artificially intelligent models is the ability to interpret images beyond what is visually apparent. Consider the following two natural language search queries - (i) "a queue of customers patiently waiting to buy ice cream" and (ii) "a queue of tourists going to see a famous Mughal architecture in India." Interpreting these queries requires o… ▽ More

    Submitted 16 October, 2022; originally announced October 2022.

    Comments: Accepted in AACL-IJCNLP 2022