Skip to main content

Showing 1–4 of 4 results for author: Farazi, M R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2001.07059  [pdf, other

    cs.CV cs.CC

    Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models

    Authors: Moshiur R. Farazi, Salman H. Khan, Nick Barnes

    Abstract: Visual Question Answering (VQA) has emerged as a Visual Turing Test to validate the reasoning ability of AI agents. The pivot to existing VQA models is the joint embedding that is learned by combining the visual features from an image and the semantic features from a given question. Consequently, a large body of literature has focused on develo** complex joint embedding strategies coupled with v… ▽ More

    Submitted 20 January, 2020; originally announced January 2020.

  2. Question-Agnostic Attention for Visual Question Answering

    Authors: Moshiur R Farazi, Salman H Khan, Nick Barnes

    Abstract: Visual Question Answering (VQA) models employ attention mechanisms to discover image locations that are most relevant for answering a specific question. For this purpose, several multimodal fusion strategies have been proposed, ranging from relatively simple operations (e.g., linear sum) to more complex ones (e.g., Block). The resulting multimodal representations define an intermediate feature spa… ▽ More

    Submitted 5 September, 2020; v1 submitted 8 August, 2019; originally announced August 2019.

    Comments: To appear in the proceedings of International Conference on Pattern Recognition (ICPR) 2020

  3. arXiv:1811.12772  [pdf, other

    cs.CV

    From Known to the Unknown: Transferring Knowledge to Answer Questions about Novel Visual and Semantic Concepts

    Authors: Moshiur R Farazi, Salman H Khan, Nick Barnes

    Abstract: Current Visual Question Answering (VQA) systems can answer intelligent questions about `Known' visual content. However, their performance drops significantly when questions about visually and linguistically `Unknown' concepts are presented during inference (`Open-world' scenario). A practical VQA system should be able to deal with novel concepts in real world settings. To address this problem, we… ▽ More

    Submitted 30 November, 2018; originally announced November 2018.

  4. arXiv:1805.04247  [pdf, other

    cs.CV cs.AI cs.CL

    Reciprocal Attention Fusion for Visual Question Answering

    Authors: Moshiur R Farazi, Salman H Khan

    Abstract: Existing attention mechanisms either attend to local image grid or object level features for Visual Question Answering (VQA). Motivated by the observation that questions can relate to both object instances and their parts, we propose a novel attention mechanism that jointly considers reciprocal relationships between the two levels of visual details. The bottom-up attention thus generated is furthe… ▽ More

    Submitted 22 July, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

    Comments: To appear in the British Machine Vision Conference (BMVC), September 2018

    Journal ref: Proceedings of the British Machine Vision Conference (250) 2018