Skip to main content

Showing 1–5 of 5 results for author: Lobo, N d V

.
  1. arXiv:2403.17223  [pdf

    cs.CV cs.AI cs.LG

    Co-Occurring of Object Detection and Identification towards unlabeled object discovery

    Authors: Binay Kumar Singh, Niels Da Vitoria Lobo

    Abstract: In this paper, we propose a novel deep learning based approach for identifying co-occurring objects in conjunction with base objects in multilabel object categories. Nowadays, with the advancement in computer vision based techniques we need to know about co-occurring objects with respect to base object for various purposes. The pipeline of the proposed work is composed of two stages: in the first… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 6 pages, 2 figures,

  2. arXiv:2207.02334  [pdf, other

    cs.CV

    Weakly Supervised Grounding for VQA in Vision-Language Transformers

    Authors: Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels Da Vitoria Lobo, Mubarak Shah

    Abstract: Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding. But most systems that show good performance of those tasks still rely on pre-trained object detectors during training, which limits their applicability to the object classes available for those detectors. To mitigate this li… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: To appear at ECCV 2022

  3. arXiv:2010.14095  [pdf, other

    cs.CV

    MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering

    Authors: Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah

    Abstract: We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities. Our approach benefits from processing multimodal data (video and text) adopting the BERT encodings individually and using a novel transformer-based fusion method to fuse them together. Our method decomposes the d… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Accepted at Findings of EMNLP 2020

  4. arXiv:2005.03804  [pdf, other

    cs.CV eess.IV

    Text Synopsis Generation for Egocentric Videos

    Authors: Aidean Sharghi, Niels da Vitoria Lobo, Mubarak Shah

    Abstract: Mass utilization of body-worn cameras has led to a huge corpus of available egocentric video. Existing video summarization algorithms can accelerate browsing such videos by selecting (visually) interesting shots from them. Nonetheless, since the system user still has to watch the summary videos, browsing large video databases remain a challenge. Hence, in this work, we propose to generate a textua… ▽ More

    Submitted 21 September, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: ICPR 2020

  5. arXiv:1501.00614  [pdf, other

    cs.CV

    Understanding Trajectory Behavior: A Motion Pattern Approach

    Authors: Mahdi M. Kalayeh, Stephen Mussmann, Alla Petrakova, Niels da Vitoria Lobo, Mubarak Shah

    Abstract: Mining the underlying patterns in gigantic and complex data is of great importance to data analysts. In this paper, we propose a motion pattern approach to mine frequent behaviors in trajectory data. Motion patterns, defined by a set of highly similar flow vector groups in a spatial locality, have been shown to be very effective in extracting dominant motion behaviors in video sequences. Inspired… ▽ More

    Submitted 3 January, 2015; originally announced January 2015.