Skip to main content

Showing 1–5 of 5 results for author: Dharur, S

.
  1. arXiv:2406.09617  [pdf, other

    cs.CL cs.HC eess.AS

    Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

    Authors: Shruti Palaskar, Oggi Rudovic, Sameer Dharur, Florian Pesce, Gautam Krishna, Aswin Sivaraman, Jack Berkowitz, Ahmed Hussen Abdelaziz, Saurabh Adya, Ahmed Tewfik

    Abstract: Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2310.15261  [pdf, ps, other

    cs.SD cs.HC cs.LG eess.AS

    Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

    Authors: Gautam Krishna, Sameer Dharur, Oggi Rudovic, Pranay Dighe, Saurabh Adya, Ahmed Hussen Abdelaziz, Ahmed H Tewfik

    Abstract: Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 5 pages

  3. arXiv:2205.01652  [pdf, other

    cs.CV cs.AI

    Episodic Memory Question Answering

    Authors: Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh

    Abstract: Egocentric augmented reality devices such as wearable glasses passively capture visual data as a human wearer tours a home environment. We envision a scenario wherein the human communicates with an AI agent powering such a device by asking questions (e.g., where did you last see my keys?). In order to succeed at this task, the egocentric AI assistant must (1) construct semantically rich and effici… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: Published at CVPR 2022 (Oral presentation)

  4. arXiv:2106.14405  [pdf, other

    cs.LG cs.RO

    Habitat 2.0: Training Home Assistants to Rearrange their Habitat

    Authors: Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra

    Abstract: We introduce Habitat 2.0 (H2.0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios. We make comprehensive contributions to all levels of the embodied AI stack - data, simulation, and benchmark tasks. Specifically, we present: (i) ReplicaCAD: an artist-authored, annotated, reconfigurable 3D dataset of apartments (matching real spa… ▽ More

    Submitted 1 July, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

  5. arXiv:2010.10038  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency

    Authors: Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju

    Abstract: Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world -- they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong. These sub-questions pertain to lower level visual concepts in the image that models ideally should understand to be able to answer the… ▽ More

    Submitted 30 November, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: Accepted to the NeurIPS 2020 workshop on Interpretable Inductive Biases and Physically Structured Learning