Skip to main content

Showing 1–12 of 12 results for author: Pahuja, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.00608  [pdf, other

    cs.CV cs.AI

    Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs

    Authors: Vardaan Pahuja, Weidi Luo, Yu Gu, Cheng-Hao Tu, Hong-You Chen, Tanya Berger-Wolf, Charles Stewart, Song Gao, Wei-Lun Chao, Yu Su

    Abstract: Camera traps are valuable tools in animal ecology for biodiversity monitoring and conservation. However, challenges like poor generalization to deployment at new unseen locations limit their practical application. Images are naturally associated with heterogeneous forms of context possibly in different modalities. In this work, we leverage the structured context associated with the camera trap ima… ▽ More

    Submitted 22 June, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: 14 pages, 5 figures

  2. arXiv:2311.01420  [pdf, other

    cs.LG

    Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data

    Authors: Cheng-Hao Tu, Hong-You Chen, Zheda Mai, Jike Zhong, Vardaan Pahuja, Tanya Berger-Wolf, Song Gao, Charles Stewart, Yu Su, Wei-Lun Chao

    Abstract: We propose a learning problem involving adapting a pre-trained source model to the target domain for classifying all classes that appeared in the source data, using target data that covers only a partial label space. This problem is practical, as it is unrealistic for the target end-users to collect data for all classes prior to adaptation. However, it has received limited attention in the literat… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023 main track

  3. arXiv:2306.03421  [pdf, other

    cs.CV

    Diversifying Joint Vision-Language Tokenization Learning

    Authors: Vardaan Pahuja, AJ Piergiovanni, Anelia Angelova

    Abstract: Building joint representations across images and text is an essential step for tasks such as Visual Question Answering and Video Question Answering. In this work, we find that the representations must not only jointly capture features from both modalities but should also be diverse for better generalization performance. To this end, we propose joint vision-language representation learning by diver… ▽ More

    Submitted 15 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Accepted to Transformers for Vision (T4V) workshop, CVPR 2023; 7 pages, 5 figures

  4. A Retrieve-and-Read Framework for Knowledge Graph Link Prediction

    Authors: Vardaan Pahuja, Boshi Wang, Hugo Latapie, Jayanth Srinivasa, Yu Su

    Abstract: Knowledge graph (KG) link prediction aims to infer new facts based on existing facts in the KG. Recent studies have shown that using the graph neighborhood of a node via graph neural networks (GNNs) provides more useful information compared to just using the query information. Conventional GNNs for KG link prediction follow the standard message-passing paradigm on the entire KG, which leads to sup… ▽ More

    Submitted 22 October, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted to CIKM'23; Published version DOI: https://doi.org/10.1145/3583780.3614769 ;12 pages, 4 figures

    Journal ref: CIKM (2023) 1992-2002

  5. arXiv:2209.04994  [pdf, other

    cs.CL cs.AI

    Knowledge Base Question Answering: A Semantic Parsing Perspective

    Authors: Yu Gu, Vardaan Pahuja, Gong Cheng, Yu Su

    Abstract: Recent advances in deep learning have greatly propelled the research on semantic parsing. Improvement has since been made in many downstream tasks, including natural language interface to web APIs, text-to-SQL generation, among others. However, despite the close connection shared with these tasks, research on question answering over knowledge bases (KBQA) has comparatively been progressing slowly.… ▽ More

    Submitted 23 October, 2022; v1 submitted 11 September, 2022; originally announced September 2022.

    Comments: 19 pages, 3 figures; accepted to AKBC'22

    ACM Class: I.2.7

  6. arXiv:2106.01586  [pdf, other

    cs.CL cs.LG

    A Systematic Investigation of KB-Text Embedding Alignment at Scale

    Authors: Vardaan Pahuja, Yu Gu, Wenhu Chen, Mehdi Bahrami, Lei Liu, Wei-Peng Chen, Yu Su

    Abstract: Knowledge bases (KBs) and text often contain complementary knowledge: KBs store structured knowledge that can support long range reasoning, while text stores more comprehensive and timely knowledge in an unstructured way. Separately embedding the individual knowledge sources into vector spaces has demonstrated tremendous successes in encoding the respective knowledge, but how to jointly embed and… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: Accepted to ACL-IJCNLP 2021. 11 pages, 2 figures

  7. arXiv:1909.09192  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Learning Sparse Mixture of Experts for Visual Question Answering

    Authors: Vardaan Pahuja, Jie Fu, Christopher J. Pal

    Abstract: There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question Answering (VQA). A Convolutional Neural Network (CNN) is an integral part of the visu… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: Accepted in Visual Question Answering and Dialog Workshop, CVPR 2019

  8. Structure Learning for Neural Module Networks

    Authors: Vardaan Pahuja, Jie Fu, Sarath Chandar, Christopher J. Pal

    Abstract: Neural Module Networks, originally proposed for the task of visual question answering, are a class of neural network architectures that involve human-specified neural modules, each designed for a specific form of reasoning. In current formulations of such networks only the parameters of the neural modules and/or the order of their execution is learned. In this work, we further expand this approach… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

  9. arXiv:1805.11016  [pdf, other

    cs.LG stat.ML

    Memory Augmented Self-Play

    Authors: Shagun Sodhani, Vardaan Pahuja

    Abstract: Self-play is an unsupervised training procedure which enables the reinforcement learning agents to explore the environment without requiring any external rewards. We augment the self-play setting by providing an external memory where the agent can store experience from the previous tasks. This enables the agent to come up with more diverse self-play tasks resulting in faster exploration of the env… ▽ More

    Submitted 31 May, 2018; v1 submitted 28 May, 2018; originally announced May 2018.

  10. arXiv:1805.08174  [pdf, other

    cs.CV cs.CL

    Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering"

    Authors: Shagun Sodhani, Vardaan Pahuja

    Abstract: This is the reproducibility report for the paper "Learning To Count Objects In Natural Images For Visual QuestionAnswering"

    Submitted 21 May, 2018; originally announced May 2018.

    Comments: Submitted to Reproducibility in ML Workshop, ICML'18

  11. arXiv:1801.10314  [pdf, other

    cs.CL

    Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph

    Authors: Amrita Saha, Vardaan Pahuja, Mitesh M. Khapra, Karthik Sankaranarayanan, Sarath Chandar

    Abstract: While conversing with chatbots, humans typically tend to ask many questions, a significant portion of which can be answered by referring to large-scale knowledge graphs (KG). While Question Answering (QA) and dialog systems have been studied independently, there is a need to study them closely to evaluate such real-world scenarios faced by bots involving both these tasks. Towards this end, we intr… ▽ More

    Submitted 4 October, 2018; v1 submitted 31 January, 2018; originally announced January 2018.

    Comments: Accepted in AAAI'18

  12. arXiv:1703.04650  [pdf, other

    cs.CL

    Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks

    Authors: Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas Raykar, Lili Kotlerman, Guy Lev

    Abstract: The stream of words produced by Automatic Speech Recognition (ASR) systems is typically devoid of punctuations and formatting. Most natural language processing applications expect segmented and well-formatted texts as input, which is not available in ASR output. This paper proposes a novel technique of jointly modeling multiple correlated tasks such as punctuation and capitalization using bidirect… ▽ More

    Submitted 18 July, 2017; v1 submitted 14 March, 2017; originally announced March 2017.

    Comments: Accepted in Interspeech 2017