Analysis on Image Set Visual Question Answering
Authors:
Abhinav Khattar,
Aviral Joshi,
Har Simrat Singh,
Pulkit Goel,
Rohit Prakash Barnwal
Abstract:
We tackle the challenge of Visual Question Answering in multi-image setting for the ISVQA dataset. Traditional VQA tasks have focused on a single-image setting where the target answer is generated from a single image. Image set VQA, however, comprises of a set of images and requires finding connection between images, relate the objects across images based on these connections and generate a unifie…
▽ More
We tackle the challenge of Visual Question Answering in multi-image setting for the ISVQA dataset. Traditional VQA tasks have focused on a single-image setting where the target answer is generated from a single image. Image set VQA, however, comprises of a set of images and requires finding connection between images, relate the objects across images based on these connections and generate a unified answer. In this report, we work with 4 approaches in a bid to improve the performance on the task. We analyse and compare our results with three baseline models - LXMERT, HME-VideoQA and VisualBERT - and show that our approaches can provide a slight improvement over the baselines. In specific, we try to improve on the spatial awareness of the model and help the model identify color using enhanced pre-training, reduce language dependence using adversarial regularization, and improve counting using regression loss and graph based deduplication. We further delve into an in-depth analysis on the language bias in the ISVQA dataset and show how models trained on ISVQA implicitly learn to associate language more strongly with the final answer.
△ Less
Submitted 31 March, 2021;
originally announced April 2021.
PS-Sim: A Framework for Scalable Simulation of Participatory Sensing Data
Authors:
Rajesh P Barnwal,
Nirnay Ghosh,
Soumya K Ghosh,
Sajal K Das
Abstract:
Emergence of smartphone and the participatory sensing (PS) paradigm have paved the way for a new variant of pervasive computing. In PS, human user performs sensing tasks and generates notifications, typically in lieu of incentives. These notifications are real-time, large-volume, and multi-modal, which are eventually fused by the PS platform to generate a summary. One major limitation with PS is t…
▽ More
Emergence of smartphone and the participatory sensing (PS) paradigm have paved the way for a new variant of pervasive computing. In PS, human user performs sensing tasks and generates notifications, typically in lieu of incentives. These notifications are real-time, large-volume, and multi-modal, which are eventually fused by the PS platform to generate a summary. One major limitation with PS is the sparsity of notifications owing to lack of active participation, thus inhibiting large scale real-life experiments for the research community. On the flip side, research community always needs ground truth to validate the efficacy of the proposed models and algorithms. Most of the PS applications involve human mobility and report generation following sensing of any event of interest in the adjacent environment. This work is an attempt to study and empirically model human participation behavior and event occurrence distributions through development of a location-sensitive data simulation framework, called PS-Sim. From extensive experiments it has been observed that the synthetic data generated by PS-Sim replicates real participation and event occurrence behaviors in PS applications, which may be considered for validation purpose in absence of the groundtruth. As a proof-of-concept, we have used real-life dataset from a vehicular traffic management application to train the models in PS-Sim and cross-validated the simulated data with other parts of the same dataset.
△ Less
Submitted 29 August, 2018;
originally announced August 2018.