Skip to main content

Showing 1–7 of 7 results for author: Chhikara, P

.
  1. arXiv:2310.16033  [pdf, other

    cs.CV cs.CL

    Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs

    Authors: Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

    Abstract: Multimodal Large Language Models (MLLMs) have recently achieved promising zero-shot accuracy on visual question answering (VQA) -- a fundamental task affecting various downstream applications and domains. Given the great potential for the broad use of these models, it is important to investigate their limitations in dealing with different image and question properties. In this work, we investigate… ▽ More

    Submitted 12 February, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: 20 pages, 12 figures, 7 tables

  2. arXiv:2308.14391  [pdf, other

    cs.CV cs.CL

    FIRE: Food Image to REcipe generation

    Authors: Prateek Chhikara, Dhiraj Chaurasia, Yifan Jiang, Omkar Masur, Filip Ilievski

    Abstract: Food computing has emerged as a prominent multidisciplinary field of research in recent years. An ambitious goal of food computing is to develop end-to-end intelligent systems capable of autonomously producing recipe information for a food image. Current image-to-recipe methods are retrieval-based and their success depends heavily on the dataset size and diversity, as well as the quality of learne… ▽ More

    Submitted 12 May, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Published at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) -- 2024

  3. arXiv:2306.05652  [pdf, other

    cs.CL cs.AI cs.HC

    Privacy Aware Question-Answering System for Online Mental Health Risk Assessment

    Authors: Prateek Chhikara, Ujjwal Pasupulety, John Marshall, Dhiraj Chaurasia, Shweta Kumari

    Abstract: Social media platforms have enabled individuals suffering from mental illnesses to share their lived experiences and find the online support necessary to cope. However, many users fail to receive genuine clinical support, thus exacerbating their symptoms. Screening users based on what they post online can aid providers in administering targeted healthcare and minimize false positives. Pre-trained… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, 3 tables

  4. arXiv:2306.00228  [pdf, other

    cs.CV cs.AI cs.CL

    Using Visual Crop** to Enhance Fine-Detail Question Answering of BLIP-Family Models

    Authors: Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

    Abstract: Visual Question Answering is a challenging task, as it requires seamless interaction between perceptual, linguistic, and background knowledge systems. While the recent progress of visual and natural language models like BLIP has led to improved performance on this task, we lack understanding of the ability of such models to perform on different kinds of questions and reasoning types. As our initia… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: 16 pages, 5 figures, 7 tables

  5. arXiv:2305.05091  [pdf, other

    cs.CL cs.AI cs.HC

    Knowledge-enhanced Agents for Interactive Text Games

    Authors: Prateek Chhikara, Jiarui Zhang, Filip Ilievski, Jonathan Francis, Kaixin Ma

    Abstract: Communication via natural language is a key aspect of machine intelligence, and it requires computational models to learn and reason about world concepts, with varying levels of supervision. Significant progress has been made on fully-supervised non-interactive tasks, such as question-answering and procedural text understanding. Yet, various sequential interactive tasks, as in text-based games, ha… ▽ More

    Submitted 16 December, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Published at K-CAP '23

  6. arXiv:2301.06680  [pdf, other

    cs.CV cs.GR cs.LG

    DIGITOUR: Automatic Digital Tours for Real-Estate Properties

    Authors: Prateek Chhikara, Harshul Kuhar, Anil Goyal, Chirag Sharma

    Abstract: A virtual or digital tour is a form of virtual reality technology which allows a user to experience a specific location remotely. Currently, these virtual tours are created by following a 2-step strategy. First, a photographer clicks a 360 degree equirectangular image; then, a team of annotators manually links these images for the "walkthrough" user experience. The major challenge in the mass adop… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: Published at CODS-COMAD '23

  7. RE-Tagger: A light-weight Real-Estate Image Classifier

    Authors: Prateek Chhikara, Anil Goyal, Chirag Sharma

    Abstract: Real-estate image tagging is one of the essential use-cases to save efforts involved in manual annotation and enhance the user experience. This paper proposes an end-to-end pipeline (referred to as RE-Tagger) for the real-estate image classification problem. We present a two-stage transfer learning approach using custom InceptionV3 architecture to classify images into different categories (i.e., b… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (DEMO TRACK)