Skip to main content

Showing 1–27 of 27 results for author: Gurari, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03643  [pdf, other

    cs.CV

    Collecting Consistently High Quality Object Tracks with Minimal Human Involvement by Using Self-Supervised Learning to Detect Tracker Errors

    Authors: Samreen Anjum, Suyog Jain, Danna Gurari

    Abstract: We propose a hybrid framework for consistently producing high-quality object tracks by combining an automated object tracker with little human input. The key idea is to tailor a module for each dataset to intelligently decide when an object tracker is failing and so humans should be brought in to re-localize an object for continued tracking. Our approach leverages self-supervised learning on unlab… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  2. arXiv:2404.14990  [pdf, other

    cs.CV eess.IV

    Interpreting COVID Lateral Flow Tests' Results with Foundation Models

    Authors: Stuti Pandey, Josh Myers-Dean, Jarek Reynolds, Danna Gurari

    Abstract: Lateral flow tests (LFTs) enable rapid, low-cost testing for health conditions including Covid, pregnancy, HIV, and malaria. Automated readers of LFT results can yield many benefits including empowering blind people to independently learn about their health and accelerating data entry for large-scale monitoring (e.g., for pandemics such as Covid) by using only a single photograph per LFT test. Acc… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  3. arXiv:2312.10637  [pdf, other

    cs.CV cs.AI

    An Evaluation of GPT-4V and Gemini in Online VQA

    Authors: Mengchen Liu, Chongyan Chen, Danna Gurari

    Abstract: While there is much excitement about the potential of large multimodal models (LMM), a comprehensive evaluation is critical to establish their true capabilities and limitations. In support of this aim, we evaluate two state-of-the-art LMMs, GPT-4V and Gemini, on a new visual question answering dataset sourced from an authentic online question answering community. We conduct fine-grained analysis b… ▽ More

    Submitted 13 February, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: 20 pages

  4. arXiv:2311.15562  [pdf, other

    cs.CV

    Fully Authentic Visual Question Answering Dataset from Online Communities

    Authors: Chongyan Chen, Mengchen Liu, Noel Codella, Yunsheng Li, Lu Yuan, Danna Gurari

    Abstract: Visual Question Answering (VQA) entails answering questions about images. We introduce the first VQA dataset in which all contents originate from an authentic use case. Sourced from online question answering community forums, we call it VQAonline. We characterize this dataset and how it relates to eight mainstream VQA datasets. Observing that answers in our dataset tend to be much longer (i.e., a… ▽ More

    Submitted 18 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  5. arXiv:2308.11662  [pdf, other

    cs.CV

    VQA Therapy: Exploring Answer Differences by Visually Grounding Answers

    Authors: Chongyan Chen, Samreen Anjum, Danna Gurari

    Abstract: Visual question answering is a task of predicting the answer to a question about an image. Given that different people can provide different answers to a visual question, we aim to better understand why with answer groundings. We introduce the first dataset that visually grounds each unique answer to each visual question, which we call VQAAnswerTherapy. We then propose two novel problems of predic… ▽ More

    Submitted 24 August, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: IEEE/CVF International Conference on Computer Vision

  6. arXiv:2307.10518  [pdf, other

    cs.CV

    Interactive Segmentation for Diverse Gesture Types Without Context

    Authors: Josh Myers-Dean, Yifei Fan, Brian Price, Wilson Chan, Danna Gurari

    Abstract: Interactive segmentation entails a human marking an image to guide how a model either creates or edits a segmentation. Our work addresses limitations of existing methods: they either only support one gesture type for marking an image (e.g., either clicks or scribbles) or require knowledge of the gesture type being employed, and require specifying whether marked regions should be included versus ex… ▽ More

    Submitted 5 December, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

  7. Hel** Visually Impaired People Take Better Quality Pictures

    Authors: Maniratnam Mandal, Deepti Ghadiyaram, Danna Gurari, Alan C. Bovik

    Abstract: Perception-based image analysis technologies can be used to help visually impaired people take better quality pictures by providing automated guidance, thereby empowering them to interact more confidently on social media. The photographs taken by visually impaired users often suffer from one or both of two kinds of quality issues: technical quality (distortions), and semantic quality, such as fram… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

  8. arXiv:2301.05323  [pdf, other

    cs.CV

    Salient Object Detection for Images Taken by People With Vision Impairments

    Authors: Jarek Reynolds, Chandra Kanth Nagesh, Danna Gurari

    Abstract: Salient object detection is the task of producing a binary mask for an image that deciphers which pixels belong to the foreground object versus background. We introduce a new salient object detection dataset using images taken by people who are visually impaired who were seeking to better understand their surroundings, which we call VizWiz-SalientObject. Compared to seven existing datasets, VizWiz… ▽ More

    Submitted 5 September, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Computer Vision and Pattern Recognition

  9. arXiv:2210.05996  [pdf, other

    cs.CV

    Line Search-Based Feature Transformation for Fast, Stable, and Tunable Content-Style Control in Photorealistic Style Transfer

    Authors: Tai-Yin Chiu, Danna Gurari

    Abstract: Photorealistic style transfer is the task of synthesizing a realistic-looking image when adapting the content from one image to appear in the style of another image. Modern models commonly embed a transformation that fuses features describing the content image and style image and then decodes the resulting feature into a stylized image. We introduce a general-purpose transformation that enables co… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: WACV2023

  10. arXiv:2207.11810  [pdf, other

    cs.CV

    VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments

    Authors: Yu-Yun Tseng, Alexander Bell, Danna Gurari

    Abstract: We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments. Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the fir… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022. The first two authors contributed equally

  11. arXiv:2203.13452  [pdf, other

    cs.CV

    PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models

    Authors: Tai-Yin Chiu, Danna Gurari

    Abstract: Photorealistic style transfer entails transferring the style of a reference image to another image so the result seems like a plausible photo. Our work is inspired by the observation that existing models are slow due to their large sizes. We introduce PCA-based knowledge distillation to distill lightweight models and show it is motivated by theory. To our knowledge, this is the first knowledge dis… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Report number: accepted to CVPR 2022

  12. arXiv:2202.01993  [pdf, other

    cs.CV cs.CL

    Grounding Answers for Visual Questions Asked by Visually Impaired People

    Authors: Chongyan Chen, Samreen Anjum, Danna Gurari

    Abstract: Visual question answering is the task of answering questions about images. We introduce the VizWiz-VQA-Grounding dataset, the first dataset that visually grounds answers to visual questions asked by people with visual impairments. We analyze our dataset and compare it with five VQA-Grounding datasets to demonstrate what makes it similar and different. We then evaluate the SOTA VQA and VQA-Groundin… ▽ More

    Submitted 8 April, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

    Comments: Computer Vision and Pattern Recognition

  13. arXiv:2112.10982  [pdf, other

    cs.CV cs.LG

    Generalized Few-Shot Semantic Segmentation: All You Need is Fine-Tuning

    Authors: Josh Myers-Dean, Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari

    Abstract: Generalized few-shot semantic segmentation was introduced to move beyond only evaluating few-shot segmentation models on novel classes to include testing their ability to remember base classes. While the current state-of-the-art approach is based on meta-learning, it performs poorly and saturates in learning after observing only a few shots. We propose the first fine-tuning solution, and demonstra… ▽ More

    Submitted 24 March, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: Includes supplementary materials

  14. arXiv:2110.11995  [pdf, other

    eess.IV cs.CV

    PhotoWCT$^2$: Compact Autoencoder for Photorealistic Style Transfer Resulting from Blockwise Training and Skip Connections of High-Frequency Residuals

    Authors: Tai-Yin Chiu, Danna Gurari

    Abstract: Photorealistic style transfer is an image editing task with the goal to modify an image to match the style of another image while ensuring the result looks like a real photograph. A limitation of existing models is that they have many parameters, which in turn prevents their use for larger image resolutions and leads to slower run-times. We introduce two mechanisms that enable our design of a more… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

  15. Vision Skills Needed to Answer Visual Questions

    Authors: Xiaoyu Zeng, Yanan Wang, Tai-Yin Chiu, Nilavra Bhattacharya, Danna Gurari

    Abstract: The task of answering questions about images has garnered attention as a practical service for assisting populations with visual impairments as well as a visual Turing test for the artificial intelligence community. Our first aim is to identify the common vision skills needed for both scenarios. To do so, we analyze the need for four vision skills---object recognition, text recognition, color reco… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: To be published on Proceedings of the ACM on Human-Computer Interaction, Vol. 4, No. CSCW2, Article 149. Publication date: October 2020

  16. arXiv:2009.14265  [pdf, other

    cs.CV cs.HC

    CrowdMOT: Crowdsourcing Strategies for Tracking Multiple Objects in Videos

    Authors: Samreen Anjum, Chi Lin, Danna Gurari

    Abstract: Crowdsourcing is a valuable approach for tracking objects in videos in a more scalable manner than possible with domain experts. However, existing frameworks do not produce high quality results with non-expert crowdworkers, especially for scenarios where objects split. To address this shortcoming, we introduce a crowdsourcing platform called CrowdMOT, and investigate two micro-task design decision… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: CSCW 2020

  17. arXiv:2004.02945  [pdf, other

    cs.CV

    Objectness-Aware Few-Shot Semantic Segmentation

    Authors: Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari

    Abstract: Few-shot semantic segmentation models aim to segment images after learning from only a few annotated examples. A key challenge for them is how to avoid overfitting because limited training data is available. While prior works usually limited the overall model capacity to alleviate overfitting, this hampers segmentation accuracy. We demonstrate how to increase overall model capacity to achieve impr… ▽ More

    Submitted 12 October, 2021; v1 submitted 6 April, 2020; originally announced April 2020.

  18. arXiv:2003.12511  [pdf, other

    cs.CV

    Assessing Image Quality Issues for Real-World Problems

    Authors: Tai-Yin Chiu, Yinan Zhao, Danna Gurari

    Abstract: We introduce a new large-scale dataset that links the assessment of image quality issues to two practical vision tasks: image captioning and visual question answering. First, we identify for 39,181 images taken by people who are blind whether each is sufficient quality to recognize the content as well as what quality flaws are observed from six options. These labels serve as a critical foundation… ▽ More

    Submitted 30 March, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

  19. arXiv:2002.08565  [pdf, other

    cs.CV

    Captioning Images Taken by People Who Are Blind

    Authors: Danna Gurari, Yinan Zhao, Meng Zhang, Nilavra Bhattacharya

    Abstract: While an important problem in the vision community is to design algorithms that can automatically caption images, few publicly-available datasets for algorithm development directly address the interests of real users. Observing that people who are blind have relied on (human-based) image captioning services to learn about images they take for nearly a decade, we introduce the first image captionin… ▽ More

    Submitted 15 July, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

  20. arXiv:1912.09336  [pdf, other

    cs.LG cs.CV cs.HC

    VizWiz Dataset Browser: A Tool for Visualizing Machine Learning Datasets

    Authors: Nilavra Bhattacharya, Danna Gurari

    Abstract: We present a visualization tool to exhaustively search and browse through a set of large-scale machine learning datasets. Built on the top of the VizWiz dataset, our dataset browser tool has the potential to support and enable a variety of qualitative and quantitative research, and open new directions for visualizing and researching with multimodal information. The tool is publicly available at ht… ▽ More

    Submitted 19 December, 2019; originally announced December 2019.

  21. arXiv:1908.04342  [pdf, other

    cs.CV cs.HC cs.LG

    Why Does a Visual Question Have Different Answers?

    Authors: Nilavra Bhattacharya, Qing Li, Danna Gurari

    Abstract: Visual question answering is the task of returning the answer to a question about an image. A challenge is that different people often provide different answers to the same visual question. To our knowledge, this is the first work that aims to understand why. We propose a taxonomy of nine plausible reasons, and create two labelled datasets consisting of ~45,000 visual questions indicating which re… ▽ More

    Submitted 14 August, 2019; v1 submitted 12 August, 2019; originally announced August 2019.

    Journal ref: The IEEE International Conference on Computer Vision (ICCV) 2019

  22. arXiv:1908.03675  [pdf, other

    cs.CV

    Unconstrained Foreground Object Search

    Authors: Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari

    Abstract: Many people search for foreground objects to use when editing images. While existing methods can retrieve candidates to aid in this, they are constrained to returning objects that belong to a pre-specified semantic class. We instead propose a novel problem of unconstrained foreground object (UFO) search and introduce a solution that supports efficient search by encoding the background image in the… ▽ More

    Submitted 9 August, 2019; originally announced August 2019.

    Comments: To appear in ICCV 2019

  23. arXiv:1905.00060  [pdf, other

    cs.CV

    Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch

    Authors: Danna Gurari, Yinan Zhao, Suyog Dutt Jain, Margrit Betke, Kristen Grauman

    Abstract: Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images… ▽ More

    Submitted 30 April, 2019; originally announced May 2019.

  24. arXiv:1803.08435  [pdf, other

    cs.CV

    Guided Image Inpainting: Replacing an Image Region by Pulling Content from Another Image

    Authors: Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari

    Abstract: Deep generative models have shown success in automatically synthesizing missing image regions using surrounding context. However, users cannot directly decide what content to synthesize with such approaches. We propose an end-to-end network for image inpainting that uses a different image to guide the synthesis of new content to fill the hole. A key challenge addressed by our approach is synthesiz… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

  25. arXiv:1802.08218  [pdf, other

    cs.CV cs.CL cs.HC

    VizWiz Grand Challenge: Answering Visual Questions from Blind People

    Authors: Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, Jeffrey P. Bigham

    Abstract: The study of algorithms to automatically answer visual questions currently is motivated by visual question answering (VQA) datasets constructed in artificial VQA settings. We propose VizWiz, the first goal-oriented VQA dataset arising from a natural VQA setting. VizWiz consists of over 31,000 visual questions originating from blind people who each took a picture using a mobile phone and recorded a… ▽ More

    Submitted 9 May, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

  26. arXiv:1705.00366  [pdf, other

    cs.CV

    Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)

    Authors: Danna Gurari, Kun He, Bo Xiong, Jianming Zhang, Mehrnoosh Sameki, Suyog Dutt Jain, Stan Sclaroff, Margrit Betke, Kristen Grauman

    Abstract: We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems. Specifically, we distinguish between images which lead multiple annotators to segment different foreground objects (ambiguous) versus minor inter-annotator differences of the same object. Taking images from eight wid… ▽ More

    Submitted 30 April, 2017; originally announced May 2017.

  27. arXiv:1608.08188  [pdf, other

    cs.AI cs.CL cs.CV cs.HC

    Visual Question: Predicting If a Crowd Will Agree on the Answer

    Authors: Danna Gurari, Kristen Grauman

    Abstract: Visual question answering (VQA) systems are emerging from a desire to empower users to ask any natural language question about visual content and receive a valid answer in response. However, close examination of the VQA problem reveals an unavoidable, entangled problem that multiple humans may or may not always agree on a single answer to a visual question. We train a model to automatically predic… ▽ More

    Submitted 29 August, 2016; originally announced August 2016.