Skip to main content

Showing 1–50 of 52 results for author: Russakovsky, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04284  [pdf, other

    cs.LG

    What is Dataset Distillation Learning?

    Authors: William Yang, Ye Zhu, Zhiwei Deng, Olga Russakovsky

    Abstract: Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used to train high performing models, little is understood about how the information is stored. In this study, we posit and answer three questions about the behavio… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  2. arXiv:2403.19669  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Analyzing the Roles of Language and Vision in Learning from Limited Data

    Authors: Allison Chen, Ilia Sucholutsky, Olga Russakovsky, Thomas L. Griffiths

    Abstract: Does language help make sense of the visual world? How important is it to actually see the world rather than having it described with words? These basic questions about the nature of intelligence have been difficult to answer because we only had one example of an intelligent system -- humans -- and limited access to cases that isolated language or vision. However, the development of sophisticated… ▽ More

    Submitted 10 May, 2024; v1 submitted 15 February, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures

  3. arXiv:2312.10539  [pdf, other

    cs.CV

    DETER: Detecting Edited Regions for Deterring Generative Manipulations

    Authors: Sai Wang, Ye Zhu, Ruoyu Wang, Amaya Dharmasiri, Olga Russakovsky, Yu Wu

    Abstract: Generative AI capabilities have grown substantially in recent years, raising renewed concerns about potential malicious use of generated data, or "deep fakes". However, deep fake datasets have not kept up with generative AI advancements sufficiently to enable the development of deep fake detection technology which can meaningfully alert human users in real-world settings. Existing datasets typical… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: First two authors contribute equally to this work. Project page at https://deter2024.github.io/deter/

  4. arXiv:2311.02815  [pdf, other

    cs.CV

    Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning

    Authors: Nobline Yoo, Olga Russakovsky

    Abstract: The goal of 2D human pose estimation (HPE) is to localize anatomical landmarks, given an image of a person in a pose. SOTA techniques make use of thousands of labeled figures (finetuning transformers or training deep CNNs), acquired using labor-intensive crowdsourcing. On the other hand, self-supervised methods re-frame the HPE task as a reconstruction problem, enabling them to leverage the vast a… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: ICCVW 2023 Publication

  5. arXiv:2310.09213  [pdf, other

    cs.LG cs.CV

    Discovery and Expansion of New Domains within Diffusion Models

    Authors: Ye Zhu, Yu Wu, Duo Xu, Zhiwei Deng, Yan Yan, Olga Russakovsky

    Abstract: In this work, we study the generalization properties of diffusion models in a few-shot setup, introduce a novel tuning-free paradigm to synthesize the target out-of-domain (OOD) data, and demonstrate its advantages compared to existing methods in data-sparse scenarios with large domain gaps. Specifically, given a pre-trained model and a small set of images that are OOD relative to the model's trai… ▽ More

    Submitted 26 May, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Code will be released at https://github.com/L-YeZhu/DiscoveryDiff

  6. arXiv:2310.01755  [pdf, other

    cs.CV

    ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms

    Authors: William Yang, Byron Zhang, Olga Russakovsky

    Abstract: The task of out-of-distribution (OOD) detection is notoriously ill-defined. Earlier works focused on new-class detection, aiming to identify label-altering data distribution shifts, also known as "semantic shift." However, recent works argue for a focus on failure detection, expanding the OOD evaluation framework to account for label-preserving data distribution shifts, also known as "covariate sh… ▽ More

    Submitted 18 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Code and dataset at https://github.com/princetonvisualai/imagenetood

  7. arXiv:2308.07545  [pdf, other

    cs.CV

    Vision-Language Dataset Distillation

    Authors: Xindi Wu, Byron Zhang, Zhiwei Deng, Olga Russakovsky

    Abstract: Dataset distillation methods reduce large-scale datasets to smaller sets of synthetic data, which preserve sufficient information for quickly training a new model from scratch. However, prior work on dataset distillation has focused exclusively on image classification datasets, whereas modern large-scale datasets are primarily in the vision-language space. In this work, we design the first vision-… ▽ More

    Submitted 7 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: 29 pages, 13 figures

  8. arXiv:2306.04482  [pdf, other

    cs.CV

    ICON$^2$: Reliably Benchmarking Predictive Inequity in Object Detection

    Authors: Sruthi Sudhakar, Viraj Prabhu, Olga Russakovsky, Judy Hoffman

    Abstract: As computer vision systems are being increasingly deployed at scale in high-stakes applications like autonomous driving, concerns about social bias in these systems are rising. Analysis of fairness in real-world vision systems, such as object detection in driving scenes, has been limited to observing predictive inequity across attributes such as pedestrian skin tone, and lacks a consistent methodo… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPR 2023 SSAD Workshop

  9. Art and the science of generative AI: A deeper dive

    Authors: Ziv Epstein, Aaron Hertzmann, Laura Herman, Robert Mahari, Morgan R. Frank, Matthew Groh, Hope Schroeder, Amy Smith, Memo Akten, Jessica Fjeld, Hany Farid, Neil Leach, Alex Pentland, Olga Russakovsky

    Abstract: A new class of tools, colloquially called generative AI, can produce high-quality artistic media for visual arts, concept art, music, fiction, literature, video, and animation. The generative capabilities of these tools are likely to fundamentally alter the creative processes by which creators formulate ideas and put them into production. As creativity is reimagined, so too may be many sectors of… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: This white paper is an expanded version of Epstein et al 2023 published in Science Perspectives on July 16, 2023 which you can find at the following DOI: 10.1126/science.adh4451

  10. Humans, AI, and Context: Understanding End-Users' Trust in a Real-World Computer Vision Application

    Authors: Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, Andrés Monroy-Hernández

    Abstract: Trust is an important factor in people's interactions with AI systems. However, there is a lack of empirical studies examining how real end-users trust or distrust the AI system they interact with. Most research investigates one aspect of trust in lab settings with hypothetical end-users. In this paper, we provide a holistic and nuanced understanding of trust in AI through a qualitative case study… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: FAccT 2023

  11. arXiv:2303.15632  [pdf, other

    cs.CV

    UFO: A unified method for controlling Understandability and Faithfulness Objectives in concept-based explanations for CNNs

    Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Ruth Fong, Olga Russakovsky

    Abstract: Concept-based explanations for convolutional neural networks (CNNs) aim to explain model behavior and outputs using a pre-defined set of semantic concepts (e.g., the model recognizes scene class ``bedroom'' based on the presence of concepts ``bed'' and ``pillow''). However, they often do not faithfully (i.e., accurately) characterize the model's behavior and can be too complex for people to unders… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

  12. arXiv:2303.06167  [pdf, other

    cs.CV cs.CY cs.LG

    Overwriting Pretrained Bias with Finetuning Data

    Authors: Angelina Wang, Olga Russakovsky

    Abstract: Transfer learning is beneficial by allowing the expressive features of models pretrained on large-scale datasets to be finetuned for the target task of smaller, more domain-specific datasets. However, there is a concern that these pretrained models may come with their own biases which would propagate into the finetuned model. In this work, we investigate bias when conceptualized as both spurious c… ▽ More

    Submitted 16 August, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: ICCV 2023 Oral

  13. arXiv:2302.08357  [pdf, other

    cs.CV

    Boundary Guided Learning-Free Semantic Control with Diffusion Models

    Authors: Ye Zhu, Yu Wu, Zhiwei Deng, Olga Russakovsky, Yan Yan

    Abstract: Applying pre-trained generative denoising diffusion models (DDMs) for downstream tasks such as image semantic editing usually requires either fine-tuning DDMs or learning auxiliary editing networks in the existing literature. In this work, we present our BoundaryDiffusion method for efficient, effective and light-weight semantic control with frozen pre-trained DDMs, without learning any extra netw… ▽ More

    Submitted 18 October, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023. 27 pages including appendices, code at https://github.com/L-YeZhu/BoundaryDiffusion

  14. arXiv:2301.02560  [pdf, other

    cs.CV

    GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition

    Authors: Vikram V. Ramaswamy, Sing Yu Lin, Dora Zhao, Aaron B. Adcock, Laurens van der Maaten, Deepti Ghadiyaram, Olga Russakovsky

    Abstract: Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and North America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically… ▽ More

    Submitted 7 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  15. arXiv:2210.03735  [pdf, other

    cs.HC cs.AI cs.CV cs.CY

    "Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction

    Authors: Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, Andrés Monroy-Hernández

    Abstract: Despite the proliferation of explainable AI (XAI) methods, little is understood about end-users' explainability needs and behaviors around XAI explanations. To address this gap and contribute to understanding how explainability can support human-AI interaction, we conducted a mixed-methods study with 20 end-users of a real-world AI application, the Merlin bird identification app, and inquired abou… ▽ More

    Submitted 16 February, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: CHI 2023

    Journal ref: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23-28, 2023, Hamburg, Germany. ACM, New York, NY, USA

  16. arXiv:2207.13325  [pdf, other

    cs.CV

    SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

    Authors: Mengxue Qu, Yu Wu, Wu Liu, Qiqi Gong, Xiaodan Liang, Olga Russakovsky, Yao Zhao, Yunchao Wei

    Abstract: In this paper, we investigate how to achieve better visual grounding with modern vision-language transformers, and propose a simple yet powerful Selective Retraining (SiRi) mechanism for this challenging task. Particularly, SiRi conveys a significant principle to the research of visual grounding, i.e., a better initialized vision-language encoder would help the model converge to a better local min… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: 21 pages (including Supplementary Materials); Accepted to ECCV 2022

  17. arXiv:2207.09847  [pdf, other

    cs.CL cs.AI cs.CV

    Predicting Word Learning in Children from the Performance of Computer Vision Systems

    Authors: Sunayana Rane, Mira L. Nencheva, Zeyu Wang, Casey Lew-Williams, Olga Russakovsky, Thomas L. Griffiths

    Abstract: For human children as well as machine learning systems, a key challenge in learning a word is linking the word to the visual phenomena it describes. We explore this aspect of word learning by using the performance of computer vision systems as a proxy for the difficulty of learning a word from visual cues. We show that the age at which children acquire different categories of words is correlated w… ▽ More

    Submitted 9 September, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: CogSci 2023

  18. arXiv:2207.09615  [pdf, other

    cs.CV

    Overlooked factors in concept-based explanations: Dataset choice, concept learnability, and human capability

    Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Ruth Fong, Olga Russakovsky

    Abstract: Concept-based interpretability methods aim to explain deep neural network model predictions using a predefined set of semantic concepts. These methods evaluate a trained model on a new, "probe" dataset and correlate model predictions with the visual concepts labeled in that dataset. Despite their popularity, they suffer from limitations that are not well-understood and articulated by the literatur… ▽ More

    Submitted 12 May, 2023; v1 submitted 19 July, 2022; originally announced July 2022.

    Comments: Published at CVPR 2023

  19. arXiv:2206.09191  [pdf, other

    cs.CV

    Gender Artifacts in Visual Datasets

    Authors: Nicole Meister, Dora Zhao, Angelina Wang, Vikram V. Ramaswamy, Ruth Fong, Olga Russakovsky

    Abstract: Gender biases are known to exist within large-scale visual datasets and can be reflected or even amplified in downstream models. Many prior works have proposed methods for mitigating gender biases, often by attempting to remove gender expression information from images. To understand the feasibility and practicality of these approaches, we investigate what $\textit{gender artifacts}$ exist within… ▽ More

    Submitted 17 September, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: ICCV 2023

  20. arXiv:2206.07690  [pdf, other

    cs.CV cs.LG

    ELUDE: Generating interpretable explanations via a decomposition into labelled and unlabelled features

    Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Nicole Meister, Ruth Fong, Olga Russakovsky

    Abstract: Deep learning models have achieved remarkable success in different areas of machine learning over the past decade; however, the size and complexity of these models make them difficult to understand. In an effort to make them more interpretable, several recent works focus on explaining parts of a deep neural network through human-interpretable, semantic attributes. However, it may be impossible to… ▽ More

    Submitted 16 June, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

  21. arXiv:2206.02916  [pdf, other

    cs.LG cs.AI cs.CV

    Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks

    Authors: Zhiwei Deng, Olga Russakovsky

    Abstract: We propose an algorithm that compresses the critical information of a large dataset into compact addressable memories. These memories can then be recalled to quickly re-train a neural network and recover the performance (instead of storing and re-training on the full original dataset). Building upon the dataset distillation framework, we make a key observation that a shared common representation a… ▽ More

    Submitted 18 November, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

  22. Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation

    Authors: Angelina Wang, Vikram V. Ramaswamy, Olga Russakovsky

    Abstract: Research in machine learning fairness has historically considered a single binary demographic attribute; however, the reality is of course far more complicated. In this work, we grapple with questions that arise along three stages of the machine learning pipeline when incorporating intersectionality as multiple demographic attributes: (1) which demographic attributes to include as dataset labels,… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2022

  23. arXiv:2203.07613  [pdf, other

    cs.CL cs.CV

    CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

    Authors: Carlos E. Jimenez, Olga Russakovsky, Karthik Narasimhan

    Abstract: We introduce CARETS, a systematic test suite to measure consistency and robustness of modern VQA models through a series of six fine-grained capability tests. In contrast to existing VQA test sets, CARETS features balanced question generation to create pairs of instances to test models, with each pair focusing on a specific capability such as rephrasing, logical symmetry or image obfuscation. We e… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  24. arXiv:2201.03639  [pdf, other

    cs.CV

    Multi-Query Video Retrieval

    Authors: Zeyu Wang, Yu Wu, Karthik Narasimhan, Olga Russakovsky

    Abstract: Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years. Despite recent progress, imperfect annotations in existing video retrieval datasets have posed significant challenges on model evaluation and development. In this paper, we tackle this issue by focusing on the less-studied setting of multi-query vide… ▽ More

    Submitted 20 July, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: ECCV 2022

  25. arXiv:2112.03184  [pdf, other

    cs.CV

    HIVE: Evaluating the Human Interpretability of Visual Explanations

    Authors: Sunnie S. Y. Kim, Nicole Meister, Vikram V. Ramaswamy, Ruth Fong, Olga Russakovsky

    Abstract: As AI technology is increasingly applied to high-impact, high-risk domains, there have been a number of new methods aimed at making AI models more human interpretable. Despite the recent growth of interpretability work, there is a lack of systematic evaluation of proposed techniques. In this work, we introduce HIVE (Human Interpretability of Visual Explanations), a novel human evaluation framework… ▽ More

    Submitted 21 July, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: ECCV 2022. Code and supplementary material are at https://princetonvisualai.github.io/HIVE

  26. arXiv:2106.08503  [pdf, other

    cs.CV

    Understanding and Evaluating Racial Biases in Image Captioning

    Authors: Dora Zhao, Angelina Wang, Olga Russakovsky

    Abstract: Image captioning is an important task for benchmarking visual reasoning and for enabling accessibility for people with vision impairments. However, as in many machine learning settings, social biases can influence image captioning in undesirable ways. In this work, we study bias propagation pathways within image captioning, focusing specifically on the COCO dataset. Prior work has analyzed gender… ▽ More

    Submitted 30 August, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: ICCV 2021

  27. [Re] Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

    Authors: Sunnie S. Y. Kim, Sharon Zhang, Nicole Meister, Olga Russakovsky

    Abstract: Singh et al. (2020) point out the dangers of contextual bias in visual recognition datasets. They propose two methods, CAM-based and feature-split, that better recognize an object or attribute in the absence of its typical context while maintaining competitive within-context accuracy. To verify their performance, we attempted to reproduce all 12 tables in the original paper, including those in the… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: ML Reproducibility Challenge 2020. Accepted for publication in the ReScience C journal

  28. arXiv:2103.06191  [pdf, other

    cs.CV

    A Study of Face Obfuscation in ImageNet

    Authors: Kaiyu Yang, Jacqueline Yau, Li Fei-Fei, Jia Deng, Olga Russakovsky

    Abstract: Face obfuscation (blurring, mosaicing, etc.) has been shown to be effective for privacy protection; nevertheless, object recognition research typically assumes access to complete, unobfuscated images. In this paper, we explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark. Most categories in the ImageNet challenge are not people categories; however,… ▽ More

    Submitted 9 June, 2022; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted to ICML 2022

  29. arXiv:2102.12594  [pdf, other

    cs.LG cs.AI

    Directional Bias Amplification

    Authors: Angelina Wang, Olga Russakovsky

    Abstract: Mitigating bias in machine learning systems requires refining our understanding of bias propagation pathways: from societal structures to large-scale data to trained models to impact on society. In this work, we focus on one aspect of the problem, namely bias amplification: the tendency of models to amplify the biases present in the data they are trained on. A metric for measuring bias amplificati… ▽ More

    Submitted 7 June, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: ICML 2021

  30. arXiv:2012.01469  [pdf, other

    cs.CV

    Fair Attribute Classification through Latent Space De-biasing

    Authors: Vikram V. Ramaswamy, Sunnie S. Y. Kim, Olga Russakovsky

    Abstract: Fairness in visual recognition is becoming a prominent and critical topic of discussion as recognition systems are deployed at scale in the real world. Models trained from data in which target labels are correlated with protected attributes (e.g., gender, race) are known to learn and exploit those correlations. In this work, we introduce a method for training accurate target classifiers while miti… ▽ More

    Submitted 2 April, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: Accepted to CVPR 2021, code can be found at https://github.com/princetonvisualai/gan-debiasing

  31. arXiv:2011.13681  [pdf, other

    cs.CV

    Point and Ask: Incorporating Pointing into Visual Question Answering

    Authors: Arjun Mani, Nobline Yoo, Will Hinthorn, Olga Russakovsky

    Abstract: Visual Question Answering (VQA) has become one of the key benchmarks of visual recognition progress. Multiple VQA extensions have been explored to better simulate real-world settings: different question formulations, changing training and test distributions, conversational consistency in dialogues, and explanation-based answering. In this work, we further expand this space by considering visual qu… ▽ More

    Submitted 18 February, 2022; v1 submitted 27 November, 2020; originally announced November 2020.

  32. arXiv:2009.03949  [pdf, other

    cs.CV

    Towards Unique and Informative Captioning of Images

    Authors: Zeyu Wang, Berthy Feng, Karthik Narasimhan, Olga Russakovsky

    Abstract: Despite considerable progress, state of the art image captioning models produce generic captions, leaving out important image details. Furthermore, these systems may even misrepresent the image in order to produce a simpler caption consisting of common concepts. In this paper, we first analyze both modern captioning systems and evaluation metrics through empirical experiments to quantify these phe… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

    Comments: ECCV 2020

  33. arXiv:2007.05655  [pdf, other

    cs.CV cs.AI cs.RO

    Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

    Authors: Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky

    Abstract: The ability to perform effective planning is crucial for building an instruction-following agent. When navigating through a new environment, an agent is challenged with (1) connecting the natural language instructions with its progressively growing knowledge of the world; and (2) performing long-range planning and decision making in the form of effective exploration and error correction. Current m… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  34. arXiv:2004.07999  [pdf, other

    cs.CV

    REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets

    Authors: Angelina Wang, Alexander Liu, Ryan Zhang, Anat Kleiman, Leslie Kim, Dora Zhao, Iroha Shirai, Arvind Narayanan, Olga Russakovsky

    Abstract: Machine learning models are known to perpetuate and even amplify the biases present in the data. However, these data biases frequently do not become apparent until after the models are deployed. Our work tackles this issue and enables the preemptive analysis of large-scale datasets. REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset, surfacing potentia… ▽ More

    Submitted 23 July, 2021; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Extended version of ECCV 2020 Spotlight paper

  35. arXiv:2003.14269  [pdf, other

    cs.CV

    Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation

    Authors: Felix Yu, Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky

    Abstract: In the Vision-and-Language Navigation (VLN) task, an agent with egocentric vision navigates to a destination given natural language instructions. The act of manually annotating these instructions is timely and expensive, such that many existing approaches automatically generate additional samples to improve agent performance. However, these approaches still have difficulty generalizing their perfo… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: 4 page short paper

  36. Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy

    Authors: Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, Olga Russakovsky

    Abstract: Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including offensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in th… ▽ More

    Submitted 16 December, 2019; originally announced December 2019.

    Comments: Accepted to FAT* 2020

  37. arXiv:1912.02256  [pdf, other

    cs.CV

    Compositional Temporal Visual Grounding of Natural Language Event Descriptions

    Authors: Jonathan C. Stroud, Ryan McCaffrey, Rada Mihalcea, Jia Deng, Olga Russakovsky

    Abstract: Temporal grounding entails establishing a correspondence between natural language event descriptions and their visual depictions. Compositional modeling becomes central: we first ground atomic descriptions "girl eating an apple," "batter hitting the ball" to short video segments, and then establish the temporal relationships between the segments. This compositional structure enables models to reco… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: Project page: jonathancstroud.com/ctg

  38. arXiv:1911.11834  [pdf, other

    cs.CV

    Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation

    Authors: Zeyu Wang, Klint Qinami, Ioannis Christos Karakozis, Kyle Genova, Prem Nair, Kenji Hata, Olga Russakovsky

    Abstract: Computer vision models learn to perform a task by capturing relevant statistics from training data. It has been shown that models learn spurious age, gender, and race correlations when trained for seemingly unrelated tasks like activity recognition or image captioning. Various mitigation techniques have been presented to prevent models from utilizing or learning such biases. However, there has bee… ▽ More

    Submitted 2 April, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: To appear in CVPR 2020

  39. arXiv:1908.07086  [pdf, other

    cs.CV

    Human uncertainty makes classification more robust

    Authors: Joshua C. Peterson, Ruairidh M. Battleday, Thomas L. Griffiths, Olga Russakovsky

    Abstract: The classification performance of deep neural networks has begun to asymptote at near-perfect levels. However, their ability to generalize outside the training set and their robustness to adversarial attacks have not. In this paper, we make progress on this problem by training with full label distributions that reflect human perceptual uncertainty. We first present a new benchmark dataset which we… ▽ More

    Submitted 19 August, 2019; originally announced August 2019.

    Comments: In Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV)

  40. arXiv:1908.02660  [pdf, other

    cs.CV

    SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition

    Authors: Kaiyu Yang, Olga Russakovsky, Jia Deng

    Abstract: Understanding the spatial relations between objects in images is a surprisingly challenging task. A chair may be "behind" a person even if it appears to the left of the person in the image (depending on which way the person is facing). Two students that appear close to each other in the image may not in fact be "next to" each other if there is a third student between them. We introduce SpatialSe… ▽ More

    Submitted 29 August, 2019; v1 submitted 7 August, 2019; originally announced August 2019.

    Comments: Accepted to ICCV 2019

  41. arXiv:1904.08900  [pdf, other

    cs.CV

    CornerNet-Lite: Efficient Keypoint Based Object Detection

    Authors: Hei Law, Yun Teng, Olga Russakovsky, Jia Deng

    Abstract: Keypoint-based methods are a relatively new paradigm in object detection, eliminating the need for anchor boxes and offering a simplified detection framework. Keypoint-based CornerNet achieves state of the art accuracy among single-stage detectors. However, this accuracy comes at high processing cost. In this work, we tackle the problem of efficient keypoint-based object detection and introduce Co… ▽ More

    Submitted 16 September, 2020; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: Accepted to BMVC 2020

  42. arXiv:1708.02696  [pdf, other

    cs.CV

    What Actions are Needed for Understanding Human Actions in Videos?

    Authors: Gunnar A. Sigurdsson, Olga Russakovsky, Abhinav Gupta

    Abstract: What is the right way to reason about human activities? What directions forward are most promising? In this work, we analyze the current state of human activity understanding in videos. The goal of this paper is to examine datasets, evaluation metrics, algorithms, and potential future directions. We look at the qualitative attributes that define activities such as pose variability, brevity, and de… ▽ More

    Submitted 8 August, 2017; originally announced August 2017.

    Comments: ICCV2017

  43. arXiv:1706.02884  [pdf, other

    cs.CV

    Learning to Learn from Noisy Web Videos

    Authors: Serena Yeung, Vignesh Ramanathan, Olga Russakovsky, Liyue Shen, Greg Mori, Li Fei-Fei

    Abstract: Understanding the simultaneously very diverse and intricately fine-grained set of possible human actions is a critical open problem in computer vision. Manually labeling training videos is feasible for some action classes but doesn't scale to the full long-tailed distribution of actions. A promising way to address this is to leverage noisy data from web queries to learn new actions, using semi-sup… ▽ More

    Submitted 9 June, 2017; originally announced June 2017.

    Comments: To appear in CVPR 2017

  44. arXiv:1704.03895  [pdf, other

    cs.CV

    What's in a Question: Using Visual Questions as a Form of Supervision

    Authors: Siddha Ganju, Olga Russakovsky, Abhinav Gupta

    Abstract: Collecting fully annotated image datasets is challenging and expensive. Many types of weak supervision have been explored: weak manual annotations, web search results, temporal continuity, ambient sound and others. We focus on one particular unexplored mode: visual questions that are asked about images. The key observation that inspires our work is that the question itself provides useful informat… ▽ More

    Submitted 12 April, 2017; originally announced April 2017.

    Comments: CVPR 2017 Spotlight paper and supplementary

  45. Predictive-Corrective Networks for Action Detection

    Authors: Achal Dave, Olga Russakovsky, Deva Ramanan

    Abstract: While deep feature learning has revolutionized techniques for static-image understanding, the same does not quite hold for video processing. Architectures and optimization techniques used for video are largely based off those for static images, potentially underutilizing rich video information. In this work, we rethink both the underlying network architecture and the stochastic learning paradigm f… ▽ More

    Submitted 12 December, 2017; v1 submitted 12 April, 2017; originally announced April 2017.

    Comments: Accepted to CVPR 2017. [v2]: Updated Multi-LSTM mAP on MultiTHUMOS (should be 29.7, was initially reported as 29.6). [Project URL]: http://www.achaldave.com/projects/predictive-corrective/

  46. Crowdsourcing in Computer Vision

    Authors: Adriana Kovashka, Olga Russakovsky, Li Fei-Fei, Kristen Grauman

    Abstract: Computer vision systems require large amounts of manually annotated data to properly learn challenging visual concepts. Crowdsourcing platforms offer an inexpensive method to capture human knowledge and understanding, for a vast number of visual perception tasks. In this survey, we describe the types of annotations computer vision researchers have collected using crowdsourcing, and how they have e… ▽ More

    Submitted 7 November, 2016; originally announced November 2016.

    Comments: A 69-page meta review of the field, Foundations and Trends in Computer Graphics and Vision, 2016

  47. arXiv:1607.07429  [pdf, other

    cs.HC cs.CV

    Much Ado About Time: Exhaustive Annotation of Temporal Data

    Authors: Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta

    Abstract: Large-scale annotated datasets allow AI systems to learn from and build upon the knowledge of the crowd. Many crowdsourcing techniques have been developed for collecting image annotations. These techniques often implicitly rely on the fact that a new input image takes a negligible amount of time to perceive. In contrast, we investigate and determine the most cost-effective way of obtaining high-qu… ▽ More

    Submitted 2 October, 2016; v1 submitted 25 July, 2016; originally announced July 2016.

    Comments: HCOMP 2016 Camera Ready

  48. arXiv:1511.06984  [pdf, other

    cs.CV cs.LG

    End-to-end Learning of Action Detection from Frame Glimpses in Videos

    Authors: Serena Yeung, Olga Russakovsky, Greg Mori, Li Fei-Fei

    Abstract: In this work we introduce a fully end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions. Our intuition is that the process of detecting actions is naturally one of observation and refinement: observing moments in video, and refining hypotheses about when an action is occurring. Based on this insight, we formulate our model as a recurrent… ▽ More

    Submitted 13 March, 2017; v1 submitted 22 November, 2015; originally announced November 2015.

    Comments: Update to version in CVPR 2016 proceedings

  49. arXiv:1507.05738  [pdf, other

    cs.CV

    Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos

    Authors: Serena Yeung, Olga Russakovsky, Ning **, Mykhaylo Andriluka, Greg Mori, Li Fei-Fei

    Abstract: Every moment counts in action recognition. A comprehensive understanding of human activity in video requires labeling every frame according to the actions occurring, placing multiple labels densely over a video sequence. To study this problem we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new dataset of dense labels over unconstrained internet videos. Modeling multiple, dense l… ▽ More

    Submitted 9 June, 2017; v1 submitted 21 July, 2015; originally announced July 2015.

    Comments: To appear in IJCV

  50. arXiv:1506.02106  [pdf, other

    cs.CV

    What's the Point: Semantic Segmentation with Point Supervision

    Authors: Amy Bearman, Olga Russakovsky, Vittorio Ferrari, Li Fei-Fei

    Abstract: The semantic image segmentation task presents a trade-off between test time accuracy and training-time annotation cost. Detailed per-pixel annotations enable training accurate models but are very time-consuming to obtain, image-level class labels are an order of magnitude cheaper but result in less accurate models. We take a natural step from image-level annotation towards stronger supervision: we… ▽ More

    Submitted 23 July, 2016; v1 submitted 5 June, 2015; originally announced June 2015.

    Comments: ECCV (2016) submission