Search | arXiv e-print repository

We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline

Authors: Simar Kareer, Vivek Vijaykumar, Harsh Maheshwari, Prithvijit Chattopadhyay, Judy Hoffman, Viraj Prabhu

Abstract: There has been abundant work in unsupervised domain adaptation for semantic segmentation (DAS) seeking to adapt a model trained on images from a labeled source domain to an unlabeled target domain. While the vast majority of prior work has studied this as a frame-level Image-DAS problem, a few Video-DAS works have sought to additionally leverage the temporal signal present in adjacent frames. Howe… ▽ More There has been abundant work in unsupervised domain adaptation for semantic segmentation (DAS) seeking to adapt a model trained on images from a labeled source domain to an unlabeled target domain. While the vast majority of prior work has studied this as a frame-level Image-DAS problem, a few Video-DAS works have sought to additionally leverage the temporal signal present in adjacent frames. However, Video-DAS works have historically studied a distinct set of benchmarks from Image-DAS, with minimal cross-benchmarking. In this work, we address this gap. Surprisingly, we find that (1) even after carefully controlling for data and model architecture, state-of-the-art Image-DAS methods (HRDA and HRDA+MIC) outperform Video-DAS methods on established Video-DAS benchmarks (+14.5 mIoU on Viper$\rightarrow$CityscapesSeq, +19.0 mIoU on Synthia$\rightarrow$CityscapesSeq), and (2) naive combinations of Image-DAS and Video-DAS techniques only lead to marginal improvements across datasets. To avoid siloed progress between Image-DAS and Video-DAS, we open-source our codebase with support for a comprehensive set of Video-DAS and Image-DAS methods on a common benchmark. Code available at https://github.com/SimarKareer/UnifiedVideoDA △ Less

Submitted 27 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: TMLR 2024

arXiv:2212.00979 [pdf, other]

PASTA: Proportional Amplitude Spectrum Training Augmentation for Syn-to-Real Domain Generalization

Authors: Prithvijit Chattopadhyay, Kartik Sarangmath, Vivek Vijaykumar, Judy Hoffman

Abstract: Synthetic data offers the promise of cheap and bountiful training data for settings where labeled real-world data is scarce. However, models trained on synthetic data significantly underperform when evaluated on real-world data. In this paper, we propose Proportional Amplitude Spectrum Training Augmentation (PASTA), a simple and effective augmentation strategy to improve out-of-the-box synthetic-t… ▽ More Synthetic data offers the promise of cheap and bountiful training data for settings where labeled real-world data is scarce. However, models trained on synthetic data significantly underperform when evaluated on real-world data. In this paper, we propose Proportional Amplitude Spectrum Training Augmentation (PASTA), a simple and effective augmentation strategy to improve out-of-the-box synthetic-to-real (syn-to-real) generalization performance. PASTA perturbs the amplitude spectra of synthetic images in the Fourier domain to generate augmented views. Specifically, with PASTA we propose a structured perturbation strategy where high-frequency components are perturbed relatively more than the low-frequency ones. For the tasks of semantic segmentation (GTAV-to-Real), object detection (Sim10K-to-Real), and object recognition (VisDA-C Syn-to-Real), across a total of 5 syn-to-real shifts, we find that PASTA outperforms more complex state-of-the-art generalization methods while being complementary to the same. △ Less

Submitted 22 September, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: Accepted at ICCV 2023, Code: https://github.com/prithv1/PASTA

arXiv:2103.12718 [pdf, other]

Self-Supervised Pretraining Improves Self-Supervised Pretraining

Authors: Colorado J. Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, Trevor Darrell

Abstract: While self-supervised pretraining has proven beneficial for many computer vision tasks, it requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. Prior work demonstrates that models pretrained on datasets dissimilar to their target data, such as chest X-ray models trained on ImageNet, underperform models trained from scratch. Users that lack the r… ▽ More While self-supervised pretraining has proven beneficial for many computer vision tasks, it requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. Prior work demonstrates that models pretrained on datasets dissimilar to their target data, such as chest X-ray models trained on ImageNet, underperform models trained from scratch. Users that lack the resources to pretrain must use existing models with lower performance. This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model. Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data. Taken together, HPT provides a simple framework for obtaining better pretrained representations with less computational resources. △ Less

Submitted 24 March, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

arXiv:2009.06494 [pdf, other]

Play Music An HCI Oriented Evaluation of Googles Default Music Player Interface

Authors: Venkatesh Vijaykumar

Abstract: The work embodied in this paper attempts to suggest a few improvements to the playlist creation task interface of the Google Play Music Android application based on recommended practices encountered in the Human-Computer Interaction discipline. The improvements are largely centered on intuitive navigation and selection actions, in order to facilitate a smoother experience in creating, ordering, an… ▽ More The work embodied in this paper attempts to suggest a few improvements to the playlist creation task interface of the Google Play Music Android application based on recommended practices encountered in the Human-Computer Interaction discipline. The improvements are largely centered on intuitive navigation and selection actions, in order to facilitate a smoother experience in creating, ordering, and adding to music playlists. The work records the efforts in need-finding, design brainstorming, and prototype design and evaluation. The work was undertaken over a single design life cycle, and is an attempt at applying recommended practices in HCI to a widely used real world application. △ Less

Submitted 14 September, 2020; originally announced September 2020.

Showing 1–4 of 4 results for author: Vijaykumar, V