Skip to main content

Showing 1–50 of 154 results for author: Kweon, I S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18898  [pdf, other

    cs.CV cs.AI

    360 in the Wild: Dataset for Depth Prediction and View Synthesis

    Authors: Kibaek Park, Francois Rameau, Jaesik Park, In So Kweon

    Abstract: The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.09388  [pdf, other

    cs.CV cs.AI cs.LG

    Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition

    Authors: Youngtaek Oh, Pyunghwan Ahn, **hyung Kim, Gwangmo Song, Soonyoung Lee, In So Kweon, Junmo Kim

    Abstract: Vision and language models (VLMs) such as CLIP have showcased remarkable zero-shot recognition abilities yet face challenges in visio-linguistic compositionality, particularly in linguistic comprehension and fine-grained image-text alignment. This paper explores the intricate relationship between compositionality and recognition -- two pivotal aspects of VLM capability. We conduct a comprehensive… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPRW 2024 on 'What is Next in Multimodal Foundation Models?'. Code: https://github.com/ytaek-oh/vl_compo

  3. arXiv:2406.02541  [pdf, other

    cs.CV

    Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

    Authors: Inkyu Shin, Qihang Yu, Xiaohui Shen, In So Kweon, Kuk-** Yoon, Liang-Chieh Chen

    Abstract: Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailo… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Project page at https://video-3dgs-project.github.io/

  4. arXiv:2403.20225  [pdf, other

    cs.CV

    MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

    Authors: Sanghyun Woo, Kwanyong Park, Inkyu Shin, Myungchul Kim, In So Kweon

    Abstract: Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras. This task has practical applications in various fields, such as visual surveillance, crowd behavior analysis, and anomaly detection. However, due to the difficulty and cost of collecting and labeling data, existing datasets for this task are e… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted on CVPR 2024

  5. arXiv:2403.19985  [pdf, other

    cs.CV

    Stable Surface Regularization for Fast Few-Shot NeRF

    Authors: Byeongin Joung, Byeong-Uk Lee, Jaesung Choe, Ukcheol Shin, Minjun Kang, Taeyeop Lee, In So Kweon, Kuk-** Yoon

    Abstract: This paper proposes an algorithm for synthesizing novel views under few-shot setup. The main concept is to develop a stable surface regularization technique called Annealing Signed Distance Function (ASDF), which anneals the surface in a coarse-to-fine manner to accelerate convergence speed. We observe that the Eikonal loss - which is a widely known geometric regularization - requires dense traini… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 3DV 2024

  6. arXiv:2403.19150  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    Towards Understanding Dual BN In Hybrid Adversarial Training

    Authors: Chenshuang Zhang, Chaoning Zhang, Kang Zhang, Axi Niu, Junmo Kim, In So Kweon

    Abstract: There is a growing concern about applying batch normalization (BN) in adversarial training (AT), especially when the model is trained on both adversarial samples and clean samples (termed Hybrid-AT). With the assumption that adversarial and clean samples are from two different domains, a common practice in prior works is to adopt Dual BN, where BN and BN are used for adversarial and clean branches… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted at TMLR

  7. arXiv:2403.18775  [pdf, other

    cs.CV cs.AI cs.LG

    ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

    Authors: Chenshuang Zhang, Fei Pan, Junmo Kim, In So Kweon, Chengzhi Mao

    Abstract: We establish rigorous benchmarks for visual perception robustness. Synthetic images such as ImageNet-C, ImageNet-9, and Stylized ImageNet provide specific type of evaluation over synthetic corruptions, backgrounds, and textures, yet those robustness benchmarks are restricted in specified variations and have low synthetic quality. In this work, we introduce generative model as a data source for syn… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  8. arXiv:2311.18508  [pdf, other

    eess.IV cs.CV

    DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution

    Authors: Axi Niu, Kang Zhang, Joshua Tian ** Tee, Trung X. Pham, **qiu Sun, Chang D. Yoo, In So Kweon, Yanning Zhang

    Abstract: It is well known the adversarial optimization of GAN-based image super-resolution (SR) methods makes the preceding SR model generate unpleasant and undesirable artifacts, leading to large distortion. We attribute the cause of such distortions to the poor calibration of the discriminator, which hampers its ability to provide meaningful feedback to the generator for learning high-quality images. To… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  9. arXiv:2311.04430  [pdf, other

    eess.IV cs.CV

    Blurry Video Compression: A Trade-off between Visual Enhancement and Data Compression

    Authors: Dawit Mureja Argaw, Junsik Kim, In So Kweon

    Abstract: Existing video compression (VC) methods primarily aim to reduce the spatial and temporal redundancies between consecutive frames in a video while preserving its quality. In this regard, previous works have achieved remarkable results on videos acquired under specific settings such as instant (known) exposure time and shutter speed which often result in sharp videos. However, when these methods are… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted to WACV 2024

  10. arXiv:2309.11711  [pdf, other

    cs.CV

    MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation

    Authors: Fei Pan, Xu Yin, Seokju Lee, Axi Niu, Sungeui Yoon, In So Kweon

    Abstract: Unsupervised domain adaptation (UDA) has been a potent technique to handle the lack of annotations in the target domain, particularly in semantic segmentation task. This study introduces a different UDA scenarios where the target domain contains unlabeled video frames. Drawing upon recent advancements of self-supervised learning of the object motion from unlabeled videos with geometric constraint,… ▽ More

    Submitted 15 April, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: CVPR 2024 Workshop on Learning with Limited Labelled Data for Image and Video Understanding. Best Paper Award

  11. arXiv:2309.01961  [pdf, other

    cs.CV

    NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

    Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-** Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

    Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More

    Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Tech report, project page https://nice.lgresearch.ai/

  12. arXiv:2308.09775  [pdf, other

    cs.CV

    Long-range Multimodal Pretraining for Movie Understanding

    Authors: Dawit Mureja Argaw, Joon-Young Lee, Markus Woodson, In So Kweon, Fabian Caba Heilbron

    Abstract: Learning computer vision models from (and for) movies has a long-standing history. While great progress has been attained, there is still a need for a pretrained multimodal model that can perform well in the ever-growing set of movie understanding tasks the community has been establishing. In this work, we introduce Long-range Multimodal Pretraining, a strategy, and a model that leverages movie da… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023

  13. arXiv:2307.00781  [pdf, other

    cs.CV eess.IV

    ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution

    Authors: Axi Niu, Pham Xuan Trung, Kang Zhang, **qiu Sun, Yu Zhu, In So Kweon, Yanning Zhang

    Abstract: Diffusion models have gained significant popularity in the field of image-to-image translation. Previous efforts applying diffusion models to image super-resolution (SR) have demonstrated that iteratively refining pure Gaussian noise using a U-Net architecture trained on denoising at various noise levels can yield satisfactory high-resolution images from low-resolution inputs. However, this iterat… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: text overlap with arXiv:2302.12831

  14. arXiv:2305.18547  [pdf, other

    cs.CV

    Learning from Multi-Perception Features for Real-Word Image Super-resolution

    Authors: Axi Niu, Kang Zhang, Trung X. Pham, Pei Wang, **qiu Sun, In So Kweon, Yanning Zhang

    Abstract: Currently, there are two popular approaches for addressing real-world image super-resolution problems: degradation-estimation-based and blind-based methods. However, degradation-estimation-based methods may be inaccurate in estimating the degradation, making them less applicable to real-world LR images. On the other hand, blind-based methods are often limited by their fixed single perception infor… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  15. arXiv:2305.00866  [pdf, other

    cs.CV cs.AI

    Attack-SAM: Towards Attacking Segment Anything Model With Adversarial Examples

    Authors: Chenshuang Zhang, Chaoning Zhang, Taegoo Kang, Donghun Kim, Sung-Ho Bae, In So Kweon

    Abstract: Segment Anything Model (SAM) has attracted significant attention recently, due to its impressive performance on various downstream tasks in a zero-short manner. Computer vision (CV) area might follow the natural language processing (NLP) area to embark on a path from task-specific vision models toward foundation models. However, deep vision models are widely recognized as vulnerable to adversarial… ▽ More

    Submitted 8 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: The first work to attack Segment Anything Model with adversarial examples

  16. arXiv:2304.06488  [pdf, other

    cs.CY cs.AI cs.CL cs.CV cs.LG

    One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era

    Authors: Chaoning Zhang, Chenshuang Zhang, Chenghao Li, Yu Qiao, Sheng Zheng, Sumit Kumar Dam, Mengchun Zhang, Jung Uk Kim, Seong Tae Kim, **woo Choi, Gyeong-Moon Park, Sung-Ho Bae, Lik-Hang Lee, Pan Hui, In So Kweon, Choong Seon Hong

    Abstract: OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is demonstrated to be one small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI). Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. Such unprecedented attention has also motivated numerous researchers to investigate ChatGPT… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: A Survey on ChatGPT and GPT-4, 29 pages. Feedback is appreciated ([email protected])

  17. arXiv:2304.04694  [pdf, other

    cs.CV

    Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

    Authors: Inkyu Shin, Dahun Kim, Qihang Yu, Jun Xie, Hong-Seok Kim, Bradley Green, In So Kweon, Kuk-** Yoon, Liang-Chieh Chen

    Abstract: Video Panoptic Segmentation (VPS) aims to achieve comprehensive pixel-level scene understanding by segmenting all pixels and associating objects in a video. Current solutions can be categorized into online and near-online approaches. Evolving over the time, each category has its own specialized designs, making it nontrivial to adapt models between different categories. To alleviate the discrepancy… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

  18. arXiv:2303.17517  [pdf, other

    cs.CL cs.CV cs.SD eess.AS

    Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

    Authors: Hyeonggon Ryu, Arda Senocak, In So Kweon, Joon Son Chung

    Abstract: The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective. Bilingual VGS models are generally trained with an equal number of spoken captions from both languages. However, in reality, there can be an imbalance among the languages for the available spoken captions. Our key contribution in this work is to leverage the power of a high… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  19. arXiv:2303.17386  [pdf, other

    cs.CV cs.AI cs.RO

    Complementary Random Masking for RGB-Thermal Semantic Segmentation

    Authors: Ukcheol Shin, Kyunghyun Lee, In So Kweon, Jean Oh

    Abstract: RGB-thermal semantic segmentation is one potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions. However, the previous studies mostly focus on designing a multi-modal fusion module without consideration of the nature of multi-modality inputs. Therefore, the networks easily become over-reliant on a single modality, making it difficult to learn… ▽ More

    Submitted 4 March, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: ICRA 2024, Our source code is available at https://github.com/UkcheolShin/CRM_RGBTSeg

  20. arXiv:2303.16730  [pdf, other

    cs.CV

    TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation

    Authors: Taeyeop Lee, Jonathan Tremblay, Valts Blukis, Bowen Wen, Byeong-Uk Lee, Inkyu Shin, Stan Birchfield, In So Kweon, Kuk-** Yoon

    Abstract: Test-time adaptation methods have been gaining attention recently as a practical solution for addressing source-to-target domain gaps by gradually updating the model without requiring labels on the target data. In this paper, we propose a method of test-time adaptation for category-level object pose estimation called TTA-COPE. We design a pose ensemble approach with a self-training loss using pose… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023, Project page: https://taeyeop.com/ttacope

  21. arXiv:2303.13336  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

    Authors: Chenshuang Zhang, Chaoning Zhang, Sheng Zheng, Mengchun Zhang, Maryam Qamar, Sung-Ho Bae, In So Kweon

    Abstract: Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction. With the diffusion model as the most popular generative model, numerous works have attempted two active tasks: text to speech and speech enhancement. This work conducts a survey on audio diffusion model, which is complementary to existing surveys that either lack the r… ▽ More

    Submitted 2 April, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: 18 pages

  22. arXiv:2303.11771  [pdf, other

    cs.CV

    Self-Sufficient Framework for Continuous Sign Language Recognition

    Authors: Youngjoon Jang, Youngtaek Oh, Jae Won Cho, Myungchul Kim, Dong-** Kim, In So Kweon, Joon Son Chung

    Abstract: The goal of this work is to develop self-sufficient framework for Continuous Sign Language Recognition (CSLR) that addresses key issues of sign language recognition. These include the need for complex multi-scale features such as hands, face, and mouth for understanding, and absence of frame-level annotations. To this end, we propose (1) Divide and Focus Convolution (DFConv) which extracts both ma… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

  23. arXiv:2303.11717  [pdf, other

    cs.AI cs.CV cs.LG cs.MM

    A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

    Authors: Chaoning Zhang, Chenshuang Zhang, Sheng Zheng, Yu Qiao, Chenghao Li, Mengchun Zhang, Sumit Kumar Dam, Chu Myaet Thwal, Ye Lin Tun, Le Luang Huy, Donguk kim, Sung-Ho Bae, Lik-Hang Lee, Yang Yang, Heng Tao Shen, In So Kweon, Choong Seon Hong

    Abstract: As ChatGPT goes viral, generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond. With such overwhelming media coverage, it is almost impossible for us to miss the opportunity to glimpse AIGC from a certain angle. In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT,… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: 56 pages, 548 citations

  24. arXiv:2303.07909  [pdf, other

    cs.CV cs.AI cs.LG

    Text-to-image Diffusion Models in Generative AI: A Survey

    Authors: Chenshuang Zhang, Chaoning Zhang, Mengchun Zhang, In So Kweon

    Abstract: This survey reviews text-to-image diffusion models in the context that diffusion models have emerged to be popular for a wide range of generative tasks. As a self-contained work, this survey starts with a brief introduction of how a basic diffusion model works for image synthesis, followed by how condition or guidance improves learning. Based on that, we present a review of state-of-the-art method… ▽ More

    Submitted 2 April, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: First survey on the recent progress of text-to-image generation based on the diffusion model (under progress)

  25. arXiv:2303.01904  [pdf, other

    cs.CV

    EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization

    Authors: Junha Song, Jungsoo Lee, In So Kweon, Sungha Choi

    Abstract: This paper presents a simple yet effective approach that improves continual test-time adaptation (TTA) in a memory-efficient manner. TTA may primarily be conducted on edge devices with limited memory, so reducing memory is crucial but has been overlooked in previous TTA studies. In addition, long-term adaptation often leads to catastrophic forgetting and error accumulation, which hinders applying… ▽ More

    Submitted 23 May, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023, Project page: https://sites.google.com/view/junha/ecotta

  26. arXiv:2302.12831  [pdf, other

    eess.IV cs.CV

    CDPMSR: Conditional Diffusion Probabilistic Models for Single Image Super-Resolution

    Authors: Axi Niu, Kang Zhang, Trung X. Pham, **qiu Sun, Yu Zhu, In So Kweon, Yanning Zhang

    Abstract: Diffusion probabilistic models (DPM) have been widely adopted in image-to-image translation to generate high-quality images. Prior attempts at applying the DPM to image super-resolution (SR) have shown that iteratively refining a pure Gaussian noise with a conditional image using a U-Net trained on denoising at various-level noises can help obtain a satisfied high-resolution image for the low-reso… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: 4 pages, 4 figures

  27. arXiv:2301.11174  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data

    Authors: Dong-** Kim, Tae-Hyun Oh, **soo Choi, In So Kweon

    Abstract: We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is an expensive task in terms of labor, time, and cost. In contrast to manually annotating all the training samples, separately collecting uni-modal datasets is immensely easier, e.g., a large-scale image dataset and a sen… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Journal extension of our EMNLP 2019 paper (arXiv:1909.02201)

  28. arXiv:2301.00808  [pdf, other

    cs.CV

    ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

    Authors: Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie

    Abstract: Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can a… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

    Comments: Code and models available at https://github.com/facebookresearch/ConvNeXt-V2

  29. arXiv:2212.10149  [pdf, other

    cs.CV

    Tracking by Associating Clips

    Authors: Sanghyun Woo, Kwanyong Park, Seoung Wug Oh, In So Kweon, Joon-Young Lee

    Abstract: The tracking-by-detection paradigm today has become the dominant method for multi-object tracking and works by detecting objects in each frame and then performing data association across frames. However, its sequential frame-wise matching property fundamentally suffers from the intermediate interruptions in a video, such as object occlusions, fast camera movements, and abrupt light changes. Moreov… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: ECCV 2022

  30. arXiv:2212.10147  [pdf, other

    cs.CV

    Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection

    Authors: Sanghyun Woo, Kwanyong Park, Seoung Wug Oh, In So Kweon, Joon-Young Lee

    Abstract: Scaling object taxonomies is one of the important steps toward a robust real-world deployment of recognition systems. We have faced remarkable progress in images since the introduction of the LVIS benchmark. To continue this success in videos, a new video benchmark, TAO, was recently presented. Given the recent encouraging results from both detection and tracking communities, we are interested in… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: ECCV 2022

  31. arXiv:2212.08356  [pdf, other

    cs.CV

    Test-time Adaptation in the Dynamic World with Compound Domain Knowledge Management

    Authors: Junha Song, Kwanyong Park, InKyu Shin, Sanghyun Woo, Chaoning Zhang, In So Kweon

    Abstract: Prior to the deployment of robotic systems, pre-training the deep-recognition models on all potential visual cases is infeasible in practice. Hence, test-time adaptation (TTA) allows the model to adapt itself to novel environments and improve its performance during test time (i.e., lifelong adaptation). Several works for TTA have shown promising adaptation performances in continuously changing env… ▽ More

    Submitted 15 April, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: 8 pages

  32. arXiv:2212.08355  [pdf, other

    cs.CV

    Learning Classifiers of Prototypes and Reciprocal Points for Universal Domain Adaptation

    Authors: Sungsu Hur, Inkyu Shin, Kwanyong Park, Sanghyun Woo, In So Kweon

    Abstract: Universal Domain Adaptation aims to transfer the knowledge between the datasets by handling two shifts: domain-shift and category-shift. The main challenge is correctly distinguishing the unknown target samples while adapting the distribution of known class knowledge from source to target. Most existing methods approach this problem by first training the target adapted known classifier and then re… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted at WACV 2023

  33. arXiv:2211.11432  [pdf, other

    cs.CV

    MATE: Masked Autoencoders are Online 3D Test-Time Learners

    Authors: M. Jehanzeb Mirza, Inkyu Shin, Wei Lin, Andreas Schriebl, Kunyang Sun, Jaesung Choe, Horst Possegger, Mateusz Kozinski, In So Kweon, Kun-** Yoon, Horst Bischof

    Abstract: Our MATE is the first Test-Time-Training (TTT) method designed for 3D data, which makes deep networks trained for point cloud classification robust to distribution shifts occurring in test data. Like existing TTT methods from the 2D image domain, MATE also leverages test data for adaptation. Its test-time objective is that of a Masked Autoencoder: a large portion of each test point cloud is remove… ▽ More

    Submitted 20 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Code is available at this repository: https://github.com/jmiemirza/MATE

  34. arXiv:2211.00448  [pdf, other

    cs.CV

    Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition

    Authors: Youngjoon Jang, Youngtaek Oh, Jae Won Cho, Dong-** Kim, Joon Son Chung, In So Kweon

    Abstract: The goal of this work is background-robust continuous sign language recognition. Most existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed backgrounds and are filmed in studios with a static monochromatic background. However, signing is not limited only to studios in the real world. In order to analyze the robustness of CSLR models under background shifts, we first evaluate e… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: Our dataset is available at https://github.com/art-jang/Signing-Outside-the-Studio

  35. arXiv:2210.12126  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    One-Shot Neural Fields for 3D Object Understanding

    Authors: Valts Blukis, Taeyeop Lee, Jonathan Tremblay, Bowen Wen, In So Kweon, Kuk-** Yoon, Dieter Fox, Stan Birchfield

    Abstract: We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from… ▽ More

    Submitted 8 August, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) on XRNeRF: Advances in NeRF for the Metaverse 2023

  36. arXiv:2209.05771  [pdf, other

    eess.IV cs.CV

    Moving from 2D to 3D: volumetric medical image classification for rectal cancer staging

    Authors: Joohyung Lee, Jieun Oh, Inkyu Shin, You-sung Kim, Dae Kyung Sohn, Tae-sung Kim, In So Kweon

    Abstract: Volumetric images from Magnetic Resonance Imaging (MRI) provide invaluable information in preoperative staging of rectal cancer. Above all, accurate preoperative discrimination between T2 and T3 stages is arguably both the most challenging and clinically significant task for rectal cancer treatment, as chemo-radiotherapy is usually recommended to patients with T3 (or greater) stage cancer. In this… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 11 pages, 2 figures, accepted to MICCAI 2022

  37. arXiv:2208.01924  [pdf, other

    cs.CV

    Per-Clip Video Object Segmentation

    Authors: Kwanyong Park, Sanghyun Woo, Seoung Wug Oh, In So Kweon, Joon-Young Lee

    Abstract: Recently, memory-based approaches show promising results on semi-supervised video object segmentation. These methods predict object masks frame-by-frame with the help of frequently updated memory of the previous mask. Different from this per-frame inference, we investigate an alternative perspective by treating video object segmentation as clip-wise mask propagation. In this per-clip inference sch… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: CVPR 2022; Code is available at https://github.com/pkyong95/PCVOS

  38. arXiv:2208.00690  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Generative Bias for Robust Visual Question Answering

    Authors: Jae Won Cho, Dong-** Kim, Hyeonggon Ryu, In So Kweon

    Abstract: The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from th… ▽ More

    Submitted 22 March, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

    Comments: CVPR 2023

  39. arXiv:2208.00173  [pdf, other

    cs.CV cs.AI cs.LG

    A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond

    Authors: Chaoning Zhang, Chenshuang Zhang, Junha Song, John Seon Keun Yi, Kang Zhang, In So Kweon

    Abstract: Masked autoencoders are scalable vision learners, as the title of MAE \cite{he2022masked}, which suggests that self-supervised learning (SSL) in vision might undertake a similar trajectory as in NLP. Specifically, generative pretext tasks with the masked prediction (e.g., BERT) have become a de facto standard SSL practice in NLP. By contrast, early attempts at generative methods in vision have bee… ▽ More

    Submitted 30 July, 2022; originally announced August 2022.

    Comments: First survey on masked autoencoder (under progress)

  40. arXiv:2207.10899  [pdf, other

    cs.CV cs.AI cs.LG

    Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness

    Authors: Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Axi Niu, Jiu Feng, Chang D. Yoo, In So Kweon

    Abstract: Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields. Integrating AT into SSL, multiple prior works have accomplished a highly significant yet challenging task: learning robust representation without labels. A widely used framework is adversarial contrastive learning which couples AT… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV 2022 oral presentation

  41. arXiv:2207.09812  [pdf, other

    cs.CV

    The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing

    Authors: Dawit Mureja Argaw, Fabian Caba Heilbron, Joon-Young Lee, Markus Woodson, In So Kweon

    Abstract: Machine learning is transforming the video editing industry. Recent advances in computer vision have leveled-up video editing tasks such as intelligent reframing, rotosco**, color grading, or applying digital makeups. However, most of the solutions have focused on video manipulation and VFX. This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-ass… ▽ More

    Submitted 21 July, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Code is available at: https://github.com/dawitmureja/AVE.git

  42. arXiv:2207.09045  [pdf, other

    cs.CV

    ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation

    Authors: Fei Pan, Sungsu Hur, Seokju Lee, Junsik Kim, In So Kweon

    Abstract: Open compound domain adaptation (OCDA) considers the target domain as the compound of multiple unknown homogeneous subdomains. The goal of OCDA is to minimize the domain gap between the labeled source domain and the unlabeled compound target domain, which benefits the model generalization to the unseen domains. Current OCDA for semantic segmentation methods adopt manual domain separation and emplo… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  43. arXiv:2207.03081  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.IV

    DRL-ISP: Multi-Objective Camera ISP with Deep Reinforcement Learning

    Authors: Ukcheol Shin, Kyunghyun Lee, In So Kweon

    Abstract: In this paper, we propose a multi-objective camera ISP framework that utilizes Deep Reinforcement Learning (DRL) and camera ISP toolbox that consist of network-based and conventional ISP tools. The proposed DRL-based camera ISP framework iteratively selects a proper tool from the toolbox and applies it to the image to maximize a given vision task-specific reward function. For this purpose, we impl… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022 (*First two authors are equal contributed)

  44. arXiv:2206.00181  [pdf, other

    cs.CV

    Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

    Authors: Fei Pan, Francois Rameau, Junsik Kim, In So Kweon

    Abstract: Training models dedicated to semantic segmentation requires a large amount of pixel-wise annotated data. Due to their costly nature, these annotations might not be available for the task at hand. To alleviate this problem, unsupervised domain adaptation approaches aim at aligning the feature distributions between the labeled source and the unlabeled target data. While these strategies lead to noti… ▽ More

    Submitted 4 June, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

  45. arXiv:2205.15361  [pdf, other

    cs.CV

    TubeFormer-DeepLab: Video Mask Transformer

    Authors: Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen

    Abstract: We present TubeFormer-DeepLab, the first attempt to tackle multiple core video segmentation tasks in a unified manner. Different video segmentation tasks (e.g., video semantic/instance/panoptic segmentation) are usually considered as distinct problems. State-of-the-art models adopted in the separate communities have diverged, and radically different approaches dominate in each task. By contrast, w… ▽ More

    Submitted 5 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: CVPR 2022; arXiv v2: add results on VIPSeg val/test sets and VSPW new test set

  46. arXiv:2204.12667  [pdf, other

    cs.CV

    MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

    Authors: Inkyu Shin, Yi-Hsuan Tsai, Bingbing Zhuang, Samuel Schulter, Buyu Liu, Sparsh Garg, In So Kweon, Kuk-** Yoon

    Abstract: Test-time adaptation approaches have recently emerged as a practical solution for handling domain shift without access to the source domain data. In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. We find that directly applying existing methods usually results in performance instability at test time because multi-modal input is n… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: CVPR 2022

  47. arXiv:2204.00089  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    Investigating Top-$k$ White-Box and Transferable Black-box Attack

    Authors: Chaoning Zhang, Philipp Benz, Adil Karjauv, Jae Won Cho, Kang Zhang, In So Kweon

    Abstract: Existing works have identified the limitation of top-$1$ attack success rate (ASR) as a metric to evaluate the attack strength but exclusively investigated it in the white-box setting, while our work extends it to a more practical black-box setting: transferable attack. It is widely reported that stronger I-FGSM transfers worse than simple FGSM, leading to a popular belief that transferability is… ▽ More

    Submitted 30 March, 2022; originally announced April 2022.

    Comments: Accepted by CVPR2022

  48. arXiv:2203.17248  [pdf, other

    cs.LG cs.AI

    Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo

    Authors: Chaoning Zhang, Kang Zhang, Trung X. Pham, Axi Niu, Zhinan Qiao, Chang D. Yoo, In So Kweon

    Abstract: Contrastive learning (CL) is widely known to require many negative samples, 65536 in MoCo for instance, for which the performance of a dictionary-free framework is often inferior because the negative sample size (NSS) is limited by its mini-batch size (MBS). To decouple the NSS from the MBS, a dynamic dictionary has been adopted in a large volume of CL frameworks, among which arguably the most pop… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR2022

  49. arXiv:2203.16262  [pdf, other

    cs.LG cs.AI

    How Does SimSiam Avoid Collapse Without Negative Samples? A Unified Understanding with Self-supervised Contrastive Learning

    Authors: Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Trung X. Pham, Chang D. Yoo, In So Kweon

    Abstract: To avoid collapse in self-supervised learning (SSL), a contrastive loss is widely used but often requires a large number of negative samples. Without negative samples yet achieving competitive performance, a recent work has attracted significant attention for providing a minimalist simple Siamese (SimSiam) method to avoid collapse. However, the reason for how it avoids collapse without negative sa… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: accepted on ICLR 2022

  50. arXiv:2203.15427  [pdf, other

    cs.CV

    Long-term Video Frame Interpolation via Feature Propagation

    Authors: Dawit Mureja Argaw, In So Kweon

    Abstract: Video frame interpolation (VFI) works generally predict intermediate frame(s) by first estimating the motion between inputs and then war** the inputs to the target time with the estimated motion. This approach, however, is not optimal when the temporal distance between the input sequence increases as existing motion estimation modules cannot effectively handle large motions. Hence, VFI works per… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022