Skip to main content

Showing 1–50 of 58 results for author: Yoo, C D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08380  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Unsupervised Speech Recognition Without Pronunciation Models

    Authors: Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of develo** ASR systems without paired speech and text corpora by pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  2. arXiv:2406.06044  [pdf, other

    cs.CV

    FRAG: Frequency Adapting Group for Diffusion Video Editing

    Authors: Sunjae Yoon, Gwanhyeong Koo, Geonwoo Kim, Chang D. Yoo

    Abstract: In video editing, the hallmark of a quality edit lies in its consistent and unobtrusive adjustment. Modification, when integrated, must be smooth and subtle, preserving the natural flow and aligning seamlessly with the original vision. Therefore, our primary focus is on overcoming the current challenges in high quality edit to ensure that each edit enhances the final product without disrupting its… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 16 pages, 16 figures, ICML 2024

  3. arXiv:2405.11206  [pdf, other

    cs.LG cs.AI cs.RO

    Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses

    Authors: Thanh Nguyen, Tung M. Luu, Tri Ton, Chang D. Yoo

    Abstract: Offline reinforcement learning (RL) addresses the challenge of expensive and high-risk data exploration inherent in RL by pre-training policies on vast amounts of offline data, enabling direct deployment or fine-tuning in real-world environments. However, this training paradigm can compromise policy robustness, leading to degraded performance in practical conditions due to observation perturbation… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  4. arXiv:2403.14119  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion

    Authors: Hee Suk Yoon, Eunseop Yoon, Joshua Tian ** Tee, Mark Hasegawa-Johnson, Yingzhen Li, Chang D. Yoo

    Abstract: In deep learning, test-time adaptation has gained attention as a method for model fine-tuning without the need for labeled data. A prime exemplification is the recently proposed test-time prompt tuning for large-scale vision-language models such as CLIP. Unfortunately, these prompts have been mainly developed to improve accuracy, overlooking the importance of calibration, which is a crucial aspect… ▽ More

    Submitted 31 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  5. arXiv:2402.01516  [pdf, other

    cs.CV

    Cross-view Masked Diffusion Transformers for Person Image Synthesis

    Authors: Trung X. Pham, Zhang Kang, Chang D. Yoo

    Abstract: We present X-MDPT ($\underline{Cross}$-view $\underline{M}$asked $\underline{D}$iffusion $\underline{P}$rediction $\underline{T}$ransformers), a novel diffusion model designed for pose-guided human image generation. X-MDPT distinguishes itself by employing masked diffusion transformers that operate on latent patches, a departure from the commonly-used Unet structures in existing works. The model c… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  6. arXiv:2401.09794  [pdf, other

    cs.CV

    Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing

    Authors: Gwanhyeong Koo, Sunjae Yoon, Chang D. Yoo

    Abstract: In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the NTI process is time-consuming, taking more than two minutes per image. To address this, we introduce an innovative method that maintains the principles of the NTI while accelerating th… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: The International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024

  7. arXiv:2401.09787  [pdf, other

    cs.LG cs.AI stat.ML

    Querying Easily Flip-flopped Samples for Deep Active Learning

    Authors: Seong ** Cho, Gwangsu Kim, Junghyun Lee, **woo Shin, Chang D. Yoo

    Abstract: Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. The sample's distance to the decision boundary is a natural measure of predictive uncertainty… ▽ More

    Submitted 16 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 34 pages, 17 figures, 5 tables. Accepted to the 12th International Conference on Learning Representations (ICLR 2024) (ver2: fixed some typos and improved some parts of the writing)

  8. arXiv:2312.11973  [pdf, other

    cs.CV cs.AI cs.LG

    Continual Learning: Forget-free Winning Subnetworks for Video Representations

    Authors: Haeyong Kang, Jaehong Yoon, Sung Ju Hwang, Chang D. Yoo

    Abstract: Inspired by the Lottery Ticket Hypothesis (LTH), which highlights the existence of efficient subnetworks within larger, dense networks, a high-performing Winning Subnetwork (WSN) in terms of task performance under appropriate sparsity conditions is considered for various continual learning tasks. It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incrementa… ▽ More

    Submitted 2 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.14962, arXiv:2306.11305

  9. arXiv:2312.09736  [pdf, other

    cs.CL cs.SD eess.AS

    HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue

    Authors: Sunjae Yoon, Dahyun Kim, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chnag D. Yoo

    Abstract: Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in develo** VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information f… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: EMNLP 2023, 14 pages, 13 figures

  10. arXiv:2312.06708  [pdf, other

    cs.CV

    Neutral Editing Framework for Diffusion-based Video Editing

    Authors: Sunjae Yoon, Gwanhyeong Koo, Ji Woo Hong, Chang D. Yoo

    Abstract: Text-conditioned image editing has succeeded in various types of editing based on a diffusion framework. Unfortunately, this success did not carry over to a video, which continues to be challenging. Existing video editing systems are still limited to rigid-type editing such as style transfer and object overlay. To this end, this paper proposes Neutral Editing (NeuEdit) framework to enable complex… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 18 pages, 14 figures

  11. arXiv:2312.05790  [pdf, other

    cs.LG cs.AI eess.SP

    SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation

    Authors: Hyun Ryu, Sunjae Yoon, Hee Suk Yoon, Eunseop Yoon, Chang D. Yoo

    Abstract: Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  12. arXiv:2312.05496  [pdf, other

    cs.CR cs.LG

    Flexible Cross-Modal Steganography via Implicit Representations

    Authors: Seoyun Yang, Sojeong Song, Chang D. Yoo, Junmo Kim

    Abstract: We present INRSteg, an innovative lossless steganography framework based on a novel data form Implicit Neural Representations (INR) that is modal-agnostic. Our framework is considered for effectively hiding multiple data without altering the original INR ensuring high-quality stego data. The neural representations of secret data are first concatenated to have independent paths that do not overlap,… ▽ More

    Submitted 12 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  13. arXiv:2312.02103  [pdf, other

    cs.CV

    Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection

    Authors: Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo

    Abstract: Open-vocabulary object detection (OVOD) has recently gained significant attention as a crucial step toward achieving human-like visual intelligence. Existing OVOD methods extend target vocabulary from pre-defined categories to open-world by transferring knowledge of arbitrary concepts from vision-language pre-training models to the detectors. While previous methods have shown remarkable successes,… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  14. arXiv:2311.18508  [pdf, other

    eess.IV cs.CV

    DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution

    Authors: Axi Niu, Kang Zhang, Joshua Tian ** Tee, Trung X. Pham, **qiu Sun, Chang D. Yoo, In So Kweon, Yanning Zhang

    Abstract: It is well known the adversarial optimization of GAN-based image super-resolution (SR) methods makes the preceding SR model generate unpleasant and undesirable artifacts, leading to large distortion. We attribute the cause of such distortions to the poor calibration of the discriminator, which hampers its ability to provide meaningful feedback to the generator for learning high-quality images. To… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  15. arXiv:2310.05241  [pdf, other

    cs.CV

    SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval

    Authors: Sunjae Yoon, Gwanhyeong Koo, Dahyun Kim, Chang D. Yoo

    Abstract: Video moment retrieval aims to localize moments in video corresponding to a given language query. To avoid the expensive cost of annotating the temporal moments, weakly-supervised VMR (wsVMR) systems have been studied. For such systems, generating a number of proposals as moment candidates and then selecting the most appropriate proposal has been a popular approach. These proposals are assumed to… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 11 pages, Accepted in ICCV 2023

  16. arXiv:2310.02382  [pdf, other

    cs.CL cs.SD eess.AS

    Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching

    Authors: Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To tackle these challenges, we introduce a novel ASR system, ESPUM. This system harnesses the power of lower-order N-skipgrams (up to N=3) combined with positional unigram statistics gathered from a small batch of samples. Eva… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  17. DimCL: Dimensional Contrastive Learning For Improving Self-Supervised Learning

    Authors: Thanh Nguyen, Trung Pham, Chaoning Zhang, Tung Luu, Thang Vu, Chang D. Yoo

    Abstract: Self-supervised learning (SSL) has gained remarkable success, for which contrastive learning (CL) plays a key role. However, the recent development of new non-CL frameworks has achieved comparable or better performance with high improvement potential, prompting researchers to enhance these frameworks further. Assimilating CL into non-CL frameworks has been thought to be beneficial, but empirical e… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Journal ref: IEEE Access 2023

  18. arXiv:2308.08442  [pdf, other

    cs.CL cs.SD eess.AS

    Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

    Authors: Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo

    Abstract: Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or parag… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023

  19. arXiv:2306.11305  [pdf, other

    cs.CV cs.AI cs.LG

    Progressive Fourier Neural Representation for Sequential Video Compilation

    Authors: Haeyong Kang, Jaehong Yoon, DaHyun Kim, Sung Ju Hwang, Chang D Yoo

    Abstract: Neural Implicit Representation (NIR) has recently gained significant attention due to its remarkable ability to encode complex and high-dimensional data into representation space and easily reconstruct it through a trainable map** function. However, NIR methods assume a one-to-one map** between the target data and representation models regardless of data relevancy or similarity. This results i… ▽ More

    Submitted 6 February, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

  20. arXiv:2306.07926  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    A Theory of Unsupervised Speech Recognition

    Authors: Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Unsupervised speech recognition (ASR-U) is the problem of learning automatic speech recognition (ASR) systems from unpaired speech-only and text-only corpora. While various algorithms exist to solve this problem, a theoretical framework is missing from studying their properties and addressing such issues as sensitivity to hyperparameters and training instability. In this paper, we proposed a gener… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  21. arXiv:2305.16371  [pdf, other

    cs.CL cs.SD eess.AS

    INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition

    Authors: Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Automatic Speech Recognition (ASR) systems have attained unprecedented performance with large speech models pre-trained based on self-supervised speech representation learning. However, these pre-trained speech models suffer from representational bias as they tend to better represent those prominent accents (i.e., native (L1) English accent) in the pre-training speech corpus than less represented… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: ACL2023

  22. arXiv:2303.14962  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Forget-free Continual Learning with Soft-Winning SubNetworks

    Authors: Haeyong Kang, Jaehong Yoon, Sultan Rizky Madjid, Sung Ju Hwang, Chang D. Yoo

    Abstract: Inspired by Regularized Lottery Ticket Hypothesis (RLTH), which states that competitive smooth (non-binary) subnetworks exist within a dense network in continual learning tasks, we investigate two proposed architecture-based continual learning methods which sequentially learn and select adaptive binary- (WSN) and non-binary Soft-Subnetworks (SoftNet) for each task. WSN and SoftNet jointly learn th… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: arXiv admin note: text overlap with arXiv:2209.07529

  23. Maximum margin learning of t-SPNs for cell classification with filtered input

    Authors: Haeyong Kang, Chang D. Yoo, Yongcheon Na

    Abstract: An algorithm based on a deep probabilistic architecture referred to as a tree-structured sum-product network (t-SPN) is considered for cell classification. The t-SPN is constructed such that the unnormalized probability is represented as conditional probabilities of a subset of most similar cell classes. The constructed t-SPN architecture is learned by maximizing the margin, which is the differenc… ▽ More

    Submitted 20 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Report number: 15728223

    Journal ref: IEEE Journal of Selected Topics in Signal Processing ( Volume: 10, Issue: 1, February 2016)

  24. arXiv:2303.02472  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure

    Authors: Hee Suk Yoon, Joshua Tian ** Tee, Eunseop Yoon, Sunjae Yoon, Gwangsu Kim, Yingzhen Li, Chang D. Yoo

    Abstract: Studies have shown that modern neural networks tend to be poorly calibrated due to over-confident predictions. Traditionally, post-processing methods have been used to calibrate the model after training. In recent years, various trainable calibration measures have been proposed to incorporate them directly into the training process. However, these methods all incorporate internal hyperparameters,… ▽ More

    Submitted 18 January, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: ICLR 2023

  25. Skew Class-balanced Re-weighting for Unbiased Scene Graph Generation

    Authors: Haeyong Kang, Chang D. Yoo

    Abstract: An unbiased scene graph generation (SGG) algorithm referred to as Skew Class-balanced Re-weighting (SCR) is proposed for considering the unbiased predicate prediction caused by the long-tailed distribution. The prior works focus mainly on alleviating the deteriorating performances of the minority predicate predictions, showing drastic drop** recall scores, i.e., losing the majority predicate per… ▽ More

    Submitted 13 March, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

    Journal ref: Mach. Learn. Knowl. Extr. 2023, 5(1), 287-303

  26. arXiv:2212.07072  [pdf, other

    cs.CL cs.LG

    SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation

    Authors: Hee Suk Yoon, Eunseop Yoon, John Harvill, Sunjae Yoon, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Word Sense Disambiguation (WSD) is an NLP task aimed at determining the correct sense of a word in a sentence from discrete sense choices. Although current systems have attained unprecedented performances for such tasks, the nonuniform distribution of word senses during training generally results in systems performing poorly on rare senses. To this end, we consider data augmentation to increase th… ▽ More

    Submitted 21 December, 2022; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: EMNLP2022

  27. Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue

    Authors: Sunjae Yoon, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Chang D. Yoo

    Abstract: Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to lear… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: 12 pages, Accepted in EMNLP 2022

  28. arXiv:2211.09861  [pdf, other

    cs.CV

    Self-Supervised Visual Representation Learning via Residual Momentum

    Authors: Trung X. Pham, Axi Niu, Zhang Kang, Sultan Rizky Madjid, Ji Woo Hong, Daehyeok Kim, Joshua Tian ** Tee, Chang D. Yoo

    Abstract: Self-supervised learning (SSL) approaches have shown promising capabilities in learning the representation from unlabeled data. Amongst them, momentum-based frameworks have attracted significant attention. Despite being a great success, these momentum-based SSL frameworks suffer from a large gap in representation between the online encoder (student) and the momentum encoder (teacher), which hinder… ▽ More

    Submitted 21 November, 2022; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: 18 pages, 16 figures

  29. Selective Query-guided Debiasing for Video Corpus Moment Retrieval

    Authors: Sunjae Yoon, Ji Woo Hong, Eunseop Yoon, Dahyun Kim, Junyeong Kim, Hee Suk Yoon, Chang D. Yoo

    Abstract: Video moment retrieval (VMR) aims to localize target moments in untrimmed videos pertinent to a given textual query. Existing retrieval systems tend to rely on retrieval bias as a shortcut and thus, fail to sufficiently learn multi-modal interactions between query and video. This retrieval bias stems from learning frequent co-occurrence patterns between query and moments, which spuriously correlat… ▽ More

    Submitted 26 November, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: 16 pages, 6 figures, Accepted in ECCV 2022

    Journal ref: In European Conference on Computer Vision (pp. 185-200). Springer, Cham (2022)

  30. arXiv:2210.08282  [pdf, other

    cs.CV

    LAD: A Hybrid Deep Learning System for Benign Paroxysmal Positional Vertigo Disorders Diagnostic

    Authors: Trung Xuan Pham, ** Woong Choi, Rusty John Lloyd Mina, Thanh Nguyen, Sultan Rizky Madjid, Chang Dong Yoo

    Abstract: Herein, we introduce "Look and Diagnose" (LAD), a hybrid deep learning-based system that aims to support doctors in the medical field in diagnosing effectively the Benign Paroxysmal Positional Vertigo (BPPV) disorder. Given the body postures of the patient in the Dix-Hallpike and lateral head turns test, the visual information of both eyes is captured and fed into LAD for analyzing and classifying… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE Access 2022, 13 pages, 14 figures

  31. arXiv:2209.08263  [pdf, other

    cs.CV

    Scalable SoftGroup for 3D Instance Segmentation on Point Clouds

    Authors: Thang Vu, Kookhoi Kim, Tung M. Luu, Thanh Nguyen, Junyeong Kim, Chang D. Yoo

    Abstract: This paper considers a network referred to as SoftGroup for accurate and scalable 3D instance segmentation. Existing state-of-the-art methods produce hard semantic predictions followed by grou** instance segmentation results. Unfortunately, errors stemming from hard decisions propagate into the grou**, resulting in poor overlap between predicted instances and ground truth and substantial false… ▽ More

    Submitted 23 December, 2023; v1 submitted 17 September, 2022; originally announced September 2022.

    Comments: Accepted by TPAMI. Extension of arXiv:2203.01509

  32. arXiv:2209.07529  [pdf, other

    cs.LG cs.AI

    On the Soft-Subnetwork for Few-shot Class Incremental Learning

    Authors: Haeyong Kang, Jaehong Yoon, Sultan Rizky Hikmawan Madjid, Sung Ju Hwang, Chang D. Yoo

    Abstract: Inspired by Regularized Lottery Ticket Hypothesis (RLTH), which hypothesizes that there exist smooth (non-binary) subnetworks within a dense network that achieve the competitive performance of the dense network, we propose a few-shot class incremental learning (FSCIL) method referred to as \emph{Soft-SubNetworks (SoftNet)}. Our objective is to learn a sequence of sessions incrementally, where each… ▽ More

    Submitted 1 March, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: The Eleventh International Conference on Learning Representations (ICLR, 2023)

  33. arXiv:2208.05744  [pdf, other

    cs.CV

    On the Pros and Cons of Momentum Encoder in Self-Supervised Visual Representation Learning

    Authors: Trung Pham, Chaoning Zhang, Axi Niu, Kang Zhang, Chang D. Yoo

    Abstract: Exponential Moving Average (EMA or momentum) is widely used in modern self-supervised learning (SSL) approaches, such as MoCo, for enhancing performance. We demonstrate that such momentum can also be plugged into momentum-free SSL frameworks, such as SimCLR, for a performance boost. Despite its wide use as a fundamental component in modern SSL frameworks, the benefit caused by momentum is not well… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: 35 pages

  34. arXiv:2207.10899  [pdf, other

    cs.CV cs.AI cs.LG

    Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness

    Authors: Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Axi Niu, Jiu Feng, Chang D. Yoo, In So Kweon

    Abstract: Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields. Integrating AT into SSL, multiple prior works have accomplished a highly significant yet challenging task: learning robust representation without labels. A widely used framework is adversarial contrastive learning which couples AT… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV 2022 oral presentation

  35. Learning Imbalanced Datasets with Maximum Margin Loss

    Authors: Haeyong Kang, Thang Vu, Chang D. Yoo

    Abstract: A learning algorithm referred to as Maximum Margin (MM) is proposed for considering the class-imbalance data learning issue: the trained model tends to predict the majority of classes rather than the minority ones. That is, underfitting for minority classes seems to be one of the challenges of generalization. For a good generalization of the minority classes, we design a new Maximum Margin (MM) lo… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Report number: 21731543

    Journal ref: 2021 IEEE International Conference on Image Processing (ICIP)

  36. arXiv:2203.17248  [pdf, other

    cs.LG cs.AI

    Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo

    Authors: Chaoning Zhang, Kang Zhang, Trung X. Pham, Axi Niu, Zhinan Qiao, Chang D. Yoo, In So Kweon

    Abstract: Contrastive learning (CL) is widely known to require many negative samples, 65536 in MoCo for instance, for which the performance of a dictionary-free framework is often inferior because the negative sample size (NSS) is limited by its mini-batch size (MBS). To decouple the NSS from the MBS, a dynamic dictionary has been adopted in a large volume of CL frameworks, among which arguably the most pop… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR2022

  37. arXiv:2203.16262  [pdf, other

    cs.LG cs.AI

    How Does SimSiam Avoid Collapse Without Negative Samples? A Unified Understanding with Self-supervised Contrastive Learning

    Authors: Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Trung X. Pham, Chang D. Yoo, In So Kweon

    Abstract: To avoid collapse in self-supervised learning (SSL), a contrastive loss is widely used but often requires a large number of negative samples. Without negative samples yet achieving competitive performance, a recent work has attracted significant attention for providing a minimalist simple Siamese (SimSiam) method to avoid collapse. However, the reason for how it avoids collapse without negative sa… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: accepted on ICLR 2022

  38. arXiv:2203.01509  [pdf, other

    cs.CV

    SoftGroup for 3D Instance Segmentation on Point Clouds

    Authors: Thang Vu, Kookhoi Kim, Tung M. Luu, Xuan Thanh Nguyen, Chang D. Yoo

    Abstract: Existing state-of-the-art 3D instance segmentation methods perform semantic segmentation followed by grou**. The hard predictions are made when performing semantic segmentation such that each point is associated with a single class. However, the errors stemming from hard decision propagate into grou** that results in (1) low overlaps between the predicted instance with the ground truth and (2)… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: To appear in CVPR 2022

  39. arXiv:2202.05488  [pdf, other

    cs.LG cs.AI

    Fast Adversarial Training with Noise Augmentation: A Unified Perspective on RandStart and GradAlign

    Authors: Axi Niu, Kang Zhang, Chaoning Zhang, Chenshuang Zhang, In So Kweon, Chang D. Yoo, Yanning Zhang

    Abstract: PGD-based and FGSM-based are two popular adversarial training (AT) approaches for obtaining adversarially robust models. Compared with PGD-based AT, FGSM-based one is significantly faster but fails with catastrophic overfitting (CO). For mitigating CO in such Fast AT, there are two popular existing strategies: random start (RandStart) and Gradient Alignment (GradAlign). The former works only for a… ▽ More

    Submitted 5 October, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

  40. arXiv:2111.05014  [pdf, other

    eess.IV cs.AI cs.CV

    GDCA: GAN-based single image super resolution with Dual discriminators and Channel Attention

    Authors: Thanh Nguyen, Hieu Hoang, Chang D. Yoo

    Abstract: Single Image Super-Resolution (SISR) is a very active research field. This paper addresses SISR by using a GAN-based approach with dual discriminators and incorporating it with an attention mechanism. The experimental results show that GDCA can generate sharper and high pleasing images compare to other conventional methods.

    Submitted 9 November, 2021; originally announced November 2021.

    Journal ref: Korean Association of Artificial Intelligence 2019

  41. Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

    Authors: Tung M. Luu, Chang D. Yoo

    Abstract: This paper proposes a method for prioritizing the replay experience referred to as Hindsight Goal Ranking (HGR) in overcoming the limitation of Hindsight Experience Replay (HER) that generates hindsight goals based on uniform sampling. HGR samples with higher probability on the states visited in an episode with larger temporal difference (TD) error, which is considered as a proxy measure of the am… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Journal ref: IEEE Access 2021

  42. arXiv:2109.11196  [pdf, other

    stat.ML cs.CY cs.LG

    Fast and Efficient MMD-based Fair PCA via Optimization over Stiefel Manifold

    Authors: Junghyun Lee, Gwangsu Kim, Matt Olfat, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: This paper defines fair principal component analysis (PCA) as minimizing the maximum mean discrepancy (MMD) between dimensionality-reduced conditional distributions of different protected classes. The incorporation of MMD naturally leads to an exact and tractable mathematical formulation of fairness with good statistical properties. We formulate the problem of fair PCA subject to MMD constraints a… ▽ More

    Submitted 25 January, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

    Comments: 24 pages, 18 figures. Accepted to the 36th AAAI Conference on Artificial Intelligence (AAAI 2022)

  43. arXiv:2108.00475  [pdf, other

    cs.CV eess.IV

    Self-supervised Learning with Local Attention-Aware Feature

    Authors: Trung X. Pham, Rusty John Lloyd Mina, Dias Issa, Chang D. Yoo

    Abstract: In this work, we propose a novel methodology for self-supervised learning for generating global and local attention-aware visual features. Our approach is based on training a model to differentiate between specific image transformations of an input sample and the patched images. Utilizing this approach, the proposed method is able to outperform the previous best competitor by 1.03% on the Tiny-Ima… ▽ More

    Submitted 1 August, 2021; originally announced August 2021.

    Comments: 5 pages, 4 figures

  44. arXiv:2103.13361  [pdf, other

    cs.CV

    Structured Co-reference Graph Attention for Video-grounded Dialogue

    Authors: Junyeong Kim, Sunjae Yoon, Dahyun Kim, Chang D. Yoo

    Abstract: A video-grounded dialogue system referred to as the Structured Co-reference Graph Attention (SCGA) is presented for decoding the answer sequence to a question regarding a given video while kee** track of the dialogue context. Although recent efforts have made great strides in improving the quality of the response, performance is still far from satisfactory. The two main challenging issues are as… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted to AAAI2021

  45. Sample-efficient Reinforcement Learning Representation Learning with Curiosity Contrastive Forward Dynamics Model

    Authors: Thanh Nguyen, Tung M. Luu, Thang Vu, Chang D. Yoo

    Abstract: Develo** an agent in reinforcement learning (RL) that is capable of performing complex control tasks directly from high-dimensional observation such as raw pixels is yet a challenge as efforts are made towards improving sample efficiency and generalization. This paper considers a learning framework for Curiosity Contrastive Forward Dynamics Model (CCFDM) in achieving a more sample-efficient RL b… ▽ More

    Submitted 14 October, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Journal ref: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  46. Robust MAML: Prioritization task buffer with adaptive learning process for model-agnostic meta-learning

    Authors: Thanh Nguyen, Tung Luu, Trung Pham, Sanzhar Rakhimkul, Chang D. Yoo

    Abstract: Model agnostic meta-learning (MAML) is a popular state-of-the-art meta-learning algorithm that provides good weight initialization of a model given a variety of learning tasks. The model initialized by provided weight can be fine-tuned to an unseen task despite only using a small amount of samples and within a few adaptation steps. MAML is simple and versatile but requires costly learning rate tun… ▽ More

    Submitted 10 June, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Journal ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  47. arXiv:2102.00831  [pdf, other

    cs.CV cs.AI

    Semantic Grou** Network for Video Captioning

    Authors: Hobin Ryu, Sunghun Kang, Haeyong Kang, Chang D. Yoo

    Abstract: This paper considers a video caption generating network referred to as Semantic Grou** Network (SGN) that attempts (1) to group video frames with discriminating word phrases of partially decoded caption and then (2) to decode those semantically aligned groups in predicting the next word. As consecutive frames are not likely to provide unique information, prior methods have focused on discarding… ▽ More

    Submitted 3 February, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: AAAI 2021

  48. arXiv:2012.10150  [pdf, other

    cs.CV

    SCNet: Training Inference Sample Consistency for Instance Segmentation

    Authors: Thang Vu, Haeyong Kang, Chang D. Yoo

    Abstract: Cascaded architectures have brought significant performance improvement in object detection and instance segmentation. However, there are lingering issues regarding the disparity in the Intersection-over-Union (IoU) distribution of the samples between training and inference. This disparity can potentially exacerbate detection accuracy. This paper proposes an architecture referred to as Sample Cons… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

    Comments: To appear in AAAI 2021

  49. VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval

    Authors: Minuk Ma, Sunjae Yoon, Junyeong Kim, Youngjoon Lee, Sunghun Kang, Chang D. Yoo

    Abstract: Video Moment Retrieval (VMR) is a task to localize the temporal moment in untrimmed video specified by natural language query. For VMR, several methods that require full supervision for training have been proposed. Unfortunately, acquiring a large number of training videos with labeled temporal boundaries for each query is a labor-intensive process. This paper explores methods for performing VMR i… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: 16 pages, 6 figures, European Conference on Computer Vision, 2020

  50. arXiv:2007.02036  [pdf, other

    cs.CV

    Modality Shifting Attention Network for Multi-modal Video Question Answering

    Authors: Junyeong Kim, Minuk Ma, Trung Pham, Kyungsu Kim, Chang D. Yoo

    Abstract: This paper considers a network referred to as Modality Shifting Attention Network (MSAN) for Multimodal Video Question Answering (MVQA) task. MSAN decomposes the task into two sub-tasks: (1) localization of temporal moment relevant to the question, and (2) accurate prediction of the answer based on the localized moment. The modality required for temporal localization may be different from that for… ▽ More

    Submitted 4 July, 2020; originally announced July 2020.

    Comments: CVPR2020 accepted; poster