Skip to main content

Showing 1–50 of 122 results for author: Min, D

.
  1. arXiv:2406.15755  [pdf, other

    cs.CV cs.AI

    Fine-grained Background Representation for Weakly Supervised Semantic Segmentation

    Authors: Xu Yin, Woobin Im, Dongbo Min, Yuchi Huo, Fei Pan, Sung-Eui Yoon

    Abstract: Generating reliable pseudo masks from image-level labels is challenging in the weakly supervised semantic segmentation (WSSS) task due to the lack of spatial information. Prevalent class activation map (CAM)-based solutions are challenged to discriminate the foreground (FG) objects from the suspicious background (BG) pixels (a.k.a. co-occurring) and learn the integral object regions. This paper pr… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  2. arXiv:2406.15725  [pdf, other

    eess.AS cs.SD

    Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes

    Authors: Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

    Abstract: To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency war** and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 4 technical report

  3. arXiv:2406.05341  [pdf, other

    eess.AS cs.SD

    Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  4. arXiv:2406.02596  [pdf, other

    cs.LG cs.AI

    Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks

    Authors: Hojoon Lee, Hyeonseo Cho, Hyunseung Kim, Donghu Kim, Dugki Min, Jaegul Choo, Clare Lyle

    Abstract: This study investigates the loss of generalization ability in neural networks, revisiting warm-starting experiments from Ash & Adams. Our empirical analysis reveals that common methods designed to enhance plasticity by maintaining trainability provide limited benefits to generalization. While reinitializing the network can be effective, it also risks losing valuable prior knowledge. To this end, w… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  5. arXiv:2404.08330  [pdf, other

    cs.CV

    Emerging Property of Masked Token for Effective Pre-training

    Authors: Hyesong Choi, Hunsang Lee, Seyoung Joung, Hye** Park, Jiyeong Kim, Dongbo Min

    Abstract: Driven by the success of Masked Language Modeling (MLM), the realm of self-supervised learning for computer vision has been invigorated by the central role of Masked Image Modeling (MIM) in driving recent breakthroughs. Notwithstanding the achievements of MIM across various downstream tasks, its overall efficiency is occasionally hampered by the lengthy duration of the pre-training phase. This pap… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  6. arXiv:2404.08327  [pdf, other

    cs.CV

    Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training

    Authors: Hyesong Choi, Hye** Park, Kwang Moo Yi, Sungmin Cha, Dongbo Min

    Abstract: In this paper, we introduce Saliency-Based Adaptive Masking (SBAM), a novel and cost-effective approach that significantly enhances the pre-training performance of Masked Image Modeling (MIM) approaches by prioritizing token salience. Our method provides robustness against variations in masking ratios, effectively mitigating the performance instability issues common in existing methods. This relax… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  7. arXiv:2404.00636  [pdf, other

    cs.CV cs.AI cs.MM

    Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation

    Authors: Taekyung Ki, Dongchan Min, Gyeongsu Chae

    Abstract: In this paper, we present Export3D, a one-shot 3D-aware portrait animation method that is able to control the facial expression and camera view of a given portrait image. To achieve this, we introduce a tri-plane generator that directly generates a tri-plane of 3D prior by transferring the expression parameter of 3DMM into the source image. The tri-plane is then decoded into the image of different… ▽ More

    Submitted 2 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: Project page: https://export3d.github.io

  8. arXiv:2403.19723  [pdf, other

    cs.CL cs.AI cs.DB cs.MM

    HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding

    Authors: Rihui **, Yu Li, Guilin Qi, Nan Hu, Yuan-Fang Li, Jiaoyan Chen, Jianan Wang, Yongrui Chen, Dehai Min

    Abstract: Table understanding (TU) has achieved promising advancements, but it faces the challenges of the scarcity of manually labeled tables and the presence of complex table structures.To address these challenges, we propose HGT, a framework with a heterogeneous graph (HG)-enhanced large language model (LLM) to tackle few-shot TU tasks.It leverages the LLM by aligning the table semantics with the LLM's p… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  9. arXiv:2403.19305  [pdf, other

    cs.CL cs.AI

    MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation

    Authors: Yu Li, Shenyu Zhang, Rui Wu, Xiutian Huang, Yongrui Chen, Wenhao Xu, Guilin Qi, Dehai Min

    Abstract: Recent advancements in generative Large Language Models(LLMs) have been remarkable, however, the quality of the text generated by these models often reveals persistent issues. Evaluating the quality of text generated by these models, especially in open-ended text, has consistently presented a significant challenge. Addressing this, recent work has explored the possibility of using LLMs as evaluato… ▽ More

    Submitted 15 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted as a long paper presentation by DASFAA 2024 Industrial Track

  10. arXiv:2403.13578  [pdf, other

    cs.CL cs.LG

    Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation

    Authors: Do June Min, Veronica Perez-Rosas, Kenneth Resnicow, Rada Mihalcea

    Abstract: In this paper, we study the problem of multi-reward reinforcement learning to jointly optimize for multiple text qualities for natural language generation. We focus on the task of counselor reflection generation, where we optimize the generators to simultaneously improve the fluency, coherence, and reflection quality of generated counselor responses. We introduce two novel bandit methods, DynaOpt… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  11. arXiv:2402.12869  [pdf, other

    cs.CL

    Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

    Authors: Dehai Min, Nan Hu, Rihui **, Nuo Lin, Jiaoyan Chen, Yongrui Chen, Yu Li, Guilin Qi, Yun Li, Nijun Li, Qianren Wang

    Abstract: Augmenting Large Language Models (LLMs) for Question Answering (QA) with domain specific data has attracted wide attention. However, domain data often exists in a hybrid format, including text and semi-structured tables, posing challenges for the seamless integration of information. Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly… ▽ More

    Submitted 9 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to NAACL 2024 Industry Track Paper

  12. arXiv:2311.08300  [pdf, other

    cs.CL cs.AI

    Workflow-Guided Response Generation for Task-Oriented Dialogue

    Authors: Do June Min, Paloma Sodhi, Ramya Ramakrishnan

    Abstract: Task-oriented dialogue (TOD) systems aim to achieve specific goals through interactive dialogue. Such tasks usually involve following specific workflows, i.e. executing a sequence of actions in a particular order. While prior work has focused on supervised learning methods to condition on past actions, they do not explicitly optimize for compliance to a desired workflow. In this paper, we propose… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  13. arXiv:2311.08299  [pdf, other

    cs.CL cs.AI

    VERVE: Template-based ReflectiVE Rewriting for MotiVational IntErviewing

    Authors: Do June Min, Verónica Pérez-Rosas, Kenneth Resnicow, Rada Mihalcea

    Abstract: Reflective listening is a fundamental skill that counselors must acquire to achieve proficiency in motivational interviewing (MI). It involves responding in a manner that acknowledges and explores the meaning of what the client has expressed in the conversation. In this work, we introduce the task of counseling response rewriting, which transforms non-reflective statements into reflective response… ▽ More

    Submitted 8 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  14. arXiv:2310.15482  [pdf, other

    cs.CV

    Salient Object Detection in RGB-D Videos

    Authors: Ao Mou, Yukang Lu, Jiahao He, Dingyao Min, Keren Fu, Qijun Zhao

    Abstract: Given the widespread adoption of depth-sensing acquisition devices, RGB-D videos and related data/media have gained considerable traction in various aspects of daily life. Consequently, conducting salient object detection (SOD) in RGB-D videos presents a highly promising and evolving avenue. Despite the potential of this area, SOD in RGB-D videos remains somewhat under-explored, with RGB-D SOD and… ▽ More

    Submitted 21 May, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: IEEE TIP (under major revision)

  15. arXiv:2306.11427  [pdf

    eess.AS

    Auditory Neural Response Inspired Sound Event Detection Based on Spectro-temporal Receptive Field

    Authors: Deokki Min, Hyeonuk Nam, Yong-Hwa Park

    Abstract: Sound event detection (SED) is one of tasks to automate function by human auditory system which listens and understands auditory scenes. Therefore, we were inspired to make SED recognize sound events in the way human auditory system does. Spectro-temporal receptive field (STRF), an approach to describe the relationship between perceived sound at ear and transformed neural response in the auditory… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Submitted to DCASE 2023 Workshop

  16. arXiv:2306.11277  [pdf, other

    cs.SD eess.AS

    Frequency & Channel Attention for Computationally Efficient Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Yong-Hwa Park

    Abstract: We explore on various attention methods on frequency and channel dimensions for sound event detection (SED) in order to enhance performance with minimal increase in computational cost while leveraging domain knowledge to address the frequency dimension of audio data. We have introduced frequency dynamic convolution (FDY conv) in a previous work to release the translational equivariance issue assoc… ▽ More

    Submitted 28 August, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to DCASE 2023 workshop

  17. arXiv:2306.01866  [pdf, ps, other

    math.DG

    Construction of higher dimensional ALF Calabi-Yau metrics

    Authors: Daheng Min

    Abstract: Roughly speaking, an ALF metric of real dimension $4n$ should be a metric such that its asymptotic cone is $4n-1$ dimensional, the volume growth of this metric is of order $4n-1$ and its sectional curvature tends to 0 at infinity. In this paper, I will first show that the Taub-NUT deformation of a hyperkähler cone with respect to a locally free $\mathbb{S}^1-$symmetry is ALF hyperkähler. Modelle… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    MSC Class: 53C25 (Primary) 53C55; 53C26; 53C30; 53D20 (Secondary)

  18. arXiv:2305.19135  [pdf, other

    cs.CV

    Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization

    Authors: Doyeon Kim, Eunji Ko, Hyunsu Kim, Yunji Kim, Junho Kim, Dongchan Min, Junmo Kim, Sung Ju Hwang

    Abstract: Portrait stylization, which translates a real human face image into an artistically stylized image, has attracted considerable interest and many prior works have shown impressive quality in recent years. However, despite their remarkable performances in the image-level translation tasks, prior methods show unsatisfactory results when they are applied to the video domain. To address the issue, we p… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 5 pages, 3 figures, CVPR 2023 Workshop on AI for Content Creation

  19. arXiv:2305.12544  [pdf, other

    cs.CL cs.AI

    Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models

    Authors: Oana Ignat, Zhi**g **, Artem Abzaliev, Laura Biester, Santiago Castro, Naihao Deng, Xinyi Gao, Aylin Gunal, Jacky He, Ashkan Kazemi, Muhammad Khalifa, Namho Koh, Andrew Lee, Siyang Liu, Do June Min, Shinka Mori, Joan Nwatu, Veronica Perez-Rosas, Siqi Shen, Zekun Wang, Winston Wu, Rada Mihalcea

    Abstract: Recent progress in large language models (LLMs) has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has, in turn, made many NLP researchers -- especially those at the beginning of their careers -- worry about what NLP research area they should focus on. Has it all be… ▽ More

    Submitted 15 March, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted at COLING 2024

  20. arXiv:2305.00521  [pdf, other

    cs.CV cs.AI cs.LG

    StyleLipSync: Style-based Personalized Lip-sync Video Generation

    Authors: Taekyung Ki, Dongchan Min

    Abstract: In this paper, we present StyleLipSync, a style-based personalized lip-sync video generative model that can generate identity-agnostic lip-synchronizing video from arbitrary audio. To generate a video of arbitrary identities, we leverage expressive lip prior from the semantically rich latent space of a pre-trained StyleGAN, where we can also design a video consistency with a linear transformation.… ▽ More

    Submitted 12 February, 2024; v1 submitted 30 April, 2023; originally announced May 2023.

    Comments: International Conference on Computer Vision (ICCV) 2023. Project page: https://stylelipsync.github.io

  21. Adaptive Endpointing with Deep Contextual Multi-armed Bandits

    Authors: Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh

    Abstract: Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is a common practice to utilize costly grid-search to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Journal ref: Proc. IEEE ICASSP, June 2023

  22. arXiv:2303.10368  [pdf, other

    cs.CL

    An Empirical Study of Pre-trained Language Models in Simple Knowledge Graph Question Answering

    Authors: Nan Hu, Yike Wu, Guilin Qi, Dehai Min, Jiaoyan Chen, Jeff Z. Pan, Zafar Ali

    Abstract: Large-scale pre-trained language models (PLMs) such as BERT have recently achieved great success and become a milestone in natural language processing (NLP). It is now the consensus of the NLP community to adopt PLMs as the backbone for downstream tasks. In recent works on knowledge graph question answering (KGQA), BERT or its variants have become necessary in their KGQA models. However, there is… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted by World Wide Web Journal

  23. arXiv:2303.07992  [pdf, other

    cs.CL

    Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family

    Authors: Yiming Tan, Dehai Min, Yu Li, Wenbo Li, Nan Hu, Yongrui Chen, Guilin Qi

    Abstract: ChatGPT is a powerful large language model (LLM) that covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge. Therefore, there is growing interest in exploring whether ChatGPT can replace traditional knowledge-based question answering (KBQA) models. Although there have been some works analyzing the question answering performance of Cha… ▽ More

    Submitted 20 September, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: To be published in Proceedings of ISWC 2023, 22nd International Semantic Web Conference

  24. arXiv:2211.09383  [pdf, other

    eess.AS cs.AI cs.SD

    Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models

    Authors: Minki Kang, Dongchan Min, Sung Ju Hwang

    Abstract: There has been a significant progress in Text-To-Speech (TTS) synthesis technology in recent years, thanks to the advancement in neural generative modeling. However, existing methods on any-speaker adaptive TTS have achieved unsatisfactory performance, due to their suboptimal accuracy in mimicking the target speakers' styles. In this work, we present Grad-StyleSpeech, which is an any-speaker adapt… ▽ More

    Submitted 13 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: ICASSP 2023

  25. arXiv:2210.02689  [pdf, other

    cs.CV

    Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence

    Authors: Sunghwan Hong, Jisu Nam, Seokju Cho, Susung Hong, Sangryul Jeon, Dongbo Min, Seungryong Kim

    Abstract: Existing pipelines of semantic correspondence commonly include extracting high-level semantic features for the invariance against intra-class variations and background clutters. This architecture, however, inevitably results in a low-resolution matching field that additionally requires an ad-hoc interpolation process as a post-processing for converting it into a high-resolution one, certainly limi… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: NeurIPS2022 camera ready

  26. arXiv:2210.00223  [pdf, other

    cs.CV

    Contour-Aware Equipotential Learning for Semantic Segmentation

    Authors: Xu Yin, Dongbo Min, Yuchi Huo, Sung-Eui Yoon

    Abstract: With increasing demands for high-quality semantic segmentation in the industry, hard-distinguishing semantic boundaries have posed a significant threat to existing solutions. Inspired by real-life experience, i.e., combining varied observations contributes to higher visual recognition confidence, we present the equipotential learning (EPL) method. This novel module transfers the predicted/ground-t… ▽ More

    Submitted 1 October, 2022; originally announced October 2022.

  27. arXiv:2209.02518  [pdf

    cs.CV

    Sequential Cross Attention Based Multi-task Learning

    Authors: Sunkyung Kim, Hyesong Choi, Dongbo Min

    Abstract: In multi-task learning (MTL) for visual scene understanding, it is crucial to transfer useful information between multiple tasks with minimal interferences. In this paper, we propose a novel architecture that effectively transfers informative features by applying the attention mechanism to the multi-scale features of the tasks. Since applying the attention module directly to all possible features… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Comments: ICIP 2022

  28. arXiv:2208.10922  [pdf, other

    cs.CV cs.LG eess.AS eess.IV

    StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation

    Authors: Dongchan Min, Minyoung Song, Eunji Ko, Sung Ju Hwang

    Abstract: We propose StyleTalker, a novel audio-driven talking head generation model that can synthesize a video of a talking person from a single reference image with accurately audio-synced lip shapes, realistic head poses, and eye blinks. Specifically, by leveraging a pretrained image generator and an image encoder, we estimate the latent codes of the talking head video that faithfully reflects the given… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 August, 2022; originally announced August 2022.

  29. arXiv:2207.13340  [pdf, other

    cs.CV cs.LG

    PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation

    Authors: Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, Kwanghoon Sohn

    Abstract: Online stereo adaptation tackles the domain shift problem, caused by different environments between synthetic (training) and real (test) datasets, to promptly adapt stereo models in dynamic real-world applications such as autonomous driving. However, previous methods often fail to counteract particular regions related to dynamic objects with more severe environmental changes. To mitigate this issu… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  30. arXiv:2206.12059  [pdf

    eess.AS cs.SD

    Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

    Authors: Byeong-Yun Ko, Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Seung-Deok Choi, Yong-Hwa Park

    Abstract: Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions:… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Report submitted for DCASE2022 Challenge Task3

  31. arXiv:2206.11645  [pdf, ps, other

    eess.AS

    Frequency Dependent Sound Event Detection for DCASE 2022 Challenge Task 4

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Byeong-Yun Ko, Seung-Deok Choi, Yong-Hwa Park

    Abstract: While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Prev… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Reprot submitted for DCASE2022 Challenge Task4

  32. arXiv:2206.09604  [pdf, other

    cs.CV cs.AI

    Distortion-Aware Network Pruning and Feature Reuse for Real-time Video Segmentation

    Authors: Hyunsu Rhee, Dongchan Min, Sunil Hwang, Bruno Andreis, Sung Ju Hwang

    Abstract: Real-time video segmentation is a crucial task for many real-world applications such as autonomous driving and robot control. Since state-of-the-art semantic segmentation models are often too heavy for real-time applications despite their impressive performance, researchers have proposed lightweight architectures with speed-accuracy trade-offs, achieving real-time speed at the expense of reduced a… ▽ More

    Submitted 15 December, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

  33. arXiv:2204.03609  [pdf, other

    cs.CV cs.LG

    Pin the Memory: Learning to Generalize Semantic Segmentation

    Authors: ** Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

    Abstract: The rise of deep neural networks has led to several breakthroughs for semantic segmentation. In spite of this, a model trained on source domain often fails to work properly in new challenging domains, that is directly concerned with the generalization capability of the model. In this paper, we present a novel memory-guided domain generalization method for semantic segmentation based on meta-learni… ▽ More

    Submitted 30 May, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022

  34. arXiv:2202.06060  [pdf, other

    cs.CV

    Depth-Cooperated Trimodal Network for Video Salient Object Detection

    Authors: Yukang Lu, Dingyao Min, Keren Fu, Qijun Zhao

    Abstract: Depth can provide useful geographical cues for salient object detection (SOD), and has been proven helpful in recent RGB-D SOD methods. However, existing video salient object detection (VSOD) methods only utilize spatiotemporal information and seldom exploit depth information for detection. In this paper, we propose a depth-cooperated trimodal network, called DCTNet for VSOD, which is a pioneering… ▽ More

    Submitted 11 July, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

    Comments: 5 pages, 3 figures, Accepted at ICIP-2022

  35. arXiv:2110.11590  [pdf, other

    cs.CV

    DIML/CVL RGB-D Dataset: 2M RGB-D Images of Natural Indoor and Outdoor Scenes

    Authors: Jaehoon Cho, Dongbo Min, Youngjung Kim, Kwanghoon Sohn

    Abstract: This manual is intended to provide a detailed description of the DIML/CVL RGB-D dataset. This dataset is comprised of 2M color images and their corresponding depth maps from a great variety of natural indoor and outdoor scenes. The indoor dataset was constructed using the Microsoft Kinect v2, while the outdoor dataset was built using the stereo cameras (ZED stereo camera and built-in stereo camera… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: Technical report

  36. Self-balanced Learning For Domain Generalization

    Authors: ** Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

    Abstract: Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics. Most existing approaches have been developed under the assumption that the source data is well-balanced in terms of both domain and class. However, real-world training data collected with different composition biases often exhibits severe… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

    Comments: Accepted at International Conference on Image Processing (ICIP) 2021

    Journal ref: ICIP, 2021, pp. 779-783

  37. arXiv:2106.03153  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

    Authors: Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang

    Abstract: With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality witho… ▽ More

    Submitted 16 June, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: Accepted by ICML 2021

  38. arXiv:2101.00431  [pdf, other

    cs.CV

    On the confidence of stereo matching in a deep-learning era: a quantitative evaluation

    Authors: Matteo Poggi, Seungryong Kim, Fabio Tosi, Sunok Kim, Filippo Aleotti, Dongbo Min, Kwanghoon Sohn, Stefano Mattoccia

    Abstract: Stereo matching is one of the most popular techniques to estimate dense depth maps by finding the disparity between matching pixels on two, synchronized and rectified images. Alongside with the development of more accurate algorithms, the research community focused on finding good strategies to estimate the reliability, i.e. the confidence, of estimated disparity maps. This information proves to b… ▽ More

    Submitted 30 March, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: TPAMI final version

  39. arXiv:2011.10897   

    cs.AI eess.SY

    Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems

    Authors: Hyungjun Park, Daiki Min, Jong-hyun Ryu, Dong Gu Choi

    Abstract: Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First… ▽ More

    Submitted 19 May, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: We request withdrawal of this article due to a definition error on methodology and problem definition (Section 3-4; pages 2-5)

  40. arXiv:2009.12840  [pdf, other

    cs.CV

    Adaptive confidence thresholding for monocular depth estimation

    Authors: Hyesong Choi, Hunsang Lee, Sunkyung Kim, Sunok Kim, Seungryong Kim, Kwanghoon Sohn, Dongbo Min

    Abstract: Self-supervised monocular depth estimation has become an appealing solution to the lack of ground truth labels, but its reconstruction loss often produces over-smoothed results across object boundaries and is incapable of handling occlusion explicitly. In this paper, we propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching… ▽ More

    Submitted 23 August, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

    Comments: ICCV 2021

  41. arXiv:2006.16659  [pdf, other

    eess.SY cs.LG

    Delayed Q-update: A novel credit assignment technique for deriving an optimal operation policy for the Grid-Connected Microgrid

    Authors: Hyungjun Park, Daiki Min, Jong-hyun Ryu, Dong Gu Choi

    Abstract: A microgrid is an innovative system that integrates distributed energy resources to supply electricity demand within electrical boundaries. This study proposes an approach for deriving a desirable microgrid operation policy that enables sophisticated controls in the microgrid system using the proposed novel credit assignment technique, delayed-Q update. The technique employs novel features such as… ▽ More

    Submitted 20 October, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

  42. arXiv:2004.13354  [pdf, other

    cs.CR

    SGX-SSD: A Policy-based Versioning SSD with Intel SGX

    Authors: **woo Ahn, Seung** Lee, **hoon Lee, Yungwoo Ko, Donghyun Min, Junghee Lee, Youngjae Kim

    Abstract: This paper demonstrates that SSDs, which perform device-level versioning, can be exposed to data tampering attacks when the retention time of data is less than the malware's dwell time. To deal with that threat, we propose SGX-SSD, a SGX-based versioning SSD which selectively preserves file history based on the given policy. The proposed system adopts Intel SGX to implement the version policy mana… ▽ More

    Submitted 28 April, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: 7 pages, 4 figures

    ACM Class: E.5

  43. arXiv:1910.00754  [pdf, other

    cs.CV

    Joint Learning of Semantic Alignment and Object Landmark Detection

    Authors: Sangryul Jeon, Dongbo Min, Seungryong Kim, Kwanghoon Sohn

    Abstract: Convolutional neural networks (CNNs) based approaches for semantic alignment and object landmark detection have improved their performance significantly. Current efforts for the two tasks focus on addressing the lack of massive training data through weakly- or unsupervised learning frameworks. In this paper, we present a joint learning approach for obtaining dense correspondences and discovering o… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted to ICCV 2019

  44. arXiv:1904.10230  [pdf, other

    cs.CV

    A Large RGB-D Dataset for Semi-supervised Monocular Depth Estimation

    Authors: Jaehoon Cho, Dongbo Min, Youngjung Kim, Kwanghoon Sohn

    Abstract: Current self-supervised methods for monocular depth estimation are largely based on deeply nested convolutional networks that leverage stereo image pairs or monocular sequences during a training phase. However, they often exhibit inaccurate results around occluded regions and depth boundaries. In this paper, we present a simple yet effective approach for monocular depth estimation using stereo ima… ▽ More

    Submitted 21 October, 2021; v1 submitted 23 April, 2019; originally announced April 2019.

    Comments: https://dimlrgbd.github.io/

  45. arXiv:1904.05012  [pdf, other

    cs.CR cs.OS

    KEY-SSD: Access-Control Drive to Protect Files from Ransomware Attacks

    Authors: **woo Ahn, Donggyu Park, Chang-Gyu Lee, Donghyun Min, Junghee Lee, Sungyong Park, Qian Chen, Youngjae Kim

    Abstract: Traditional techniques to prevent damage from ransomware attacks are to detect and block attacks by monitoring the known behaviors such as frequent name changes, recurring access to cryptographic libraries and exchange keys with remote servers. Unfortunately, intelligent ransomware can easily bypass these techniques. Another prevention technique is to recover from the backup copy when a file is in… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: 12 pages, 20 figures

  46. arXiv:1904.02969  [pdf, other

    cs.CV

    Semantic Attribute Matching Networks

    Authors: Seungryong Kim, Dongbo Min, Somi Jeong, Sunok Kim, Sangryul Jeon, Kwanghoon Sohn

    Abstract: We present semantic attribute matching networks (SAM-Net) for jointly establishing correspondences and transferring attributes across semantically similar images, which intelligently weaves the advantages of the two tasks while overcoming their limitations. SAM-Net accomplishes this through an iterative process of establishing reliable correspondences by reducing the attribute discrepancy between… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

    Comments: CVPR 2019

  47. arXiv:1812.01640  [pdf, other

    cs.LG

    Overcoming Catastrophic Forgetting by Soft Parameter Pruning

    Authors: Jian Peng, Jiang Hao, Zhuo Li, Enqiang Guo, Xiaohong Wan, Deng Min, Qing Zhu, Haifeng Li

    Abstract: Catastrophic forgetting is a challenge issue in continual learning when a deep neural network forgets the knowledge acquired from the former task after learning on subsequent tasks. However, existing methods try to find the joint distribution of parameters shared with all tasks. This idea can be questionable because this joint distribution may not present when the number of tasks increase. On the… ▽ More

    Submitted 4 December, 2018; originally announced December 2018.

    Comments: 10 pages, 12 figures

  48. arXiv:1810.12155  [pdf, other

    cs.CV

    Recurrent Transformer Networks for Semantic Correspondence

    Authors: Seungryong Kim, Stephen Lin, Sangryul Jeon, Dongbo Min, Kwanghoon Sohn

    Abstract: We present recurrent transformer networks (RTNs) for obtaining dense correspondences between semantically similar images. Our networks accomplish this through an iterative process of estimating spatial transformations between the input images and using these transformations to generate aligned convolutional activations. By directly estimating the transformations between an image pair, rather than… ▽ More

    Submitted 29 October, 2018; originally announced October 2018.

    Comments: Neural Information Processing Systems (NIPS) 2018

  49. arXiv:1807.02939  [pdf, other

    cs.CV

    PARN: Pyramidal Affine Regression Networks for Dense Semantic Correspondence

    Authors: Sangryul Jeon, Seungryong Kim, Dongbo Min, Kwanghoon Sohn

    Abstract: This paper presents a deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates locally-varying affine transformation fields across images. To deal with intra-class appearance and shape variations that commonly exist among different instances within the same object category, we leverage a pyramidal model where affine transformation fiel… ▽ More

    Submitted 1 August, 2018; v1 submitted 9 July, 2018; originally announced July 2018.

    Comments: To appear in ECCV 2018

  50. arXiv:1708.04568  [pdf

    physics.optics cond-mat.mes-hall

    Low-threshold optically pumped lasing in highly strained Ge nanowires

    Authors: Shuyu Bao, Daeik Kim, Chibuzo Onwukaeme, Shashank Gupta, Krishna Saraswat, Kwang Hong Lee, Yeji Kim, Dabin Min, Yongduck Jung, Haodong Qiu, Hong Wang, Eugene A. Fitzgerald, Chuan Seng Tan, Donguk Nam

    Abstract: The integration of efficient, miniaturized group IV lasers into CMOS architecture holds the key to the realization of fully functional photonic-integrated circuits. Despite several years of progress, however, all group IV lasers reported to date exhibit impractically high thresholds owing to their unfavorable bandstructures. Highly strained germanium with its fundamentally altered bandstructure ha… ▽ More

    Submitted 15 August, 2017; originally announced August 2017.

    Comments: 31 pages, 9 figures