Search | arXiv e-print repository

Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Authors: Inès Hyeonsu Kim, JoungBin Lee, Soowon Son, Woojeong **, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, Seungryong Kim

Abstract: Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data a… ▽ More Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data augmentation; however, they rely on human poses already present in the training dataset, failing to effectively reduce the human pose bias in the dataset. We propose Diff-ID, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment a training dataset that enables existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. Using the SMPL model, we simultaneously capture both the desired human poses and camera viewpoints, enabling realistic human rendering. The depth information provided by the SMPL model indirectly conveys the camera viewpoints. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate realistic images with diverse human poses and camera viewpoints. Qualitative results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches. The performance gains achieved by training Re-ID models on our offline augmented dataset highlight the potential of our proposed framework in improving the scalability and generalizability of person Re-ID models. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: The project page is available at https://ku-cvlab.github.io/Diff-ID/

arXiv:2406.12721 [pdf]

Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4

Authors: Sang Won Son, Jongyeon Park, Hong Kook Kim, Sulaiman Vesal, Jeong Eun Lim

Abstract: In this report, we propose three novel methods for develo** a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main de… ▽ More In this report, we propose three novel methods for develo** a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main decoder, enhancing performance of the convolutional block during the initial training stages by assigning a different weight strategy between main and auxiliary decoder losses. Next, to address the time interval issue between the DESED and MAESTRO datasets, we propose maximum probability aggregation (MPA) during the training step. The proposed MPA method enables the model's output to be aligned with soft labels of 1 s in the MAESTRO dataset. Finally, we propose a multi-channel input feature that employs various versions of logmel and MFCC features to generate time-frequency pattern. The experimental results demonstrate the efficacy of these proposed methods in a view of improving SED performance by achieving a balanced enhancement across different datasets and label types. Ultimately, this approach presents a significant step forward in develo** more robust and flexible SED models △ Less

Submitted 24 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: DCASE 2024 challenge Task4, 4 pages

arXiv:2406.12016 [pdf, other]

Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization

Authors: Seungwoo Son, Wonpyo Park, Woohyun Han, Kyuyeun Kim, Jaeho Lee

Abstract: Despite recent advances in LLM quantization, activation quantization remains to be challenging due to the activation outliers. Conventional remedies, e.g., mixing precisions for different channels, introduce extra overhead and reduce the speedup. In this work, we develop a simple yet effective strategy to facilitate per-tensor activation quantization by preventing the generation of problematic tok… ▽ More Despite recent advances in LLM quantization, activation quantization remains to be challenging due to the activation outliers. Conventional remedies, e.g., mixing precisions for different channels, introduce extra overhead and reduce the speedup. In this work, we develop a simple yet effective strategy to facilitate per-tensor activation quantization by preventing the generation of problematic tokens. Precisely, we propose a method to find a set of key-value cache, coined CushionCache, which mitigates outliers in subsequent tokens when inserted as a prefix. CushionCache works in two steps: First, we greedily search for a prompt token sequence that minimizes the maximum activation values in subsequent tokens. Then, we further tune the token cache to regularize the activations of subsequent tokens to be more quantization-friendly. The proposed method successfully addresses activation outliers of LLMs, providing a substantial performance boost for per-tensor activation quantization methods. We thoroughly evaluate our method over a wide range of models and benchmarks and find that it significantly surpasses the established baseline of per-tensor W8A8 quantization and can be seamlessly integrated with the recent activation quantization method. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10809 [pdf, other]

Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations

Authors: Yoonna Jang, Suhyune Son, Jeongwoo Lee, Junyoung Son, Yuna Hur, Jungwoo Lim, Hyeonseok Moon, Kisu Yang, Heuiseok Lim

Abstract: Despite the striking advances in recent language generation performance, model-generated responses have suffered from the chronic problem of hallucinations that are either untrue or unfaithful to a given source. Especially in the task of knowledge grounded conversation, the models are required to generate informative responses, but hallucinated utterances lead to miscommunication. In particular, e… ▽ More Despite the striking advances in recent language generation performance, model-generated responses have suffered from the chronic problem of hallucinations that are either untrue or unfaithful to a given source. Especially in the task of knowledge grounded conversation, the models are required to generate informative responses, but hallucinated utterances lead to miscommunication. In particular, entity-level hallucination that causes critical misinformation and undesirable conversation is one of the major concerns. To address this issue, we propose a post-hoc refinement method called REM. It aims to enhance the quality and faithfulness of hallucinated utterances by refining them based on the source knowledge. If the generated utterance has a low source-faithfulness score with the given knowledge, REM mines the key entities in the knowledge and implicitly uses them for refining the utterances. We verify that our method reduces entity hallucination in the utterance. Also, we show the adaptability and efficacy of REM with extensive experiments and generative results. Our code is available at https://github.com/YOONNAJANG/REM. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Accepted at EMNLP 2023

arXiv:2406.01431 [pdf, other]

Deep Stochastic Kinematic Models for Probabilistic Motion Forecasting in Traffic

Authors: Laura Zheng, Sanghyun Son, **g Liang, Xijun Wang, Brian Clipp, Ming C. Lin

Abstract: Kinematic priors have shown to be helpful in boosting generalization and performance in prior work on trajectory forecasting. Specifically, kinematic priors have been applied such that models predict a set of actions instead of future output trajectories. By unrolling predicted trajectories via time integration and models of kinematic dynamics, predicted trajectories are not only kinematically fea… ▽ More Kinematic priors have shown to be helpful in boosting generalization and performance in prior work on trajectory forecasting. Specifically, kinematic priors have been applied such that models predict a set of actions instead of future output trajectories. By unrolling predicted trajectories via time integration and models of kinematic dynamics, predicted trajectories are not only kinematically feasible on average but also relate uncertainty from one timestep to the next. With benchmarks supporting prediction of multiple trajectory predictions, deterministic kinematic priors are less and less applicable to current models. We propose a method for integrating probabilistic kinematic priors into modern probabilistic trajectory forecasting architectures. The primary difference between our work and previous techniques is the analytical quantification of variance, or uncertainty, in predicted trajectories. With negligible additional computational overhead, our method can be generalized and easily implemented with any modern probabilistic method that models candidate trajectories as Gaussian distributions. In particular, our method works especially well in unoptimal settings, such as with small datasets or in the presence of noise. Our method achieves up to a 50% performance boost in small dataset settings and up to an 8% performance boost in large-scale learning compared to previous kinematic prediction methods on SOTA trajectory forecasting architectures out-of-the-box, with minimal fine-tuning. In this paper, we show four analytical formulations of probabilistic kinematic priors which can be used for any Gaussian Mixture Model (GMM)-based deep learning models, quantify the error bound on linear approximations applied during trajectory unrolling, and show results to evaluate each formulation in trajectory forecasting. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2405.09862 [pdf, other]

Performance of Quantum Networks Using Heterogeneous Link Architectures

Authors: Kento Samuel Soon, Naphan Benchasattabuse, Michal Hajdušek, Kentaro Teramoto, Shota Nagayama, Rodney Van Meter

Abstract: The heterogeneity of quantum link architectures is an essential theme in designing quantum networks for technological interoperability and possibly performance optimization. However, the performance of heterogeneously connected quantum links has not yet been addressed. Here, we investigate the integration of two inherently different technologies, with one link where the photons flow from the nodes… ▽ More The heterogeneity of quantum link architectures is an essential theme in designing quantum networks for technological interoperability and possibly performance optimization. However, the performance of heterogeneously connected quantum links has not yet been addressed. Here, we investigate the integration of two inherently different technologies, with one link where the photons flow from the nodes toward a device in the middle of the link, and a different link where pairs of photons flow from a device in the middle towards the nodes. We utilize the quantum internet simulator QuISP to conduct simulations. We first optimize the existing photon pair protocol for a single link by taking the pulse rate into account. Here, we find that increasing the pulse rate can actually decrease the overall performance. Using our optimized links, we demonstrate that heterogeneous networks actually work. Their performance is highly dependent on link configuration, but we observe no significant decrease in generation rate compared to homogeneous networks. This work provides insights into the phenomena we likely will observe when introducing technological heterogeneity into quantum networks, which is crucial for creating a scalable and robust quantum internetwork. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 10 pages, 10 figures

arXiv:2405.09861 [pdf, other]

An Implementation and Analysis of a Practical Quantum Link Architecture Utilizing Entangled Photon Sources

Authors: Kento Samuel Soon, Michal Hajdušek, Shota Nagayama, Naphan Benchasattabuse, Kentaro Teramoto, Ryosuke Satoh, Rodney Van Meter

Abstract: Quantum repeater networks play a crucial role in distributing entanglement. Various link architectures have been proposed to facilitate the creation of Bell pairs between distant nodes, with entangled photon sources emerging as a primary technology for building quantum networks. Our work advances the Memory-Source-Memory (MSM) link architecture, addressing the absence of practical implementation d… ▽ More Quantum repeater networks play a crucial role in distributing entanglement. Various link architectures have been proposed to facilitate the creation of Bell pairs between distant nodes, with entangled photon sources emerging as a primary technology for building quantum networks. Our work advances the Memory-Source-Memory (MSM) link architecture, addressing the absence of practical implementation details. We conduct numerical simulations using the Quantum Internet Simulation Package (QuISP) to analyze the performance of the MSM link and contrast it with other link architectures. We observe a saturation effect in the MSM link, where additional quantum resources do not affect the Bell pair generation rate of the link. By introducing a theoretical model, we explain the origin of this effect and characterize the parameter region where it occurs. Our work bridges theoretical insights with practical implementation, which is crucial for robust and scalable quantum networks. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 8 pages, 8 figures

arXiv:2405.04537 [pdf, other]

An intuitive multi-frequency feature representation for SO(3)-equivariant networks

Authors: Dongwon Son, Jaehyung Kim, Sanghyeon Son, Beomjoon Kim

Abstract: The usage of 3D vision algorithms, such as shape reconstruction, remains limited because they require inputs to be at a fixed canonical rotation. Recently, a simple equivariant network, Vector Neuron (VN) has been proposed that can be easily used with the state-of-the-art 3D neural network (NN) architectures. However, its performance is limited because it is designed to use only three-dimensional… ▽ More The usage of 3D vision algorithms, such as shape reconstruction, remains limited because they require inputs to be at a fixed canonical rotation. Recently, a simple equivariant network, Vector Neuron (VN) has been proposed that can be easily used with the state-of-the-art 3D neural network (NN) architectures. However, its performance is limited because it is designed to use only three-dimensional features, which is insufficient to capture the details present in 3D data. In this paper, we introduce an equivariant feature representation for map** a 3D point to a high-dimensional feature space. Our feature can discern multiple frequencies present in 3D data, which is the key to designing an expressive feature for 3D vision tasks. Our representation can be used as an input to VNs, and the results demonstrate that with our feature representation, VN captures more details, overcoming the limitation raised in its original paper. △ Less

Submitted 15 March, 2024; originally announced May 2024.

Comments: ICLR 2024

arXiv:2404.13445 [pdf, other]

DMesh: A Differentiable Mesh Representation

Authors: Sanghyun Son, Matheus Gadelha, Yang Zhou, Zexiang Xu, Ming C. Lin, Yi Zhou

Abstract: We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and select triangular faces on the tetrahedra to define the final mesh. We formulate probability of… ▽ More We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and select triangular faces on the tetrahedra to define the final mesh. We formulate probability of faces to exist on the actual surface in a differentiable manner based on the WDT. This enables DMesh to represent meshes of various topology in a differentiable way, and allows us to reconstruct the mesh under various observations, such as point cloud and multi-view images using gradient-based optimization. The source code and full paper is available at: https://sonsang.github.io/dmesh-project. △ Less

Submitted 1 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: 35 pages, 22 figures. Updated with more analysis and experimental results

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.16049 [pdf, other]

doi 10.1016/j.chaos.2024.115032

Improving Demand Forecasting in Open Systems with Cartogram-Enhanced Deep Learning

Authors: Sangjoon Park, Yongsung Kwon, Hyungjoon Soh, Mi ** Lee, Seung-Woo Son

Abstract: Predicting temporal patterns across various domains poses significant challenges due to their nuanced and often nonlinear trajectories. To address this challenge, prediction frameworks have been continuously refined, employing data-driven statistical methods, mathematical models, and machine learning. Recently, as one of the challenging systems, shared transport systems such as public bicycles hav… ▽ More Predicting temporal patterns across various domains poses significant challenges due to their nuanced and often nonlinear trajectories. To address this challenge, prediction frameworks have been continuously refined, employing data-driven statistical methods, mathematical models, and machine learning. Recently, as one of the challenging systems, shared transport systems such as public bicycles have gained prominence due to urban constraints and environmental concerns. Predicting rental and return patterns at bicycle stations remains a formidable task due to the system's openness and imbalanced usage patterns across stations. In this study, we propose a deep learning framework to predict rental and return patterns by leveraging cartogram approaches. The cartogram approach facilitates the prediction of demand for newly installed stations with no training data as well as long-period prediction, which has not been achieved before. We apply this method to public bicycle rental-and-return data in Seoul, South Korea, employing a spatial-temporal convolutional graph attention network. Our improved architecture incorporates batch attention and modified node feature updates for better prediction accuracy across different time scales. We demonstrate the effectiveness of our framework in predicting temporal patterns and its potential applications. △ Less

Submitted 26 May, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: 11 pages, 7 figures

arXiv:2403.14191 [pdf, other]

doi 10.1016/j.compbiomed.2024.108241

PECI-Net: Bolus segmentation from video fluoroscopic swallowing study images using preprocessing ensemble and cascaded inference

Authors: Dougho Park, Younghun Kim, Harim Kang, Junmyeoung Lee, **young Choi, Taeyeon Kim, Sangeok Lee, Seokil Son, Minsol Kim, Injung Kim

Abstract: Bolus segmentation is crucial for the automated detection of swallowing disorders in videofluoroscopic swallowing studies (VFSS). However, it is difficult for the model to accurately segment a bolus region in a VFSS image because VFSS images are translucent, have low contrast and unclear region boundaries, and lack color information. To overcome these challenges, we propose PECI-Net, a network arc… ▽ More Bolus segmentation is crucial for the automated detection of swallowing disorders in videofluoroscopic swallowing studies (VFSS). However, it is difficult for the model to accurately segment a bolus region in a VFSS image because VFSS images are translucent, have low contrast and unclear region boundaries, and lack color information. To overcome these challenges, we propose PECI-Net, a network architecture for VFSS image analysis that combines two novel techniques: the preprocessing ensemble network (PEN) and the cascaded inference network (CIN). PEN enhances the sharpness and contrast of the VFSS image by combining multiple preprocessing algorithms in a learnable way. CIN reduces ambiguity in bolus segmentation by using context from other regions through cascaded inference. Moreover, CIN prevents undesirable side effects from unreliably segmented regions by referring to the context in an asymmetric way. In experiments, PECI-Net exhibited higher performance than four recently developed baseline models, outperforming TernausNet, the best among the baseline models, by 4.54\% and the widely used UNet by 10.83\%. The results of the ablation studies confirm that CIN and PEN are effective in improving bolus segmentation performance. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 20 pages, 8 figures,

Journal ref: Computers in Biology and Medicine (2024)

arXiv:2403.01479 [pdf, other]

Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation

Authors: Heegon **, Seonil Son, Jemin Park, Youngseok Kim, Hyungjong Noh, Yeonsoo Lee

Abstract: The advent of scalable deep models and large datasets has improved the performance of Neural Machine Translation. Knowledge Distillation (KD) enhances efficiency by transferring knowledge from a teacher model to a more compact student model. However, KD approaches to Transformer architecture often rely on heuristics, particularly when deciding which teacher layers to distill from. In this paper, w… ▽ More The advent of scalable deep models and large datasets has improved the performance of Neural Machine Translation. Knowledge Distillation (KD) enhances efficiency by transferring knowledge from a teacher model to a more compact student model. However, KD approaches to Transformer architecture often rely on heuristics, particularly when deciding which teacher layers to distill from. In this paper, we introduce the 'Align-to-Distill' (A2D) strategy, designed to address the feature map** problem by adaptively aligning student attention heads with their teacher counterparts during training. The Attention Alignment Module in A2D performs a dense head-by-head comparison between student and teacher attention heads across layers, turning the combinatorial map** heuristics into a learning problem. Our experiments show the efficacy of A2D, demonstrating gains of up to +3.61 and +0.63 BLEU points for WMT-2022 De->Dsb and WMT-2014 En->De, respectively, compared to Transformer baselines. △ Less

Submitted 25 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: Accepted to LREC-COLING 2024

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2402.14886 [pdf]

Applying Reinforcement Learning to Optimize Traffic Light Cycles

Authors: Seungah Son, Juhee **

Abstract: Manual optimization of traffic light cycles is a complex and time-consuming task, necessitating the development of automated solutions. In this paper, we propose the application of reinforcement learning to optimize traffic light cycles in real-time. We present a case study using the Simulation Urban Mobility simulator to train a Deep Q-Network algorithm. The experimental results showed 44.16% dec… ▽ More Manual optimization of traffic light cycles is a complex and time-consuming task, necessitating the development of automated solutions. In this paper, we propose the application of reinforcement learning to optimize traffic light cycles in real-time. We present a case study using the Simulation Urban Mobility simulator to train a Deep Q-Network algorithm. The experimental results showed 44.16% decrease in the average number of Emergency stops, showing the potential of our approach to reduce traffic congestion and improve traffic flow. Furthermore, we discuss avenues for future research and enhancements to the reinforcement learning model. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.14854 [pdf, other]

A Dual-Prompting for Interpretable Mental Health Language Models

Authors: Hyolim Jeon, Dongje Yoo, Daeun Lee, Sejung Son, Seungbae Kim, **young Han

Abstract: Despite the increasing demand for AI-based mental health monitoring tools, their practical utility for clinicians is limited by the lack of interpretability.The CLPsych 2024 Shared Task (Chim et al., 2024) aims to enhance the interpretability of Large Language Models (LLMs), particularly in mental health analysis, by providing evidence of suicidality through linguistic content. We propose a dual-p… ▽ More Despite the increasing demand for AI-based mental health monitoring tools, their practical utility for clinicians is limited by the lack of interpretability.The CLPsych 2024 Shared Task (Chim et al., 2024) aims to enhance the interpretability of Large Language Models (LLMs), particularly in mental health analysis, by providing evidence of suicidality through linguistic content. We propose a dual-prompting approach: (i) Knowledge-aware evidence extraction by leveraging the expert identity and a suicide dictionary with a mental health-specific LLM; and (ii) Evidence summarization by employing an LLM-based consistency evaluator. Comprehensive experiments demonstrate the effectiveness of combining domain-specific information, revealing performance improvements and the approach's potential to aid clinicians in assessing mental state progression. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Journal ref: Proceedings of the Ninth Workshop on Computational Linguistics and Clinical Psychology 2024

arXiv:2401.15726 [pdf, other]

Long-Term Typhoon Trajectory Prediction: A Physics-Conditioned Approach Without Reanalysis Data

Authors: Young-Jae Park, Minseok Seo, Doyi Kim, Hyeri Kim, Sanghoon Choi, Beomkyu Choi, Jeongwon Ryu, Sohee Son, Hae-Gon Jeon, Yeji Choi

Abstract: In the face of escalating climate changes, typhoon intensities and their ensuing damage have surged. Accurate trajectory prediction is crucial for effective damage control. Traditional physics-based models, while comprehensive, are computationally intensive and rely heavily on the expertise of forecasters. Contemporary data-driven methods often rely on reanalysis data, which can be considered to b… ▽ More In the face of escalating climate changes, typhoon intensities and their ensuing damage have surged. Accurate trajectory prediction is crucial for effective damage control. Traditional physics-based models, while comprehensive, are computationally intensive and rely heavily on the expertise of forecasters. Contemporary data-driven methods often rely on reanalysis data, which can be considered to be the closest to the true representation of weather conditions. However, reanalysis data is not produced in real-time and requires time for adjustment because prediction models are calibrated with observational data. This reanalysis data, such as ERA5, falls short in challenging real-world situations. Optimal preparedness necessitates predictions at least 72 hours in advance, beyond the capabilities of standard physics models. In response to these constraints, we present an approach that harnesses real-time Unified Model (UM) data, sidestep** the limitations of reanalysis data. Our model provides predictions at 6-hour intervals for up to 72 hours in advance and outperforms both state-of-the-art data-driven methods and numerical weather prediction models. In line with our efforts to mitigate adversities inflicted by \rthree{typhoons}, we release our preprocessed \textit{PHYSICS TRACK} dataset, which includes ERA5 reanalysis data, typhoon best-track, and UM forecast data. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: This paper was accepted for a Spotlight presentation at ICLR 2024

arXiv:2401.00460 [pdf, other]

RainSD: Rain Style Diversification Module for Image Synthesis Enhancement using Feature-Level Style Distribution

Authors: Hyeonjae Jeon, Junghyun Seo, Taesoo Kim, Sungho Son, Jungki Lee, Gyeungho Choi, Yongseob Lim

Abstract: Autonomous driving technology nowadays targets to level 4 or beyond, but the researchers are faced with some limitations for develo** reliable driving algorithms in diverse challenges. To promote the autonomous vehicles to spread widely, it is important to address safety issues on this technology. Among various safety concerns, the sensor blockage problem by severe weather conditions can be one… ▽ More Autonomous driving technology nowadays targets to level 4 or beyond, but the researchers are faced with some limitations for develo** reliable driving algorithms in diverse challenges. To promote the autonomous vehicles to spread widely, it is important to address safety issues on this technology. Among various safety concerns, the sensor blockage problem by severe weather conditions can be one of the most frequent threats for multi-task learning based perception algorithms during autonomous driving. To handle this problem, the importance of the generation of proper datasets is becoming more significant. In this paper, a synthetic road dataset with sensor blockage generated from real road dataset BDD100K is suggested in the format of BDD100K annotation. Rain streaks for each frame were made by an experimentally established equation and translated utilizing the image-to-image translation network based on style transfer. Using this dataset, the degradation of the diverse multi-task networks for autonomous driving, such as lane detection, driving area segmentation, and traffic object detection, has been thoroughly evaluated and analyzed. The tendency of the performance degradation of deep neural network-based perception systems for autonomous vehicle has been analyzed in depth. Finally, we discuss the limitation and the future directions of the deep neural network-based perception algorithms and autonomous driving dataset generation based on image-to-image translation. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: Under Review

MSC Class: 14J60 (Autonomous Vehicles) 14F05; 14J26 (Adverse Weather Condition)

arXiv:2312.08710 [pdf, other]

Gradient Informed Proximal Policy Optimization

Authors: Sanghyun Son, Laura Yu Zheng, Ryan Sullivan, Yi-Ling Qiao, Ming C. Lin

Abstract: We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an α-policy that stands as a locally superior policy. By adaptively modifying the α value, we can effectively manage the influence of analytica… ▽ More We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an α-policy that stands as a locally superior policy. By adaptively modifying the α value, we can effectively manage the influence of analytical policy gradients during learning. To this end, we suggest metrics for assessing the variance and bias of analytical gradients, reducing dependence on these gradients when high variance or bias is detected. Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments. Our code can be found online: https://github.com/SonSang/gippo. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 27 pages, NeurIPS 2023 Conference

arXiv:2310.11710 [pdf, other]

doi 10.18653/v1/2023.emnlp-main.577

Learning Co-Speech Gesture for Multimodal Aphasia Type Detection

Authors: Daeun Lee, Sejung Son, Hyolim Jeon, Seungbae Kim, **young Han

Abstract: Aphasia, a language disorder resulting from brain damage, requires accurate identification of specific aphasia types, such as Broca's and Wernicke's aphasia, for effective treatment. However, little attention has been paid to develo** methods to detect different types of aphasia. Recognizing the importance of analyzing co-speech gestures for distinguish aphasia types, we propose a multimodal gra… ▽ More Aphasia, a language disorder resulting from brain damage, requires accurate identification of specific aphasia types, such as Broca's and Wernicke's aphasia, for effective treatment. However, little attention has been paid to develo** methods to detect different types of aphasia. Recognizing the importance of analyzing co-speech gestures for distinguish aphasia types, we propose a multimodal graph neural network for aphasia type detection using speech and corresponding gesture patterns. By learning the correlation between the speech and gesture modalities for each aphasia type, our model can generate textual representations sensitive to gesture information, leading to accurate aphasia type detection. Extensive experiments demonstrate the superiority of our approach over existing methods, achieving state-of-the-art results (F1 84.2\%). We also show that gesture features outperform acoustic features, highlighting the significance of gesture expression in detecting aphasia types. We provide the codes for reproducibility purposes. △ Less

Submitted 20 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 accepted

Journal ref: EMNLP 2023:Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

arXiv:2308.02596 [pdf, other]

doi 10.1007/s40042-023-00921-8

Revisiting small-world network models: Exploring technical realizations and the equivalence of the Newman-Watts and Harary models

Authors: Seora Son, Eun Ji Choi, Sang Hoon Lee

Abstract: We address the relatively less known facts on the equivalence and technical realizations surrounding two network models showing the "small-world" property, namely the Newman-Watts and the Harary models. We provide the most accurate (in terms of faithfulness to the original literature) versions of these models to clarify the deviation from them existing in their variants adopted in one of the most… ▽ More We address the relatively less known facts on the equivalence and technical realizations surrounding two network models showing the "small-world" property, namely the Newman-Watts and the Harary models. We provide the most accurate (in terms of faithfulness to the original literature) versions of these models to clarify the deviation from them existing in their variants adopted in one of the most popular network analysis packages. The difference in technical realizations of those models could be conceived as minor details, but we discover significantly notable changes caused by the possibly inadvertent modification. For the Harary model, the stochasticity in the original formulation allows a much wider range of the clustering coefficient and the average shortest path length. For the Newman-Watts model, due to the drastically different degree distributions, the clustering coefficient can also be affected, which is verified by our higher-order analytic derivation. During the process, we discover the equivalence of the Newman-Watts (better known in the network science or physics community) and the Harary (better known in the graph theory or mathematics community) models under a specific condition of restricted parity in variables, which would bridge the two relatively independently developed models in different fields. Our result highlights the importance of each detailed step in constructing network models and the possibility of deeply related models, even if they might initially appear distinct in terms of the time period or the academic disciplines from which they emerged. △ Less

Submitted 12 December, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: 11 pages, 5 figures, 1 table

Journal ref: J. Korean Phys. Soc. 83, 879 (2023)

arXiv:2307.12751 [pdf, other]

ICF-SRSR: Invertible scale-Conditional Function for Self-Supervised Real-world Single Image Super-Resolution

Authors: Reyhaneh Neshatavar, Mohsen Yavartanoo, Sanghyun Son, Kyoung Mu Lee

Abstract: Single image super-resolution (SISR) is a challenging ill-posed problem that aims to up-sample a given low-resolution (LR) image to a high-resolution (HR) counterpart. Due to the difficulty in obtaining real LR-HR training pairs, recent approaches are trained on simulated LR images degraded by simplified down-sampling operators, e.g., bicubic. Such an approach can be problematic in practice becaus… ▽ More Single image super-resolution (SISR) is a challenging ill-posed problem that aims to up-sample a given low-resolution (LR) image to a high-resolution (HR) counterpart. Due to the difficulty in obtaining real LR-HR training pairs, recent approaches are trained on simulated LR images degraded by simplified down-sampling operators, e.g., bicubic. Such an approach can be problematic in practice because of the large gap between the synthesized and real-world LR images. To alleviate the issue, we propose a novel Invertible scale-Conditional Function (ICF), which can scale an input image and then restore the original input with different scale conditions. By leveraging the proposed ICF, we construct a novel self-supervised SISR framework (ICF-SRSR) to handle the real-world SR task without using any paired/unpaired training data. Furthermore, our ICF-SRSR can generate realistic and feasible LR-HR pairs, which can make existing supervised SISR networks more robust. Extensive experiments demonstrate the effectiveness of the proposed method in handling SISR in a fully self-supervised manner. Our ICF-SRSR demonstrates superior performance compared to the existing methods trained on synthetic paired images in real-world scenarios and exhibits comparable performance compared to state-of-the-art supervised/unsupervised methods on public benchmark datasets. △ Less

Submitted 31 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.00995 [pdf, other]

doi 10.1145/3580305.3599917

Towards Suicide Prevention from Bipolar Disorder with Temporal Symptom-Aware Multitask Learning

Authors: Daeun Lee, Sejung Son, Hyolim Jeon, Seungbae Kim, **young Han

Abstract: Bipolar disorder (BD) is closely associated with an increased risk of suicide. However, while the prior work has revealed valuable insight into understanding the behavior of BD patients on social media, little attention has been paid to develo** a model that can predict the future suicidality of a BD patient. Therefore, this study proposes a multi-task learning model for predicting the future su… ▽ More Bipolar disorder (BD) is closely associated with an increased risk of suicide. However, while the prior work has revealed valuable insight into understanding the behavior of BD patients on social media, little attention has been paid to develo** a model that can predict the future suicidality of a BD patient. Therefore, this study proposes a multi-task learning model for predicting the future suicidality of BD patients by jointly learning current symptoms. We build a novel BD dataset clinically validated by psychiatrists, including 14 years of posts on bipolar-related subreddits written by 818 BD patients, along with the annotations of future suicidality and BD symptoms. We also suggest a temporal symptom-aware attention mechanism to determine which symptoms are the most influential for predicting future suicidality over time through a sequence of BD posts. Our experiments demonstrate that the proposed model outperforms the state-of-the-art models in both BD symptom identification and future suicidality prediction tasks. In addition, the proposed temporal symptom-aware attention provides interpretable attention weights, hel** clinicians to apprehend BD patients more comprehensively and to provide timely intervention by tracking mental state progression. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: KDD 2023 accepted

Journal ref: KDD 2023: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

arXiv:2306.06461 [pdf]

Semi-supervsied Learning-based Sound Event Detection using Freuqency Dynamic Convolution with Large Kernel Attention for DCASE Challenge 2023 Task 4

Authors: Ji Won Kim, Sang Won Son, Yoonah Song, Hong Kook Kim, Il Hoon Song, Jeong Eun Lim

Abstract: This report proposes a frequency dynamic convolution (FDY) with a large kernel attention (LKA)-convolutional recurrent neural network (CRNN) with a pre-trained bidirectional encoder representation from audio transformers (BEATs) embedding-based sound event detection (SED) model that employs a mean-teacher and pseudo-label approach to address the challenge of limited labeled data for DCASE 2023 Tas… ▽ More This report proposes a frequency dynamic convolution (FDY) with a large kernel attention (LKA)-convolutional recurrent neural network (CRNN) with a pre-trained bidirectional encoder representation from audio transformers (BEATs) embedding-based sound event detection (SED) model that employs a mean-teacher and pseudo-label approach to address the challenge of limited labeled data for DCASE 2023 Task 4. The proposed FDY with LKA integrates the FDY and LKA module to effectively capture time-frequency patterns, long-term dependencies, and high-level semantic information in audio signals. The proposed FDY with LKA-CRNN with a BEATs embedding network is initially trained on the entire DCASE 2023 Task 4 dataset using the mean-teacher approach, generating pseudo-labels for weakly labeled, unlabeled, and the AudioSet. Subsequently, the proposed SED model is retrained using the same pseudo-label approach. A subset of these models is selected for submission, demonstrating superior F1-scores and polyphonic SED score performance on the DCASE 2023 Challenge Task 4 validation dataset. △ Less

Submitted 10 June, 2023; originally announced June 2023.

Comments: DCASE 2023 Challenge Task 4A, 5 pages

arXiv:2305.15417 [pdf, other]

Entropy-Aware Similarity for Balanced Clustering: A Case Study with Melanoma Detection

Authors: Seok Bin Son, Soohyun Park, Joongheon Kim

Abstract: Clustering data is an unsupervised learning approach that aims to divide a set of data points into multiple groups. It is a crucial yet demanding subject in machine learning and data mining. Its successful applications span various fields. However, conventional clustering techniques necessitate the consideration of balance significance in specific applications. Therefore, this paper addresses the… ▽ More Clustering data is an unsupervised learning approach that aims to divide a set of data points into multiple groups. It is a crucial yet demanding subject in machine learning and data mining. Its successful applications span various fields. However, conventional clustering techniques necessitate the consideration of balance significance in specific applications. Therefore, this paper addresses the challenge of imbalanced clustering problems and presents a new method for balanced clustering by utilizing entropy-aware similarity, which can be defined as the degree of balances. We have coined the term, entropy-aware similarity for balanced clustering (EASB), which maximizes balance during clustering by complementary clustering of unbalanced data and incorporating entropy in a novel similarity formula that accounts for both angular differences and distances. The effectiveness of the proposed approach is evaluated on actual melanoma medial data, specifically the International Skin Imaging Collaboration (ISIC) 2019 and 2020 challenge datasets, to demonstrate how it can successfully cluster the data while preserving balance. Lastly, we can confirm that the proposed method exhibited outstanding performance in detecting melanoma, comparing to classical methods. △ Less

Submitted 11 May, 2023; originally announced May 2023.

arXiv:2305.14032 [pdf, other]

doi 10.21437/Interspeech.2023-1426

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification

Authors: Sangmin Bae, June-Woo Kim, Won-Yang Cho, Hyerim Baek, Soyoun Son, Byungjo Lee, Changwan Ha, Kyongpil Tae, Sungnyun Kim, Se-Young Yun

Abstract: Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data. In this study,… ▽ More Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data. In this study, we demonstrate that the pretrained model on large-scale visual and audio datasets can be generalized to the respiratory sound classification task. In addition, we introduce a straightforward Patch-Mix augmentation, which randomly mixes patches between different samples, with Audio Spectrogram Transformer (AST). We further propose a novel and effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space. Our method achieves state-of-the-art performance on the ICBHI dataset, outperforming the prior leading score by an improvement of 4.08%. △ Less

Submitted 22 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: INTERSPEECH 2023, Code URL: https://github.com/raymin0223/patch-mix_contrastive_learning

arXiv:2305.06110 [pdf, other]

Pavlok-Nudge: A Feedback Mechanism for Atomic Behaviour Modification with Snoring Usecase

Authors: Shreya Ghosh, Md Rakibul Hasan, Pradyumna Agrawal, Zhixi Cai, Susannah Soon, Abhinav Dhall, Tom Gedeon

Abstract: This paper proposes a feedback mechanism to 'break bad habits' using the Pavlok device. Pavlok utilises beeps, vibration and shocks as a mode of aversion technique to help individuals with behaviour modification. While the device can be useful in certain periodic daily life situations, like alarms and exercise notifications, the device relies on manual operations that limit its usage. To this end,… ▽ More This paper proposes a feedback mechanism to 'break bad habits' using the Pavlok device. Pavlok utilises beeps, vibration and shocks as a mode of aversion technique to help individuals with behaviour modification. While the device can be useful in certain periodic daily life situations, like alarms and exercise notifications, the device relies on manual operations that limit its usage. To this end, we design a user interface to generate an automatic feedback mechanism that integrates Pavlok and a deep learning based model to detect certain behaviours via an integrated user interface i.e. mobile or desktop application. Our proposed solution is implemented and verified in the context of snoring, which first detects audio from the environment following a prediction of whether the audio content is a snore or not. Based on the prediction of the deep learning model, we use Pavlok to alert users for preventive measures. We believe that this simple solution can help people to change their atomic habits, which may lead to long-term benefits. △ Less

Submitted 10 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: Shreya Ghosh, Md Rakibul Hasan and Pradyumna Agrawal contributed equally to this research

arXiv:2302.13509 [pdf, other]

GeoLCR: Attention-based Geometric Loop Closure and Registration

Authors: **g Liang, Sanghyun Son, Ming Lin, Dinesh Manocha

Abstract: We present a novel algorithm specially designed for loop detection and registration that utilizes Lidar-based perception. Our approach to loop detection involves voxelizing point clouds, followed by an overlap calculation to confirm whether a vehicle has completed a loop. We further enhance the current pose's accuracy via an innovative point-level registration model. The efficacy of our algorithm… ▽ More We present a novel algorithm specially designed for loop detection and registration that utilizes Lidar-based perception. Our approach to loop detection involves voxelizing point clouds, followed by an overlap calculation to confirm whether a vehicle has completed a loop. We further enhance the current pose's accuracy via an innovative point-level registration model. The efficacy of our algorithm has been assessed across a range of well-known datasets, including KITTI, KITTI-360, Nuscenes, Complex Urban, NCLT, and MulRan. In comparative terms, our method exhibits up to a twofold increase in the precision of both translation and rotation estimations. Particularly noteworthy is our method's performance on challenging sequences where it outperforms others, being the first to achieve a perfect 100% success rate in loop detection. △ Less

Submitted 16 July, 2023; v1 submitted 26 February, 2023; originally announced February 2023.

arXiv:2302.10494 [pdf, other]

MaskedKD: Efficient Distillation of Vision Transformers with Masked Images

Authors: Seungwoo Son, Namhoon Lee, Jaeho Lee

Abstract: Knowledge distillation is an effective method for training lightweight models, but it introduces a significant amount of computational overhead to the training cost, as the method requires acquiring teacher supervisions on training samples. This additional cost -- called distillation cost -- is most pronounced when we employ large-scale teacher models such as vision transformers (ViTs). We present… ▽ More Knowledge distillation is an effective method for training lightweight models, but it introduces a significant amount of computational overhead to the training cost, as the method requires acquiring teacher supervisions on training samples. This additional cost -- called distillation cost -- is most pronounced when we employ large-scale teacher models such as vision transformers (ViTs). We present MaskedKD, a simple yet effective strategy that can significantly reduce the cost of distilling ViTs without sacrificing the prediction accuracy of the student model. Specifically, MaskedKD diminishes the cost of running teacher at inference by masking a fraction of image patch tokens fed to the teacher, and therefore skip** the computations required to process those patches. The mask locations are selected to prevent masking away the core features of an image that the student model uses for prediction. This mask selection mechanism operates based on some attention score of the student model, which is already computed during the student forward pass, and thus incurs almost no additional computation. Without sacrificing the final student accuracy, MaskedKD dramatically reduces the amount of computations required for distilling ViTs. We demonstrate that MaskedKD can save up the distillation cost by $50\%$ without any student performance drop, leading to approximately $28\%$ drop in the overall training FLOPs. △ Less

Submitted 31 May, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

arXiv:2211.12118 [pdf, other]

HaRiM$^+$: Evaluating Summary Quality with Hallucination Risk

Authors: Seonil Son, Junsoo Park, Jeong-in Hwang, Junghwa Lee, Hyungjong Noh, Yeonsoo Lee

Abstract: One of the challenges of develo** a summarization model arises from the difficulty in measuring the factual inconsistency of the generated text. In this study, we reinterpret the decoder overconfidence-regularizing objective suggested in (Miao et al., 2021) as a hallucination risk measurement to better estimate the quality of generated summaries. We propose a reference-free metric, HaRiM+, which… ▽ More One of the challenges of develo** a summarization model arises from the difficulty in measuring the factual inconsistency of the generated text. In this study, we reinterpret the decoder overconfidence-regularizing objective suggested in (Miao et al., 2021) as a hallucination risk measurement to better estimate the quality of generated summaries. We propose a reference-free metric, HaRiM+, which only requires an off-the-shelf summarization model to compute the hallucination risk based on token likelihoods. Deploying it requires no additional training of models or ad-hoc modules, which usually need alignment to human judgments. For summary-quality estimation, HaRiM+ records state-of-the-art correlation to human judgment on three summary-quality annotation sets: FRANK, QAGS, and SummEval. We hope that our work, which merits the use of summarization models, facilitates the progress of both automated evaluation and generation of summary. △ Less

Submitted 24 November, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: 9 pages (+ 21 pages of Appendix), AACL 2022

Journal ref: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2022), pages 895-924

arXiv:2210.10482 [pdf, other]

Effective Targeted Attacks for Adversarial Self-Supervised Learning

Authors: Minseon Kim, Hyeonjeong Ha, Sooel Son, Sung Ju Hwang

Abstract: Recently, unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information. Previous studies in unsupervised AT have mostly focused on implementing self-supervised learning (SSL) frameworks, which maximize the instance-wise classification loss to generate adversarial examples. However, we observe that simply maximizing the self-… ▽ More Recently, unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information. Previous studies in unsupervised AT have mostly focused on implementing self-supervised learning (SSL) frameworks, which maximize the instance-wise classification loss to generate adversarial examples. However, we observe that simply maximizing the self-supervised training loss with an untargeted adversarial attack often results in generating ineffective adversaries that may not help improve the robustness of the trained model, especially for non-contrastive SSL frameworks without negative examples. To tackle this problem, we propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks. Specifically, we introduce an algorithm that selects the most confusing yet similar target example for a given instance based on entropy and similarity, and subsequently perturbs the given instance towards the selected target. Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks, on the benchmark datasets. △ Less

Submitted 26 October, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: NeurIPS 2023

arXiv:2210.08046 [pdf, other]

Differentiable Hybrid Traffic Simulation

Authors: Sanghyun Son, Yi-Ling Qiao, Jason Sewall, Ming C. Lin

Abstract: We introduce a novel differentiable hybrid traffic simulator, which simulates traffic using a hybrid model of both macroscopic and microscopic models and can be directly integrated into a neural network for traffic control and flow optimization. This is the first differentiable traffic simulator for macroscopic and hybrid models that can compute gradients for traffic states across time steps and i… ▽ More We introduce a novel differentiable hybrid traffic simulator, which simulates traffic using a hybrid model of both macroscopic and microscopic models and can be directly integrated into a neural network for traffic control and flow optimization. This is the first differentiable traffic simulator for macroscopic and hybrid models that can compute gradients for traffic states across time steps and inhomogeneous lanes. To compute the gradient flow between two types of traffic models in a hybrid framework, we present a novel intermediate conversion component that bridges the lanes in a differentiable manner as well. We also show that we can use analytical gradients to accelerate the overall process and enhance scalability. Thanks to these gradients, our simulator can provide more efficient and scalable solutions for complex learning and control problems posed in traffic engineering than other existing algorithms. Refer to https://sites.google.com/umd.edu/diff-hybrid-traffic-sim for our project. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: 13 pages, Siggraph Asia 2022 Journal Paper

ACM Class: I.6.1; I.6.3

arXiv:2210.03772 [pdf, other]

Traffic-Aware Autonomous Driving with Differentiable Traffic Simulation

Authors: Laura Zheng, Sanghyun Son, Ming C. Lin

Abstract: While there have been advancements in autonomous driving control and traffic simulation, there have been little to no works exploring their unification with deep learning. Works in both areas seem to focus on entirely different exclusive problems, yet traffic and driving are inherently related in the real world. In this paper, we present Traffic-Aware Autonomous Driving (TrAAD), a generalizable di… ▽ More While there have been advancements in autonomous driving control and traffic simulation, there have been little to no works exploring their unification with deep learning. Works in both areas seem to focus on entirely different exclusive problems, yet traffic and driving are inherently related in the real world. In this paper, we present Traffic-Aware Autonomous Driving (TrAAD), a generalizable distillation-style method for traffic-informed imitation learning that directly optimizes for faster traffic flow and lower energy consumption. TrAAD focuses on the supervision of speed control in imitation learning systems, as most driving research focuses on perception and steering. Moreover, our method addresses the lack of co-simulation between traffic and driving simulators and provides a basis for directly involving traffic simulation with autonomous driving in future work. Our results show that, with information from traffic simulation involved in the supervision of imitation learning methods, an autonomous vehicle can learn how to accelerate in a fashion that is beneficial for traffic flow and overall energy consumption for all nearby vehicles. △ Less

Submitted 6 April, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

arXiv:2209.10922 [pdf, other]

Learning to Write with Coherence From Negative Examples

Authors: Seonil Son, Jaeseo Lim, Youwon Jang, Jaeyoung Lee, Byoung-Tak Zhang

Abstract: Coherence is one of the critical factors that determine the quality of writing. We propose writing relevance (WR) training method for neural encoder-decoder natural language generation (NLG) models which improves coherence of the continuation by leveraging negative examples. WR loss regresses the vector representation of the context and generated sentence toward positive continuation by contrastin… ▽ More Coherence is one of the critical factors that determine the quality of writing. We propose writing relevance (WR) training method for neural encoder-decoder natural language generation (NLG) models which improves coherence of the continuation by leveraging negative examples. WR loss regresses the vector representation of the context and generated sentence toward positive continuation by contrasting it with the negatives. We compare our approach with Unlikelihood (UL) training in a text continuation task on commonsense natural language inference (NLI) corpora to show which method better models the coherence by avoiding unlikely continuations. The preference of our approach in human evaluation shows the efficacy of our method in improving coherence. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: 4+1 pages, 4 figures, 2 tables. ICASSP 2022 rejected

arXiv:2209.09777 [pdf, other]

WGICP: Differentiable Weighted GICP-Based Lidar Odometry

Authors: Sanghyun Son, **g Liang, Ming Lin, Dinesh Manocha

Abstract: We present a novel differentiable weighted generalized iterative closest point (WGICP) method applicable to general 3D point cloud data, including that from Lidar. Our method builds on differentiable generalized ICP (GICP), and we propose using the differentiable K-Nearest Neighbor (KNN) algorithm to enhance differentiability. The differentiable GICP algorithm provides the gradient of output pose… ▽ More We present a novel differentiable weighted generalized iterative closest point (WGICP) method applicable to general 3D point cloud data, including that from Lidar. Our method builds on differentiable generalized ICP (GICP), and we propose using the differentiable K-Nearest Neighbor (KNN) algorithm to enhance differentiability. The differentiable GICP algorithm provides the gradient of output pose estimation with respect to each input point, which allows us to train a neural network to predict its importance, or weight, in estimating the correct pose. In contrast to the other ICP-based methods that use voxel-based downsampling or matching methods to reduce the computational cost, our method directly reduces the number of points used for GICP by only selecting those with the highest weights and ignoring redundant ones with lower weights. We show that our method improves both accuracy and speed of the GICP algorithm for the KITTI dataset and can be used to develop a more robust and efficient SLAM system. △ Less

Submitted 3 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: 6 pages

arXiv:2209.06422 [pdf, other]

Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models

Authors: Suhyune Son, Chanjun Park, Jungseob Lee, Midan Shim, Chanhee Lee, Yoonna Jang, Jaehyung Seo, Heuiseok Lim

Abstract: As pre-trained language models become more resource-demanding, the inequality between resource-rich languages such as English and resource-scarce languages is worsening. This can be attributed to the fact that the amount of available training data in each language follows the power-law distribution, and most of the languages belong to the long tail of the distribution. Some research areas attempt… ▽ More As pre-trained language models become more resource-demanding, the inequality between resource-rich languages such as English and resource-scarce languages is worsening. This can be attributed to the fact that the amount of available training data in each language follows the power-law distribution, and most of the languages belong to the long tail of the distribution. Some research areas attempt to mitigate this problem. For example, in cross-lingual transfer learning and multilingual training, the goal is to benefit long-tail languages via the knowledge acquired from resource-rich languages. Although being successful, existing work has mainly focused on experimenting on as many languages as possible. As a result, targeted in-depth analysis is mostly absent. In this study, we focus on a single low-resource language and perform extensive evaluation and probing experiments using cross-lingual post-training (XPT). To make the transfer scenario challenging, we choose Korean as the target language, as it is a language isolate and thus shares almost no typology with English. Results show that XPT not only outperforms or performs on par with monolingual models trained with orders of magnitudes more data but also is highly efficient in the transfer process. △ Less

Submitted 14 September, 2022; originally announced September 2022.

arXiv:2209.00862 [pdf, other]

Spatio-Temporal Attack Course-of-Action (COA) Search Learning for Scalable and Time-Varying Networks

Authors: Haemin Lee, Seok Bin Son, Won Joon Yun, Joongheon Kim, Soyi Jung, Dong Hwa Kim

Abstract: One of the key topics in network security research is the autonomous COA (Couse-of-Action) attack search method. Traditional COA attack search methods that passively search for attacks can be difficult, especially as the network gets bigger. To address these issues, new autonomous COA techniques are being developed, and among them, an intelligent spatial algorithm is designed in this paper for eff… ▽ More One of the key topics in network security research is the autonomous COA (Couse-of-Action) attack search method. Traditional COA attack search methods that passively search for attacks can be difficult, especially as the network gets bigger. To address these issues, new autonomous COA techniques are being developed, and among them, an intelligent spatial algorithm is designed in this paper for efficient operations in scalable networks. On top of the spatial search, a Monte-Carlo (MC)- based temporal approach is additionally considered for taking care of time-varying network behaviors. Therefore, we propose a spatio-temporal attack COA search algorithm for scalable and time-varying networks. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2205.13763 [pdf, other]

Tutorial on Course-of-Action (COA) Attack Search Methods in Computer Networks

Authors: Seok Bin Son, Soohyun Park, Haemin Lee, Joongheon Kim, Soyi Jung, Donghwa Kim

Abstract: In the literature of modern network security research, deriving effective and efficient course-of-action (COA) attach search methods are of interests in industry and academia. As the network size grows, the traditional COA attack search methods can suffer from the limitations to computing and communication resources. Therefore, various methods have been developed to solve these problems, and reinf… ▽ More In the literature of modern network security research, deriving effective and efficient course-of-action (COA) attach search methods are of interests in industry and academia. As the network size grows, the traditional COA attack search methods can suffer from the limitations to computing and communication resources. Therefore, various methods have been developed to solve these problems, and reinforcement learning (RL)-based intelligent algorithms are one of the most effective solutions. Therefore, we review the RL-based COA attack search methods for network attack scenarios in terms of the trends and their contrib △ Less

Submitted 27 May, 2022; originally announced May 2022.

arXiv:2204.04892 [pdf, other]

JORLDY: a fully customizable open source framework for reinforcement learning

Authors: Kyushik Min, Hyunho Lee, Kwansu Shin, Taehak Lee, Hojoon Lee, **won Choi, Sungho Son

Abstract: Recently, Reinforcement Learning (RL) has been actively researched in both academic and industrial fields. However, there exist only a few RL frameworks which are developed for researchers or students who want to study RL. In response, we propose an open-source RL framework "Join Our Reinforcement Learning framework for Develo** Yours" (JORLDY). JORLDY provides more than 20 widely used RL algori… ▽ More Recently, Reinforcement Learning (RL) has been actively researched in both academic and industrial fields. However, there exist only a few RL frameworks which are developed for researchers or students who want to study RL. In response, we propose an open-source RL framework "Join Our Reinforcement Learning framework for Develo** Yours" (JORLDY). JORLDY provides more than 20 widely used RL algorithms which are implemented with Pytorch. Also, JORLDY supports multiple RL environments which include OpenAI gym, Unity ML-Agents, Mujoco, Super Mario Bros and Procgen. Moreover, the algorithmic components such as agent, network, environment can be freely customized, so that the users can easily modify and append algorithmic components. We expect that JORLDY will support various RL research and contribute further advance the field of RL. The source code of JORLDY is provided on the following Github: https://github.com/kakaoenterprise/JORLDY △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: 12 pages, 6 figures

arXiv:2203.15662 [pdf, other]

MatteFormer: Transformer-Based Image Matting via Prior-Tokens

Authors: GyuTae Park, SungJoon Son, JaeYoung Yoo, SeHo Kim, Nojun Kwak

Abstract: In this paper, we propose a transformer-based image matting model called MatteFormer, which takes full advantage of trimap information in the transformer block. Our method first introduces a prior-token which is a global representation of each trimap region (e.g. foreground, background and unknown). These prior-tokens are used as global priors and participate in the self-attention mechanism of eac… ▽ More In this paper, we propose a transformer-based image matting model called MatteFormer, which takes full advantage of trimap information in the transformer block. Our method first introduces a prior-token which is a global representation of each trimap region (e.g. foreground, background and unknown). These prior-tokens are used as global priors and participate in the self-attention mechanism of each block. Each stage of the encoder is composed of PAST (Prior-Attentive Swin Transformer) block, which is based on the Swin Transformer block, but differs in a couple of aspects: 1) It has PA-WSA (Prior-Attentive Window Self-Attention) layer, performing self-attention not only with spatial-tokens but also with prior-tokens. 2) It has prior-memory which saves prior-tokens accumulatively from the previous blocks and transfers them to the next block. We evaluate our MatteFormer on the commonly used image matting datasets: Composition-1k and Distinctions-646. Experiment results show that our proposed method achieves state-of-the-art performance with a large margin. Our codes are available at https://github.com/webtoon/matteformer. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.13009 [pdf, other]

CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from Image

Authors: Reyhaneh Neshatavar, Mohsen Yavartanoo, Sanghyun Son, Kyoung Mu Lee

Abstract: Recently, significant progress has been made on image denoising with strong supervision from large-scale datasets. However, obtaining well-aligned noisy-clean training image pairs for each specific scenario is complicated and costly in practice. Consequently, applying a conventional supervised denoising network on in-the-wild noisy inputs is not straightforward. Although several studies have chall… ▽ More Recently, significant progress has been made on image denoising with strong supervision from large-scale datasets. However, obtaining well-aligned noisy-clean training image pairs for each specific scenario is complicated and costly in practice. Consequently, applying a conventional supervised denoising network on in-the-wild noisy inputs is not straightforward. Although several studies have challenged this problem without strong supervision, they rely on less practical assumptions and cannot be applied to practical situations directly. To address the aforementioned challenges, we propose a novel and powerful self-supervised denoising method called CVF-SID based on a Cyclic multi-Variate Function (CVF) module and a self-supervised image disentangling (SID) framework. The CVF module can output multiple decomposed variables of the input and take a combination of the outputs back as an input in a cyclic manner. Our CVF-SID can disentangle a clean image and noise maps from the input by leveraging various self-supervised loss terms. Unlike several methods that only consider the signal-independent noise models, we also deal with signal-dependent noise components for real-world applications. Furthermore, we do not rely on any prior assumptions about the underlying noise distribution, making CVF-SID more generalizable toward realistic noise. Extensive experiments on real-world datasets show that CVF-SID achieves state-of-the-art self-supervised image denoising performance and is comparable to other existing approaches. The code is publicly available from https://github.com/Reyhanehne/CVF-SID_PyTorch . △ Less

Submitted 29 March, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: Published at CVPR 2022

arXiv:2203.11799 [pdf, other]

AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network

Authors: Wooseok Lee, Sanghyun Son, Kyoung Mu Lee

Abstract: Blind-spot network (BSN) and its variants have made significant advances in self-supervised denoising. Nevertheless, they are still bound to synthetic noisy inputs due to less practical assumptions like pixel-wise independent noise. Hence, it is challenging to deal with spatially correlated real-world noise using self-supervised BSN. Recently, pixel-shuffle downsampling (PD) has been proposed to r… ▽ More Blind-spot network (BSN) and its variants have made significant advances in self-supervised denoising. Nevertheless, they are still bound to synthetic noisy inputs due to less practical assumptions like pixel-wise independent noise. Hence, it is challenging to deal with spatially correlated real-world noise using self-supervised BSN. Recently, pixel-shuffle downsampling (PD) has been proposed to remove the spatial correlation of real-world noise. However, it is not trivial to integrate PD and BSN directly, which prevents the fully self-supervised denoising model on real-world images. We propose an Asymmetric PD (AP) to address this issue, which introduces different PD stride factors for training and inference. We systematically demonstrate that the proposed AP can resolve inherent trade-offs caused by specific PD stride factors and make BSN applicable to practical scenarios. To this end, we develop AP-BSN, a state-of-the-art self-supervised denoising method for real-world sRGB images. We further propose random-replacing refinement, which significantly improves the performance of our AP-BSN without any additional parameters. Extensive studies demonstrate that our method outperforms the other self-supervised and even unpaired denoising methods by a large margin, without using any additional knowledge, e.g., noise level, regarding the underlying unknown noise. △ Less

Submitted 24 March, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: Accepted to CVPR2022

arXiv:2203.11593 [pdf, other]

Unified Negative Pair Generation toward Well-discriminative Feature Space for Face Recognition

Authors: Junuk Jung, Seonhoon Lee, Heung-Seon Oh, Yongjun Park, Joochan Park, Sungbin Son

Abstract: The goal of face recognition (FR) can be viewed as a pair similarity optimization problem, maximizing a similarity set $\mathcal{S}^p$ over positive pairs, while minimizing similarity set $\mathcal{S}^n$ over negative pairs. Ideally, it is expected that FR models form a well-discriminative feature space (WDFS) that satisfies $\inf{\mathcal{S}^p} > \sup{\mathcal{S}^n}$. With regard to WDFS, the exi… ▽ More The goal of face recognition (FR) can be viewed as a pair similarity optimization problem, maximizing a similarity set $\mathcal{S}^p$ over positive pairs, while minimizing similarity set $\mathcal{S}^n$ over negative pairs. Ideally, it is expected that FR models form a well-discriminative feature space (WDFS) that satisfies $\inf{\mathcal{S}^p} > \sup{\mathcal{S}^n}$. With regard to WDFS, the existing deep feature learning paradigms (i.e., metric and classification losses) can be expressed as a unified perspective on different pair generation (PG) strategies. Unfortunately, in the metric loss (ML), it is infeasible to generate negative pairs taking all classes into account in each iteration because of the limited mini-batch size. In contrast, in classification loss (CL), it is difficult to generate extremely hard negative pairs owing to the convergence of the class weight vectors to their center. This leads to a mismatch between the two similarity distributions of the sampled pairs and all negative pairs. Thus, this paper proposes a unified negative pair generation (UNPG) by combining two PG strategies (i.e., MLPG and CLPG) from a unified perspective to alleviate the mismatch. UNPG introduces useful information about negative pairs using MLPG to overcome the CLPG deficiency. Moreover, it includes filtering the similarities of noisy negative pairs to guarantee reliable convergence and improved performance. Exhaustive experiments show the superiority of UNPG by achieving state-of-the-art performance across recent loss functions on public benchmark datasets. Our code and pretrained models are publicly available. △ Less

Submitted 18 April, 2024; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: 9 pages, 6 figures, Published at BMVC22

arXiv:2202.10555 [pdf, other]

doi 10.1016/j.cageo.2022.105072

Effective Training Strategies for Deep-learning-based Precipitation Nowcasting and Estimation

Authors: Jihoon Ko, Kyuhan Lee, Hyun** Hwang, Seok-Geun Oh, Seok-Woo Son, Kijung Shin

Abstract: Deep learning has been successfully applied to precipitation nowcasting. In this work, we propose a pre-training scheme and a new loss function for improving deep-learning-based nowcasting. First, we adapt U-Net, a widely-used deep-learning model, for the two problems of interest here: precipitation nowcasting and precipitation estimation from radar images. We formulate the former as a classificat… ▽ More Deep learning has been successfully applied to precipitation nowcasting. In this work, we propose a pre-training scheme and a new loss function for improving deep-learning-based nowcasting. First, we adapt U-Net, a widely-used deep-learning model, for the two problems of interest here: precipitation nowcasting and precipitation estimation from radar images. We formulate the former as a classification problem with three precipitation intervals and the latter as a regression problem. For these tasks, we propose to pre-train the model to predict radar images in the near future without requiring ground-truth precipitation, and we also propose the use of a new loss function for fine-tuning to mitigate the class imbalance problem. We demonstrate the effectiveness of our approach using radar images and precipitation datasets collected from South Korea over seven years. It is highlighted that our pre-training scheme and new loss function improve the critical success index (CSI) of nowcasting of heavy rainfall (at least 10 mm/hr) by up to 95.7% and 43.6%, respectively, at a 5-hr lead time. We also demonstrate that our approach reduces the precipitation estimation error by up to 10.7%, compared to the conventional approach, for light rainfall (between 1 and 10 mm/hr). Lastly, we report the sensitivity of our approach to different resolutions and a detailed analysis of four cases of heavy rainfall. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: to appear in Computers & Geosciences

arXiv:2202.09533 [pdf, other]

C2N: Practical Generative Noise Modeling for Real-World Denoising

Authors: Geonwoon Jang, Wooseok Lee, Sanghyun Son, Kyoung Mu Lee

Abstract: Learning-based image denoising methods have been bounded to situations where well-aligned noisy and clean images are given, or samples are synthesized from predetermined noise models, e.g., Gaussian. While recent generative noise modeling methods aim to simulate the unknown distribution of real-world noise, several limitations still exist. In a practical scenario, a noise generator should learn to… ▽ More Learning-based image denoising methods have been bounded to situations where well-aligned noisy and clean images are given, or samples are synthesized from predetermined noise models, e.g., Gaussian. While recent generative noise modeling methods aim to simulate the unknown distribution of real-world noise, several limitations still exist. In a practical scenario, a noise generator should learn to simulate the general and complex noise distribution without using paired noisy and clean images. However, since existing methods are constructed on the unrealistic assumption of real-world noise, they tend to generate implausible patterns and cannot express complicated noise maps. Therefore, we introduce a Clean-to-Noisy image generation framework, namely C2N, to imitate complex real-world noise without using any paired examples. We construct the noise generator in C2N accordingly with each component of real-world noise characteristics to express a wide range of noise accurately. Combined with our C2N, conventional denoising CNNs can be trained to outperform existing unsupervised methods on challenging real-world benchmarks by a large margin. △ Less

Submitted 19 February, 2022; originally announced February 2022.

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 2350-2359

arXiv:2201.03239 [pdf, other]

There is no rose without a thorn: Finding weaknesses on BlenderBot 2.0 in terms of Model, Data and User-Centric Approach

Authors: Jungseob Lee, Midan Shim, Suhyune Son, Chanjun Park, Yu** Kim, Heuiseok Lim

Abstract: BlenderBot 2.0 is a dialogue model that represents open-domain chatbots by reflecting real-time information and remembering user information for an extended period using an internet search module and multi-session. Nonetheless, the model still has room for improvement. To this end, we examine BlenderBot 2.0 limitations and errors from three perspectives: model, data, and user. From the data point… ▽ More BlenderBot 2.0 is a dialogue model that represents open-domain chatbots by reflecting real-time information and remembering user information for an extended period using an internet search module and multi-session. Nonetheless, the model still has room for improvement. To this end, we examine BlenderBot 2.0 limitations and errors from three perspectives: model, data, and user. From the data point of view, we highlight the unclear guidelines provided to workers during the crowdsourcing process, as well as a lack of a process for refining hate speech in the collected data and verifying the accuracy of internet-based information. From a user perspective, we identify nine types of limitations of BlenderBot 2.0, and their causes are thoroughly investigated. Furthermore, for each point of view, we propose practical improvement methods and discuss several potential future research directions. △ Less

Submitted 8 July, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

Comments: English and Extention Version of "Empirical study on BlenderBot 2.0 errors analysis in terms of model, data and dialogue" (Journal of the Korea Convergence Society)

arXiv:2112.08619 [pdf, other]

Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge

Authors: Yoonna Jang, Jungwoo Lim, Yuna Hur, Dongsuk Oh, Suhyune Son, Yeonsoo Lee, Donghoon Shin, Seungryong Kim, Heuiseok Lim

Abstract: Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to. However, existing conversational agents and datasets do not consider such comprehensive information, and thus they have a limitation in generating the utterances where the knowledge and persona are fused properly. To address this issue, we introduce a… ▽ More Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to. However, existing conversational agents and datasets do not consider such comprehensive information, and thus they have a limitation in generating the utterances where the knowledge and persona are fused properly. To address this issue, we introduce a call For Customized conversation (FoCus) dataset where the customized answers are built with the user's persona and Wikipedia knowledge. To evaluate the abilities to make informative and customized utterances of pre-trained language models, we utilize BART and GPT-2 as well as transformer-based models. We assess their generation abilities with automatic scores and conduct human evaluations for qualitative results. We examine whether the model reflects adequate persona and knowledge with our proposed two sub-tasks, persona grounding (PG) and knowledge grounding (KG). Moreover, we show that the utterances of our data are constructed with the proper knowledge and persona through grounding quality assessment. △ Less

Submitted 16 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: Accepted paper at the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

arXiv:2111.01717 [pdf, other]

MixFace: Improving Face Verification Focusing on Fine-grained Conditions

Authors: Junuk Jung, Sungbin Son, Joochan Park, Yongjun Park, Seonhoon Lee, Heung-Seon Oh

Abstract: The performance of face recognition has become saturated for public benchmark datasets such as LFW, CFP-FP, and AgeDB, owing to the rapid advances in CNNs. However, the effects of faces with various fine-grained conditions on FR models have not been investigated because of the absence of such datasets. This paper analyzes their effects in terms of different conditions and loss functions using K-FA… ▽ More The performance of face recognition has become saturated for public benchmark datasets such as LFW, CFP-FP, and AgeDB, owing to the rapid advances in CNNs. However, the effects of faces with various fine-grained conditions on FR models have not been investigated because of the absence of such datasets. This paper analyzes their effects in terms of different conditions and loss functions using K-FACE, a recently introduced FR dataset with fine-grained conditions. We propose a novel loss function, MixFace, that combines classification and metric losses. The superiority of MixFace in terms of effectiveness and robustness is demonstrated experimentally on various benchmark datasets. △ Less

Submitted 19 June, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: 9 pages, 6 figures

arXiv:2109.03444 [pdf, other]

doi 10.1109/TPAMI.2021.3106790

Toward Real-World Super-Resolution via Adaptive Downsampling Models

Authors: Sanghyun Son, Jaeha Kim, Wei-Sheng Lai, Ming-Husan Yang, Kyoung Mu Lee

Abstract: Most image super-resolution (SR) methods are developed on synthetic low-resolution (LR) and high-resolution (HR) image pairs that are constructed by a predetermined operation, e.g., bicubic downsampling. As existing methods typically learn an inverse map** of the specific function, they produce blurry results when applied to real-world images whose exact formulation is different and unknown. The… ▽ More Most image super-resolution (SR) methods are developed on synthetic low-resolution (LR) and high-resolution (HR) image pairs that are constructed by a predetermined operation, e.g., bicubic downsampling. As existing methods typically learn an inverse map** of the specific function, they produce blurry results when applied to real-world images whose exact formulation is different and unknown. Therefore, several methods attempt to synthesize much more diverse LR samples or learn a realistic downsampling model. However, due to restrictive assumptions on the downsampling process, they are still biased and less generalizable. This study proposes a novel method to simulate an unknown downsampling process without imposing restrictive prior knowledge. We propose a generalizable low-frequency loss (LFL) in the adversarial training framework to imitate the distribution of target LR images without using any paired examples. Furthermore, we design an adaptive data loss (ADL) for the downsampler, which can be adaptively learned and updated from the data during the training loops. Extensive experiments validate that our downsampling model can facilitate existing SR methods to perform more accurate reconstructions on various synthetic and real-world examples than the conventional approaches. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Comments: Accepted at TPAMI

arXiv:2106.10147 [pdf, other]

Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in Deep Neural Networks

Authors: Suyoung Lee, Wonho Song, Suman Jana, Meeyoung Cha, Sooel Son

Abstract: Trigger set-based watermarking schemes have gained emerging attention as they provide a means to prove ownership for deep neural network model owners. In this paper, we argue that state-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership. We posit that this impaired capability stems from two common experimental flaws that the existing resear… ▽ More Trigger set-based watermarking schemes have gained emerging attention as they provide a means to prove ownership for deep neural network model owners. In this paper, we argue that state-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership. We posit that this impaired capability stems from two common experimental flaws that the existing research practice has committed when evaluating the robustness of watermarking algorithms: (1) incomplete adversarial evaluation and (2) overlooked adaptive attacks. We conduct a comprehensive adversarial evaluation of 11 representative watermarking schemes against six of the existing attacks and demonstrate that each of these watermarking schemes lacks robustness against at least two non-adaptive attacks. We also propose novel adaptive attacks that harness the adversary's knowledge of the underlying watermarking algorithm of a target model. We demonstrate that the proposed attacks effectively break all of the 11 watermarking schemes, consequently allowing adversaries to obscure the ownership of any watermarked model. We encourage follow-up studies to consider our guidelines when evaluating the robustness of their watermarking schemes via conducting comprehensive adversarial evaluation that includes our adaptive attacks to demonstrate a meaningful upper bound of watermark robustness. △ Less

Submitted 19 January, 2023; v1 submitted 18 June, 2021; originally announced June 2021.

Comments: 15 pages, accepted at IEEE TDSC

arXiv:2106.03839 [pdf, other]

NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Authors: Goutam Bhat, Martin Danelljan, Radu Timofte, Kazutoshi Akita, Wooyeong Cho, Haoqiang Fan, Lanpeng Jia, Daeshik Kim, Bruno Lecouat, Youwei Li, Shuaicheng Liu, Ziluan Liu, Ziwei Luo, Takahiro Maeda, Julien Mairal, Christian Micheloni, Xuan Mo, Takeru Oba, Pavel Ostyakov, Jean Ponce, Sanghyeok Son, Jian Sun, Norimichi Ukita, Rao Muhammad Umer, Youliang Yan , et al. (3 additional authors not shown)

Abstract: This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks; Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using… ▽ More This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks; Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using a diverse set of solutions. The top-performing methods set a new state-of-the-art for the burst super-resolution task. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: NTIRE 2021 Burst Super-Resolution challenge report

Showing 1–50 of 82 results for author: Son, S