Search | arXiv e-print repository

Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation

Authors: ** Woo Lee, Jaehyun Park, Min Jun Choi, Kyogu Lee

Abstract: While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling wit… ▽ More While significant advancements have been made in music generation and differentiable sound synthesis within machine learning and computer audition, the simulation of instrument vibration guided by physical laws has been underexplored. To address this gap, we introduce a novel model for simulating the spatio-temporal motion of nonlinear strings, integrating modal synthesis and spectral modeling within a neural network framework. Our model leverages physical properties and fundamental frequencies as inputs, outputting string states across time and space that solve the partial differential equation characterizing the nonlinear string. Empirical evaluations demonstrate that the proposed architecture achieves superior accuracy in string motion simulation compared to existing baseline architectures. The code and demo are available online. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2406.16994 [pdf, other]

Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for providing cooperatively global access sustainability and energy efficiency. However, as the number of CubeSats and HALE-UAVs, increases, the scheduling dimension of each ground station (GS) increases. As a result, each GS can fall into the curse of dimensionality, and this challenge becomes one major hurdle for efficient global access. Therefore, this paper provides a quantum multi-agent reinforcement Learning (QMARL)-based method for scheduling between GSs and CubeSats/HALE-UAVs in order to improve global access availability and energy efficiency. The main reason why the QMARL-based scheduler can be beneficial is that the algorithm facilitates a logarithmic-scale reduction in scheduling action dimensions, which is one critical feature as the number of CubeSats and HALE-UAVs expands. Additionally, individual GSs have different traffic demands depending on their locations and characteristics, thus it is essential to provide differentiated access services. The superiority of the proposed scheduler is validated through data-intensive experiments in realistic CubeSat/HALE-UAV settings. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 17 pages, 22 figures

arXiv:2406.13633 [pdf, ps, other]

Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

Authors: Jaehyun Park, Dabeen Lee

Abstract: We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an… ▽ More We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an $\tilde{\mathcal{O}}(dD\sqrt{T})$ regret, where $d$ is the dimension of feature map**, $D$ is the diameter of the underlying MDP, and $T$ is the horizon. The second algorithm \texttt{OVIFH-MNL} is computationally more efficient and applies to the more general class of weakly communicating MDPs, for which we show a regret guarantee of $\tilde{\mathcal{O}}(d^{2/5} \mathrm{sp}(v^*)T^{4/5})$ where $\mathrm{sp}(v^*)$ is the span of the associated optimal bias function. We also prove a lower bound of $Ω(d\sqrt{DT})$ for learning communicating MDPs with MNL transitions of diameter at most $D$. Furthermore, we show a regret lower bound of $Ω(dH^{3/2}\sqrt{K})$ for learning $H$-horizon episodic MDPs with MNL function approximation where $K$ is the number of episodes, which improves upon the best-known lower bound for the finite-horizon setting. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.11128 [pdf, other]

Model Adaptation for Time Constrained Embodied Control

Authors: Jaehyun Song, Minjong Yoo, Honguk Woo

Abstract: When adopting a deep learning model for embodied agents, it is required that the model structure be optimized for specific tasks and operational conditions. Such optimization can be static such as model compression or dynamic such as adaptive inference. Yet, these techniques have not been fully investigated for embodied control systems subject to time constraints, which necessitate sequential deci… ▽ More When adopting a deep learning model for embodied agents, it is required that the model structure be optimized for specific tasks and operational conditions. Such optimization can be static such as model compression or dynamic such as adaptive inference. Yet, these techniques have not been fully investigated for embodied control systems subject to time constraints, which necessitate sequential decision-making for multiple tasks, each with distinct inference latency limitations. In this paper, we present MoDeC, a time constraint-aware embodied control framework using the modular model adaptation. We formulate model adaptation to varying operational conditions on resource and time restrictions as dynamic routing on a modular network, incorporating these conditions as part of multi-task objectives. Our evaluation across several vision-based embodied environments demonstrates the robustness of MoDeC, showing that it outperforms other model adaptation methods in both performance and adherence to time constraints in robotic manipulation and autonomous driving applications △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 8 pages, 5 figures, Accepted in The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR 2024)

arXiv:2406.08527 [pdf, other]

Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning

Authors: Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim, **woo Shin

Abstract: Learning effective representations from raw data is crucial for the success of deep learning methods. However, in the tabular domain, practitioners often prefer augmenting raw column features over using learned representations, as conventional tree-based algorithms frequently outperform competing approaches. As a result, feature engineering methods that automatically generate candidate features ha… ▽ More Learning effective representations from raw data is crucial for the success of deep learning methods. However, in the tabular domain, practitioners often prefer augmenting raw column features over using learned representations, as conventional tree-based algorithms frequently outperform competing approaches. As a result, feature engineering methods that automatically generate candidate features have been widely used. While these approaches are often effective, there remains ambiguity in defining the space over which to search for candidate features. Moreover, they often rely solely on validation scores to select good features, neglecting valuable feedback from past experiments that could inform the planning of future experiments. To address the shortcomings, we propose a new tabular learning framework based on large language models (LLMs), coined Optimizing Column feature generator with decision Tree reasoning (OCTree). Our key idea is to leverage LLMs' reasoning capabilities to find good feature generation rules without manually specifying the search space and provide language-based reasoning information highlighting past experiments as feedback for iterative rule improvements. Here, we choose a decision tree as reasoning as it can be interpreted in natural language, effectively conveying knowledge of past experiments (i.e., the prediction models trained with the generated features) to the LLM. Our empirical results demonstrate that this simple framework consistently enhances the performance of various prediction models across diverse tabular benchmarks, outperforming competing automatic feature engineering methods. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 18 pages

arXiv:2405.11828 [pdf, other]

Federated Learning with Incomplete Sensing Modalities

Authors: Adiba Orzikulova, Jaehyun Kwak, Jaemin Shin, Sung-Ju Lee

Abstract: Many mobile sensing applications utilize data from various modalities, including motion and physiological sensors in mobile and wearable devices. Federated Learning (FL) is particularly suitable for these applications thanks to its privacy-preserving feature. However, challenges such as limited battery life, poor network conditions, and sensor malfunctions can restrict the use of all available mod… ▽ More Many mobile sensing applications utilize data from various modalities, including motion and physiological sensors in mobile and wearable devices. Federated Learning (FL) is particularly suitable for these applications thanks to its privacy-preserving feature. However, challenges such as limited battery life, poor network conditions, and sensor malfunctions can restrict the use of all available modalities for local model training. Additionally, existing multimodal FL systems also struggle with scalability and efficiency as the number of modality sources increases. To address these issues, we introduce FLISM, a framework designed to enable multimodal FL with incomplete modalities. FLISM leverages simulation technique to learn robust representations that can handle missing modalities and transfers model knowledge across clients with varying set of modalities. The evaluation results using three real-world datasets and simulations demonstrate FLISM's effective balance between model performance and system efficiency. It shows an average improvement of .067 in F1-score, while also reducing communication (2.69x faster) and computational (2.28x more efficient) overheads compared to existing methods addressing incomplete modalities. Moreover, in simulated scenarios involving tasks with a larger number of modalities, FLISM achieves a significant speedup of 3.23x~85.10x in communication and 3.73x~32.29x in computational efficiency. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.07543 [pdf]

Accelerating the Evolution of Personalized Automated Lane Change through Lesson Learning

Authors: Jia Hu, Mingyue Lei, Duo Li, Zhenning Li, Jaehyun, So, Haoran Wang

Abstract: Personalization is crucial for the widespread adoption of advanced driver assistance system. To match up with each user's preference, the online evolution capability is a must. However, conventional evolution methods learn from naturalistic driving data, which requires a lot computing power and cannot be applied online. To address this challenge, this paper proposes a lesson learning approach: lea… ▽ More Personalization is crucial for the widespread adoption of advanced driver assistance system. To match up with each user's preference, the online evolution capability is a must. However, conventional evolution methods learn from naturalistic driving data, which requires a lot computing power and cannot be applied online. To address this challenge, this paper proposes a lesson learning approach: learning from driver's takeover interventions. By leveraging online takeover data, the driving zone is generated to ensure perceived safety using Gaussian discriminant analysis. Real-time corrections to trajectory planning rewards are enacted through apprenticeship learning. Guided by the objective of optimizing rewards within the constraints of the driving zone, this approach employs model predictive control for trajectory planning. This lesson learning framework is highlighted for its faster evolution capability, adeptness at experience accumulating, assurance of perceived safety, and computational efficiency. Simulation results demonstrate that the proposed system consistently achieves a successful customization without further takeover interventions. Accumulated experience yields a 24% enhancement in evolution efficiency. The average number of learning iterations is only 13.8. The average computation time is 0.08 seconds. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.02845 [pdf, other]

Data-Efficient Molecular Generation with Hierarchical Textual Inversion

Authors: Seo** Kim, Jaehyun Nam, Sihyun Yu, Younghoon Shin, **woo Shin

Abstract: Develo** an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecula… ▽ More Develo** an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution. We propose to use multi-level embeddings to reflect such hierarchical features based on the adoption of the recent textual inversion technique in the visual domain, which achieves data-efficient image generation. Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution. We then generate molecules based on the interpolation of the multi-level token embeddings. Extensive experiments demonstrate the superiority of HI-Mol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50x less training data. We also show the effectiveness of molecules generated by HI-Mol in low-shot molecular property prediction. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.02499 [pdf, other]

DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

Abstract: The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t… ▽ More The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address map** and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)

arXiv:2404.15305 [pdf, other]

ADAPT^2: Adapting Pre-Trained Sensing Models to End-Users via Self-Supervision Replay

Authors: Hyungjun Yoon, Jaehyun Kwak, Biniyam Aschalew Tolera, Gaole Dai, Mo Li, Taesik Gong, Kimin Lee, Sung-Ju Lee

Abstract: Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine… ▽ More Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine-tuned in heterogeneous domains. To address the issue, we propose ADAPT^2, a few-shot domain adaptation framework for personalizing self-supervised models. ADAPT2 proposes self-supervised meta-learning for initial model pre-training, followed by a user-side model adaptation by replaying the self-supervision with user-specific data. This allows models to adjust their pre-trained representations to the user with only a few samples. Evaluation with four benchmarks demonstrates that ADAPT^2 outperforms existing baselines by an average F1-score of 8.8%p. Our on-device computational overhead analysis on a commodity off-the-shelf (COTS) smartphone shows that ADAPT2 completes adaptation within an unobtrusive latency (in three minutes) with only a 9.54% memory consumption, demonstrating the computational efficiency of the proposed method. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2404.13486 [pdf, ps, other]

doi 10.1145/3626183.3659974

An Optimal MPC Algorithm for Subunit-Monge Matrix Multiplication, with Applications to LIS

Authors: Jaehyun Koo

Abstract: We present an $O(1)$-round fully-scalable deterministic massively parallel algorithm for computing the min-plus matrix multiplication of unit-Monge matrices. We use this to derive a $O(\log n)$-round fully-scalable massively parallel algorithm for solving the exact longest increasing subsequence (LIS) problem. For a fully-scalable MPC regime, this result substantially improves the previously known… ▽ More We present an $O(1)$-round fully-scalable deterministic massively parallel algorithm for computing the min-plus matrix multiplication of unit-Monge matrices. We use this to derive a $O(\log n)$-round fully-scalable massively parallel algorithm for solving the exact longest increasing subsequence (LIS) problem. For a fully-scalable MPC regime, this result substantially improves the previously known algorithm of $O(\log^4 n)$-round complexity, and matches the best algorithm for computing the $(1+ε)$-approximation of LIS. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: To appear in SPAA 2024

arXiv:2404.13081 [pdf, other]

SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs

Authors: Jaehyung Kim, Jaehyun Nam, Sangwoo Mo, Jong** Park, Sang-Woo Lee, Minjoon Seo, Jung-Woo Ha, **woo Shin

Abstract: Large language models (LLMs) have made significant advancements in various natural language processing tasks, including question answering (QA) tasks. While incorporating new information with the retrieval of relevant passages is a promising way to improve QA with LLMs, the existing methods often require additional fine-tuning which becomes infeasible with recent LLMs. Augmenting retrieved passage… ▽ More Large language models (LLMs) have made significant advancements in various natural language processing tasks, including question answering (QA) tasks. While incorporating new information with the retrieval of relevant passages is a promising way to improve QA with LLMs, the existing methods often require additional fine-tuning which becomes infeasible with recent LLMs. Augmenting retrieved passages via prompting has the potential to address this limitation, but this direction has been limitedly explored. To this end, we design a simple yet effective framework to enhance open-domain QA (ODQA) with LLMs, based on the summarized retrieval (SuRe). SuRe helps LLMs predict more accurate answers for a given question, which are well-supported by the summarized retrieval that could be viewed as an explicit rationale extracted from the retrieved passages. Specifically, SuRe first constructs summaries of the retrieved passages for each of the multiple answer candidates. Then, SuRe confirms the most plausible answer from the candidate set by evaluating the validity and ranking of the generated summaries. Experimental results on diverse ODQA benchmarks demonstrate the superiority of SuRe, with improvements of up to 4.6% in exact match (EM) and 4.0% in F1 score over standard prompting approaches. SuRe also can be integrated with a broad range of retrieval methods and LLMs. Finally, the generated summaries from SuRe show additional advantages to measure the importance of retrieved passages and serve as more preferred rationales by models and humans. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Accepted at ICLR 2024

arXiv:2404.08173 [pdf, ps, other]

doi 10.4230/LIPIcs.FUN.2024.19

Anarchy in the APSP: Algorithm and Hardness for Incorrect Implementation of Floyd-Warshall

Authors: Jaehyun Koo

Abstract: The celebrated Floyd-Warshall algorithm efficiently computes the all-pairs shortest path, and its simplicity made it a staple in computer science classes. Frequently, students discover a variant of this Floyd-Warshall algorithm by mixing up the loop order, ending up with the incorrect APSP matrix. This paper considers a computational problem of computing this incorrect APSP matrix. We will propose… ▽ More The celebrated Floyd-Warshall algorithm efficiently computes the all-pairs shortest path, and its simplicity made it a staple in computer science classes. Frequently, students discover a variant of this Floyd-Warshall algorithm by mixing up the loop order, ending up with the incorrect APSP matrix. This paper considers a computational problem of computing this incorrect APSP matrix. We will propose efficient algorithms for this problem and prove that this incorrect variant is APSP-complete. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: To appear in FUN 2024

arXiv:2403.04504 [pdf, other]

Improving Matrix Completion by Exploiting Rating Ordinality in Graph Neural Networks

Authors: Jaehyun Lee, SeongKu Kang, Hwanjo Yu

Abstract: Matrix completion is an important area of research in recommender systems. Recent methods view a rating matrix as a user-item bi-partite graph with labeled edges denoting observed ratings and predict the edges between the user and item nodes by using the graph neural network (GNN). Despite their effectiveness, they treat each rating type as an independent relation type and thus cannot sufficiently… ▽ More Matrix completion is an important area of research in recommender systems. Recent methods view a rating matrix as a user-item bi-partite graph with labeled edges denoting observed ratings and predict the edges between the user and item nodes by using the graph neural network (GNN). Despite their effectiveness, they treat each rating type as an independent relation type and thus cannot sufficiently consider the ordinal nature of the ratings. In this paper, we explore a new approach to exploit rating ordinality for GNN, which has not been studied well in the literature. We introduce a new method, called ROGMC, to leverage Rating Ordinality in GNN-based Matrix Completion. It uses cumulative preference propagation to directly incorporate rating ordinality in GNN's message passing, allowing for users' stronger preferences to be more emphasized based on inherent orders of rating types. This process is complemented by interest regularization which facilitates preference learning using the underlying interest information. Our extensive experiments show that ROGMC consistently outperforms the existing strategies of using rating types for GNN. We expect that our attempt to explore the feasibility of utilizing rating ordinality for GNN may stimulate further research in this direction. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 4 pages, 2 figures, 3 tables

arXiv:2401.12019 [pdf, other]

Stereo-Matching Knowledge Distilled Monocular Depth Estimation Filtered by Multiple Disparity Consistency

Authors: Woonghyun Ka, Jae Young Lee, Jaehyun Choi, Junmo Kim

Abstract: In stereo-matching knowledge distillation methods of the self-supervised monocular depth estimation, the stereo-matching network's knowledge is distilled into a monocular depth network through pseudo-depth maps. In these methods, the learning-based stereo-confidence network is generally utilized to identify errors in the pseudo-depth maps to prevent transferring the errors. However, the learning-b… ▽ More In stereo-matching knowledge distillation methods of the self-supervised monocular depth estimation, the stereo-matching network's knowledge is distilled into a monocular depth network through pseudo-depth maps. In these methods, the learning-based stereo-confidence network is generally utilized to identify errors in the pseudo-depth maps to prevent transferring the errors. However, the learning-based stereo-confidence networks should be trained with ground truth (GT), which is not feasible in a self-supervised setting. In this paper, we propose a method to identify and filter errors in the pseudo-depth map using multiple disparity maps by checking their consistency without the need for GT and a training process. Experimental results show that the proposed method outperforms the previous methods and works well on various configurations by filtering out erroneous areas where the stereo-matching is vulnerable, especially such as textureless regions, occlusion boundaries, and reflective surfaces. △ Less

Submitted 22 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: ICASSP 2024. The first two authors are equally contributed

arXiv:2401.12001 [pdf, other]

Modeling Stereo-Confidence Out of the End-to-End Stereo-Matching Network via Disparity Plane Sweep

Authors: Jae Young Lee, Woonghyun Ka, Jaehyun Choi, Junmo Kim

Abstract: We propose a novel stereo-confidence that can be measured externally to various stereo-matching networks, offering an alternative input modality choice of the cost volume for learning-based approaches, especially in safety-critical systems. Grounded in the foundational concepts of disparity definition and the disparity plane sweep, the proposed stereo-confidence method is built upon the idea that… ▽ More We propose a novel stereo-confidence that can be measured externally to various stereo-matching networks, offering an alternative input modality choice of the cost volume for learning-based approaches, especially in safety-critical systems. Grounded in the foundational concepts of disparity definition and the disparity plane sweep, the proposed stereo-confidence method is built upon the idea that any shift in a stereo-image pair should be updated in a corresponding amount shift in the disparity map. Based on this idea, the proposed stereo-confidence method can be summarized in three folds. 1) Using the disparity plane sweep, multiple disparity maps can be obtained and treated as a 3-D volume (predicted disparity volume), like the cost volume is constructed. 2) One of these disparity maps serves as an anchor, allowing us to define a desirable (or ideal) disparity profile at every spatial point. 3) By comparing the desirable and predicted disparity profiles, we can quantify the level of matching ambiguity between left and right images for confidence measurement. Extensive experimental results using various stereo-matching networks and datasets demonstrate that the proposed stereo-confidence method not only shows competitive performance on its own but also consistent performance improvements when it is used as an input modality for learning-based stereo-confidence methods. △ Less

Submitted 22 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: AAAI 2024. The first two authors contributed equally

arXiv:2312.15288 [pdf, other]

Understanding normalization in contrastive representation learning and out-of-distribution detection

Authors: Tai Le-Gia, Jaehyun Ahn

Abstract: Contrastive representation learning has emerged as an outstanding approach for anomaly detection. In this work, we explore the $\ell_2$-norm of contrastive features and its applications in out-of-distribution detection. We propose a simple method based on contrastive learning, which incorporates out-of-distribution data by discriminating against normal samples in the contrastive layer space. Our a… ▽ More Contrastive representation learning has emerged as an outstanding approach for anomaly detection. In this work, we explore the $\ell_2$-norm of contrastive features and its applications in out-of-distribution detection. We propose a simple method based on contrastive learning, which incorporates out-of-distribution data by discriminating against normal samples in the contrastive layer space. Our approach can be applied flexibly as an outlier exposure (OE) approach, where the out-of-distribution data is a huge collective of random images, or as a fully self-supervised learning approach, where the out-of-distribution data is self-generated by applying distribution-shifting transformations. The ability to incorporate additional out-of-distribution samples enables a feasible solution for datasets where AD methods based on contrastive learning generally underperform, such as aerial images or microscopy images. Furthermore, the high-quality features learned through contrastive learning consistently enhance performance in OE scenarios, even when the available out-of-distribution dataset is not diverse enough. Our extensive experiments demonstrate the superiority of our proposed method under various scenarios, including unimodal and multimodal settings, with various image datasets. △ Less

Submitted 8 April, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.04885 [pdf, other]

VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement

Authors: Hanjung Kim, Jaehyun Kang, Miran Heo, Sukjun Hwang, Seoung Wug Oh, Seon Joo Kim

Abstract: In recent years, online Video Instance Segmentation (VIS) methods have shown remarkable advancement with their powerful query-based detectors. Utilizing the output queries of the detector at the frame-level, these methods achieve high accuracy on challenging benchmarks. However, our observations demonstrate that these methods heavily rely on location information, which often causes incorrect assoc… ▽ More In recent years, online Video Instance Segmentation (VIS) methods have shown remarkable advancement with their powerful query-based detectors. Utilizing the output queries of the detector at the frame-level, these methods achieve high accuracy on challenging benchmarks. However, our observations demonstrate that these methods heavily rely on location information, which often causes incorrect associations between objects. This paper presents that a key axis of object matching in trackers is appearance information, which becomes greatly instructive under conditions where positional cues are insufficient for distinguishing their identities. Therefore, we suggest a simple yet powerful extension to object decoders that explicitly extract embeddings from backbone features and drive queries to capture the appearances of objects, which greatly enhances instance association accuracy. Furthermore, recognizing the limitations of existing benchmarks in fully evaluating appearance awareness, we have constructed a synthetic dataset to rigorously validate our method. By effectively resolving the over-reliance on location information, we achieve state-of-the-art results on YouTube-VIS 2019/2021 and Occluded VIS (OVIS). Code is available at https://github.com/KimHanjung/VISAGE. △ Less

Submitted 8 March, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

Comments: Technical report

arXiv:2312.03005 [pdf, other]

Few-Shot Anomaly Detection with Adversarial Loss for Robust Feature Representations

Authors: Jae Young Lee, Wonjun Lee, Jaehyun Choi, Yongkwi Lee, Young Seog Yoon

Abstract: Anomaly detection is a critical and challenging task that aims to identify data points deviating from normal patterns and distributions within a dataset. Various methods have been proposed using a one-class-one-model approach, but these techniques often face practical problems such as memory inefficiency and the requirement of sufficient data for training. In particular, few-shot anomaly detection… ▽ More Anomaly detection is a critical and challenging task that aims to identify data points deviating from normal patterns and distributions within a dataset. Various methods have been proposed using a one-class-one-model approach, but these techniques often face practical problems such as memory inefficiency and the requirement of sufficient data for training. In particular, few-shot anomaly detection presents significant challenges in industrial applications, where limited samples are available before mass production. In this paper, we propose a few-shot anomaly detection method that integrates adversarial training loss to obtain more robust and generalized feature representations. We utilize the adversarial loss previously employed in domain adaptation to align feature distributions between source and target domains, to enhance feature robustness and generalization in few-shot anomaly detection tasks. We hypothesize that adversarial loss is effective when applied to features that should have similar characteristics, such as those from the same layer in a Siamese network's parallel branches or input-output pairs of reconstruction-based methods. Experimental results demonstrate that the proposed method generally achieves better performance when utilizing the adversarial loss. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: BMVC 2023

arXiv:2311.09585 [pdf, other]

LifeTox: Unveiling Implicit Toxicity in Life Advice

Authors: Minbeom Kim, Jahyun Koo, Hwanhee Lee, Joonsuk Park, Hwaran Lee, Kyomin Jung

Abstract: As large language models become increasingly integrated into daily life, detecting implicit toxicity across diverse contexts is crucial. To this end, we introduce LifeTox, a dataset designed for identifying implicit toxicity within a broad range of advice-seeking scenarios. Unlike existing safety datasets, LifeTox comprises diverse contexts derived from personal experiences through open-ended ques… ▽ More As large language models become increasingly integrated into daily life, detecting implicit toxicity across diverse contexts is crucial. To this end, we introduce LifeTox, a dataset designed for identifying implicit toxicity within a broad range of advice-seeking scenarios. Unlike existing safety datasets, LifeTox comprises diverse contexts derived from personal experiences through open-ended questions. Experiments demonstrate that RoBERTa fine-tuned on LifeTox matches or surpasses the zero-shot performance of large language models in toxicity classification tasks. These results underscore the efficacy of LifeTox in addressing the complex challenges inherent in implicit toxicity. We open-sourced the dataset\footnote{\url{https://huggingface.co/datasets/mbkim/LifeTox}} and the LifeTox moderator family; 350M, 7B, and 13B. △ Less

Submitted 18 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: 11 pages, 5 figures, NAACL 2024

arXiv:2311.07223 [pdf, other]

Wasm SpecTec: Engineering a Formal Language Standard

Authors: Joachim Breitner, Philippa Gardner, Jaehyun Lee, Sam Lindley, Matija Pretnar, Xiaojia Rao, Andreas Rossberg, Sukyoung Ryu, Wonho Shin, Conrad Watt, Dongjun Youn

Abstract: WebAssembly (Wasm) is a low-level bytecode language and virtual machine, intended as a compilation target for a wide range of programming languages, which is seeing increasing adoption across diverse ecosystems. As a young technology, Wasm continues to evolve -- it reached version 2.0 last year and another major update is expected soon. For a new feature to be standardised in Wasm, four key arte… ▽ More WebAssembly (Wasm) is a low-level bytecode language and virtual machine, intended as a compilation target for a wide range of programming languages, which is seeing increasing adoption across diverse ecosystems. As a young technology, Wasm continues to evolve -- it reached version 2.0 last year and another major update is expected soon. For a new feature to be standardised in Wasm, four key artefacts must be presented: a formal (mathematical) specification of the feature, an accompanying prose pseudocode description, an implementation in the official reference interpreter, and a suite of unit tests. This rigorous process helps to avoid errors in the design and implementation of new Wasm features, and Wasm's distinctive formal specification in particular has facilitated machine-checked proofs of various correctness properties for the language. However, manually crafting all of these artefacts requires expert knowledge combined with repetitive and tedious labor, which is a burden on the language's standardization process and authoring of the specification. This paper presents Wasm SpecTec, a technology to express the formal specification of Wasm through a domain-specific language. This DSL allows all of Wasm's currently handwritten specification artefacts to be error-checked and generated automatically from a single source of truth, and is designed to be easy to write, read, compare, and review. We believe that Wasm SpecTec's automation and meta-level error checking will significantly ease the current burden of the language's specification authors. We demonstrate the current capabilities of Wasm SpecTec by showcasing its proficiency in generating various artefacts, and describe our work towards replacing the manually written official Wasm specification document with specifications generated by Wasm SpecTec. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 5 pages, 7 figures

arXiv:2311.03722 [pdf, other]

Inertial Guided Uncertainty Estimation of Feature Correspondence in Visual-Inertial Odometry/SLAM

Authors: Seongwook Yoon, Jaehyun Kim, Sanghoon Sull

Abstract: Visual odometry and Simultaneous Localization And Map** (SLAM) has been studied as one of the most important tasks in the areas of computer vision and robotics, to contribute to autonomous navigation and augmented reality systems. In case of feature-based odometry/SLAM, a moving visual sensor observes a set of 3D points from different viewpoints, correspondences between the projected 2D points i… ▽ More Visual odometry and Simultaneous Localization And Map** (SLAM) has been studied as one of the most important tasks in the areas of computer vision and robotics, to contribute to autonomous navigation and augmented reality systems. In case of feature-based odometry/SLAM, a moving visual sensor observes a set of 3D points from different viewpoints, correspondences between the projected 2D points in each image are usually established by feature tracking and matching. However, since the corresponding point could be erroneous and noisy, reliable uncertainty estimation can improve the accuracy of odometry/SLAM methods. In addition, inertial measurement unit is utilized to aid the visual sensor in terms of Visual-Inertial fusion. In this paper, we propose a method to estimate the uncertainty of feature correspondence using an inertial guidance robust to image degradation caused by motion blur, illumination change and occlusion. Modeling a guidance distribution to sample possible correspondence, we fit the distribution to an energy function based on image error, yielding more robust uncertainty than conventional methods. We also demonstrate the feasibility of our approach by incorporating it into one of recent visual-inertial odometry/SLAM algorithms for public datasets. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 12 pages

arXiv:2311.01018 [pdf, other]

Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning

Authors: Jiwan Hur, Jaehyun Choi, Gyo** Han, Dong-Jae Lee, Junmo Kim

Abstract: Training diffusion models on limited datasets poses challenges in terms of limited generation capacity and expressiveness, leading to unsatisfactory results in various downstream tasks utilizing pretrained diffusion models, such as domain translation and text-guided image manipulation. In this paper, we propose Self-Distillation for Fine-Tuning diffusion models (SDFT), a methodology to address the… ▽ More Training diffusion models on limited datasets poses challenges in terms of limited generation capacity and expressiveness, leading to unsatisfactory results in various downstream tasks utilizing pretrained diffusion models, such as domain translation and text-guided image manipulation. In this paper, we propose Self-Distillation for Fine-Tuning diffusion models (SDFT), a methodology to address these challenges by leveraging diverse features from diffusion models pretrained on large source datasets. SDFT distills more general features (shape, colors, etc.) and less domain-specific features (texture, fine details, etc) from the source model, allowing successful knowledge transfer without disturbing the training process on target datasets. The proposed method is not constrained by the specific architecture of the model and thus can be generally adopted to existing frameworks. Experimental results demonstrate that SDFT enhances the expressiveness of the diffusion model with limited datasets, resulting in improved generation capabilities across various downstream tasks. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: WACV 2024

arXiv:2310.08929 [pdf, other]

Leveraging Image Augmentation for Object Manipulation: Towards Interpretable Controllability in Object-Centric Learning

Authors: **woo Kim, Janghyuk Choi, Jaehyun Kang, Changyeon Lee, Ho-** Choi, Seon Joo Kim

Abstract: The binding problem in artificial neural networks is actively explored with the goal of achieving human-level recognition skills through the comprehension of the world in terms of symbol-like entities. Especially in the field of computer vision, object-centric learning (OCL) is extensively researched to better understand complex scenes by acquiring object representations or slots. While recent stu… ▽ More The binding problem in artificial neural networks is actively explored with the goal of achieving human-level recognition skills through the comprehension of the world in terms of symbol-like entities. Especially in the field of computer vision, object-centric learning (OCL) is extensively researched to better understand complex scenes by acquiring object representations or slots. While recent studies in OCL have made strides with complex images or videos, the interpretability and interactivity over object representation remain largely uncharted, still holding promise in the field of OCL. In this paper, we introduce a novel method, Slot Attention with Image Augmentation (SlotAug), to explore the possibility of learning interpretable controllability over slots in a self-supervised manner by utilizing an image augmentation strategy. We also devise the concept of sustainability in controllable slots by introducing iterative and reversible controls over slots with two proposed submethods: Auxiliary Identity Manipulation and Slot Consistency Loss. Extensive empirical studies and theoretical validation confirm the effectiveness of our approach, offering a novel capability for interpretable and sustainable control of object representations. △ Less

Submitted 1 March, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.06765 [pdf, other]

Efficient Graduated Non-Convexity for Pose Graph Optimization

Authors: Wonseok Kang, Jaehyun Kim, Jiseong Chung, Seungwon Choi, Tae-wan Kim

Abstract: We propose a novel approach to Graduated Non-Convexity (GNC) and demonstrate its efficacy through its application in robust pose graph optimization, a key component in SLAM backends. Traditional GNC methods often rely on heuristic methods for GNC schedule, updating control parameter μ for escalating the non-convexity. In contrast, our approach leverages the properties of convex functions and conve… ▽ More We propose a novel approach to Graduated Non-Convexity (GNC) and demonstrate its efficacy through its application in robust pose graph optimization, a key component in SLAM backends. Traditional GNC methods often rely on heuristic methods for GNC schedule, updating control parameter μ for escalating the non-convexity. In contrast, our approach leverages the properties of convex functions and convex optimization to identify the boundary points beyond which convexity is no longer guaranteed, thereby eliminating redundant optimization steps in existing methodologies and enhancing both speed and robustness. We show that our method outperforms the state-of-the-art method in terms of speed and accuracy when used for robust back-end pose graph optimization via GNC. Our work builds upon and enhances the open-source riSAM framework. Our implementation can be accessed from: https://github.com/SNU-DLLAB/EGNC-PGO △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 6 pages, 6 figures

arXiv:2310.06541 [pdf, other]

Realizing Stabilized Landing for Computation-Limited Reusable Rockets: A Quantum Reinforcement Learning Approach

Authors: Gyu Seon Kim, JaeHyun Chung, Soohyun Park

Abstract: The advent of reusable rockets has heralded a new era in space exploration, reducing the costs of launching satellites by a significant factor. Traditional rockets were disposable, but the design of reusable rockets for repeated use has revolutionized the financial dynamics of space missions. The most critical phase of reusable rockets is the landing stage, which involves managing the tremendous s… ▽ More The advent of reusable rockets has heralded a new era in space exploration, reducing the costs of launching satellites by a significant factor. Traditional rockets were disposable, but the design of reusable rockets for repeated use has revolutionized the financial dynamics of space missions. The most critical phase of reusable rockets is the landing stage, which involves managing the tremendous speed and attitude for safe recovery. The complexity of this task presents new challenges for control systems, specifically in terms of precision and adaptability. Classical control systems like the proportional-integral-derivative (PID) controller lack the flexibility to adapt to dynamic system changes, making them costly and time-consuming to redesign of controller. This paper explores the integration of quantum reinforcement learning into the control systems of reusable rockets as a promising alternative. Unlike classical reinforcement learning, quantum reinforcement learning uses quantum bits that can exist in superposition, allowing for more efficient information encoding and reducing the number of parameters required. This leads to increased computational efficiency, reduced memory requirements, and more stable and predictable performance. Due to the nature of reusable rockets, which must be light, heavy computers cannot fit into them. In the reusable rocket scenario, quantum reinforcement learning, which has reduced memory requirements due to fewer parameters, is a good solution. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 5 pages, 5 figures

arXiv:2308.13327 [pdf, other]

3D Face Alignment Through Fusion of Head Pose Information and Features

Authors: Jaehyun So, Youngjoon Han

Abstract: The ability of humans to infer head poses from face shapes, and vice versa, indicates a strong correlation between the two. Accordingly, recent studies on face alignment have employed head pose information to predict facial landmarks in computer vision tasks. In this study, we propose a novel method that employs head pose information to improve face alignment performance by fusing said information… ▽ More The ability of humans to infer head poses from face shapes, and vice versa, indicates a strong correlation between the two. Accordingly, recent studies on face alignment have employed head pose information to predict facial landmarks in computer vision tasks. In this study, we propose a novel method that employs head pose information to improve face alignment performance by fusing said information with the feature maps of a face alignment network, rather than simply using it to initialize facial landmarks. Furthermore, the proposed network structure performs robust face alignment through a dual-dimensional network using multidimensional features represented by 2D feature maps and a 3D heatmap. For effective dense face alignment, we also propose a prediction method for facial geometric landmarks through training based on knowledge distillation using predicted keypoints. We experimentally assessed the correlation between the predicted facial landmarks and head pose information, as well as variations in the accuracy of facial landmarks with respect to the quality of head pose information. In addition, we demonstrated the effectiveness of the proposed method through a competitive performance comparison with state-of-the-art methods on the AFLW2000-3D, AFLW, and BIWI datasets. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.11444 [pdf, other]

Adaptive Graduated Non-Convexity for Pose Graph Optimization

Authors: Seungwon Choi, Wonseok Kang, Jiseong Chung, Jaehyun Kim, Tae-wan Kim

Abstract: We present a novel approach to robust pose graph optimization based on Graduated Non-Convexity (GNC). Unlike traditional GNC-based methods, the proposed approach employs an adaptive shape function using B-spline to optimize the shape of the robust kernel. This aims to reduce GNC iterations, boosting computational speed without compromising accuracy. When integrated with the open-source riSAM algor… ▽ More We present a novel approach to robust pose graph optimization based on Graduated Non-Convexity (GNC). Unlike traditional GNC-based methods, the proposed approach employs an adaptive shape function using B-spline to optimize the shape of the robust kernel. This aims to reduce GNC iterations, boosting computational speed without compromising accuracy. When integrated with the open-source riSAM algorithm, the method demonstrates enhanced efficiency across diverse datasets. Accompanying open-source code aims to encourage further research in this area. https://github.com/SNU-DLLAB/AGNC-PGO △ Less

Submitted 23 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 4 pages, 3 figures. Accepted for the workshop on Robotic Perception and Map**(ROPEM): Frontier Vision & Learning Techniques, organized at the 2023 International Conference on Intelligent Robots and Systems (IROS)

arXiv:2308.09092 [pdf, other]

Watch Out! Smartwatches as criminal tool and digital forensic investigations

Authors: Seungjae Jeon, Jaehyun Chung, Doowon Jeong

Abstract: In the rapidly advancing technological landscape, smartwatches have materialized as multifunctional devices integral to our daily routines. Smartwatches store a substantial amount of personal information, potentially serving as repositories of digital evidence. Thus, digital forensic researchers have devoted considerable effort to exploring smartwatch forensic techniques. However, it has been obse… ▽ More In the rapidly advancing technological landscape, smartwatches have materialized as multifunctional devices integral to our daily routines. Smartwatches store a substantial amount of personal information, potentially serving as repositories of digital evidence. Thus, digital forensic researchers have devoted considerable effort to exploring smartwatch forensic techniques. However, it has been observed that prior studies have primarily treated smartwatches as mere storage mediums for digital evidence, neglecting their potential role in criminal activities. This paper presents the information leakage perpetrated through smartwatches. We represent crime scenarios in an environment where smartphones are not available, considering that the perception that smartphones can be used as tools for criminal behavior prevails in many organizations, while the potential of similar-use smartwatches is often overlooked. We detail mechanisms for information leakage via file transfer and camera control using smartwatches. Additionally, we present methods to investigate each crime incident through smartwatch forensics. Finally, we describe the limitations of post-incident responses and propose proactive measures to prepare for potential crimes involving smartwatches. Keywords: Information Leakage, Smartwatch Forensics, Android Forensics, Mobile Device Management, Security Policy △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.04723 [pdf, other]

doi 10.13089/JKIISC.2023.33.4.671

A Forensic Methodology for Detecting Image Manipulations

Authors: Jiwon Lee, Seungjae Jeon, Yunji Park, Jaehyun Chung, Doowon Jeong

Abstract: By applying artificial intelligence to image editing technology, it has become possible to generate high-quality images with minimal traces of manipulation. However, since these technologies can be misused for criminal activities such as dissemination of false information, destruction of evidence, and denial of facts, it is crucial to implement strong countermeasures. In this study, image file and… ▽ More By applying artificial intelligence to image editing technology, it has become possible to generate high-quality images with minimal traces of manipulation. However, since these technologies can be misused for criminal activities such as dissemination of false information, destruction of evidence, and denial of facts, it is crucial to implement strong countermeasures. In this study, image file and mobile forensic artifacts analysis were conducted for detecting image manipulation. Image file analysis involves parsing the metadata of manipulated images (e.g., Exif, DQT, and Filename Signature) and comparing them with a Reference DB to detect manipulation. The Reference DB is a database that collects manipulation-related traces left in image metadata, which serves as a criterion for detecting image manipulation. In the mobile forensic artifacts analysis, packages related to image editing tools were extracted and analyzed to aid the detection of image manipulation. The proposed methodology overcomes the limitations of existing graphic feature-based analysis and combines with image processing techniques, providing the advantage of reducing false positives. The research results demonstrate the significant role of such methodology in digital forensic investigation and analysis. Additionally, We provide the code for parsing image metadata and the Reference DB along with the dataset of manipulated images, aiming to contribute to related research. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Journal ref: Journal of The Korea Institute of Information Security and Cryptology (2023)

arXiv:2307.08671 [pdf, other]

Deep Cross-Modal Steganography Using Neural Representations

Authors: Gyo** Han, Dong-Jae Lee, Jiwan Hur, Jaehyun Choi, Junmo Kim

Abstract: Steganography is the process of embedding secret data into another message or data, in such a way that it is not easily noticeable. With the advancement of deep learning, Deep Neural Networks (DNNs) have recently been utilized in steganography. However, existing deep steganography techniques are limited in scope, as they focus on specific data types and are not effective for cross-modal steganogra… ▽ More Steganography is the process of embedding secret data into another message or data, in such a way that it is not easily noticeable. With the advancement of deep learning, Deep Neural Networks (DNNs) have recently been utilized in steganography. However, existing deep steganography techniques are limited in scope, as they focus on specific data types and are not effective for cross-modal steganography. Therefore, We propose a deep cross-modal steganography framework using Implicit Neural Representations (INRs) to hide secret data of various formats in cover images. The proposed framework employs INRs to represent the secret data, which can handle data of various modalities and resolutions. Experiments on various secret datasets of diverse types demonstrate that the proposed approach is expandable and capable of accommodating different modalities. △ Less

Submitted 7 October, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

Comments: ICIP 2023 Oral

arXiv:2306.11526 [pdf, other]

Understanding Contrastive Learning Through the Lens of Margins

Authors: Daniel Rho, TaeSoo Kim, Sooill Park, Jaehyun Park, JaeHan Park

Abstract: Contrastive learning, along with its variations, has been a highly effective self-supervised learning method across diverse domains. Contrastive learning measures the distance between representations using cosine similarity and uses cross-entropy for representation learning. Within the same framework of cosine-similarity-based representation learning, margins have played a significant role in enha… ▽ More Contrastive learning, along with its variations, has been a highly effective self-supervised learning method across diverse domains. Contrastive learning measures the distance between representations using cosine similarity and uses cross-entropy for representation learning. Within the same framework of cosine-similarity-based representation learning, margins have played a significant role in enhancing face and speaker recognition tasks. Interestingly, despite the shared reliance on the same similarity metrics and objective functions, contrastive learning has not actively adopted margins. Furthermore, decision-boundary-based explanations are the only ones that have been used to explain the effect of margins in contrastive learning. In this work, we propose a new perspective to understand the role of margins based on gradient analysis. Based on the new perspective, we analyze how margins affect gradients of contrastive learning and separate the effect into more elemental levels. We separately analyze each and provide possible directions for improving contrastive learning. Our experimental results demonstrate that emphasizing positive samples and scaling gradients depending on positive sample angles and logits are the keys to improving the generalization performance of contrastive learning in both seen and unseen datasets, and other factors can only marginally improve performance. △ Less

Submitted 10 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.08204 [pdf, other]

Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer

Authors: Jaehyun Park, Jaegyun Im, Sanha Hwang, Mintaek Lim, Sabina Ualibekova, Se** Kim, Sundong Kim

Abstract: In the pursuit of artificial general intelligence (AGI), we tackle Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged approach. We employ the Decision Transformer in an imitation learning paradigm to model human problem-solving, and introduce an object detection algorithm, the Push and Pull clustering method. This dual strategy enhances AI's ARC problem-solving skills and provi… ▽ More In the pursuit of artificial general intelligence (AGI), we tackle Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged approach. We employ the Decision Transformer in an imitation learning paradigm to model human problem-solving, and introduce an object detection algorithm, the Push and Pull clustering method. This dual strategy enhances AI's ARC problem-solving skills and provides insights for AGI progression. Yet, our work reveals the need for advanced data collection tools, robust training datasets, and refined model structures. This study highlights potential improvements for Decision Transformers and propels future AGI research. △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2306.04732 [pdf, other]

Online Multi-Contact Receding Horizon Planning via Value Function Approximation

Authors: Jiayi Wang, Sanghyun Kim, Teguh Santoso Lembono, Wenqian Du, Jaehyun Shim, Saeid Samadi, Ke Wang, Vladimir Ivan, Sylvain Calinon, Sethu Vijayakumar, Steve Tonneau

Abstract: Planning multi-contact motions in a receding horizon fashion requires a value function to guide the planning with respect to the future, e.g., building momentum to traverse large obstacles. Traditionally, the value function is approximated by computing trajectories in a prediction horizon (never executed) that foresees the future beyond the execution horizon. However, given the non-convex dynamics… ▽ More Planning multi-contact motions in a receding horizon fashion requires a value function to guide the planning with respect to the future, e.g., building momentum to traverse large obstacles. Traditionally, the value function is approximated by computing trajectories in a prediction horizon (never executed) that foresees the future beyond the execution horizon. However, given the non-convex dynamics of multi-contact motions, this approach is computationally expensive. To enable online Receding Horizon Planning (RHP) of multi-contact motions, we find efficient approximations of the value function. Specifically, we propose a trajectory-based and a learning-based approach. In the former, namely RHP with Multiple Levels of Model Fidelity, we approximate the value function by computing the prediction horizon with a convex relaxed model. In the latter, namely Locally-Guided RHP, we learn an oracle to predict local objectives for locomotion tasks, and we use these local objectives to construct local value functions for guiding a short-horizon RHP. We evaluate both approaches in simulation by planning centroidal trajectories of a humanoid robot walking on moderate slopes, and on large slopes where the robot cannot maintain static balance. Our results show that locally-guided RHP achieves the best computation efficiency (95\%-98.6\% cycles converge online). This computation advantage enables us to demonstrate online receding horizon planning of our real-world humanoid robot Talos walking in dynamic environments that change on-the-fly. △ Less

Submitted 17 April, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.03366 [pdf, other]

doi 10.1109/LCA.2023.3296153

X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands

Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

Abstract: The demand for accurate information about the internal structure and characteristics of dynamic random-access memory (DRAM) has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing in memory, enhance reliability, and mitigate a vulnerability known as rowhammer. However, DRAM manufacturers only disclose limited information through official d… ▽ More The demand for accurate information about the internal structure and characteristics of dynamic random-access memory (DRAM) has been on the rise. Recent studies have explored the structure and characteristics of DRAM to improve processing in memory, enhance reliability, and mitigate a vulnerability known as rowhammer. However, DRAM manufacturers only disclose limited information through official documents, making it difficult to find specific information about actual DRAM devices. This paper presents reliable findings on the internal structure and characteristics of DRAM using activate-induced bitflips (AIBs), retention time test, and row-copy operation. While previous studies have attempted to understand the internal behaviors of DRAM devices, they have only shown results without identifying the causes or have analyzed DRAM modules rather than individual chips. We first uncover the size, structure, and operation of DRAM subarrays and verify our findings on the characteristics of DRAM. Then, we correct misunderstood information related to AIBs and demonstrate experimental results supporting the cause of rowhammer. We expect that the information we uncover about the structure, behavior, and characteristics of DRAM will help future DRAM research. △ Less

Submitted 12 August, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 4 pages, 7 figures, accepted at IEEE Computer Architecture Letters

arXiv:2304.07675 [pdf, other]

Multimodal Representation Learning of Cardiovascular Magnetic Resonance Imaging

Authors: Jielin Qiu, Peide Huang, Makiya Nakashima, Jaehyun Lee, Jiacheng Zhu, Wilson Tang, Pohao Chen, Christopher Nguyen, Byung-Hak Kim, Debbie Kwon, Douglas Weber, Ding Zhao, David Chen

Abstract: Self-supervised learning is crucial for clinical imaging applications, given the lack of explicit labels in healthcare. However, conventional approaches that rely on precise vision-language alignment are not always feasible in complex clinical imaging modalities, such as cardiac magnetic resonance (CMR). CMR provides a comprehensive visualization of cardiac anatomy, physiology, and microstructure,… ▽ More Self-supervised learning is crucial for clinical imaging applications, given the lack of explicit labels in healthcare. However, conventional approaches that rely on precise vision-language alignment are not always feasible in complex clinical imaging modalities, such as cardiac magnetic resonance (CMR). CMR provides a comprehensive visualization of cardiac anatomy, physiology, and microstructure, making it challenging to interpret. Additionally, CMR reports require synthesizing information from sequences of images and different views, resulting in potentially weak alignment between the study and diagnosis report pair. To overcome these challenges, we propose \textbf{CMRformer}, a multimodal learning framework to jointly learn sequences of CMR images and associated cardiologist's reports. Moreover, one of the major obstacles to improving CMR study is the lack of large, publicly available datasets. To bridge this gap, we collected a large \textbf{CMR dataset}, which consists of 13,787 studies from clinical cases. By utilizing our proposed CMRformer and our collected dataset, we achieved remarkable performance in real-world clinical tasks, such as CMR image retrieval and diagnosis report retrieval. Furthermore, the learned representations are evaluated to be practically helpful for downstream applications, such as disease classification. Our work could potentially expedite progress in the CMR study and lead to more accurate and effective diagnosis and treatment. △ Less

Submitted 15 April, 2023; originally announced April 2023.

Comments: 24 pages

arXiv:2304.04625 [pdf, other]

Reinforcement Learning-Based Black-Box Model Inversion Attacks

Authors: Gyo** Han, Jaehyun Choi, Haeil Lee, Junmo Kim

Abstract: Model inversion attacks are a type of privacy attack that reconstructs private data used to train a machine learning model, solely by accessing the model. Recently, white-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention because of their excellent attack performance. On the other hand, current… ▽ More Model inversion attacks are a type of privacy attack that reconstructs private data used to train a machine learning model, solely by accessing the model. Recently, white-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention because of their excellent attack performance. On the other hand, current black-box model inversion attacks that utilize GANs suffer from issues such as being unable to guarantee the completion of the attack process within a predetermined number of query accesses or achieve the same level of performance as white-box attacks. To overcome these limitations, we propose a reinforcement learning-based black-box model inversion attack. We formulate the latent space search as a Markov Decision Process (MDP) problem and solve it with reinforcement learning. Our method utilizes the confidence scores of the generated images to provide rewards to an agent. Finally, the private data can be reconstructed using the latent vectors found by the agent trained in the MDP. The experiment results on various datasets and models demonstrate that our attack successfully recovers the private information of the target model by achieving state-of-the-art attack performance. We emphasize the importance of studies on privacy-preserving machine learning by proposing a more advanced black-box model inversion attack. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: CVPR 2023, Accepted

arXiv:2304.00715 [pdf, other]

Guaranteeing the Õ(AGM/OUT) Runtime for Uniform Sampling and OUT Size Estimation over Joins

Authors: Kyoungmin Kim, Jaehyun Ha, George Fletcher, Wook-Shin Han

Abstract: We propose a new method for estimating the number of answers OUT of a small join query Q in a large database D, and for uniform sampling over joins. Our method is the first to satisfy all the following statements. - Support arbitrary Q, which can be either acyclic or cyclic, and contain binary and non-binary relations. - Guarantee an arbitrary small error with a high probability always in Õ(AGM/OU… ▽ More We propose a new method for estimating the number of answers OUT of a small join query Q in a large database D, and for uniform sampling over joins. Our method is the first to satisfy all the following statements. - Support arbitrary Q, which can be either acyclic or cyclic, and contain binary and non-binary relations. - Guarantee an arbitrary small error with a high probability always in Õ(AGM/OUT) time, where AGM is the AGM bound OUT (an upper bound of OUT), and Õ hides the polylogarithmic factor of input size. We also explain previous join size estimators in a unified framework. All methods including ours rely on certain indexes on relations in D, which take linear time to build offline. Additionally, we extend our method using generalized hypertree decompositions (GHDs) to achieve a lower complexity than Õ(AGM/OUT) when OUT is small, and present optimization techniques for improving estimation efficiency and accuracy. △ Less

Submitted 9 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: 19 pages

arXiv:2303.13726 [pdf, other]

Topology-Based MPC for Automatic Footstep Placement and Contact Surface Selection

Authors: Jaehyun Shim, Carlos Mastalli, Thomas Corbères, Steve Tonneau, Vladimir Ivan, Sethu Vijayakumar

Abstract: State-of-the-art approaches to footstep planning assume reduced-order dynamics when solving the combinatorial problem of selecting contact surfaces in real time. However, in exchange for computational efficiency, these approaches ignore joint torque limits and limb dynamics. In this work, we address these limitations by presenting a topology-based approach that enables model predictive control (MP… ▽ More State-of-the-art approaches to footstep planning assume reduced-order dynamics when solving the combinatorial problem of selecting contact surfaces in real time. However, in exchange for computational efficiency, these approaches ignore joint torque limits and limb dynamics. In this work, we address these limitations by presenting a topology-based approach that enables model predictive control (MPC) to simultaneously plan full-body motions, torque commands, footstep placements, and contact surfaces in real time. To determine if a robot's foot is inside a contact surface, we borrow the winding number concept from topology. We then use this winding number and potential field to create a contact-surface penalty function. By using this penalty function, MPC can select a contact surface from all candidate surfaces in the vicinity and determine footstep placements within it. We demonstrate the benefits of our approach by showing the impact of considering full-body dynamics, which includes joint torque limits and limb dynamics, on the selection of footstep placements and contact surfaces. Furthermore, we validate the feasibility of deploying our topology-based approach in an MPC scheme and explore its potential capabilities through a series of experimental and simulation trials. △ Less

Submitted 29 July, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: 7 pages, 6 figures

Journal ref: IEEE International Conference on Robotics and Automation (ICRA), 2023

arXiv:2303.11545 [pdf, other]

Fix the Noise: Disentangling Source Feature for Controllable Domain Translation

Authors: Dongyeun Lee, Jae Young Lee, Doyeon Kim, Jaehyun Choi, Jaejun Yoo, Junmo Kim

Abstract: Recent studies show strong generative performance in domain translation especially by using transfer learning techniques on the unconditional generator. However, the control between different domain features using a single model is still challenging. Existing methods often require additional models, which is computationally demanding and leads to unsatisfactory visual quality. In addition, they ha… ▽ More Recent studies show strong generative performance in domain translation especially by using transfer learning techniques on the unconditional generator. However, the control between different domain features using a single model is still challenging. Existing methods often require additional models, which is computationally demanding and leads to unsatisfactory visual quality. In addition, they have restricted control steps, which prevents a smooth transition. In this paper, we propose a new approach for high-quality domain translation with better controllability. The key idea is to preserve source features within a disentangled subspace of a target feature space. This allows our method to smoothly control the degree to which it preserves source features while generating images from an entirely new domain using only a single model. Our extensive experiments show that the proposed method can produce more consistent and realistic images than previous works and maintain precise controllability over different levels of transformation. The code is available at https://github.com/LeeDongYeun/FixNoise. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023. The code is available at https://github.com/LeeDongYeun/FixNoise. Extended from arXiv:2204.14079 (AICC workshop at CVPR 2022)

arXiv:2303.08610 [pdf, other]

Blind Estimation of Audio Processing Graph

Authors: Sungho Lee, Jaehyun Park, Seungryeol Paik, Kyogu Lee

Abstract: Musicians and audio engineers sculpt and transform their sounds by connecting multiple processors, forming an audio processing graph. However, most deep-learning methods overlook this real-world practice and assume fixed graph settings. To bridge this gap, we develop a system that reconstructs the entire graph from a given reference audio. We first generate a realistic graph-reference pair dataset… ▽ More Musicians and audio engineers sculpt and transform their sounds by connecting multiple processors, forming an audio processing graph. However, most deep-learning methods overlook this real-world practice and assume fixed graph settings. To bridge this gap, we develop a system that reconstructs the entire graph from a given reference audio. We first generate a realistic graph-reference pair dataset and train a simple blind estimation system composed of a convolutional reference encoder and a transformer-based graph decoder. We apply our model to singing voice effects and drum mixing estimation tasks. Evaluation results show that our method can reconstruct complex signal routings, including multi-band processing and sidechaining. △ Less

Submitted 7 May, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: Accepted to ICASSP 2023

arXiv:2303.00918 [pdf, other]

STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables

Authors: Jaehyun Nam, Jihoon Tack, Kyungmin Lee, Hankook Lee, **woo Shin

Abstract: Learning with few labeled tabular samples is often an essential requirement for industrial machine learning applications as varieties of tabular data suffer from high annotation costs or have difficulties in collecting new samples for novel tasks. Despite the utter importance, such a problem is quite under-explored in the field of tabular learning, and existing few-shot learning schemes from other… ▽ More Learning with few labeled tabular samples is often an essential requirement for industrial machine learning applications as varieties of tabular data suffer from high annotation costs or have difficulties in collecting new samples for novel tasks. Despite the utter importance, such a problem is quite under-explored in the field of tabular learning, and existing few-shot learning schemes from other domains are not straightforward to apply, mainly due to the heterogeneous characteristics of tabular data. In this paper, we propose a simple yet effective framework for few-shot semi-supervised tabular learning, coined Self-generated Tasks from UNlabeled Tables (STUNT). Our key idea is to self-generate diverse few-shot tasks by treating randomly chosen columns as a target label. We then employ a meta-learning scheme to learn generalizable knowledge with the constructed tasks. Moreover, we introduce an unsupervised validation scheme for hyperparameter search (and early stop**) by generating a pseudo-validation set using STUNT from unlabeled data. Our experimental results demonstrate that our simple framework brings significant performance gain under various tabular few-shot learning benchmarks, compared to prior semi- and self-supervised baselines. Code is available at https://github.com/jaehyun513/STUNT. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: ICLR 2023 (Spotlight)

arXiv:2301.06392 [pdf, other]

I See-Through You: A Framework for Removing Foreground Occlusion in Both Sparse and Dense Light Field Images

Authors: Jiwan Hur, Jae Young Lee, Jaehyun Choi, Junmo Kim

Abstract: Light field (LF) camera captures rich information from a scene. Using the information, the LF de-occlusion (LF-DeOcc) task aims to reconstruct the occlusion-free center view image. Existing LF-DeOcc studies mainly focus on the sparsely sampled (sparse) LF images where most of the occluded regions are visible in other views due to the large disparity. In this paper, we expand LF-DeOcc in more chall… ▽ More Light field (LF) camera captures rich information from a scene. Using the information, the LF de-occlusion (LF-DeOcc) task aims to reconstruct the occlusion-free center view image. Existing LF-DeOcc studies mainly focus on the sparsely sampled (sparse) LF images where most of the occluded regions are visible in other views due to the large disparity. In this paper, we expand LF-DeOcc in more challenging datasets, densely sampled (dense) LF images, which are taken by a micro-lens-based portable LF camera. Due to the small disparity ranges of dense LF images, most of the background regions are invisible in any view. To apply LF-DeOcc in both LF datasets, we propose a framework, ISTY, which is defined and divided into three roles: (1) extract LF features, (2) define the occlusion, and (3) inpaint occluded regions. By dividing the framework into three specialized components according to the roles, the development and analysis can be easier. Furthermore, an explainable intermediate representation, an occlusion mask, can be obtained in the proposed framework. The occlusion mask is useful for comprehensive analysis of the model and other applications by manipulating the mask. In experiments, qualitative and quantitative results show that the proposed framework outperforms state-of-the-art LF-DeOcc methods in both sparse and dense LF datasets. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Comments: WACV 2023

arXiv:2211.15875 [pdf, other]

Data Poisoning Attack Aiming the Vulnerability of Continual Learning

Authors: Gyo** Han, Jaehyun Choi, Hyeong Gwon Hong, Junmo Kim

Abstract: Generally, regularization-based continual learning models limit access to the previous task data to imitate the real-world constraints related to memory and privacy. However, this introduces a problem in these models by not being able to track the performance on each task. In essence, current continual learning methods are susceptible to attacks on previous tasks. We demonstrate the vulnerability… ▽ More Generally, regularization-based continual learning models limit access to the previous task data to imitate the real-world constraints related to memory and privacy. However, this introduces a problem in these models by not being able to track the performance on each task. In essence, current continual learning methods are susceptible to attacks on previous tasks. We demonstrate the vulnerability of regularization-based continual learning methods by presenting a simple task-specific data poisoning attack that can be used in the learning process of a new task. Training data generated by the proposed attack causes performance degradation on a specific task targeted by the attacker. We experiment with the attack on the two representative regularization-based continual learning methods, Elastic Weight Consolidation (EWC) and Synaptic Intelligence (SI), trained with variants of MNIST dataset. The experiment results justify the vulnerability proposed in this paper and demonstrate the importance of develo** continual learning models that are robust to adversarial attacks. △ Less

Submitted 3 July, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: ICIP 2023 (NeurIPS 2022 ML Safety Workshop accepted paper)

arXiv:2211.02686 [pdf, ps, other]

LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training

Authors: Seock-Hwan Noh, Junsang Park, Dahoon Park, Jahyun Koo, Jeik Choi, Jaeha Kung

Abstract: When training early-stage deep neural networks (DNNs), generating intermediate features via convolution or linear layers occupied most of the execution time. Accordingly, extensive research has been done to reduce the computational burden of the convolution or linear layers. In recent mobile-friendly DNNs, however, the relative number of operations involved in processing these layers has significa… ▽ More When training early-stage deep neural networks (DNNs), generating intermediate features via convolution or linear layers occupied most of the execution time. Accordingly, extensive research has been done to reduce the computational burden of the convolution or linear layers. In recent mobile-friendly DNNs, however, the relative number of operations involved in processing these layers has significantly reduced. As a result, the proportion of the execution time of other layers, such as batch normalization layers, has increased. Thus, in this work, we conduct a detailed analysis of the batch normalization layer to efficiently reduce the runtime overhead in the batch normalization process. Backed up by the thorough analysis, we present an extremely efficient batch normalization, named LightNorm, and its associated hardware module. In more detail, we fuse three approximation techniques that are i) low bit-precision, ii) range batch normalization, and iii) block floating point. All these approximate techniques are carefully utilized not only to maintain the statistics of intermediate feature maps, but also to minimize the off-chip memory accesses. By using the proposed LightNorm hardware, we can achieve significant area and energy savings during the DNN training without hurting the training accuracy. This makes the proposed hardware a great candidate for the on-device training. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Comments: The paper is going to appearin the 40th IEEE International Conference on Computer Design (ICCD), 2022

arXiv:2210.01323 [pdf, other]

ASAP: Accurate semantic segmentation for real time performance

Authors: Jaehyun Park, Subin Lee, Eon Kim, Byeongjun Moon, Dabeen Yu, Yeonseung Yu, Junghwan Kim

Abstract: Feature fusion modules from encoder and self-attention module have been adopted in semantic segmentation. However, the computation of these modules is costly and has operational limitations in real-time environments. In addition, segmentation performance is limited in autonomous driving environments with a lot of contextual information perpendicular to the road surface, such as people, buildings,… ▽ More Feature fusion modules from encoder and self-attention module have been adopted in semantic segmentation. However, the computation of these modules is costly and has operational limitations in real-time environments. In addition, segmentation performance is limited in autonomous driving environments with a lot of contextual information perpendicular to the road surface, such as people, buildings, and general objects. In this paper, we propose an efficient feature fusion method, Feature Fusion with Different Norms (FFDN) that utilizes rich global context of multi-level scale and vertical pooling module before self-attention that preserves most contextual information while reducing the complexity of global context encoding in the vertical direction. By doing this, we could handle the properties of representation in global space and reduce additional computational cost. In addition, we analyze low performance in challenging cases including small and vertically featured objects. We achieve the mean Interaction of-union(mIoU) of 73.1 and the Frame Per Second(FPS) of 191, which are comparable results with state-of-the-arts on Cityscapes test datasets. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 5 pages, 4 figures

arXiv:2207.00555 [pdf, other]

FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning

Authors: Yeonghyeon Lee, Kangwook Jang, Jahyun Goo, Youngmoon Jung, Hoirin Kim

Abstract: Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such… ▽ More Large-scale speech self-supervised learning (SSL) has emerged to the main field of speech processing, however, the problem of computational cost arising from its vast size makes a high entry barrier to academia. In addition, existing distillation techniques of speech SSL models compress the model by reducing layers, which induces performance degradation in linguistic pattern recognition tasks such as phoneme recognition (PR). In this paper, we propose FitHuBERT, which makes thinner in dimension throughout almost all model components and deeper in layer compared to prior speech SSL distillation works. Moreover, we employ a time-reduction layer to speed up inference time and propose a method of hint-based distillation for less performance degradation. Our method reduces the model to 23.8% in size and 35.9% in inference time compared to HuBERT. Also, we achieve 12.1% word error rate and 13.3% phoneme error rate on the SUPERB benchmark which is superior than prior work. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: Accepted to Interspeech 2022

arXiv:2204.14079 [pdf, other]

Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN

Authors: Dongyeun Lee, Jae Young Lee, Doyeon Kim, Jaehyun Choi, Junmo Kim

Abstract: Transfer learning of StyleGAN has recently shown great potential to solve diverse tasks, especially in domain translation. Previous methods utilized a source model by swap** or freezing weights during transfer learning, however, they have limitations on visual quality and controlling source features. In other words, they require additional models that are computationally demanding and have restr… ▽ More Transfer learning of StyleGAN has recently shown great potential to solve diverse tasks, especially in domain translation. Previous methods utilized a source model by swap** or freezing weights during transfer learning, however, they have limitations on visual quality and controlling source features. In other words, they require additional models that are computationally demanding and have restricted control steps that prevent a smooth transition. In this paper, we propose a new approach to overcome these limitations. Instead of swap** or freezing, we introduce a simple feature matching loss to improve generation quality. In addition, to control the degree of source features, we train a target model with the proposed strategy, FixNoise, to preserve the source features only in a disentangled subspace of a target feature space. Owing to the disentangled feature space, our method can smoothly control the degree of the source features in a single model. Extensive experiments demonstrate that the proposed method can generate more consistent and realistic images than previous works. △ Less

Submitted 21 March, 2023; v1 submitted 29 April, 2022; originally announced April 2022.

Comments: Full CVPR 2023 paper is available at arXiv:2303.11545. Best paper of CVPRW AICC 2022 (CVPR 2022 Workshop on AI for Content Creation). The code is available at https://github.com/LeeDongYeun/FixNoise

arXiv:2204.03872 [pdf, other]

Controllable Missingness from Uncontrollable Missingness: Joint Learning Measurement Policy and Imputation

Authors: Seongwook Yoon, Jaehyun Kim, Heejeong Lim, Sanghoon Sull

Abstract: Due to the cost or interference of measurement, we need to control measurement system. Assuming that each variable can be measured sequentially, there exists optimal policy choosing next measurement for the former observations. Though optimal measurement policy is actually dependent on the goal of measurement, we mainly focus on retrieving complete data, so called as imputation. Also, we adapt the… ▽ More Due to the cost or interference of measurement, we need to control measurement system. Assuming that each variable can be measured sequentially, there exists optimal policy choosing next measurement for the former observations. Though optimal measurement policy is actually dependent on the goal of measurement, we mainly focus on retrieving complete data, so called as imputation. Also, we adapt the imputation method to missingness varying with measurement policy. However, learning measurement policy and imputation requires complete data which is impossible to be observed, unfortunately. To tackle this problem, we propose a data generation method and joint learning algorithm. The main idea is that 1) the data generation method is inherited by imputation method, and 2) the adaptation of imputation encourages measurement policy to learn more than individual learning. We implemented some variations of proposed algorithm for two different datasets and various missing rates. From the experimental results, we demonstrate that our algorithm is generally applicable and outperforms baseline methods. △ Less

Submitted 8 April, 2022; originally announced April 2022.

arXiv:2203.07554 [pdf, other]

Agile Maneuvers in Legged Robots: a Predictive Control Approach

Authors: Carlos Mastalli, Wolfgang Merkt, Guiyang Xin, Jaehyun Shim, Michael Mistry, Ioannis Havoutis, Sethu Vijayakumar

Abstract: Planning and execution of agile locomotion maneuvers have been a longstanding challenge in legged robotics. It requires to derive motion plans and local feedback policies in real-time to handle the nonholonomy of the kinetic momenta. To achieve so, we propose a hybrid predictive controller that considers the robot's actuation limits and full-body dynamics. It combines the feedback policies with ta… ▽ More Planning and execution of agile locomotion maneuvers have been a longstanding challenge in legged robotics. It requires to derive motion plans and local feedback policies in real-time to handle the nonholonomy of the kinetic momenta. To achieve so, we propose a hybrid predictive controller that considers the robot's actuation limits and full-body dynamics. It combines the feedback policies with tactile information to locally predict future actions. It converges within a few milliseconds thanks to a feasibility-driven approach. Our predictive controller enables ANYmal robots to generate agile maneuvers in realistic scenarios. A crucial element is to track the local feedback policies as, in contrast to whole-body control, they achieve the desired angular momentum. To the best of our knowledge, our predictive controller is the first to handle actuation limits, generate agile locomotion maneuvers, and execute optimal feedback policies for low level torque control without the use of a separate whole-body controller. △ Less

Submitted 18 July, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: 20 pages, 16 figures

Showing 1–50 of 77 results for author: Jaehyun