Search | arXiv e-print repository

arXiv:2407.01073 [pdf, other]

No More Potentially Dynamic Objects: Static Point Cloud Map Generation based on 3D Object Detection and Ground Projection

Authors: Soo** Woo, Donghwi Jung, Seong-Woo Kim

Abstract: In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static obje… ▽ More In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static objects include buildings and trees, otherwise, the dynamic objects contain objects such as parked cars that change their position over time. Removing dynamic objects from the point cloud map is crucial as they can degrade the quality and localization accuracy of the map. To address this issue, in this paper, we propose an algorithm that creates a map only consisting of static objects. We apply a 3D object detection algorithm to the point cloud data which are obtained from LiDAR to implement our pipeline. We then stack the points to create the map after performing ground segmentation and projection. As a result, not only we can eliminate currently dynamic objects at the time of map generation but also potentially dynamic objects such as parked vehicles. We validate the performance of our method using two kinds of datasets collected on real roads: KITTI and our dataset. The result demonstrates the capability of our proposal to create an accurate static map excluding dynamic objects from input point clouds. Also, we verified the improved performance of localization using a generated map based on our method. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00289 [pdf, other]

Personalised Outfit Recommendation via History-aware Transformers

Authors: David Jung, Julien Monteil, Philip Schulz, Volodymyr Vaskovych

Abstract: We present the history-aware transformer (HAT), a transformer-based model that uses shoppers' purchase history to personalise outfit predictions. The aim of this work is to recommend outfits that are internally coherent while matching an individual shopper's style and taste. To achieve this, we stack two transformer models, one that produces outfit representations and another one that processes th… ▽ More We present the history-aware transformer (HAT), a transformer-based model that uses shoppers' purchase history to personalise outfit predictions. The aim of this work is to recommend outfits that are internally coherent while matching an individual shopper's style and taste. To achieve this, we stack two transformer models, one that produces outfit representations and another one that processes the history of purchased outfits for a given shopper. We use these models to score an outfit's compatibility in the context of a shopper's preferences as inferred from their previous purchases. During training, the model learns to discriminate between purchased and random outfits using 3 losses: the focal loss for outfit compatibility typically used in the literature, a contrastive loss to bring closer learned outfit embeddings from a shopper's history, and an adaptive margin loss to facilitate learning from weak negatives. Together, these losses enable the model to make personalised recommendations based on a shopper's purchase history. Our experiments on the IQON3000 and Polyvore datasets show that HAT outperforms strong baselines on the outfit Compatibility Prediction (CP) and the Fill In The Blank (FITB) tasks. The model improves AUC for the CP hard task by 15.7% (IQON3000) and 19.4% (Polyvore) compared to previous SOTA results. It further improves accuracy on the FITB hard task by 6.5% and 9.7%, respectively. We provide ablation studies on the personalisation, constrastive loss, and adaptive margin loss that highlight the importance of these modelling choices. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2406.19848 [pdf, other]

3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints

Authors: Yoonkyu Yoo, Donghwi Jung, Seong-Woo Kim

Abstract: In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic stru… ▽ More In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic structures. Traditional methods relied on operator expertise for precise excavator operation, occasionally resulting in safety accidents. Therefore, there have been endeavors to attain precise excavator control through equation-based control algorithms. However, these methods had the limitation of necessitating prior information related to physical values of the excavator, rendering them unsuitable for the diverse range of excavators used in the field. To overcome these limitations, we have explored reinforcement learning-based control methods that do not demand prior knowledge of specific equipment but instead utilize data to train models. Nevertheless, existing reinforcement learning-based methods overlooked cabin swing rotation and confined the bucket's workspace to a 2D plane. Control confined within such a limited area diminishes the applicability of the algorithm in construction sites. We address this issue by expanding the previous 2D plane workspace of the bucket operation into a 3D space, incorporating cabin swing rotation. By expanding the workspace into 3D, excavators can execute continuous operations without requiring human intervention. To accomplish this objective, distinct targets were established for each joint, facilitating the training of action values for each joint independently, regardless of the progress of other joint learning. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.17869 [pdf, other]

Burst Image Super-Resolution with Base Frame Selection

Authors: Sanghyun Kim, Min Jung Lee, Woohyeok Kim, Deunsol Jung, Jaesung Rim, Sunghyun Cho, Minsu Cho

Abstract: Burst image super-resolution has been a topic of active research in recent years due to its ability to obtain a high-resolution image by using complementary information between multiple frames in the burst. In this work, we explore using burst shots with non-uniform exposures to confront real-world practical scenarios by introducing a new benchmark dataset, dubbed Non-uniformly Exposed Burst Image… ▽ More Burst image super-resolution has been a topic of active research in recent years due to its ability to obtain a high-resolution image by using complementary information between multiple frames in the burst. In this work, we explore using burst shots with non-uniform exposures to confront real-world practical scenarios by introducing a new benchmark dataset, dubbed Non-uniformly Exposed Burst Image (NEBI), that includes the burst frames at varying exposure times to obtain a broader range of irradiance and motion characteristics within a scene. As burst shots with non-uniform exposures exhibit varying levels of degradation, fusing information of the burst shots into the first frame as a base frame may not result in optimal image quality. To address this limitation, we propose a Frame Selection Network (FSN) for non-uniform scenarios. This network seamlessly integrates into existing super-resolution methods in a plug-and-play manner with low computational costs. The comparative analysis reveals the effectiveness of the nonuniform setting for the practical scenario and our FSN on synthetic-/real- NEBI datasets. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: CVPR2024W NTIRE accepted

arXiv:2406.17256 [pdf, other]

Disentangled Motion Modeling for Video Frame Interpolation

Authors: Jaihyun Lew, Jooyoung Choi, Chaehun Shin, Dahuin Jung, Sungroh Yoon

Abstract: Video frame interpolation (VFI) aims to synthesize intermediate frames in between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works employ the high quality generative models for perceptual quality. However, they require complex training and large computational cost for modeling on the pixel space. In this paper,… ▽ More Video frame interpolation (VFI) aims to synthesize intermediate frames in between existing frames to enhance visual smoothness and quality. Beyond the conventional methods based on the reconstruction loss, recent works employ the high quality generative models for perceptual quality. However, they require complex training and large computational cost for modeling on the pixel space. In this paper, we introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling. We propose disentangled two-stage training process, initially training a frame synthesis model to generate frames from input pairs and their optical flows. Subsequently, we propose a motion diffusion model, equipped with our novel diffusion U-Net architecture designed for optical flow, to produce bi-directional flows between frames. This method, by leveraging the simpler low-frequency representation of motions, achieves superior perceptual quality with reduced computational demands compared to generative modeling methods on the pixel space. Our method surpasses state-of-the-art methods in perceptual metrics across various benchmarks, demonstrating its efficacy and efficiency in VFI. Our code is available at: https://github.com/JHLew/MoMo △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2404.04819 [pdf, other]

Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer

Authors: Hyeong** Nam, Daniel Sungho Jung, Gyeongsik Moon, Kyoung Mu Lee

Abstract: Human-object contact serves as a strong cue to understand how humans physically interact with objects. Nevertheless, it is not widely explored to utilize human-object contact information for the joint reconstruction of 3D human and object from a single image. In this work, we present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between… ▽ More Human-object contact serves as a strong cue to understand how humans physically interact with objects. Nevertheless, it is not widely explored to utilize human-object contact information for the joint reconstruction of 3D human and object from a single image. In this work, we present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between humans and objects. There are two core designs in our system: 1) 3D-guided contact estimation and 2) contact-based 3D human and object refinement. First, for accurate human-object contact estimation, CONTHO initially reconstructs 3D humans and objects and utilizes them as explicit 3D guidance for contact estimation. Second, to refine the initial reconstructions of 3D human and object, we propose a novel contact-based refinement Transformer that effectively aggregates human features and object features based on the estimated human-object contact. The proposed contact-based refinement prevents the learning of erroneous correlation between human and object, which enables accurate 3D reconstruction. As a result, our CONTHO achieves state-of-the-art performance in both human-object contact estimation and joint reconstruction of 3D human and object. The code is publicly available at https://github.com/dqj5182/CONTHO_RELEASE. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: Published at CVPR 2024, 19 pages including the supplementary material

arXiv:2404.00450 [pdf, other]

Planning and Editing What You Retrieve for Enhanced Tool Learning

Authors: Tenghao Huang, Dongwon Jung, Muhao Chen

Abstract: Recent advancements in integrating external tools with Large Language Models (LLMs) have opened new frontiers, with applications in mathematical reasoning, code generators, and smart assistants. However, existing methods, relying on simple one-time retrieval strategies, fall short on effectively and accurately shortlisting relevant tools. This paper introduces a novel PLUTO (Planning, Learning, an… ▽ More Recent advancements in integrating external tools with Large Language Models (LLMs) have opened new frontiers, with applications in mathematical reasoning, code generators, and smart assistants. However, existing methods, relying on simple one-time retrieval strategies, fall short on effectively and accurately shortlisting relevant tools. This paper introduces a novel PLUTO (Planning, Learning, and Understanding for TOols) approach, encompassing `Plan-and-Retrieve (P&R)` and `Edit-and-Ground (E&G)` paradigms. The P&R paradigm consists of a neural retrieval module for shortlisting relevant tools and an LLM-based query planner that decomposes complex queries into actionable tasks, enhancing the effectiveness of tool utilization. The E&G paradigm utilizes LLMs to enrich tool descriptions based on user scenarios, bridging the gap between user queries and tool functionalities. Experiment results demonstrate that these paradigms significantly improve the recall and NDCG in tool retrieval tasks, significantly surpassing current state-of-the-art models. △ Less

Submitted 4 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: This paper is accepted at NAACL-Findings 2024

arXiv:2403.10911 [pdf, other]

Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation

Authors: Yeongtak Oh, Jonghyun Lee, Jooyoung Choi, Dahuin Jung, Uiwon Hwang, Sungroh Yoon

Abstract: Test-time adaptation (TTA) addresses the unforeseen distribution shifts occurring during test time. In TTA, both performance and, memory and time consumption serve as crucial considerations. A recent diffusion-based TTA approach for restoring corrupted images involves image-level updates. However, using pixel space diffusion significantly increases resource requirements compared to conventional mo… ▽ More Test-time adaptation (TTA) addresses the unforeseen distribution shifts occurring during test time. In TTA, both performance and, memory and time consumption serve as crucial considerations. A recent diffusion-based TTA approach for restoring corrupted images involves image-level updates. However, using pixel space diffusion significantly increases resource requirements compared to conventional model updating TTA approaches, revealing limitations as a TTA method. To address this, we propose a novel TTA method by leveraging a latent diffusion model (LDM) based image editing model and fine-tuning it with our newly introduced corruption modeling scheme. This scheme enhances the robustness of the diffusion model against distribution shifts by creating (clean, corrupted) image pairs and fine-tuning the model to edit corrupted images into clean ones. Moreover, we introduce a distilled variant to accelerate the model for corruption editing using only 4 network function evaluations (NFEs). We extensively validated our method across various architectures and datasets including image and video domains. Our model achieves the best performance with a 100 times faster runtime than that of a diffusion-based baseline. Furthermore, it outpaces the speed of the model updating TTA method based on data augmentation threefold, rendering an image-level updating approach more practical. △ Less

Submitted 18 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.09055 [pdf, other]

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Authors: Jaerin Lee, Daniel Sungho Jung, Kanggeon Lee, Kyoung Mu Lee

Abstract: The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing. Previous works have focused on improving the usability of diffusion models by reducing the inference time or increasing user interactivity by allowing new, fine-grained controls such as region-based text prompts. H… ▽ More The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing. Previous works have focused on improving the usability of diffusion models by reducing the inference time or increasing user interactivity by allowing new, fine-grained controls such as region-based text prompts. However, we empirically find that integrating both branches of works is nontrivial, limiting the potential of diffusion models. To solve this incompatibility, we present StreamMultiDiffusion, the first real-time region-based text-to-image generation framework. By stabilizing fast inference techniques and restructuring the model into a newly proposed multi-prompt stream batch architecture, we achieve $\times 10$ faster panorama generation than existing solutions, and the generation speed of 1.57 FPS in region-based text-to-image synthesis on a single RTX 2080 Ti GPU. Our solution opens up a new paradigm for interactive image generation named semantic palette, where high-quality images are generated in real-time from given multiple hand-drawn regions, encoding prescribed semantic meanings (e.g., eagle, girl). Our code and demo application are available at https://github.com/ironjr/StreamMultiDiffusion. △ Less

Submitted 1 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: 29 pages, 16 figures. v2: typos corrected, references added. Project page: https://jaerinlee.com/research/StreamMultiDiffusion

arXiv:2403.07366 [pdf, other]

Entropy is not Enough for Test-Time Adaptation: From the Perspective of Disentangled Factors

Authors: Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, Sungroh Yoon

Abstract: Test-time adaptation (TTA) fine-tunes pre-trained deep neural networks for unseen test data. The primary challenge of TTA is limited access to the entire test dataset during online updates, causing error accumulation. To mitigate it, TTA methods have utilized the model output's entropy as a confidence metric that aims to determine which samples have a lower likelihood of causing error. Through exp… ▽ More Test-time adaptation (TTA) fine-tunes pre-trained deep neural networks for unseen test data. The primary challenge of TTA is limited access to the entire test dataset during online updates, causing error accumulation. To mitigate it, TTA methods have utilized the model output's entropy as a confidence metric that aims to determine which samples have a lower likelihood of causing error. Through experimental studies, however, we observed the unreliability of entropy as a confidence metric for TTA under biased scenarios and theoretically revealed that it stems from the neglect of the influence of latent disentangled factors of data on predictions. Building upon these findings, we introduce a novel TTA method named Destroy Your Object (DeYO), which leverages a newly proposed confidence metric named Pseudo-Label Probability Difference (PLPD). PLPD quantifies the influence of the shape of an object on prediction by measuring the difference between predictions before and after applying an object-destructive transformation. DeYO consists of sample selection and sample weighting, which employ entropy and PLPD concurrently. For robust adaptation, DeYO prioritizes samples that dominantly incorporate shape information when making predictions. Our extensive experiments demonstrate the consistent superiority of DeYO over baseline methods across various scenarios, including biased and wild. Project page is publicly available at https://whitesnowdrop.github.io/DeYO/. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: ICLR 2024 Spotlight; 26 pages, 9 figures, 20 tables;

arXiv:2402.04535 [pdf, other]

MuNES: Multifloor Navigation Including Elevators and Stairs

Authors: Donghwi Jung, Chan Kim, Jae-Kyung Cho, Seong-Woo Kim

Abstract: We propose a scheme called MuNES for single map** and trajectory planning including elevators and stairs. Optimized multifloor trajectories are important for optimal interfloor movements of robots. However, given two or more options of moving between floors, it is difficult to select the best trajectory because there are no suitable indoor multifloor maps in the existing methods. To solve this p… ▽ More We propose a scheme called MuNES for single map** and trajectory planning including elevators and stairs. Optimized multifloor trajectories are important for optimal interfloor movements of robots. However, given two or more options of moving between floors, it is difficult to select the best trajectory because there are no suitable indoor multifloor maps in the existing methods. To solve this problem, MuNES creates a single multifloor map including elevators and stairs by estimating altitude changes based on pressure data. In addition, the proposed method performs floor-based loop detection for faster and more accurate loop closure. The single multifloor map is then voxelized leaving only the parts needed for trajectory planning. An optimal and realistic multifloor trajectory is generated by exploring the voxels using an A* algorithm based on the proposed cost function, which affects realistic factors. We tested this algorithm using data acquired from around a campus and note that a single accurate multifloor map could be created. Furthermore, optimal and realistic multifloor trajectory could be found by selecting the means of motion between floors between elevators and stairs according to factors such as the starting point, ending point, and elevator waiting time. The code and data used in this work are available at https://github.com/donghwijung/MuNES. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.14616 [pdf, other]

Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse

Authors: Seungyoon Lee, Dahyun Jung, Chanjun Park, Seolhwa Lee, Heuiseok Lim

Abstract: We introduce the concept of "Alternative Speech" as a new way to directly combat hate speech and complement the limitations of counter-narrative. An alternative speech provides practical alternatives to hate speech in real-world scenarios by offering speech-level corrections to speakers while considering the surrounding context and promoting speakers to reform. Further, an alternative speech can c… ▽ More We introduce the concept of "Alternative Speech" as a new way to directly combat hate speech and complement the limitations of counter-narrative. An alternative speech provides practical alternatives to hate speech in real-world scenarios by offering speech-level corrections to speakers while considering the surrounding context and promoting speakers to reform. Further, an alternative speech can combat hate speech alongside counter-narratives, offering a useful tool to address social issues such as racial discrimination and gender inequality. We propose the new concept and provide detailed guidelines for constructing the necessary dataset. Through discussion, we demonstrate that combining alternative speech and counter-narrative can be a more effective strategy for combating hate speech by complementing specificity and guiding capacity of counter-narrative. This paper presents another perspective for dealing with hate speech, offering viable remedies to complement the constraints of current approaches to mitigating harmful bias. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted for The First Workshop on Data-Centric AI (DCAI) at ICDM 2023

arXiv:2401.04143 [pdf, other]

RHOBIN Challenge: Reconstruction of Human Object Interaction

Authors: Xianghui Xie, Xi Wang, Nikos Athanasiou, Bharat Lal Bhatnagar, Chun-Hao P. Huang, Kaichun Mo, Hao Chen, Xia Jia, Zerui Zhang, Liangxian Cui, Xiao Lin, Bingqiao Qian, Jie Xiao, Wenfei Yang, Hyeong** Nam, Daniel Sungho Jung, Kihoon Kim, Kyoung Mu Lee, Otmar Hilliges, Gerard Pons-Moll

Abstract: Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate resear… ▽ More Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate research fields in computer vision for a long time. We hence proposed the first RHOBIN challenge: reconstruction of human-object interactions in conjunction with the RHOBIN workshop. It was aimed at bringing the research communities of human and object reconstruction as well as interaction modeling together to discuss techniques and exchange ideas. Our challenge consists of three tracks of 3D reconstruction from monocular RGB images with a focus on dealing with challenging interaction scenarios. Our challenge attracted more than 100 participants with more than 300 submissions, indicating the broad interest in the research communities. This paper describes the settings of our challenge and discusses the winning methods of each track in more detail. We observe that the human reconstruction task is becoming mature even under heavy occlusion settings while object pose estimation and joint reconstruction remain challenging tasks. With the growing interest in interaction modeling, we hope this report can provide useful insights and foster future research in this direction. Our workshop website can be found at \href{https://rhobin-challenge.github.io/}{https://rhobin-challenge.github.io/}. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: 14 pages, 5 tables, 7 figure. Technical report of the CVPR'23 workshop: RHOBIN challenge (https://rhobin-challenge.github.io/)

arXiv:2312.15924 [pdf, other]

Modeling and Analysis of GEO Satellite Networks

Authors: Dong-Hyun Jung, Hongjae Nam, Junil Choi, David J. Love

Abstract: The extensive coverage offered by satellites makes them effective in enhancing service continuity for users on dynamic airborne and maritime platforms, such as airplanes and ships. In particular, geosynchronous Earth orbit (GEO) satellites ensure stable connectivity for terrestrial users due to their stationary characteristics when observed from Earth. This paper introduces a novel approach to mod… ▽ More The extensive coverage offered by satellites makes them effective in enhancing service continuity for users on dynamic airborne and maritime platforms, such as airplanes and ships. In particular, geosynchronous Earth orbit (GEO) satellites ensure stable connectivity for terrestrial users due to their stationary characteristics when observed from Earth. This paper introduces a novel approach to model and analyze GEO satellite networks using stochastic geometry. We model the distribution of GEO satellites in the geostationary orbit according to a binomial point process (BPP) and examine satellite visibility depending on the terminal's latitude. Then, we identify potential distribution cases for GEO satellites and derive case probabilities based on the properties of the BPP. We also obtain the distance distributions between the terminal and GEO satellites and derive the coverage probability of the network. We further approximate the derived expressions using the Poisson limit theorem. Monte Carlo simulations are performed to validate the analytical findings, demonstrating a strong alignment between the analyses and simulations. The simplified analytical results can be used to estimate the coverage performance of GEO satellite networks by effectively modeling the positions of GEO satellites. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 12 pages, 9 figures, submitted to IEEE Transactions on Wireless Communications

arXiv:2312.04266 [pdf, other]

Activity Grammars for Temporal Action Segmentation

Authors: Dayoung Gong, Joonseok Lee, Deunsol Jung, Suha Kwak, Minsu Cho

Abstract: Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed activity video into a sequence of action segments, remains challenging for this reason. This paper addresses the problem by introducing an effective act… ▽ More Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed activity video into a sequence of action segments, remains challenging for this reason. This paper addresses the problem by introducing an effective activity grammar to guide neural predictions for temporal action segmentation. We propose a novel grammar induction algorithm that extracts a powerful context-free grammar from action sequence data. We also develop an efficient generalized parser that transforms frame-level probability distributions into a reliable sequence of actions according to the induced grammar with recursive rules. Our approach can be combined with any neural network for temporal action segmentation to enhance the sequence prediction and discover its compositional structure. Experimental results demonstrate that our method significantly improves temporal action segmentation in terms of both performance and interpretability on two standard benchmarks, Breakfast and 50 Salads. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2311.15890 [pdf, other]

Stability-Informed Initialization of Neural Ordinary Differential Equations

Authors: Theodor Westny, Arman Mohammadi, Daniel Jung, Erik Frisk

Abstract: This paper addresses the training of Neural Ordinary Differential Equations (neural ODEs), and in particular explores the interplay between numerical integration techniques, stability regions, step size, and initialization techniques. It is shown how the choice of integration technique implicitly regularizes the learned model, and how the solver's corresponding stability region affects training an… ▽ More This paper addresses the training of Neural Ordinary Differential Equations (neural ODEs), and in particular explores the interplay between numerical integration techniques, stability regions, step size, and initialization techniques. It is shown how the choice of integration technique implicitly regularizes the learned model, and how the solver's corresponding stability region affects training and prediction performance. From this analysis, a stability-informed parameter initialization technique is introduced. The effectiveness of the initialization method is displayed across several learning benchmarks and industrial applications. △ Less

Submitted 1 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.16492 [pdf, other]

On the Powerfulness of Textual Outlier Exposure for Visual OoD Detection

Authors: Sangha Park, Jisoo Mok, Dahuin Jung, Saehyung Lee, Sungroh Yoon

Abstract: Successful detection of Out-of-Distribution (OoD) data is becoming increasingly important to ensure safe deployment of neural networks. One of the main challenges in OoD detection is that neural networks output overconfident predictions on OoD data, make it difficult to determine OoD-ness of data solely based on their predictions. Outlier exposure addresses this issue by introducing an additional… ▽ More Successful detection of Out-of-Distribution (OoD) data is becoming increasingly important to ensure safe deployment of neural networks. One of the main challenges in OoD detection is that neural networks output overconfident predictions on OoD data, make it difficult to determine OoD-ness of data solely based on their predictions. Outlier exposure addresses this issue by introducing an additional loss that encourages low-confidence predictions on OoD data during training. While outlier exposure has shown promising potential in improving OoD detection performance, all previous studies on outlier exposure have been limited to utilizing visual outliers. Drawing inspiration from the recent advancements in vision-language pre-training, this paper venture out to the uncharted territory of textual outlier exposure. First, we uncover the benefits of using textual outliers by replacing real or virtual outliers in the image-domain with textual equivalents. Then, we propose various ways of generating preferable textual outliers. Our extensive experiments demonstrate that generated textual outliers achieve competitive performance on large-scale OoD and hard OoD benchmarks. Furthermore, we conduct empirical analyses of textual outliers to provide primary criteria for designing advantageous textual outliers: near-distribution, descriptiveness, and inclusion of visual semantics. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2310.10088 [pdf, other]

PUCA: Patch-Unshuffle and Channel Attention for Enhanced Self-Supervised Image Denoising

Authors: Hyemi Jang, Junsung Park, Dahuin Jung, Jaihyun Lew, Ho Bae, Sungroh Yoon

Abstract: Although supervised image denoising networks have shown remarkable performance on synthesized noisy images, they often fail in practice due to the difference between real and synthesized noise. Since clean-noisy image pairs from the real world are extremely costly to gather, self-supervised learning, which utilizes noisy input itself as a target, has been studied. To prevent a self-supervised deno… ▽ More Although supervised image denoising networks have shown remarkable performance on synthesized noisy images, they often fail in practice due to the difference between real and synthesized noise. Since clean-noisy image pairs from the real world are extremely costly to gather, self-supervised learning, which utilizes noisy input itself as a target, has been studied. To prevent a self-supervised denoising model from learning identical map**, each output pixel should not be influenced by its corresponding input pixel; This requirement is known as J-invariance. Blind-spot networks (BSNs) have been a prevalent choice to ensure J-invariance in self-supervised image denoising. However, constructing variations of BSNs by injecting additional operations such as downsampling can expose blinded information, thereby violating J-invariance. Consequently, convolutions designed specifically for BSNs have been allowed only, limiting architectural flexibility. To overcome this limitation, we propose PUCA, a novel J-invariant U-Net architecture, for self-supervised denoising. PUCA leverages patch-unshuffle/shuffle to dramatically expand receptive fields while maintaining J-invariance and dilated attention blocks (DABs) for global context incorporation. Experimental results demonstrate that PUCA achieves state-of-the-art performance, outperforming existing methods in self-supervised image denoising. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2309.01943 [pdf, other]

Extract-and-Adaptation Network for 3D Interacting Hand Mesh Recovery

Authors: JoonKyu Park, Daniel Sungho Jung, Gyeongsik Moon, Kyoung Mu Lee

Abstract: Understanding how two hands interact with each other is a key component of accurate 3D interacting hand mesh recovery. However, recent Transformer-based methods struggle to learn the interaction between two hands as they directly utilize two hand features as input tokens, which results in distant token problem. The distant token problem represents that input tokens are in heterogeneous spaces, lea… ▽ More Understanding how two hands interact with each other is a key component of accurate 3D interacting hand mesh recovery. However, recent Transformer-based methods struggle to learn the interaction between two hands as they directly utilize two hand features as input tokens, which results in distant token problem. The distant token problem represents that input tokens are in heterogeneous spaces, leading Transformer to fail in capturing correlation between input tokens. Previous Transformer-based methods suffer from the problem especially when poses of two hands are very different as they project features from a backbone to separate left and right hand-dedicated features. We present EANet, extract-and-adaptation network, with EABlock, the main component of our network. Rather than directly utilizing two hand features as input tokens, our EABlock utilizes two complementary types of novel tokens, SimToken and JoinToken, as input tokens. Our two novel tokens are from a combination of separated two hand features; hence, it is much more robust to the distant token problem. Using the two type of tokens, our EABlock effectively extracts interaction feature and adapts it to each hand. The proposed EANet achieves the state-of-the-art performance on 3D interacting hands benchmarks. The codes are available at https://github.com/jkpark0825/EANet. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted at ICCVW 2023

arXiv:2308.06554 [pdf, other]

Cyclic Test-Time Adaptation on Monocular Video for 3D Human Mesh Reconstruction

Authors: Hyeong** Nam, Daniel Sungho Jung, Yeonguk Oh, Kyoung Mu Lee

Abstract: Despite recent advances in 3D human mesh reconstruction, domain gap between training and test data is still a major challenge. Several prior works tackle the domain gap problem via test-time adaptation that fine-tunes a network relying on 2D evidence (e.g., 2D human keypoints) from test images. However, the high reliance on 2D evidence during adaptation causes two major issues. First, 2D evidence… ▽ More Despite recent advances in 3D human mesh reconstruction, domain gap between training and test data is still a major challenge. Several prior works tackle the domain gap problem via test-time adaptation that fine-tunes a network relying on 2D evidence (e.g., 2D human keypoints) from test images. However, the high reliance on 2D evidence during adaptation causes two major issues. First, 2D evidence induces depth ambiguity, preventing the learning of accurate 3D human geometry. Second, 2D evidence is noisy or partially non-existent during test time, and such imperfect 2D evidence leads to erroneous adaptation. To overcome the above issues, we introduce CycleAdapt, which cyclically adapts two networks: a human mesh reconstruction network (HMRNet) and a human motion denoising network (MDNet), given a test video. In our framework, to alleviate high reliance on 2D evidence, we fully supervise HMRNet with generated 3D supervision targets by MDNet. Our cyclic adaptation scheme progressively elaborates the 3D supervision targets, which compensate for imperfect 2D evidence. As a result, our CycleAdapt achieves state-of-the-art performance compared to previous test-time adaptation methods. The codes are available at https://github.com/hygenie1228/CycleAdapt_RELEASE. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Comments: Published at ICCV 2023, 16 pages including the supplementary material

arXiv:2307.14856 [pdf, other]

Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

Authors: Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On

Abstract: In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architectur… ▽ More In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger and exhibits significant performance improvements compared to conventional seq2seq models across a variety of settings. We posit that, with the right configuration and prompt design, seq2seq models can be highly effective few-shot learners for a wide spectrum of applications. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2306.14470 [pdf, other]

Knowledge Graph-Augmented Korean Generative Commonsense Reasoning

Authors: Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim

Abstract: Generative commonsense reasoning refers to the task of generating acceptable and logical assumptions about everyday situations based on commonsense understanding. By utilizing an existing dataset such as Korean CommonGen, language generation models can learn commonsense reasoning specific to the Korean language. However, language models often fail to consider the relationships between concepts and… ▽ More Generative commonsense reasoning refers to the task of generating acceptable and logical assumptions about everyday situations based on commonsense understanding. By utilizing an existing dataset such as Korean CommonGen, language generation models can learn commonsense reasoning specific to the Korean language. However, language models often fail to consider the relationships between concepts and the deep knowledge inherent to concepts. To address these limitations, we propose a method to utilize the Korean knowledge graph data for text generation. Our experimental result shows that the proposed method can enhance the efficiency of Korean commonsense inference, thereby underlining the significance of employing supplementary data. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: Accepted for Data-centric Machine Learning Research (DMLR) Workshop at ICML 2023

arXiv:2306.05067 [pdf, other]

Improving Visual Prompt Tuning for Self-supervised Vision Transformers

Authors: Seungryong Yoo, Eunji Kim, Dahuin Jung, Jungbeom Lee, Sungroh Yoon

Abstract: Visual Prompt Tuning (VPT) is an effective tuning method for adapting pretrained Vision Transformers (ViTs) to downstream tasks. It leverages extra learnable tokens, known as prompts, which steer the frozen pretrained ViTs. Although VPT has demonstrated its applicability with supervised vision transformers, it often underperforms with self-supervised ones. Through empirical observations, we deduce… ▽ More Visual Prompt Tuning (VPT) is an effective tuning method for adapting pretrained Vision Transformers (ViTs) to downstream tasks. It leverages extra learnable tokens, known as prompts, which steer the frozen pretrained ViTs. Although VPT has demonstrated its applicability with supervised vision transformers, it often underperforms with self-supervised ones. Through empirical observations, we deduce that the effectiveness of VPT hinges largely on the ViT blocks with which the prompt tokens interact. Specifically, VPT shows improved performance on image classification tasks for MAE and MoCo v3 when the prompt tokens are inserted into later blocks rather than the first block. These observations suggest that there exists an optimal location of blocks for the insertion of prompt tokens. Unfortunately, identifying the optimal blocks for prompts within each self-supervised ViT for diverse future scenarios is a costly process. To mitigate this problem, we propose a simple yet effective method that learns a gate for each ViT block to adjust its intervention into the prompt tokens. With our method, prompt tokens are selectively influenced by blocks that require steering for task adaptation. Our method outperforms VPT variants in FGVC and VTAB image classification and ADE20K semantic segmentation. The code is available at https://github.com/ryongithub/GatedPromptTuning. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: International Conference on Machine Learning (ICML) 2023

arXiv:2306.01574 [pdf, other]

Probabilistic Concept Bottleneck Models

Authors: Eunji Kim, Dahuin Jung, Sangha Park, Siwon Kim, Sungroh Yoon

Abstract: Interpretable models are designed to make decisions in a human-interpretable manner. Representatively, Concept Bottleneck Models (CBM) follow a two-step process of concept prediction and class prediction based on the predicted concepts. CBM provides explanations with high-level concepts derived from concept predictions; thus, reliable concept predictions are important for trustworthiness. In this… ▽ More Interpretable models are designed to make decisions in a human-interpretable manner. Representatively, Concept Bottleneck Models (CBM) follow a two-step process of concept prediction and class prediction based on the predicted concepts. CBM provides explanations with high-level concepts derived from concept predictions; thus, reliable concept predictions are important for trustworthiness. In this study, we address the ambiguity issue that can harm reliability. While the existence of a concept can often be ambiguous in the data, CBM predicts concepts deterministically without considering this ambiguity. To provide a reliable interpretation against this ambiguity, we propose Probabilistic Concept Bottleneck Models (ProbCBM). By leveraging probabilistic concept embeddings, ProbCBM models uncertainty in concept prediction and provides explanations based on the concept and its corresponding uncertainty. This uncertainty enhances the reliability of the explanations. Furthermore, as class uncertainty is derived from concept uncertainty in ProbCBM, we can explain class uncertainty by means of concept uncertainty. Code is publicly available at https://github.com/ejkim47/prob-cbm. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: International Conference on Machine Learning (ICML) 2023

arXiv:2305.18726 [pdf, other]

Diffusion-Stego: Training-free Diffusion Generative Steganography via Message Projection

Authors: Daegyu Kim, Chaehun Shin, Jooyoung Choi, Dahuin Jung, Sungroh Yoon

Abstract: Generative steganography is the process of hiding secret messages in generated images instead of cover images. Existing studies on generative steganography use GAN or Flow models to obtain high hiding message capacity and anti-detection ability over cover images. However, they create relatively unrealistic stego images because of the inherent limitations of generative models. We propose Diffusion-… ▽ More Generative steganography is the process of hiding secret messages in generated images instead of cover images. Existing studies on generative steganography use GAN or Flow models to obtain high hiding message capacity and anti-detection ability over cover images. However, they create relatively unrealistic stego images because of the inherent limitations of generative models. We propose Diffusion-Stego, a generative steganography approach based on diffusion models which outperform other generative models in image generation. Diffusion-Stego projects secret messages into latent noise of diffusion models and generates stego images with an iterative denoising process. Since the naive hiding of secret messages into noise boosts visual degradation and decreases extracted message accuracy, we introduce message projection, which hides messages into noise space while addressing these issues. We suggest three options for message projection to adjust the trade-off between extracted message accuracy, anti-detection ability, and image quality. Diffusion-Stego is a training-free approach, so we can apply it to pre-trained diffusion models which generate high-quality images, or even large-scale text-to-image models, such as Stable diffusion. Diffusion-Stego achieved a high capacity of messages (3.0 bpp of binary messages with 98% accuracy, and 6.0 bpp with 90% accuracy) as well as high quality (with a FID score of 2.77 for 1.0 bpp on the FFHQ 64$\times$64 dataset) that makes it challenging to distinguish from real images in the PNG format. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.04670 [pdf, other]

Analysis of Numerical Integration in RNN-Based Residuals for Fault Diagnosis of Dynamic Systems

Authors: Arman Mohammadi, Theodor Westny, Daniel Jung, Mattias Krysander

Abstract: Data-driven modeling and machine learning are widely used to model the behavior of dynamic systems. One application is the residual evaluation of technical systems where model predictions are compared with measurement data to create residuals for fault diagnosis applications. While recurrent neural network models have been shown capable of modeling complex non-linear dynamic systems, they are limi… ▽ More Data-driven modeling and machine learning are widely used to model the behavior of dynamic systems. One application is the residual evaluation of technical systems where model predictions are compared with measurement data to create residuals for fault diagnosis applications. While recurrent neural network models have been shown capable of modeling complex non-linear dynamic systems, they are limited to fixed steps discrete-time simulation. Modeling using neural ordinary differential equations, however, make it possible to evaluate the state variables at specific times, compute gradients when training the model and use standard numerical solvers to explicitly model the underlying dynamic of the time-series data. Here, the effect of solver selection on the performance of neural ordinary differential equation residuals during training and evaluation is investigated. The paper includes a case study of a heavy-duty truck's after-treatment system to highlight the potential of these techniques for improving fault diagnosis performance. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2305.01955 [pdf, other]

Satellite Clusters Flying in Formation: Orbital Configuration-Dependent Performance Analyses

Authors: Dong-Hyun Jung, Joon-Gyu Ryu, Junil Choi

Abstract: This paper considers a downlink satellite communication system where a satellite cluster, i.e., a satellite swarm consisting of one leader and multiple follower satellites, serves a ground terminal. The satellites in the cluster form either a linear or circular formation moving in a group and cooperatively send their signals by maximum ratio transmission precoding. We first conduct a coordinate tr… ▽ More This paper considers a downlink satellite communication system where a satellite cluster, i.e., a satellite swarm consisting of one leader and multiple follower satellites, serves a ground terminal. The satellites in the cluster form either a linear or circular formation moving in a group and cooperatively send their signals by maximum ratio transmission precoding. We first conduct a coordinate transformation to effectively capture the relative positions of satellites in the cluster. Next, we derive an exact expression for the orbital configuration-dependent outage probability under the Nakagami fading by using the distribution of the sum of independent Gamma random variables. In addition, we obtain a simpler approximated expression for the outage probability with the help of second-order moment-matching. We also analyze asymptotic behavior in the high signal-to-noise ratio regime and the diversity order of the outage performance. Finally, we verify the analytical results through Monte Carlo simulations. Our analytical results provide the performance of satellite cluster-based communication systems based on specific orbital configurations, which can be used to design reliable satellite clusters in terms of cluster size, formation, and orbits. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: 5 pages, 5 figures, submitted to IEEE Transactions on Vehicular Technology

arXiv:2304.04997 [pdf, other]

Relational Context Learning for Human-Object Interaction Detection

Authors: Sanghyun Kim, Deunsol Jung, Minsu Cho

Abstract: Recent state-of-the-art methods for HOI detection typically build on transformer architectures with two decoder branches, one for human-object pair detection and the other for interaction classification. Such disentangled transformers, however, may suffer from insufficient context exchange between the branches and lead to a lack of context information for relational reasoning, which is critical in… ▽ More Recent state-of-the-art methods for HOI detection typically build on transformer architectures with two decoder branches, one for human-object pair detection and the other for interaction classification. Such disentangled transformers, however, may suffer from insufficient context exchange between the branches and lead to a lack of context information for relational reasoning, which is critical in discovering HOI instances. In this work, we propose the multiplex relation network (MUREN) that performs rich context exchange between three decoder branches using unary, pairwise, and ternary relations of human, object, and interaction tokens. The proposed method learns comprehensive relational contexts for discovering HOI instances, achieving state-of-the-art performance on two standard benchmarks for HOI detection, HICO-DET and V-COCO. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: accepted to CVPR2023

arXiv:2304.03495 [pdf, other]

Devil's on the Edges: Selective Quad Attention for Scene Graph Generation

Authors: Deunsol Jung, Sanghyun Kim, Won Hwa Kim, Minsu Cho

Abstract: Scene graph generation aims to construct a semantic graph structure from an image such that its nodes and edges respectively represent objects and their relationships. One of the major challenges for the task lies in the presence of distracting objects and relationships in images; contextual reasoning is strongly distracted by irrelevant objects or backgrounds and, more importantly, a vast number… ▽ More Scene graph generation aims to construct a semantic graph structure from an image such that its nodes and edges respectively represent objects and their relationships. One of the major challenges for the task lies in the presence of distracting objects and relationships in images; contextual reasoning is strongly distracted by irrelevant objects or backgrounds and, more importantly, a vast number of irrelevant candidate relations. To tackle the issue, we propose the Selective Quad Attention Network (SQUAT) that learns to select relevant object pairs and disambiguate them via diverse contextual interactions. SQUAT consists of two main components: edge selection and quad attention. The edge selection module selects relevant object pairs, i.e., edges in the scene graph, which helps contextual reasoning, and the quad attention module then updates the edge features using both edge-to-node and edge-to-edge cross-attentions to capture contextual information between objects and object pairs. Experiments demonstrate the strong performance and robustness of SQUAT, achieving the state of the art on the Visual Genome and Open Images v6 benchmarks. △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: Accepted at CVPR 2023; Project page at https://cvlab.postech.ac.kr/research/SQUAT/

arXiv:2303.15060 [pdf, other]

TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering

Authors: Jaehoon Choi, Dongki Jung, Taejae Lee, Sangwook Kim, Youngdong Jung, Dinesh Manocha, Donghwan Lee

Abstract: We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone which offers access to images, depth maps, and valid poses. Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps and refines camera poses guided by corresponding depth. Then, we adopt the neural implicit surface reconstruction method, which allows for high-qu… ▽ More We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone which offers access to images, depth maps, and valid poses. Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps and refines camera poses guided by corresponding depth. Then, we adopt the neural implicit surface reconstruction method, which allows for high-quality mesh and develops a new training process for applying a regularization provided by classical multi-view stereo methods. Moreover, we apply a differentiable rendering to fine-tune incomplete texture maps and generate textures which are perceptually closer to the original scene. Our pipeline can be applied to any common objects in the real world without the need for either in-the-lab environments or accurate mask images. We demonstrate results of captured objects with complex shapes and validate our method numerically against existing 3D reconstruction and texture map** methods. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR23. Project Page: https://jh-choi.github.io/TMO/

arXiv:2303.11853 [pdf, other]

LoRCoN-LO: Long-term Recurrent Convolutional Network-based LiDAR Odometry

Authors: Donghwi Jung, Jae-Kyung Cho, Younghwa Jung, Soohyun Shin, Seong-Woo Kim

Abstract: We propose a deep learning-based LiDAR odometry estimation method called LoRCoN-LO that utilizes the long-term recurrent convolutional network (LRCN) structure. The LRCN layer is a structure that can process spatial and temporal information at once by using both CNN and LSTM layers. This feature is suitable for predicting continuous robot movements as it uses point clouds that contain spatial info… ▽ More We propose a deep learning-based LiDAR odometry estimation method called LoRCoN-LO that utilizes the long-term recurrent convolutional network (LRCN) structure. The LRCN layer is a structure that can process spatial and temporal information at once by using both CNN and LSTM layers. This feature is suitable for predicting continuous robot movements as it uses point clouds that contain spatial information. Therefore, we built a LoRCoN-LO model using the LRCN layer, and predicted the pose of the robot through this model. For performance verification, we conducted experiments exploiting a public dataset (KITTI). The results of the experiment show that LoRCoN-LO displays accurate odometry prediction in the dataset. The code is available at https://github.com/donghwijung/LoRCoN-LO. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: 4 pages, ICEIC 2023

arXiv:2303.07846 [pdf, other]

Sample-efficient Adversarial Imitation Learning

Authors: Dahuin Jung, Hyungyu Lee, Sungroh Yoon

Abstract: Imitation learning, in which learning is performed by demonstration, has been studied and advanced for sequential decision-making tasks in which a reward function is not predefined. However, imitation learning methods still require numerous expert demonstration samples to successfully imitate an expert's behavior. To improve sample efficiency, we utilize self-supervised representation learning, wh… ▽ More Imitation learning, in which learning is performed by demonstration, has been studied and advanced for sequential decision-making tasks in which a reward function is not predefined. However, imitation learning methods still require numerous expert demonstration samples to successfully imitate an expert's behavior. To improve sample efficiency, we utilize self-supervised representation learning, which can generate vast training signals from the given data. In this study, we propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations that are robust to diverse distortions and temporally predictive, on non-image control tasks. In particular, in comparison with existing self-supervised learning methods for tabular data, we propose a different corruption method for state and action representations that is robust to diverse distortions. We theoretically and empirically observe that making an informative feature manifold with less sample complexity significantly improves the performance of imitation learning. The proposed method shows a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations and additional experiments using demonstrations with varying optimality to provide insights into a range of factors. △ Less

Submitted 23 January, 2024; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: Published at JMLR (Journal of Machine Learning Research), A preliminary version of this manuscript was presented at Deep RL Workshop, NeurIPS 2022

arXiv:2302.08741 [pdf, other]

New Insights for the Stability-Plasticity Dilemma in Online Continual Learning

Authors: Dahuin Jung, Dong** Lee, Sunwon Hong, Hyemi Jang, Ho Bae, Sungroh Yoon

Abstract: The aim of continual learning is to learn new tasks continuously (i.e., plasticity) without forgetting previously learned knowledge from old tasks (i.e., stability). In the scenario of online continual learning, wherein data comes strictly in a streaming manner, the plasticity of online continual learning is more vulnerable than offline continual learning because the training signal that can be ob… ▽ More The aim of continual learning is to learn new tasks continuously (i.e., plasticity) without forgetting previously learned knowledge from old tasks (i.e., stability). In the scenario of online continual learning, wherein data comes strictly in a streaming manner, the plasticity of online continual learning is more vulnerable than offline continual learning because the training signal that can be obtained from a single data point is limited. To overcome the stability-plasticity dilemma in online continual learning, we propose an online continual learning framework named multi-scale feature adaptation network (MuFAN) that utilizes a richer context encoding extracted from different levels of a pre-trained network. Additionally, we introduce a novel structure-wise distillation loss and replace the commonly used batch normalization layer with a newly proposed stability-plasticity normalization module to train MuFAN that simultaneously maintains high plasticity and stability. MuFAN outperforms other state-of-the-art continual learning methods on the SVHN, CIFAR100, miniImageNet, and CORe50 datasets. Extensive experiments and ablation studies validate the significance and scalability of each proposed component: 1) multi-scale feature maps from a pre-trained encoder, 2) the structure-wise distillation loss, and 3) the stability-plasticity normalization module in MuFAN. Code is publicly available at https://github.com/whitesnowdrop/MuFAN. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: Accepted to ICLR2023

arXiv:2301.08386 [pdf, other]

doi 10.1109/MVT.2023.3262360

Satellite Clustering for Non-Terrestrial Networks: Concept, Architectures, and Applications

Authors: Dong-Hyun Jung, Gyeongrae Im, Joon-Gyu Ryu, Seungkeun Park, Heejung Yu, Junil Choi

Abstract: Recently, mega-constellations with a massive number of low Earth orbit (LEO) satellites are being considered as a possible solution for providing global coverage due to relatively low latency and high throughput compared to geosynchronous orbit satellites. However, as the number of satellites and operators participating in the LEO constellation increases, inter-satellite interference will become m… ▽ More Recently, mega-constellations with a massive number of low Earth orbit (LEO) satellites are being considered as a possible solution for providing global coverage due to relatively low latency and high throughput compared to geosynchronous orbit satellites. However, as the number of satellites and operators participating in the LEO constellation increases, inter-satellite interference will become more severe, which may yield marginal improvement or even decrement in network throughput. In this article, we introduce the concept of satellite clusters that can enhance network performance through satellites' cooperative transmissions. The characteristics, formation types, and transmission schemes for the satellite clusters are highlighted. Simulation results evaluate the impact of clustering from coverage and capacity perspectives, showing that when the number of satellites is large, the performance of clustered networks outperforms the unclustered ones. The viable network architectures of the satellite cluster are proposed based on the 3GPP standard. Finally, the future applications of clustered satellite networks are discussed. △ Less

Submitted 19 January, 2023; originally announced January 2023.

Comments: 7 pages, 7 figures, 1 table, submitted to IEEE Vehicular Technology Magazine

Journal ref: IEEE Vehicular Technology Magazine, vol. 18, no. 3, pp. 29-37, Sep. 2023

arXiv:2211.07116 [pdf, other]

Few-shot Metric Learning: Online Adaptation of Embedding for Retrieval

Authors: Deunsol Jung, Dahyun Kang, Suha Kwak, Minsu Cho

Abstract: Metric learning aims to build a distance metric typically by learning an effective embedding function that maps similar objects into nearby points in its embedding space. Despite recent advances in deep metric learning, it remains challenging for the learned metric to generalize to unseen classes with a substantial domain gap. To tackle the issue, we explore a new problem of few-shot metric learni… ▽ More Metric learning aims to build a distance metric typically by learning an effective embedding function that maps similar objects into nearby points in its embedding space. Despite recent advances in deep metric learning, it remains challenging for the learned metric to generalize to unseen classes with a substantial domain gap. To tackle the issue, we explore a new problem of few-shot metric learning that aims to adapt the embedding function to the target domain with only a few annotated data. We introduce three few-shot metric learning baselines and propose the Channel-Rectifier Meta-Learning (CRML), which effectively adapts the metric space online by adjusting channels of intermediate layers. Experimental analyses on miniImageNet, CUB-200-2011, MPII, as well as a new dataset, miniDeepFashion, demonstrate that our method consistently improves the learned metric by adapting it to target classes and achieves a greater gain in image retrieval when the domain gap from the source classes is larger. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Accepted at ACCV 2022

arXiv:2211.06390 [pdf, other]

The BlackParrot BedRock Cache Coherence System

Authors: Mark Wyse, Daniel Petrisko, Farzam Gilani, Yuan-Mao Chueh, Paul Gao, Dai Cheol Jung, Sripathi Muralitharan, Shashank Vijaya Ranga, Mark Oskin, Michael Taylor

Abstract: This paper presents BP-BedRock, the open-source cache coherence protocol and system implemented within the BlackParrot 64-bit RISC-V multicore processor. BP-BedRock implements the BedRock directory-based MOESIF cache coherence protocol and includes two different open-source coherence protocol engines, one FSM-based and the other microcode programmable. Both coherence engines support coherent uncac… ▽ More This paper presents BP-BedRock, the open-source cache coherence protocol and system implemented within the BlackParrot 64-bit RISC-V multicore processor. BP-BedRock implements the BedRock directory-based MOESIF cache coherence protocol and includes two different open-source coherence protocol engines, one FSM-based and the other microcode programmable. Both coherence engines support coherent uncacheable access to cacheable memory and L1-based atomic read-modify-write operations. Fitted within the BlackParrot multicore, BP-BedRock has been silicon validated in a GlobalFoundries 12nm FinFET process and FPGA validated with both coherence engines in 8-core configurations, booting Linux and running off the shelf benchmarks. After describing BP-BedRock and the design of the two coherence engines, we study their performance by analyzing processing occupancy and running the Splash-3 benchmarks on the 8-core FPGA implementations. Careful design and coherence-specific ISA extensions enable the programmable controller to achieve performance within 1% of the fixed-function FSM controller on average (2.3% worst-case) as demonstrated on our FPGA test system. Analysis shows that the programmable coherence engine increases die area by only 4% in an ASIC process and increases logic utilization by only 6.3% on FPGA with one additional block RAM added per core. △ Less

Submitted 11 November, 2022; originally announced November 2022.

arXiv:2210.14226 [pdf, other]

FedClassAvg: Local Representation Learning for Personalized Federated Learning on Heterogeneous Neural Networks

Authors: Jaehee Jang, Heonseok Ha, Dahuin Jung, Sungroh Yoon

Abstract: Personalized federated learning is aimed at allowing numerous clients to train personalized models while participating in collaborative training in a communication-efficient manner without exchanging private data. However, many personalized federated learning algorithms assume that clients have the same neural network architecture, and those for heterogeneous models remain understudied. In this st… ▽ More Personalized federated learning is aimed at allowing numerous clients to train personalized models while participating in collaborative training in a communication-efficient manner without exchanging private data. However, many personalized federated learning algorithms assume that clients have the same neural network architecture, and those for heterogeneous models remain understudied. In this study, we propose a novel personalized federated learning method called federated classifier averaging (FedClassAvg). Deep neural networks for supervised learning tasks consist of feature extractor and classifier layers. FedClassAvg aggregates classifier weights as an agreement on decision boundaries on feature spaces so that clients with not independently and identically distributed (non-iid) data can learn about scarce labels. In addition, local feature representation learning is applied to stabilize the decision boundaries and improve the local feature extraction capabilities for clients. While the existing methods require the collection of auxiliary data or model weights to generate a counterpart, FedClassAvg only requires clients to communicate with a couple of fully connected layers, which is highly communication-efficient. Moreover, FedClassAvg does not require extra optimization problems such as knowledge transfer, which requires intensive computation overhead. We evaluated FedClassAvg through extensive experiments and demonstrated it outperforms the current state-of-the-art algorithms on heterogeneous personalized federated learning tasks. △ Less

Submitted 26 October, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: Accepted to ICPP 2022. Code: https://github.com/hukla/fedclassavg

arXiv:2206.12006 [pdf, ps, other]

doi 10.1109/TIFS.2022.3188150

When Satellites Work as Eavesdroppers

Authors: Dong-Hyun Jung, Joon-Gyu Ryu, Junil Choi

Abstract: This paper considers satellite eavesdroppers in uplink satellite communication systems where the eavesdroppers are randomly distributed at arbitrary altitudes according to homogeneous binomial point processes and attempt to overhear signals that a ground terminal transmits to a serving satellite. Non-colluding eavesdrop** satellites are assumed, i.e., they do not cooperate with each other, so th… ▽ More This paper considers satellite eavesdroppers in uplink satellite communication systems where the eavesdroppers are randomly distributed at arbitrary altitudes according to homogeneous binomial point processes and attempt to overhear signals that a ground terminal transmits to a serving satellite. Non-colluding eavesdrop** satellites are assumed, i.e., they do not cooperate with each other, so that their received signals are not combined but are decoded individually. Directional beamforming with two types of antennas: fixed- and steerable-beam antennas, is adopted at the eavesdrop** satellites. The possible distribution cases for the eavesdrop** satellites and the distributions of the distances between the terminal and the satellites are analyzed. The distributions of the signal-to-noise ratios (SNRs) at both the serving satellite and the most detrimental eavesdrop** satellite are derived as closed-form expressions. The ergodic and outage secrecy capacities of the systems are derived with the secrecy outage probability using the SNR distributions. Simpler approximate expressions for the secrecy performance are obtained based on the Poisson limit theorem, and asymptotic analyses are also carried out in the high-SNR regime. Monte-Carlo simulations verify the analytical results for the secrecy performance. The analytical results are expected to be used to evaluate the secrecy performance and design secure satellite constellations by considering the impact of potential threats from malicious satellite eavesdroppers. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: 16 pages, 11 figures, 1 table, accepted by IEEE Transactions on Information Forensics and Security

Journal ref: IEEE Transactions on Information Forensics and Security, vol. 17, pp. 2784-2799, July 2022

arXiv:2206.07567 [pdf, other]

A Unifying Approach to Efficient (Near)-Gathering of Disoriented Robots with Limited Visibility

Authors: Jannik Castenow, Jonas Harbig, Daniel Jung, Peter Kling, Till Knollmann, Friedhelm Meyer auf der Heide

Abstract: We consider a swarm of $n$ robots in \mathbb{R}^d. The robots are oblivious, disoriented (no common coordinate system/compass), and have limited visibility (observe other robots up to a constant distance). The basic formation task gathering requires that all robots reach the same, not predefined position. In the related near-gathering task, they must reach distinct positions such that every robot… ▽ More We consider a swarm of $n$ robots in \mathbb{R}^d. The robots are oblivious, disoriented (no common coordinate system/compass), and have limited visibility (observe other robots up to a constant distance). The basic formation task gathering requires that all robots reach the same, not predefined position. In the related near-gathering task, they must reach distinct positions such that every robot sees the entire swarm. In the considered setting, gathering can be solved in $\mathcal{O}(n + Δ^2)$ synchronous rounds both in two and three dimensions, where $Δ$ denotes the initial maximal distance of two robots. In this work, we formalize a key property of efficient gathering protocols and use it to define $λ$-contracting protocols. Any such protocol gathers $n$ robots in the $d$-dimensional space in $\mathcal{O}(Δ^2)$ synchronous rounds. Moreover, we prove a corresponding lower bound stating that any protocol in which robots move to target points inside of the local convex hulls of their neighborhoods -- $λ$-contracting protocols have this property -- requires $Ω(Δ^2)$ rounds to gather all robots. Among others, we prove that the $d$-dimensional generalization of the GtC-protocol is $λ$-contracting. Remarkably, our improved and generalized runtime bound is independent of $n$ and $d$. The independence of $d$ answers an open research question. We also introduce an approach to make any $λ$-contracting protocol collisionfree to solve near-gathering. The resulting protocols maintain the runtime of $Θ(Δ^2)$ and work even in the semi-synchronous model. △ Less

Submitted 9 September, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

arXiv:2206.06640 [pdf, other]

Confidence Score for Source-Free Unsupervised Domain Adaptation

Authors: Jonghyun Lee, Dahuin Jung, Junho Yim, Sungroh Yoon

Abstract: Source-free unsupervised domain adaptation (SFUDA) aims to obtain high performance in the unlabeled target domain using the pre-trained source model, not the source data. Existing SFUDA methods assign the same importance to all target samples, which is vulnerable to incorrect pseudo-labels. To differentiate between sample importance, in this study, we propose a novel sample-wise confidence score,… ▽ More Source-free unsupervised domain adaptation (SFUDA) aims to obtain high performance in the unlabeled target domain using the pre-trained source model, not the source data. Existing SFUDA methods assign the same importance to all target samples, which is vulnerable to incorrect pseudo-labels. To differentiate between sample importance, in this study, we propose a novel sample-wise confidence score, the Joint Model-Data Structure (JMDS) score for SFUDA. Unlike existing confidence scores that use only one of the source or target domain knowledge, the JMDS score uses both knowledge. We then propose a Confidence score Weighting Adaptation using the JMDS (CoWA-JMDS) framework for SFUDA. CoWA-JMDS consists of the JMDS scores as sample weights and weight Mixup that is our proposed variant of Mixup. Weight Mixup promotes the model make more use of the target domain knowledge. The experimental results show that the JMDS score outperforms the existing confidence scores. Moreover, CoWA-JMDS achieves state-of-the-art performance on various SFUDA scenarios: closed, open, and partial-set scenarios. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: ICML 2022 camera ready

arXiv:2203.10996 [pdf, other]

Technologies for AI-Driven Fashion Social Networking Service with E-Commerce

Authors: **seok Seol, Seongjae Kim, Sungchan Park, Holim Lim, Hyunsoo Na, Eunyoung Park, Dohee Jung, Soyoung Park, Kangwoo Lee, Sang-goo Lee

Abstract: The rapid growth of the online fashion market brought demands for innovative fashion services and commerce platforms. With the recent success of deep learning, many applications employ AI technologies such as visual search and recommender systems to provide novel and beneficial services. In this paper, we describe applied technologies for AI-driven fashion social networking service that incorporat… ▽ More The rapid growth of the online fashion market brought demands for innovative fashion services and commerce platforms. With the recent success of deep learning, many applications employ AI technologies such as visual search and recommender systems to provide novel and beneficial services. In this paper, we describe applied technologies for AI-driven fashion social networking service that incorporate fashion e-commerce. In the application, people can share and browse their outfit-of-the-day (OOTD) photos, while AI analyzes them and suggests similar style OOTDs and related products. To this end, we trained deep learning based AI models for fashion and integrated them to build a fashion visual search system and a recommender system for OOTD. With aforementioned technologies, the AI-driven fashion SNS platform, iTOO, has been successfully launched. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: 16 pages, accepted in International Semantic Intelligence Conference (ISIC) 2022, The Applications and Deployment Track

arXiv:2203.05332 [pdf, other]

SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

Authors: Jaehoon Choi, Dongki Jung, Yonghan Lee, Deokhwa Kim, Dinesh Manocha, Donghwan Lee

Abstract: Monocular depth estimation in the wild inherently predicts depth up to an unknown scale. To resolve scale ambiguity issue, we present a learning algorithm that leverages monocular simultaneous localization and map** (SLAM) with proprioceptive sensors. Such monocular SLAM systems can provide metrically scaled camera poses. Given these metric poses and monocular sequences, we propose a self-superv… ▽ More Monocular depth estimation in the wild inherently predicts depth up to an unknown scale. To resolve scale ambiguity issue, we present a learning algorithm that leverages monocular simultaneous localization and map** (SLAM) with proprioceptive sensors. Such monocular SLAM systems can provide metrically scaled camera poses. Given these metric poses and monocular sequences, we propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation. Our approach is based on a teacher-student formulation which guides our network to predict high-quality depths. We demonstrate that our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments. Our full system shows improvements over recent self-supervised depth estimation and completion methods on EuRoC, OpenLORIS, and ScanNet datasets. △ Less

Submitted 10 March, 2022; originally announced March 2022.

arXiv:2110.09286 [pdf, other]

Gait-based Human Identification through Minimum Gait-phases and Sensors

Authors: Muhammad Zeeshan Arshad, Dawoon Jung, Mina Park, Kyung-Ryoul Mun, **wook Kim

Abstract: Human identification is one of the most common and critical tasks for condition monitoring, human-machine interaction, and providing assistive services in smart environments. Recently, human gait has gained new attention as a biometric for identification to achieve contactless identification from a distance robust to physical appearances. However, an important aspect of gait identification through… ▽ More Human identification is one of the most common and critical tasks for condition monitoring, human-machine interaction, and providing assistive services in smart environments. Recently, human gait has gained new attention as a biometric for identification to achieve contactless identification from a distance robust to physical appearances. However, an important aspect of gait identification through wearables and image-based systems alike is accurate identification when limited information is available, for example, when only a fraction of the whole gait cycle or only a part of the subject body is visible. In this paper, we present a gait identification technique based on temporal and descriptive statistic parameters of different gait phases as the features and we investigate the performance of using only single gait phases for the identification task using a minimum number of sensors. It was shown that it is possible to achieve high accuracy of over 95.5 percent by monitoring a single phase of the whole gait cycle through only a single sensor. It was also shown that the proposed methodology could be used to achieve 100 percent identification accuracy when the whole gait cycle was monitored through pelvis and foot sensors combined. The ANN was found to be more robust to fewer data features compared to SVM and was concluded as the best machine algorithm for the purpose. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2021)

ACM Class: I.2

arXiv:2110.07821 [pdf, other]

Gait-based Frailty Assessment using Image Representation of IMU Signals and Deep CNN

Authors: Muhammad Zeeshan Arshad, Dawoon Jung, Mina Park, Hyungeun Shin, **wook Kim, Kyung-Ryoul Mun

Abstract: Frailty is a common and critical condition in elderly adults, which may lead to further deterioration of health. However, difficulties and complexities exist in traditional frailty assessments based on activity-related questionnaires. These can be overcome by monitoring the effects of frailty on the gait. In this paper, it is shown that by encoding gait signals as images, deep learning-based model… ▽ More Frailty is a common and critical condition in elderly adults, which may lead to further deterioration of health. However, difficulties and complexities exist in traditional frailty assessments based on activity-related questionnaires. These can be overcome by monitoring the effects of frailty on the gait. In this paper, it is shown that by encoding gait signals as images, deep learning-based models can be utilized for the classification of gait type. Two deep learning models (a) SS-CNN, based on single stride input images, and (b) MS-CNN, based on 3 consecutive strides were proposed. It was shown that MS-CNN performs best with an accuracy of 85.1\%, while SS-CNN achieved an accuracy of 77.3\%. This is because MS-CNN can observe more features corresponding to stride-to-stride variations which is one of the key symptoms of frailty. Gait signals were encoded as images using STFT, CWT, and GAF. While the MS-CNN model using GAF images achieved the best overall accuracy and precision, CWT has a slightly better recall. This study demonstrates how image encoded gait data can be used to exploit the full potential of deep learning CNN models for the assessment of frailty. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2021)

ACM Class: I.2

arXiv:2110.06383 [pdf, other]

Real-time Drift Detection on Time-series Data

Authors: Nandini Ramanan, Rasool Tahmasbi, Marjorie Sayer, Deokwoo Jung, Shalini Hemachandran, Claudionor Nunes Coelho Jr

Abstract: Practical machine learning applications involving time series data, such as firewall log analysis to proactively detect anomalous behavior, are concerned with real time analysis of streaming data. Consequently, we need to update the ML models as the statistical characteristics of such data may shift frequently with time. One alternative explored in the literature is to retrain models with updated… ▽ More Practical machine learning applications involving time series data, such as firewall log analysis to proactively detect anomalous behavior, are concerned with real time analysis of streaming data. Consequently, we need to update the ML models as the statistical characteristics of such data may shift frequently with time. One alternative explored in the literature is to retrain models with updated data whenever the models accuracy is observed to degrade. However, these methods rely on near real time availability of ground truth, which is rarely fulfilled. Further, in applications with seasonal data, temporal concept drift is confounded by seasonal variation. In this work, we propose an approach called Unsupervised Temporal Drift Detector or UTDD to flexibly account for seasonal variation, efficiently detect temporal concept drift in time series data in the absence of ground truth, and subsequently adapt our ML models to concept drift for better generalization. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: 5 pages, 5 figures

arXiv:2108.05615 [pdf, other]

DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes

Authors: Dongki Jung, Jaehoon Choi, Yonghan Lee, Deokhwa Kim, Changick Kim, Dinesh Manocha, Donghwan Lee

Abstract: We present a novel approach for estimating depth from a monocular camera as it moves through complex and crowded indoor environments, e.g., a department store or a metro station. Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people, by training on dynamic scenes. Since it is difficult to collect dense depth maps from cro… ▽ More We present a novel approach for estimating depth from a monocular camera as it moves through complex and crowded indoor environments, e.g., a department store or a metro station. Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people, by training on dynamic scenes. Since it is difficult to collect dense depth maps from crowded indoor environments, we design our training framework without requiring depths produced from depth sensing devices. Our network leverages RGB images and sparse depth maps generated from traditional 3D reconstruction methods to estimate dense depth maps. We use two constraints to handle depth for non-rigidly moving people without tracking their motion explicitly. We demonstrate that our approach offers consistent improvements over recent depth estimation methods on the NAVERLABS dataset, which includes complex and crowded scenes. △ Less

Submitted 12 August, 2021; originally announced August 2021.

arXiv:2108.02949 [pdf, other]

Auxiliary Class Based Multiple Choice Learning

Authors: Sihwan Kim, Dae Yon Jung, Taejang Park

Abstract: The merit of ensemble learning lies in having different outputs from many individual models on a single input, i.e., the diversity of the base models. The high quality of diversity can be achieved when each model is specialized to different subsets of the whole dataset. Moreover, when each model explicitly knows to which subsets it is specialized, more opportunities arise to improve diversity. In… ▽ More The merit of ensemble learning lies in having different outputs from many individual models on a single input, i.e., the diversity of the base models. The high quality of diversity can be achieved when each model is specialized to different subsets of the whole dataset. Moreover, when each model explicitly knows to which subsets it is specialized, more opportunities arise to improve diversity. In this paper, we propose an advanced ensemble method, called Auxiliary class based Multiple Choice Learning (AMCL), to ultimately specialize each model under the framework of multiple choice learning (MCL). The advancement of AMCL is originated from three novel techniques which control the framework from different directions: 1) the concept of auxiliary class to provide more distinct information through the labels, 2) the strategy, named memory-based assignment, to determine the association between the inputs and the models, and 3) the feature fusion module to achieve generalized features. To demonstrate the performance of our method compared to all variants of MCL methods, we conduct extensive experiments on the image classification and segmentation tasks. Overall, the performance of AMCL exceeds all others in most of the public datasets trained with various networks as members of the ensembles. △ Less

Submitted 7 December, 2021; v1 submitted 6 August, 2021; originally announced August 2021.

arXiv:2106.07473 [pdf, other]

Time Series Anomaly Detection with label-free Model Selection

Authors: Deokwoo Jung, Nandini Ramanan, Mehrnaz Amjadi, Sankeerth Rao Karingula, Jake Taylor, Claudionor Nunes Coelho Jr

Abstract: Anomaly detection for time-series data becomes an essential task for many data-driven applications fueled with an abundance of data and out-of-the-box machine-learning algorithms. In many real-world settings, develo** a reliable anomaly model is highly challenging due to insufficient anomaly labels and the prohibitively expensive cost of obtaining anomaly examples. It imposes a significant bottl… ▽ More Anomaly detection for time-series data becomes an essential task for many data-driven applications fueled with an abundance of data and out-of-the-box machine-learning algorithms. In many real-world settings, develo** a reliable anomaly model is highly challenging due to insufficient anomaly labels and the prohibitively expensive cost of obtaining anomaly examples. It imposes a significant bottleneck to evaluate model quality for model selection and parameter tuning reliably. As a result, many existing anomaly detection algorithms fail to show their promised performance after deployment. In this paper, we propose LaF-AD, a novel anomaly detection algorithm with label-free model selection for unlabeled times-series data. Our proposed algorithm performs a fully unsupervised ensemble learning across a large number of candidate parametric models. We develop a model variance metric that quantifies the sensitivity of anomaly probability with a bootstrap** method. Then it makes a collective decision for anomaly events by model learners using the model variance. Our algorithm is easily parallelizable, more robust for ill-conditioned and seasonal data, and highly scalable for a large number of anomaly models. We evaluate our algorithm against other state-of-the-art methods on a synthetic domain and a benchmark public data set. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 11 pages, 1 Figure, 4 tables

arXiv:2106.05319 [pdf, other]

Stein Latent Optimization for Generative Adversarial Networks

Authors: Uiwon Hwang, Heeseung Kim, Dahuin Jung, Hyemi Jang, Hyungyu Lee, Sungroh Yoon

Abstract: Generative adversarial networks (GANs) with clustered latent spaces can perform conditional generation in a completely unsupervised manner. In the real world, the salient attributes of unlabeled data can be imbalanced. However, most of existing unsupervised conditional GANs cannot cluster attributes of these data in their latent spaces properly because they assume uniform distributions of the attr… ▽ More Generative adversarial networks (GANs) with clustered latent spaces can perform conditional generation in a completely unsupervised manner. In the real world, the salient attributes of unlabeled data can be imbalanced. However, most of existing unsupervised conditional GANs cannot cluster attributes of these data in their latent spaces properly because they assume uniform distributions of the attributes. To address this problem, we theoretically derive Stein latent optimization that provides reparameterizable gradient estimations of the latent distribution parameters assuming a Gaussian mixture prior in a continuous latent space. Structurally, we introduce an encoder network and novel unsupervised conditional contrastive loss to ensure that data generated from a single mixture component represent a single attribute. We confirm that the proposed method, named Stein Latent Optimization for GANs (SLOGAN), successfully learns balanced or imbalanced attributes and achieves state-of-the-art unsupervised conditional generation performance even in the absence of attribute information (e.g., the imbalance ratio). Moreover, we demonstrate that the attributes to be learned can be manipulated using a small amount of probe data. △ Less

Submitted 15 March, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: ICLR 2022 camera ready

arXiv:2104.13010 [pdf, ps, other]

doi 10.1109/TCOMM.2022.3142290

Performance Analysis of Satellite Communication System Under the Shadowed-Rician Fading: A Stochastic Geometry Approach

Authors: Dong-Hyun Jung, Joon-Gyu Ryu, Woo-** Byun, Junil Choi

Abstract: In this paper, we consider downlink low Earth orbit (LEO) satellite communication systems where multiple LEO satellites are uniformly distributed over a sphere at a certain altitude according to a homogeneous binomial point process (BPP). Based on the characteristics of the BPP, we analyze the distance distributions and the distribution cases for the serving satellite. We analytically derive the e… ▽ More In this paper, we consider downlink low Earth orbit (LEO) satellite communication systems where multiple LEO satellites are uniformly distributed over a sphere at a certain altitude according to a homogeneous binomial point process (BPP). Based on the characteristics of the BPP, we analyze the distance distributions and the distribution cases for the serving satellite. We analytically derive the exact outage probability, and its approximated expression is obtained using the Poisson limit theorem. With these derived expressions, the system throughput maximization problem is formulated under the satellite-visibility and outage constraints. To solve this problem, we reformulate it with bounded feasible sets and propose an iterative algorithm to obtain near-optimal solutions. Simulation results perfectly match the derived exact expressions for the outage probability and system throughput. The analytical results of the approximated expressions are fairly close to those of the exact ones. It is also shown that the proposed algorithm for the throughput maximization is very close to the optimal performance obtained by a two-dimensional exhaustive search. △ Less

Submitted 23 June, 2022; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: 15 pages, 14 figures, 1 table, accepted by IEEE Transactions on Communications

Journal ref: IEEE Transactions on Communications, vol. 70, no. 4, pp. 2707-2721, Apr. 2022

Showing 1–50 of 81 results for author: Jung, D