Search | arXiv e-print repository

Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition

Authors: Parham Zolfaghari, Vitor Fortes Rey, Lala Ray, Hyun Kim, Sungho Suh, Paul Lukowicz

Abstract: The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from… ▽ More The proliferation of deep learning has significantly advanced various fields, yet Human Activity Recognition (HAR) has not fully capitalized on these developments, primarily due to the scarcity of labeled datasets. Despite the integration of advanced Inertial Measurement Units (IMUs) in ubiquitous wearable devices like smartwatches and fitness trackers, which offer self-labeled activity data from users, the volume of labeled data remains insufficient compared to domains where deep learning has achieved remarkable success. Addressing this gap, in this paper, we propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model that generates sensor data directly from 3D skeleton pose sequences. our method simultaneously trains the pose-to-sensor network and a human activity classifier, optimizing both data reconstruction and activity recognition. Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset. Experimental results demonstrate the superiority of our framework with significant performance improvements over baseline methods. △ Less

Submitted 25 April, 2024; originally announced June 2024.

Comments: Accepted in IEEE 6th International Conference on Activity and Behavior Computing (ABC 2024)

arXiv:2406.16716 [pdf, other]

One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection

Authors: Hyun Myung Kim, Kangwook Jang, Hoirin Kim

Abstract: As speech synthesis systems continue to make remarkable advances in recent years, the importance of robust deepfake detection systems that perform well in unseen systems has grown. In this paper, we propose a novel adaptive centroid shift (ACS) method that updates the centroid representation by continually shifting as the weighted average of bonafide representations. Our approach uses only bonafid… ▽ More As speech synthesis systems continue to make remarkable advances in recent years, the importance of robust deepfake detection systems that perform well in unseen systems has grown. In this paper, we propose a novel adaptive centroid shift (ACS) method that updates the centroid representation by continually shifting as the weighted average of bonafide representations. Our approach uses only bonafide samples to define their centroid, which can yield a specialized centroid for one-class learning. Integrating our ACS with one-class learning gathers bonafide representations into a single cluster, forming well-separated embeddings robust to unseen spoofing attacks. Our proposed method achieves an equal error rate (EER) of 2.19% on the ASVspoof 2021 deepfake dataset, outperforming all existing systems. Furthermore, the t-SNE visualization illustrates that our method effectively maps the bonafide embeddings into a single cluster and successfully disentangles the bonafide and spoof classes. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.13935 [pdf, other]

CONMOD: Controllable Neural Frame-based Modulation Effects

Authors: Gyubin Lee, Hounsu Kim, Junwon Lee, Juhan Nam

Abstract: Deep learning models have seen widespread use in modelling LFO-driven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single blac… ▽ More Deep learning models have seen widespread use in modelling LFO-driven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single black-box model which emulates various LFO-driven effects in a frame-wise manner, offering control over LFO frequency and feedback parameters. Additionally, the model is capable of learning the continuous embedding space of two distinct phaser effects, enabling us to steer between effects and achieve creative outputs. Our model outperforms previous work while possessing both controllability and universality, presenting opportunities to enhance creativity in modern LFO-driven audio effects. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12721 [pdf]

Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4

Authors: Sang Won Son, Jongyeon Park, Hong Kook Kim, Sulaiman Vesal, Jeong Eun Lim

Abstract: In this report, we propose three novel methods for develo** a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main de… ▽ More In this report, we propose three novel methods for develo** a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main decoder, enhancing performance of the convolutional block during the initial training stages by assigning a different weight strategy between main and auxiliary decoder losses. Next, to address the time interval issue between the DESED and MAESTRO datasets, we propose maximum probability aggregation (MPA) during the training step. The proposed MPA method enables the model's output to be aligned with soft labels of 1 s in the MAESTRO dataset. Finally, we propose a multi-channel input feature that employs various versions of logmel and MFCC features to generate time-frequency pattern. The experimental results demonstrate the efficacy of these proposed methods in a view of improving SED performance by achieving a balanced enhancement across different datasets and label types. Ultimately, this approach presents a significant step forward in develo** more robust and flexible SED models △ Less

Submitted 24 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: DCASE 2024 challenge Task4, 4 pages

arXiv:2406.11248 [pdf]

Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9

Authors: Do Hyun Lee, Yoonah Song, Hong Kook Kim

Abstract: We present a prompt-engineering-based text-augmentation approach applied to a language-queried audio source separation (LASS) task. To enhance the performance of LASS, the proposed approach utilizes large language models (LLMs) to generate multiple captions corresponding to each sentence of the training dataset. To this end, we first perform experiments to identify the most effective prompts for c… ▽ More We present a prompt-engineering-based text-augmentation approach applied to a language-queried audio source separation (LASS) task. To enhance the performance of LASS, the proposed approach utilizes large language models (LLMs) to generate multiple captions corresponding to each sentence of the training dataset. To this end, we first perform experiments to identify the most effective prompts for caption augmentation with a smaller number of captions. A LASS model trained with these augmented captions demonstrates improved performance on the DCASE 2024 Task 9 validation set compared to that trained without augmentation. This study highlights the effectiveness of LLM-based caption augmentation in advancing language-queried audio source separation. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: DCASE 2024 Challenge Task 9, 4 pages

arXiv:2406.10549 [pdf, other]

Lightweight Audio Segmentation for Long-form Speech Translation

Authors: Jaesong Lee, Soyoon Kim, Hanbyul Kim, Joon Son Chung

Abstract: Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently, data-driven approaches for the speech segmentation task have been developed. Although the approaches improve overall translation quality, a performan… ▽ More Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently, data-driven approaches for the speech segmentation task have been developed. Although the approaches improve overall translation quality, a performance gap exists due to a mismatch between the models and ST systems. In addition, the prior works require large self-supervised speech models, which consume significant computational resources. In this work, we propose a segmentation model that achieves better speech translation quality with a small model size. We propose an ASR-with-punctuation task as an effective pre-training strategy for the segmentation model. We also show that proper integration of the speech segmentation model into the underlying ST system is critical to improve overall translation quality at inference time. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: Accepted to Interspeech 2024

arXiv:2406.07909 [pdf, other]

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

Authors: Eungbeom Kim, Hantae Kim, Kyogu Lee

Abstract: Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduce… ▽ More Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduces a self-knowledge distillation (SKD) method that guides the frame-level alignment during the training time. In contrast to the conventional method using separate teacher and student models, this study introduces a simple and effective method sharing encoder layers and applying the sub-model as the student model. Overall, our approach is effective in improving both the resource efficiency as well as performance. We also conducted an experimental analysis of the spike timings to illustrate that the proposed method improves performance by reducing the alignment disagreement. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.02479 [pdf]

Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile Analysis

Authors: Yi Hu, Hyeon** Kim, Kai Ye, Ning Lu

Abstract: This paper presents a novel method for utilizing fine-tuned Large Language Models (LLMs) to minimize data requirements in load profile analysis, demonstrated through the restoration of missing data in power system load profiles. A two-stage fine-tuning strategy is proposed to adapt a pre-trained LLMs, i.e., GPT-3.5, for missing data restoration tasks. Through empirical evaluation, we demonstrate t… ▽ More This paper presents a novel method for utilizing fine-tuned Large Language Models (LLMs) to minimize data requirements in load profile analysis, demonstrated through the restoration of missing data in power system load profiles. A two-stage fine-tuning strategy is proposed to adapt a pre-trained LLMs, i.e., GPT-3.5, for missing data restoration tasks. Through empirical evaluation, we demonstrate the effectiveness of the fine-tuned model in accurately restoring missing data, achieving comparable performance to state-of-the-art specifically designed models such as BERT-PIN. Key findings include the importance of prompt engineering and the optimal utilization of fine-tuning samples, highlighting the efficiency of few-shot learning in transferring knowledge from general user cases to specific target users. Furthermore, the proposed approach demonstrates notable cost-effectiveness and time efficiency compared to training models from scratch, making it a practical solution for scenarios with limited data availability and computing resources. This research has significant potential for application to other power system load profile analysis tasks. Consequently, it advances the use of LLMs in power system analytics, offering promising implications for enhancing the resilience and efficiency of power distribution systems. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.03684 [pdf, other]

All-in-One Deep Learning Framework for MR Image Reconstruction

Authors: Geunu Jeong, Hyeonsoo Kim, Joonyoung Yang, Kyungeun Jang, Jeewook Kim

Abstract: We introduce a novel, all-in-one deep learning framework for MR image reconstruction, enabling a single model to enhance image quality across multiple aspects of k-space sampling and to be effective across a wide range of clinical and technical scenarios. This DICOM-based algorithm serves as the core of SwiftMR (AIRS Medical, Seoul, Korea), which is FDA-cleared, CE-certified, and commercially avai… ▽ More We introduce a novel, all-in-one deep learning framework for MR image reconstruction, enabling a single model to enhance image quality across multiple aspects of k-space sampling and to be effective across a wide range of clinical and technical scenarios. This DICOM-based algorithm serves as the core of SwiftMR (AIRS Medical, Seoul, Korea), which is FDA-cleared, CE-certified, and commercially available. We first detail the comprehensive development process of the model, including data collection, training pair preparation, model architecture design, and DICOM inference. We then assess the model's capability to enhance image quality in a multi-dimensional manner, specifically across various aspects of k-space sampling. Subsequently, we evaluate several features of the multi-dimensional enhancement: the accuracy of tunable denoising, the effectiveness of super-resolution in each encoding direction, and the reduction of artifacts that become more prominent at lower spatial resolutions. Additionally, we assess its compatibility with various scan parameter sets and its generalizability across scanner vendors not seen during training. Finally, we present specific cases demonstrating the model's utility in reducing scan time across anatomical regions in conjunction with protocol optimization. The proposed model is compatible with a broad spectrum of scenarios, including various vendors, pulse sequences, scan parameters, and anatomical regions. Its DICOM-based operation particularly enhances its applicability for real-world applications. Given its demonstrated effectiveness and versatility, we expect its use to expand in the field of clinical MRI. △ Less

Submitted 26 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 22 pages, 9 figures; number of collected MR raw data corrected

arXiv:2404.19075 [pdf, other]

Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction

Authors: K. Aditya Mohan, Massimiliano Ferrucci, Chuck Divin, Garrett A. Stevenson, Hyo** Kim

Abstract: 4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in… ▽ More 4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an extremely ill-posed inverse problem. Existing approaches assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in-situ experiments that causes spurious artifacts and inaccurate morphological reconstructions of the object. To solve this problem, we propose to perform a 4D time-space reconstruction using a distributed implicit neural representation (DINR) network that is trained using a novel distributed stochastic training algorithm. Our DINR network learns to reconstruct the object at its output by iterative optimization of its network parameters such that the measured projection images best match the output of the CT forward measurement model. We use a continuous time and space forward measurement model that is a function of the DINR outputs at a sparsely sampled set of continuous valued object coordinates. Unlike existing state-of-the-art neural representation architectures that forward and back propagate through dense voxel grids that sample the object's entire time-space coordinates, we only propagate through the DINR at a small subset of object coordinates in each iteration resulting in an order-of-magnitude reduction in memory and compute for training. DINR leverages distributed computation across several compute nodes and GPUs to produce high-fidelity 4D time-space reconstructions even for extremely large CT data sizes. We use both simulated parallel-beam and experimental cone-beam X-ray CT datasets to demonstrate the superior performance of our approach. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: submitted to Nature Machine Intelligence

arXiv:2404.17736 [pdf, other]

Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission

Authors: Mingyu Yang, Bowen Liu, Boyang Wang, Hun-Seok Kim

Abstract: Deep learning-based joint source-channel coding (deep JSCC) has been demonstrated as an effective approach for wireless image transmission. Nevertheless, current research has concentrated on minimizing a standard distortion metric such as Mean Squared Error (MSE), which does not necessarily improve the perceptual quality. To address this issue, we propose DiffJSCC, a novel framework that leverages… ▽ More Deep learning-based joint source-channel coding (deep JSCC) has been demonstrated as an effective approach for wireless image transmission. Nevertheless, current research has concentrated on minimizing a standard distortion metric such as Mean Squared Error (MSE), which does not necessarily improve the perceptual quality. To address this issue, we propose DiffJSCC, a novel framework that leverages pre-trained text-to-image diffusion models to enhance the realism of images transmitted over the channel. The proposed DiffJSCC utilizes prior deep JSCC frameworks to deliver an initial reconstructed image at the receiver. Then, the spatial and textual features are extracted from the initial reconstruction, which, together with the channel state information (e.g., signal-to-noise ratio, SNR), are passed to a control module to fine-tune the pre-trained Stable Diffusion model. Extensive experiments on the Kodak dataset reveal that our method significantly surpasses both conventional methods and prior deep JSCC approaches on perceptual metrics such as LPIPS and FID scores, especially with poor channel conditions and limited bandwidth. Notably, DiffJSCC can achieve highly realistic reconstructions for 768x512 pixel Kodak images with only 3072 symbols (<0.008 symbols per pixel) under 1dB SNR. Our code will be released in https://github.com/mingyuyng/DiffJSCC. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17585 [pdf, other]

NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep Stage Classification Using Single-Channel EEG

Authors: Cheol-Hui Lee, Hakseung Kim, Hyun-jee Han, Min-Kyung Jung, Byung C. Yoon, Dong-Joo Kim

Abstract: The classification of sleep stages is a pivotal aspect of diagnosing sleep disorders and evaluating sleep quality. However, the conventional manual scoring process, conducted by clinicians, is time-consuming and prone to human bias. Recent advancements in deep learning have substantially propelled the automation of sleep stage classification. Nevertheless, challenges persist, including the need fo… ▽ More The classification of sleep stages is a pivotal aspect of diagnosing sleep disorders and evaluating sleep quality. However, the conventional manual scoring process, conducted by clinicians, is time-consuming and prone to human bias. Recent advancements in deep learning have substantially propelled the automation of sleep stage classification. Nevertheless, challenges persist, including the need for large datasets with labels and the inherent biases in human-generated annotations. This paper introduces NeuroNet, a self-supervised learning (SSL) framework designed to effectively harness unlabeled single-channel sleep electroencephalogram (EEG) signals by integrating contrastive learning tasks and masked prediction tasks. NeuroNet demonstrates superior performance over existing SSL methodologies through extensive experimentation conducted across three polysomnography (PSG) datasets. Additionally, this study proposes a Mamba-based temporal context module to capture the relationships among diverse EEG epochs. Combining NeuroNet with the Mamba-based temporal context module has demonstrated the capability to achieve, or even surpass, the performance of the latest supervised learning methodologies, even with a limited amount of labeled data. This study is expected to establish a new benchmark in sleep stage classification, promising to guide future research and applications in the field of sleep analysis. △ Less

Submitted 13 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: 14 pages, 4 figures

arXiv:2404.08175 [pdf, ps, other]

A Novel Vision Transformer based Load Profile Analysis using Load Images as Inputs

Authors: Hyeon** Kim, Yi Hu, Kai Ye, Ning Lu

Abstract: This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset,… ▽ More This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset, comprising 1M load images derived from smart meter data collected over a two-year period from 2,000 residential users. The training methodology is self-supervised, masked image modeling, wherein masked load images are restored to reveal hidden relationships among image patches. The pre-trained ViT encoder is then applied to various downstream tasks, including the identification of electric vehicle (EV) charging loads and behind-the-meter solar photovoltaic (PV) systems and load disaggregation. Simulation results illustrate ViT4LPA's superior performance compared to existing neural network models in downstream tasks. Additionally, we conduct an in-depth analysis of the attention weights within the ViT4LPA model to gain insights into its information flow mechanisms. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07021 [pdf, other]

A 4x32Gb/s 1.8pJ/bit Collaborative Baud-Rate CDR with Background Eye-Climbing Algorithm and Low-Power Global Clock Distribution

Authors: Jihee Kim, Jia Park, Jiwon Shin, Hanseok Kim, Kahyun Kim, Haengbeom Shin, Ha-Jung Park, Woo-Seok Choi

Abstract: This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the freq… ▽ More This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the frequency offset without any phase interpolators. To this end, a fractional divider controlled by CDR is placed close to the global phase locked loop. Moreover, in order to address the sub-optimal lock point of conventional baud-rate phase detectors, the proposed CDR employs a background eye-climbing algorithm, which optimizes the sampling phase and maximizes the vertical eye margin (VEM). Fabricated in a 28nm CMOS process, the proposed 4x32Gb/s RX shows a low integrated fractional spur of -40.4dBc at a 2500ppm frequency offset. Furthermore, it improves bit-error-rate performance by increasing the VEM by 17%. The entire RX achieves the energy efficiency of 1.8pJ/bit with the aggregate data rate of 128Gb/s. △ Less

Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06452 [pdf, other]

PAAM: A Framework for Coordinated and Priority-Driven Accelerator Management in ROS 2

Authors: Daniel Enright, Yecheng Xiang, Hyunjong Choi, Hyoseung Kim

Abstract: This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor th… ▽ More This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor that acts as an accelerator resource server, arbitrating accelerator access requests from all other callbacks at the application layer. This approach enables coordinated and priority-driven accelerator access management in multi-process robotic systems. The framework design is directly applicable to all types of accelerators and enables granular control over how specific chains access accelerators, making it possible to achieve predictable real-time support for accelerators used by safety-critical callback chains without making changes to underlying accelerator device drivers. The paper shows that PAAM also offers a theoretical analysis that can upper bound the worst-case response time of safety-critical callback chains that necessitate accelerator access. This paper also demonstrates that complex robotic systems with extensive accelerator usage that are integrated with PAAM may achieve up to a 91\% reduction in end-to-end response time of their critical callback chains. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 14 Pages, 14 Figures

arXiv:2404.05119 [pdf, other]

A 0.65-pJ/bit 3.6-TB/s/mm I/O Interface with XTalk Minimizing Affine Signaling for Next-Generation HBM with High Interconnect Density

Authors: Hyunjun Park, Jiwon Shin, Hanseok Kim, Jihee Kim, Haengbeom Shin, Taehoon Kim, Jung-Hun Park, Woo-Seok Choi

Abstract: This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through n… ▽ More This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through numerical experiments. XMAS not only demonstrates exceptional crosstalk removing capabilities but also exhibits robustness against noise, especially simultaneous switching noise. Fabricated in a 28-nm CMOS process, the prototype XMAS transceiver achieves an edge density of 3.6TB/s/mm and an energy efficiency of 0.65pJ/b. Compared to the single-ended signaling, the crosstalk-induced peak-to-peak jitter of the received eye with XMAS is reduced by 75% at 10GS/s/pin data rate, and the horizontal eye opening extends to 0.2UI at a bit error rate < 10$^{-12}$. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04096 [pdf, other]

Machine Learning-Aided Cooperative Localization under Dense Urban Environment

Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee

Abstract: Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin… ▽ More Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions including localization and controls. Location awareness, in particular, lends itself to the deployment of location-specific services and the improvement of the operation performance. The localization entails direct communication to the network infrastructure, and the resulting centralized positioning solutions readily become intractable as the network scales up. As an alternative to the centralized solutions, this article addresses decentralized principle of vehicular localization reinforced by machine learning techniques in dense urban environments with frequent inaccessibility to reliable measurement. As such, the collaboration of multiple vehicles enhances the positioning performance of machine learning approaches. A virtual testbed is developed to validate this machine learning model for real-map vehicular networks. Numerical results demonstrate universal feasibility of cooperative localization, in particular, for dense urban area configurations. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.02342 [pdf, other]

A Computational Analysis of Lyric Similarity Perception

Authors: Haven Kim, Taketo Akama

Abstract: In musical compositions that include vocals, lyrics significantly contribute to artistic expression. Consequently, previous studies have introduced the concept of a recommendation system that suggests lyrics similar to a user's favorites or personalized preferences, aiding in the discovery of lyrics among millions of tracks. However, many of these systems do not fully consider human perceptions of… ▽ More In musical compositions that include vocals, lyrics significantly contribute to artistic expression. Consequently, previous studies have introduced the concept of a recommendation system that suggests lyrics similar to a user's favorites or personalized preferences, aiding in the discovery of lyrics among millions of tracks. However, many of these systems do not fully consider human perceptions of lyric similarity, primarily due to limited research in this area. To bridge this gap, we conducted a comparative analysis of computational methods for modeling lyric similarity with human perception. Results indicated that computational models based on similarities between embeddings from pre-trained BERT-based models, the audio from which the lyrics are derived, and phonetic components are indicative of perceptual lyric similarity. This finding underscores the importance of semantic, stylistic, and phonetic similarities in human perception about lyric similarity. We anticipate that our findings will enhance the development of similarity-based lyric recommendation systems by offering pseudo-labels for neural network development and introducing objective evaluation metrics. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01661 [pdf, other]

Interaction-Aware Vehicle Motion Planning with Collision Avoidance Constraints in Highway Traffic

Authors: Dongryul Kim, Hyeonjeong Kim, Kyoungseok Han

Abstract: This paper proposes collision-free optimal trajectory planning for autonomous vehicles in highway traffic, where vehicles need to deal with the interaction among each other. To address this issue, a novel optimal control framework is suggested, which couples the trajectory of surrounding vehicles with collision avoidance constraints. Additionally, we describe a trajectory optimization technique un… ▽ More This paper proposes collision-free optimal trajectory planning for autonomous vehicles in highway traffic, where vehicles need to deal with the interaction among each other. To address this issue, a novel optimal control framework is suggested, which couples the trajectory of surrounding vehicles with collision avoidance constraints. Additionally, we describe a trajectory optimization technique under state constraints, utilizing a planner based on Pontryagin's Minimum Principle, capable of numerically solving collision avoidance scenarios with surrounding vehicles. Simulation results demonstrate the effectiveness of the proposed approach regarding interaction-based motion planning for different scenarios. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.19785 [pdf, other]

Integrated Communication, Localization, and Sensing in 6G D-MIMO Networks

Authors: Hao Guo, Henk Wymeersch, Behrooz Makki, Hui Chen, Yibo Wu, Giuseppe Durisi, Musa Furkan Keskin, Mohammad H. Moghaddam, Charitha Madapatha, Han Yu, Peter Hammarberg, Hyowon Kim, Tommy Svensson

Abstract: Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple out… ▽ More Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple output (D-MIMO) systems deploy a large number of distributed nodes and efficiently control them, making distributed integrated sensing and communications (ISAC) possible. In this paper, we investigate ISAC in D-MIMO through the lens of different design architectures and deployments, revealing both conflicts and synergies. In addition, simulation and demonstration results reveal both opportunities and challenges towards the implementation of ISAC in D-MIMO. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.12726 [pdf]

Small Distance Increment Method for Measuring Complex Permittivity With mmWave Radar

Authors: Hang Song, Hyun Joon Kim, Mingxia Wan, Bo Wei, Takamaro Kikkawa, Jun-ichi Takada

Abstract: Measuring the complex permittivity of material is essential in many scenarios such as quality check and component analysis. Generally, measurement methods for characterizing the material are based on the usage of vector network analyzer, which is large and not easy for on-site measurement, especially in high frequency range such as millimeter wave (mmWave). In addition, some measurement methods re… ▽ More Measuring the complex permittivity of material is essential in many scenarios such as quality check and component analysis. Generally, measurement methods for characterizing the material are based on the usage of vector network analyzer, which is large and not easy for on-site measurement, especially in high frequency range such as millimeter wave (mmWave). In addition, some measurement methods require the destruction of samples, which is not suitable for non-destructive inspection. In this work, a small distance increment (SDI) method is proposed to non-destructively measure the complex permittivity of material. In SDI, the transmitter and receiver are formed as the monostatic radar, which is facing towards the material under test (MUT). During the measurement, the distance between radar and MUT changes with small increments and the signals are recorded at each position. A mathematical model is formulated to depict the relationship among the complex permittivity, distance increment, and measured signals. By fitting the model, the complex permittivity of MUT is estimated. To implement and evaluate the proposed SDI method, a commercial off-the-shelf mmWave radar is utilized and the measurement system is developed. Then, the evaluation was carried out on the acrylic plate. With the proposed method, the estimated complex permittivity of acrylic plate shows good agreement with the literature values, demonstrating the efficacy of SDI method for characterizing the complex permittivity of material. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.08187 [pdf, other]

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam

Abstract: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit… ▽ More This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to recognize speech as pronounced rather than as existing words. The model was fine-tuned with a speech dataset from 137 children with inadequate speech production pronouncing 73 Korean words selected for actual clinical diagnosis. The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy. While the model still requires improvement in recognizing unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 12 pages, 2 figures

ACM Class: I.2.7

arXiv:2403.06397 [pdf, other]

DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning

Authors: Xuefeng Wang, Henglin Pu, Hyung Jun Kim, Husheng Li

Abstract: Safe Multi-agent reinforcement learning (safe MARL) has increasingly gained attention in recent years, emphasizing the need for agents to not only optimize the global return but also adhere to safety requirements through behavioral constraints. Some recent work has integrated control theory with multi-agent reinforcement learning to address the challenge of ensuring safety. However, there have bee… ▽ More Safe Multi-agent reinforcement learning (safe MARL) has increasingly gained attention in recent years, emphasizing the need for agents to not only optimize the global return but also adhere to safety requirements through behavioral constraints. Some recent work has integrated control theory with multi-agent reinforcement learning to address the challenge of ensuring safety. However, there have been only very limited applications of Model Predictive Control (MPC) methods in this domain, primarily due to the complex and implicit dynamics characteristic of multi-agent environments. To bridge this gap, we propose a novel method called Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning (DeepSafeMPC). The key insight of DeepSafeMPC is leveraging a entralized deep learning model to well predict environmental dynamics. Our method applies MARL principles to search for optimal solutions. Through the employment of MPC, the actions of agents can be restricted within safe states concurrently. We demonstrate the effectiveness of our approach using the Safe Multi-agent MuJoCo environment, showcasing significant advancements in addressing safety concerns in MARL. △ Less

Submitted 11 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: 8 pages, 5 figures

arXiv:2403.05136 [pdf, other]

DeRO: Dead Reckoning Based on Radar Odometry With Accelerometers Aided for Robot Localization

Authors: Hoang Viet Do, Yong Hun Kim, Joo Han Lee, Min Ho Lee, ** Woo Song

Abstract: In this paper, we propose a radar odometry structure that directly utilizes radar velocity measurements for dead reckoning while maintaining its ability to update estimations within the Kalman filter framework. Specifically, we employ the Doppler velocity obtained by a 4D Frequency Modulated Continuous Wave (FMCW) radar in conjunction with gyroscope data to calculate poses. This approach helps mit… ▽ More In this paper, we propose a radar odometry structure that directly utilizes radar velocity measurements for dead reckoning while maintaining its ability to update estimations within the Kalman filter framework. Specifically, we employ the Doppler velocity obtained by a 4D Frequency Modulated Continuous Wave (FMCW) radar in conjunction with gyroscope data to calculate poses. This approach helps mitigate high drift resulting from accelerometer biases and double integration. Instead, tilt angles measured by gravitational force are utilized alongside relative distance measurements from radar scan matching for the filter's measurement update. Additionally, to further enhance the system's accuracy, we estimate and compensate for the radar velocity scale factor. The performance of the proposed method is verified through five real-world open-source datasets. The results demonstrate that our approach reduces position error by 47% and rotation error by 52% on average compared to the state-of-the-art radar-inertial fusion method in terms of absolute trajectory error. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 9 pages, 5 figures, 1 table, conference

ACM Class: I.2.9

arXiv:2402.05965 [pdf, other]

Hybrid Neural Representations for Spherical Data

Authors: Hyomin Kim, Yunhui Jang, Jaeho Lee, Sungsoo Ahn

Abstract: In this paper, we study hybrid neural representations for spherical data, a domain of increasing relevance in scientific research. In particular, our work focuses on weather and climate data as well as comic microwave background (CMB) data. Although previous studies have delved into coordinate-based neural representations for spherical signals, they often fail to capture the intricate details of h… ▽ More In this paper, we study hybrid neural representations for spherical data, a domain of increasing relevance in scientific research. In particular, our work focuses on weather and climate data as well as comic microwave background (CMB) data. Although previous studies have delved into coordinate-based neural representations for spherical signals, they often fail to capture the intricate details of highly nonlinear signals. To address this limitation, we introduce a novel approach named Hybrid Neural Representations for Spherical data (HNeR-S). Our main idea is to use spherical feature-grids to obtain positional features which are combined with a multilayer perception to predict the target signal. We consider feature-grids with equirectangular and hierarchical equal area isolatitude pixelization structures that align with weather data and CMB data, respectively. We extensively verify the effectiveness of our HNeR-S for regression, super-resolution, temporal interpolation, and compression tasks. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 13 pages, 8 figures

arXiv:2402.05706 [pdf, other]

Unified Speech-Text Pretraining for Spoken Dialog Modeling

Authors: Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, Kang Min Yoo

Abstract: While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation. This work proposes an extensive speech-text LLM framework, named the Unified Spoken Dialog Model (USDM), to generate coherent spoken responses with… ▽ More While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation. This work proposes an extensive speech-text LLM framework, named the Unified Spoken Dialog Model (USDM), to generate coherent spoken responses with organic prosodic features relevant to the given input speech without relying on automatic speech recognition (ASR) or text-to-speech (TTS) solutions. Our approach employs a multi-step speech-text inference scheme that leverages chain-of-reasoning capabilities exhibited by the underlying LLM. We also propose a generalized speech-text pretraining scheme that helps with capturing cross-modal semantics. Automatic and human evaluations show that the proposed approach is effective in generating natural-sounding spoken responses, outperforming both prior and cascaded baselines. Detailed comparative studies reveal that, despite the cascaded approach being stronger in individual components, the joint speech-text modeling improves robustness against recognition errors and speech quality. Demo is available at https://unifiedsdm.github.io. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.01116 [pdf, other]

Scalable Multi-modal Model Predictive Control via Duality-based Interaction Predictions

Authors: Hansung Kim, Siddharth H. Nair, Francesco Borrelli

Abstract: We propose a hierarchical architecture designed for scalable real-time Model Predictive Control (MPC) in complex, multi-modal traffic scenarios. This architecture comprises two key components: 1) RAID-Net, a novel attention-based Recurrent Neural Network that predicts relevant interactions along the MPC prediction horizon between the autonomous vehicle and the surrounding vehicles using Lagrangian… ▽ More We propose a hierarchical architecture designed for scalable real-time Model Predictive Control (MPC) in complex, multi-modal traffic scenarios. This architecture comprises two key components: 1) RAID-Net, a novel attention-based Recurrent Neural Network that predicts relevant interactions along the MPC prediction horizon between the autonomous vehicle and the surrounding vehicles using Lagrangian duality, and 2) a reduced Stochastic MPC problem that eliminates irrelevant collision avoidance constraints, enhancing computational efficiency. Our approach is demonstrated in a simulated traffic intersection with interactive surrounding vehicles, showcasing a 12x speed-up in solving the motion planning problem. A video demonstrating the proposed architecture in multiple complex traffic scenarios can be found here: https://youtu.be/-pRiOnPb9_c. GitHub: https://github.com/MPC-Berkeley/hmpc_raidnet △ Less

Submitted 2 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: Accepted at IEEE Intelligent Vehicles Symposium 2024

arXiv:2401.13498 [pdf, other]

Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion Outpainting

Authors: Hounsu Kim, Soonbeom Choi, Juhan Nam

Abstract: Synthesizing performing guitar sound is a highly challenging task due to the polyphony and high variability in expression. Recently, deep generative models have shown promising results in synthesizing expressive polyphonic instrument sounds from music scores, often using a generic MIDI input. In this work, we propose an expressive acoustic guitar sound synthesis model with a customized input repre… ▽ More Synthesizing performing guitar sound is a highly challenging task due to the polyphony and high variability in expression. Recently, deep generative models have shown promising results in synthesizing expressive polyphonic instrument sounds from music scores, often using a generic MIDI input. In this work, we propose an expressive acoustic guitar sound synthesis model with a customized input representation to the instrument, which we call guitarroll. We implement the proposed approach using diffusion-based outpainting which can generate audio with long-term consistency. To overcome the lack of MIDI/audio-paired datasets, we used not only an existing guitar dataset but also collected data from a high quality sample-based guitar synthesizer. Through quantitative and qualitative evaluations, we show that our proposed model has higher audio quality than the baseline model and generates more realistic timbre sounds than the previous leading work. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted to ICASSP 2024

arXiv:2401.10288 [pdf, other]

CLAN: A Contrastive Learning based Novelty Detection Framework for Human Activity Recognition

Authors: Hyunju Kim, Dongman Lee

Abstract: In ambient assisted living, human activity recognition from time series sensor data mainly focuses on predefined activities, often overlooking new activity patterns. We propose CLAN, a two-tower contrastive learning-based novelty detection framework with diverse types of negative pairs for human activity recognition. It is tailored to challenges with human activity characteristics, including the s… ▽ More In ambient assisted living, human activity recognition from time series sensor data mainly focuses on predefined activities, often overlooking new activity patterns. We propose CLAN, a two-tower contrastive learning-based novelty detection framework with diverse types of negative pairs for human activity recognition. It is tailored to challenges with human activity characteristics, including the significance of temporal and frequency features, complex activity dynamics, shared features across activities, and sensor modality variations. The framework aims to construct invariant representations of known activity robust to the challenges. To generate suitable negative pairs, it selects data augmentation methods according to the temporal and frequency characteristics of each dataset. It derives the key representations against meaningless dynamics by contrastive and classification losses-based representation learning and score function-based novelty detection that accommodate dynamic numbers of the different types of augmented samples. The proposed two-tower model extracts the representations in terms of time and frequency, mutually enhancing expressiveness for distinguishing between new and known activities, even when they share common features. Experiments on four real-world human activity datasets show that CLAN surpasses the best performance of existing novelty detection methods, improving by 8.3%, 13.7%, and 53.3% in AUROC, balanced accuracy, and [email protected] metrics respectively. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.08962 [pdf, other]

DOO-RE: A dataset of ambient sensors in a meeting room for activity recognition

Authors: Hyunju Kim, Geon Kim, Taehoon Lee, Kisoo Kim, Dongman Lee

Abstract: With the advancement of IoT technology, recognizing user activities with machine learning methods is a promising way to provide various smart services to users. High-quality data with privacy protection is essential for deploying such services in the real world. Data streams from surrounding ambient sensors are well suited to the requirement. Existing ambient sensor datasets only support constrain… ▽ More With the advancement of IoT technology, recognizing user activities with machine learning methods is a promising way to provide various smart services to users. High-quality data with privacy protection is essential for deploying such services in the real world. Data streams from surrounding ambient sensors are well suited to the requirement. Existing ambient sensor datasets only support constrained private spaces and those for public spaces have yet to be explored despite growing interest in research on them. To meet this need, we build a dataset collected from a meeting room equipped with ambient sensors. The dataset, DOO-RE, includes data streams from various ambient sensor types such as Sound and Projector. Each sensor data stream is segmented into activity units and multiple annotators provide activity labels through a cross-validation annotation process to improve annotation quality. We finally obtain 9 types of activities. To our best knowledge, DOO-RE is the first dataset to support the recognition of both single and group activities in a real meeting room with reliable annotations. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2312.15914 [pdf, other]

Improving One-Shot Transmission in NR Sidelink Resource Allocation for V2X Communication

Authors: Hojeong Lee, Hyogon Kim

Abstract: The Society of Automotive Engineers (SAE) has specified a wireless channel congestion control algorithm for cellular vehicle-to-everything (C-V2X) communication in J3161/1. A notable aspect of J3161/1 standard is that it addresses persistent packet collisions between neighboring vehicles. Although the chances are slim, the persistent collisions can cause so called the wireless blind spot once the… ▽ More The Society of Automotive Engineers (SAE) has specified a wireless channel congestion control algorithm for cellular vehicle-to-everything (C-V2X) communication in J3161/1. A notable aspect of J3161/1 standard is that it addresses persistent packet collisions between neighboring vehicles. Although the chances are slim, the persistent collisions can cause so called the wireless blind spot once the event takes place. Then the involved vehicles cannot inform their presence to neighboring vehicles, an extremely dangerous condition for driving safety. J3161's solution to the problem is stochastic one-shot transmission, where the transmission occasionally occurs in a resource that is not originally reserved. Through the one-shot transmission, the worst-case packet inter-reception time (PIR) is bounded and the wireless blind spot problem can be effectively mitigated. Interestingly, the standard one-shot transmission scheme does not resolve the persistent collision relation itself. This paper shows that by breaking out of the relation as soon as the persistent collision condition is identified, vehicles can improve the worst-case PIR by approximately 500 ms, the number of packet collisions per persistent collision event by 10%, and the number of total collisions by 15% to 57% over the standard one-shot transmission. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.09456 [pdf, other]

Pioneering EEG Motor Imagery Classification Through Counterfactual Analysis

Authors: Kang Yin, Hye-Bin Shin, Hee-Dong Kim, Seong-Whan Lee

Abstract: The application of counterfactual explanation (CE) techniques in the realm of electroencephalography (EEG) classification has been relatively infrequent in contemporary research. In this study, we attempt to introduce and explore a novel non-generative approach to CE, specifically tailored for the analysis of EEG signals. This innovative approach assesses the model's decision-making process by str… ▽ More The application of counterfactual explanation (CE) techniques in the realm of electroencephalography (EEG) classification has been relatively infrequent in contemporary research. In this study, we attempt to introduce and explore a novel non-generative approach to CE, specifically tailored for the analysis of EEG signals. This innovative approach assesses the model's decision-making process by strategically swap** patches derived from time-frequency analyses. By meticulously examining the variations and nuances introduced in the classification outcomes through this method, we aim to derive insights that can enhance interpretability. The empirical results obtained from our experimental investigations serve not only to validate the efficacy of our proposed approach but also to reinforce human confidence in the model's predictive capabilities. Consequently, these findings underscore the significance and potential value of conducting further, more extensive research in this promising direction. △ Less

Submitted 10 November, 2023; originally announced December 2023.

arXiv:2312.09040 [pdf, other]

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models

Authors: Kangwook Jang, Sungnyun Kim, Hoirin Kim

Abstract: Albeit great performance of Transformer-based speech selfsupervised learning (SSL) models, their large parameter size and computational cost make them unfavorable to utilize. In this study, we propose to compress the speech SSL models by distilling speech temporal relation (STaR). Unlike previous works that directly match the representation for each speech frame, STaR distillation transfers tempor… ▽ More Albeit great performance of Transformer-based speech selfsupervised learning (SSL) models, their large parameter size and computational cost make them unfavorable to utilize. In this study, we propose to compress the speech SSL models by distilling speech temporal relation (STaR). Unlike previous works that directly match the representation for each speech frame, STaR distillation transfers temporal relation between speech frames, which is more suitable for lightweight student with limited capacity. We explore three STaR distillation objectives and select the best combination as the final STaR loss. Our model distilled from HuBERT BASE achieves an overall score of 79.8 on SUPERB benchmark, the best performance among models with up to 27 million parameters. We show that our method is applicable across different speech SSL models and maintains robust performance with further reduced parameters. △ Less

Submitted 25 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: ICASSP 2024 Best Student Paper Awarded. Code URL: https://github.com/sungnyun/ARMHuBERT

arXiv:2312.02937 [pdf, other]

doi 10.2514/6.2024-1167

Synergistic Perception and Control Simplex for Verifiable Safe Vertical Landing

Authors: Ayoosh Bansal, Yang Zhao, James Zhu, Sheng Cheng, Yuliang Gu, Hyung-** Yoon, Hunmin Kim, Naira Hovakimyan, Lui Sha

Abstract: Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomou… ▽ More Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomous air mobility vehicles. We improve upon this system by replacing static assumptions of control capabilities with dynamic confirmation, i.e., real-time confirmation of control limitations of the system, ensuring reliable fulfillment of safety maneuvers and overrides, without dependence on overly pessimistic assumptions. Parameters defining control system capabilities and limitations, e.g., maximum deceleration, are continuously tracked within the system and used to make safety-critical decisions. We apply these techniques to propose a verifiable collision avoidance solution for autonomous aerial mobility vehicles operating in cluttered and potentially unsafe environments. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: To appear in AIAA SciTech 2024

ACM Class: C.3; C.4; J.7

Journal ref: AIAA SCITECH 2024 Forum, p. 1167

arXiv:2312.02753 [pdf, other]

C3: High-performance and low-complexity neural compression from a single image or video

Authors: Hyunjik Kim, Matthias Bauer, Lucas Theis, Jonathan Richard Schwarz, Emilien Dupont

Abstract: Most neural compression models are trained on large datasets of images or videos in order to generalize to unseen data. Such generalization typically requires large and expressive architectures with a high decoding complexity. Here we introduce C3, a neural compression method with strong rate-distortion (RD) performance that instead overfits a small model to each image or video separately. The res… ▽ More Most neural compression models are trained on large datasets of images or videos in order to generalize to unseen data. Such generalization typically requires large and expressive architectures with a high decoding complexity. Here we introduce C3, a neural compression method with strong rate-distortion (RD) performance that instead overfits a small model to each image or video separately. The resulting decoding complexity of C3 can be an order of magnitude lower than neural baselines with similar RD performance. C3 builds on COOL-CHIC (Ladune et al.) and makes several simple and effective improvements for images. We further develop new methodology to apply C3 to videos. On the CLIC2020 image benchmark, we match the RD performance of VTM, the reference implementation of the H.266 codec, with less than 3k MACs/pixel for decoding. On the UVG video benchmark, we match the RD performance of the Video Compression Transformer (Mentzer et al.), a well-established neural video codec, with less than 5k MACs/pixel for decoding. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.01285 [pdf, other]

A Literature Review on the Smart Wheelchair Systems

Authors: Yane Kim, Bharath Velamala, Youngseo Choi, Yu** Kim, Hyunkin Kim, Nishad Kulkarni, Eung-Joo Lee

Abstract: This study offers an in-depth analysis of smart wheelchair (SW) systems, charting their progression from early developments to future innovations. It delves into various Brain-Computer Interface (BCI) systems, including mu rhythm, event-related potential, and steady-state visual evoked potential. The paper addresses challenges in signal categorization, proposing the sparse Bayesian extreme learnin… ▽ More This study offers an in-depth analysis of smart wheelchair (SW) systems, charting their progression from early developments to future innovations. It delves into various Brain-Computer Interface (BCI) systems, including mu rhythm, event-related potential, and steady-state visual evoked potential. The paper addresses challenges in signal categorization, proposing the sparse Bayesian extreme learning machine as an innovative solution. Additionally, it explores the integration of emotional states in BCI systems, the application of alternative control methods such as EMG-based systems, and the deployment of intelligent adaptive interfaces utilizing recurrent quantum neural networks. The study also covers advancements in autonomous navigation, assistance, and map**, emphasizing their importance in SW systems. The human aspect of SW interaction receives considerable attention, specifically in terms of privacy, physiological factors, and the refinement of control mechanisms. The paper acknowledges the commercial challenges faced, like the limitations of indoor usage and the necessity for user training. For future applications, the research explores the potential of autonomous systems adept at adapting to changing environments and user needs. This exploration includes reinforcement learning and various control methods, such as eye and voice control, to improve adaptability and interaction. The potential integration with smart home technologies, including advanced features such as robotic arms, is also considered, aiming to further enhance user accessibility and independence. Ultimately, this study seeks to provide a thorough overview of SW systems, presenting extensive research to detail their historical evolution, current state, and future prospects. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2311.17358 [pdf, other]

OpenSense: An Open-World Sensing Framework for Incremental Learning and Dynamic Sensor Scheduling on Embedded Edge Devices

Authors: Abdulrahman Bukhari, Seyedmehdi Hosseinimotlagh, Hyoseung Kim

Abstract: Recent advances in Internet-of-Things (IoT) technologies have sparked significant interest towards develo** learning-based sensing applications on embedded edge devices. These efforts, however, are being challenged by the complexities of adapting to unforeseen conditions in an open-world environment, mainly due to the intensive computational and energy demands exceeding the capabilities of edge… ▽ More Recent advances in Internet-of-Things (IoT) technologies have sparked significant interest towards develo** learning-based sensing applications on embedded edge devices. These efforts, however, are being challenged by the complexities of adapting to unforeseen conditions in an open-world environment, mainly due to the intensive computational and energy demands exceeding the capabilities of edge devices. In this paper, we propose OpenSense, an open-world time-series sensing framework for making inferences from time-series sensor data and achieving incremental learning on an embedded edge device with limited resources. The proposed framework is able to achieve two essential tasks, inference and incremental learning, eliminating the necessity for powerful cloud servers. In addition, to secure enough time for incremental learning and reduce energy consumption, we need to schedule sensing activities without missing any events in the environment. Therefore, we propose two dynamic sensor scheduling techniques: (i) a class-level period assignment scheduler that finds an appropriate sensing period for each inferred class, and (ii) a Q-learning-based scheduler that dynamically determines the sensing interval for each classification moment by learning the patterns of event classes. With this framework, we discuss the design choices made to ensure satisfactory learning performance and efficient resource usage. Experimental results demonstrate the ability of the system to incrementally adapt to unforeseen conditions and to efficiently schedule to run on a resource-constrained device. △ Less

Submitted 21 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.10306 [pdf, other]

MPSeg : Multi-Phase strategy for coronary artery Segmentation

Authors: Jonghoe Ku, Yong-Hee Lee, Junsup Shin, In Kyu Lee, Hyun-Woo Kim

Abstract: Accurate segmentation of coronary arteries is a pivotal process in assessing cardiovascular diseases. However, the intricate structure of the cardiovascular system presents significant challenges for automatic segmentation, especially when utilizing methodologies like the SYNTAX Score, which relies extensively on detailed structural information for precise risk stratification. To address these dif… ▽ More Accurate segmentation of coronary arteries is a pivotal process in assessing cardiovascular diseases. However, the intricate structure of the cardiovascular system presents significant challenges for automatic segmentation, especially when utilizing methodologies like the SYNTAX Score, which relies extensively on detailed structural information for precise risk stratification. To address these difficulties and cater to this need, we present MPSeg, an innovative multi-phase strategy designed for coronary artery segmentation. Our approach specifically accommodates these structural complexities and adheres to the principles of the SYNTAX Score. Initially, our method segregates vessels into two categories based on their unique morphological characteristics: Left Coronary Artery (LCA) and Right Coronary Artery (RCA). Specialized ensemble models are then deployed for each category to execute the challenging segmentation task. Due to LCA's higher complexity over RCA, a refinement model is utilized to scrutinize and correct initial class predictions on segmented areas. Notably, our approach demonstrated exceptional effectiveness when evaluated in the Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCADE) Segmentation Detection Algorithm challenge at MICCAI 2023. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: MICCAI 2023 Conference ARCADE Challenge

arXiv:2311.07227 [pdf, other]

CARTOS: A Charging-Aware Real-Time Operating System for Intermittent Batteryless Devices

Authors: Mohsen Karimi, Yidi Wang, Youngbin Kim, Yoo** Lim, Hyoseung Kim

Abstract: This paper presents CARTOS, a charging-aware real-time operating system designed to enhance the functionality of intermittently-powered batteryless devices (IPDs) for various Internet of Things (IoT) applications. While IPDs offer significant advantages such as extended lifespan and operability in extreme environments, they pose unique challenges, including the need to ensure forward progress of p… ▽ More This paper presents CARTOS, a charging-aware real-time operating system designed to enhance the functionality of intermittently-powered batteryless devices (IPDs) for various Internet of Things (IoT) applications. While IPDs offer significant advantages such as extended lifespan and operability in extreme environments, they pose unique challenges, including the need to ensure forward progress of program execution amidst variable energy availability and maintaining reliable real-time time behavior during power disruptions. To address these challenges, CARTOS introduces a mixed-preemption scheduling model that classifies tasks into computational and peripheral tasks, and ensures their efficient and timely execution by adopting just-in-time checkpointing for divisible computation tasks and uninterrupted execution for indivisible peripheral tasks. CARTOS also supports processing chains of tasks with precedence constraints and adapts its scheduling in response to environmental changes to offer continuous execution under diverse conditions. CARTOS is implemented with new APIs and components added to FreeRTOS but is designed for portability to other embedded RTOSs. Through real hardware experiments and simulations, CARTOS exhibits superior performance over state-of-the-art methods, demonstrating that it can serve as a practical platform for develo** resilient, real-time sensing applications on IPDs. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.05878 [pdf]

Central Angle Optimization for 360-degree Holographic 3D Content

Authors: Hakdong Kim, Minsung Yoon, Cheongwon Kim

Abstract: In this study, we propose a method to find an optimal central angle in deep learning-based depth map estimation used to produce realistic holographic content. The acquisition of RGB-depth map images as detailed as possible must be performed to generate holograms of high quality, despite the high computational cost. Therefore, we introduce a novel pipeline designed to analyze various values of cent… ▽ More In this study, we propose a method to find an optimal central angle in deep learning-based depth map estimation used to produce realistic holographic content. The acquisition of RGB-depth map images as detailed as possible must be performed to generate holograms of high quality, despite the high computational cost. Therefore, we introduce a novel pipeline designed to analyze various values of central angles between adjacent camera viewpoints equidistant from the origin of an object-centered environment. Then we propose the optimal central angle to generate high-quality holographic content. The proposed pipeline comprises key steps such as comparing estimated depth maps and comparing reconstructed CGHs (Computer-Generated Holograms) from RGB images and estimated depth maps. We experimentally demonstrate and discuss the relationship between the central angle and the quality of digital holographic content. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2310.19264 [pdf, other]

Sound of Story: Multi-modal Storytelling with Audio

Authors: Jaeyeon Bae, Seokhoon Jeong, Seokun Kang, Namgi Han, Jae-Yon Lee, Hyounghun Kim, Taehwan Kim

Abstract: Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new… ▽ More Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new component called "background sound" which is story context-based audio without any linguistic information. For this purpose, we introduce a new dataset, called "Sound of Story (SoS)", which has paired image and text sequences with corresponding sound or background music for a story. To the best of our knowledge, this is the largest well-curated dataset for storytelling with sound. Our SoS dataset consists of 27,354 stories with 19.6 images per story and 984 hours of speech-decoupled audio such as background music and other sounds. As benchmark tasks for storytelling with sound and the dataset, we propose retrieval tasks between modalities, and audio generation tasks from image-text sequences, introducing strong baselines for them. We believe the proposed dataset and tasks may shed light on the multi-modal understanding of storytelling in terms of sound. Downloading the dataset and baseline codes for each task will be released in the link: https://github.com/Sosdatasets/SoS_Dataset. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Findings of EMNLP 2023, project: https://github.com/Sosdatasets/SoS_Dataset/

arXiv:2310.18882 [pdf, other]

Differentiable Learning of Generalized Structured Matrices for Efficient Deep Neural Networks

Authors: Changwoo Lee, Hun-Seok Kim

Abstract: This paper investigates efficient deep neural networks (DNNs) to replace dense unstructured weight matrices with structured ones that possess desired properties. The challenge arises because the optimal weight matrix structure in popular neural network models is obscure in most cases and may vary from layer to layer even in the same network. Prior structured matrices proposed for efficient DNNs we… ▽ More This paper investigates efficient deep neural networks (DNNs) to replace dense unstructured weight matrices with structured ones that possess desired properties. The challenge arises because the optimal weight matrix structure in popular neural network models is obscure in most cases and may vary from layer to layer even in the same network. Prior structured matrices proposed for efficient DNNs were mostly hand-crafted without a generalized framework to systematically learn them. To address this issue, we propose a generalized and differentiable framework to learn efficient structures of weight matrices by gradient descent. We first define a new class of structured matrices that covers a wide range of structured matrices in the literature by adjusting the structural parameters. Then, the frequency-domain differentiable parameterization scheme based on the Gaussian-Dirichlet kernel is adopted to learn the structural parameters by proximal gradient descent. On the image and language tasks, our method learns efficient DNNs with structured matrices, achieving lower complexity and/or higher performance than prior approaches that employ low-rank, block-sparse, or block-low-rank matrices. △ Less

Submitted 7 March, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

arXiv:2310.17742 [pdf]

BERT-PIN: A BERT-based Framework for Recovering Missing Data Segments in Time-series Load Profiles

Authors: Yi Hu, Kai Ye, Hyeon** Kim, Ning Lu

Abstract: Inspired by the success of the Transformer model in natural language processing and computer vision, this paper introduces BERT-PIN, a Bidirectional Encoder Representations from Transformers (BERT) powered Profile Inpainting Network. BERT-PIN recovers multiple missing data segments (MDSs) using load and temperature time-series profiles as inputs. To adopt a standard Transformer model structure for… ▽ More Inspired by the success of the Transformer model in natural language processing and computer vision, this paper introduces BERT-PIN, a Bidirectional Encoder Representations from Transformers (BERT) powered Profile Inpainting Network. BERT-PIN recovers multiple missing data segments (MDSs) using load and temperature time-series profiles as inputs. To adopt a standard Transformer model structure for profile inpainting, we segment the load and temperature profiles into line segments, treating each segment as a word and the entire profile as a sentence. We incorporate a top candidates selection process in BERT-PIN, enabling it to produce a sequence of probability distributions, based on which users can generate multiple plausible imputed data sets, each reflecting different confidence levels. We develop and evaluate BERT-PIN using real-world dataset for two applications: multiple MDSs recovery and demand response baseline estimation. Simulation results show that BERT-PIN outperforms the existing methods in accuracy while is capable of restoring multiple MDSs within a longer window. BERT-PIN, served as a pre-trained model, can be fine-tuned for conducting many downstream tasks, such as classification and super resolution. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.08660 [pdf, other]

Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach

Authors: Heasung Kim, Sravan Kumar Ankireddy

Abstract: In this work, we consider the problem of network parameter optimization for rate maximization. We frame this as a joint optimization problem of power control, beam forming, and interference cancellation. We consider the setting where multiple Base Stations (BSs) communicate with multiple user equipment (UEs). Because of the exponential computational complexity of brute force search, we instead sol… ▽ More In this work, we consider the problem of network parameter optimization for rate maximization. We frame this as a joint optimization problem of power control, beam forming, and interference cancellation. We consider the setting where multiple Base Stations (BSs) communicate with multiple user equipment (UEs). Because of the exponential computational complexity of brute force search, we instead solve this nonconvex optimization problem using deep reinforcement learning (RL) techniques. Modern communication systems are notorious for their difficulty in exactly modeling their behavior. This limits us in using RL-based algorithms as interaction with the environment is needed for the agent to explore and learn efficiently. Further, it is ill-advised to deploy the algorithm in the real world for exploration and learning because of the high cost of failure. In contrast to the previous RL-based solutions proposed, such as deep-Q network (DQN) based control, we suggest an offline model-based approach. We specifically consider discrete batch-constrained deep Q-learning (BCQ) and show that performance similar to DQN can be achieved with only a fraction of the data without exploring. This maximizes sample efficiency and minimizes risk in deploying a new algorithm to commercial networks. We provide the entire project resource, including code and data, at the following link: https://github.com/Heasung-Kim/ safe-rl-deployment-for-5g. △ Less

Submitted 11 November, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

Comments: 10 pages, 8 figures

arXiv:2310.07974 [pdf, other]

Causality-based Cost Allocation for Peer-to-Peer Energy Trading in Distribution System

Authors: Hyun Joong Kim, Yong Hyun Song, Jip Kim

Abstract: While peer-to-peer energy trading has the potential to harness the capabilities of small-scale energy resources, a peer-matching process often overlooks power grid conditions, yielding increased losses, line congestion, and voltage problems. This imposes a great challenge on the distribution system operator (DSO), which can eventually limit peer-to-peer energy trading. To align the peer-matching p… ▽ More While peer-to-peer energy trading has the potential to harness the capabilities of small-scale energy resources, a peer-matching process often overlooks power grid conditions, yielding increased losses, line congestion, and voltage problems. This imposes a great challenge on the distribution system operator (DSO), which can eventually limit peer-to-peer energy trading. To align the peer-matching process with the physical grid conditions, this paper proposes a cost causality-based network cost allocation method and the grid-aware peer-matching process. Building on the cost causality principle, the proposed model utilizes the network cost (loss, congestion, and voltage) as a signal to encourage peers to adjust their preferences ensuring that matches are more in line with grid conditions, leading to enhanced social welfare. Additionally, this paper presents mathematical proof showing the superiority of the causality-based cost allocation over existing methods. △ Less

Submitted 20 February, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 7 pages, 7 figures

arXiv:2310.04010 [pdf, other]

Excision And Recovery: Visual Defect Obfuscation Based Self-Supervised Anomaly Detection Strategy

Authors: YeongHyeon Park, Sungho Kang, Myung ** Kim, Yeonho Lee, Hyeong Seok Kim, Juneho Yi

Abstract: Due to scarcity of anomaly situations in the early manufacturing stage, an unsupervised anomaly detection (UAD) approach is widely adopted which only uses normal samples for training. This approach is based on the assumption that the trained UAD model will accurately reconstruct normal patterns but struggles with unseen anomalous patterns. To enhance the UAD performance, reconstruction-by-inpainti… ▽ More Due to scarcity of anomaly situations in the early manufacturing stage, an unsupervised anomaly detection (UAD) approach is widely adopted which only uses normal samples for training. This approach is based on the assumption that the trained UAD model will accurately reconstruct normal patterns but struggles with unseen anomalous patterns. To enhance the UAD performance, reconstruction-by-inpainting based methods have recently been investigated, especially on the masking strategy of suspected defective regions. However, there are still issues to overcome: 1) time-consuming inference due to multiple masking, 2) output inconsistency by random masking strategy, and 3) inaccurate reconstruction of normal patterns when the masked area is large. Motivated by this, we propose a novel reconstruction-by-inpainting method, dubbed Excision And Recovery (EAR), that features single deterministic masking based on the ImageNet pre-trained DINO-ViT and visual obfuscation for hint-providing. Experimental results on the MVTec AD dataset show that deterministic masking by pre-trained attention effectively cuts out suspected defective regions and resolve the aforementioned issues 1 and 2. Also, hint-providing by mosaicing proves to enhance the UAD performance than emptying those regions by binary masking, thereby overcomes issue 3. Our approach achieves a high UAD performance without any change of the neural network structure. Thus, we suggest that EAR be adopted in various manufacturing industries as a practically deployable solution. △ Less

Submitted 9 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: 10 pages, 5 figures, 5 tables

arXiv:2309.14967 [pdf, other]

A novel approach for holographic 3D content generation without depth map

Authors: Hakdong Kim, Minkyu Jee, Yurim Lee, Kyudam Choi, MinSung Yoon, Cheongwon Kim

Abstract: In preparation for observing holographic 3D content, acquiring a set of RGB color and depth map images per scene is necessary to generate computer-generated holograms (CGHs) when using the fast Fourier transform (FFT) algorithm. However, in real-world situations, these paired formats of RGB color and depth map images are not always fully available. We propose a deep learning-based method to synthe… ▽ More In preparation for observing holographic 3D content, acquiring a set of RGB color and depth map images per scene is necessary to generate computer-generated holograms (CGHs) when using the fast Fourier transform (FFT) algorithm. However, in real-world situations, these paired formats of RGB color and depth map images are not always fully available. We propose a deep learning-based method to synthesize the volumetric digital holograms using only the given RGB image, so that we can overcome environments where RGB color and depth map images are partially provided. The proposed method uses only the input of RGB image to estimate its depth map and then generate its CGH sequentially. Through experiments, we demonstrate that the volumetric hologram generated through our proposed model is more accurate than that of competitive models, under the situation that only RGB color data can be provided. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.12047 [pdf, other]

doi 10.1145/3610548.3618140

Self-Calibrating, Fully Differentiable NLOS Inverse Rendering

Authors: Kiseok Choi, Inchul Kim, Dongyoung Choi, Julio Marco, Diego Gutierrez, Min H. Kim

Abstract: Existing time-resolved non-line-of-sight (NLOS) imaging methods reconstruct hidden scenes by inverting the optical paths of indirect illumination measured at visible relay surfaces. These methods are prone to reconstruction artifacts due to inversion ambiguities and capture noise, which are typically mitigated through the manual selection of filtering functions and parameters. We introduce a fully… ▽ More Existing time-resolved non-line-of-sight (NLOS) imaging methods reconstruct hidden scenes by inverting the optical paths of indirect illumination measured at visible relay surfaces. These methods are prone to reconstruction artifacts due to inversion ambiguities and capture noise, which are typically mitigated through the manual selection of filtering functions and parameters. We introduce a fully-differentiable end-to-end NLOS inverse rendering pipeline that self-calibrates the imaging parameters during the reconstruction of hidden scenes, using as input only the measured illumination while working both in the time and frequency domains. Our pipeline extracts a geometric representation of the hidden scene from NLOS volumetric intensities and estimates the time-resolved illumination at the relay wall produced by such geometric information using differentiable transient rendering. We then use gradient descent to optimize imaging parameters by minimizing the error between our simulated time-resolved illumination and the measured illumination. Our end-to-end differentiable pipeline couples diffraction-based volumetric NLOS reconstruction with path-space light transport and a simple ray marching technique to extract detailed, dense sets of surface points and normals of hidden scenes. We demonstrate the robustness of our method to consistently reconstruct geometry and albedo, even under significant noise levels. △ Less

Submitted 25 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Journal ref: Proceedings of ACM SIGGRAPH Asia 2023 (December 2023)

arXiv:2309.07152 [pdf]

Novel Smart N95 Filtering Facepiece Respirator with Real-time Adaptive Fit Functionality and Wireless Humidity Monitoring for Enhanced Wearable Comfort

Authors: Kangkyu Kwon, Yoon Jae Lee, Yeongju Jung, Ira Soltis, Chanyeong Choi, Yewon Na, Lissette Romero, Myung Chul Kim, Nathan Rodeheaver, Hodam Kim, Michael S. Lloyd, Ziqing Zhuang, William King, Susan Xu, Seung-Hwan Ko, **woo Lee, Woon-Hong Yeo

Abstract: The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the cur… ▽ More The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the current facial respirators do not inform the user of the air quality within the smart facepiece respirator in case of continuous long-term use. Here, we demonstrate the novel smart N-95 filtering facepiece respirator that incorporates the humidity sensor and pressure sensory feedback-enabled self-fit adjusting functionality for the effective performance of the facial respirator to prevent the transmission of airborne pathogens. The laser-induced graphene (LIG) constitutes the humidity sensor, and the pressure sensor array based on the dielectric elastomeric sponge monitors the respirator contact on the face of the user, providing the sensory information for a closed-loop feedback mechanism. As a result of the self-fit adjusting mode along with elastomeric lining, the fit factor is increased by 3.20 and 5 times at average and maximum respectively. We expect that the experimental proof-of-concept of this work will offer viable solutions to the current commercial respirators to address the limitations. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 20 pages, 5 figures, 1 table, submitted for possible publication

MSC Class: 92C55

arXiv:2309.06770 [pdf, other]

Deep Learning-based Synthetic High-Resolution In-Depth Imaging Using an Attachable Dual-element Endoscopic Ultrasound Probe

Authors: Hah Min Lew, Jae Seong Kim, Moon Hwan Lee, Jaegeun Park, Sangyeon Youn, Hee Man Kim, Jihun Kim, Jae Youn Hwang

Abstract: Endoscopic ultrasound (EUS) imaging has a trade-off between resolution and penetration depth. By considering the in-vivo characteristics of human organs, it is necessary to provide clinicians with appropriate hardware specifications for precise diagnosis. Recently, super-resolution (SR) ultrasound imaging studies, including the SR task in deep learning fields, have been reported for enhancing ultr… ▽ More Endoscopic ultrasound (EUS) imaging has a trade-off between resolution and penetration depth. By considering the in-vivo characteristics of human organs, it is necessary to provide clinicians with appropriate hardware specifications for precise diagnosis. Recently, super-resolution (SR) ultrasound imaging studies, including the SR task in deep learning fields, have been reported for enhancing ultrasound images. However, most of those studies did not consider ultrasound imaging natures, but rather they were conventional SR techniques based on downsampling of ultrasound images. In this study, we propose a novel deep learning-based high-resolution in-depth imaging probe capable of offering low- and high-frequency ultrasound image pairs. We developed an attachable dual-element EUS probe with customized low- and high-frequency ultrasound transducers under small hardware constraints. We also designed a special geared structure to enable the same image plane. The proposed system was evaluated with a wire phantom and a tissue-mimicking phantom. After the evaluation, 442 ultrasound image pairs from the tissue-mimicking phantom were acquired. We then applied several deep learning models to obtain synthetic high-resolution in-depth images, thus demonstrating the feasibility of our approach for clinical unmet needs. Furthermore, we quantitatively and qualitatively analyzed the results to find a suitable deep-learning model for our task. The obtained results demonstrate that our proposed dual-element EUS probe with an image-to-image translation network has the potential to provide synthetic high-frequency ultrasound images deep inside tissues. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 10 pages, 9 figures

Showing 1–50 of 304 results for author: Kim, H