Skip to main content

Showing 1–50 of 158 results for author: Kim, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.12721  [pdf

    eess.AS cs.SD

    Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4

    Authors: Sang Won Son, Jongyeon Park, Hong Kook Kim, Sulaiman Vesal, Jeong Eun Lim

    Abstract: In this report, we propose three novel methods for develo** a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main de… ▽ More

    Submitted 24 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 challenge Task4, 4 pages

  2. arXiv:2406.11248  [pdf

    eess.AS cs.AI cs.SD

    Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9

    Authors: Do Hyun Lee, Yoonah Song, Hong Kook Kim

    Abstract: We present a prompt-engineering-based text-augmentation approach applied to a language-queried audio source separation (LASS) task. To enhance the performance of LASS, the proposed approach utilizes large language models (LLMs) to generate multiple captions corresponding to each sentence of the training dataset. To this end, we first perform experiments to identify the most effective prompts for c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 9, 4 pages

  3. arXiv:2406.09345  [pdf, other

    cs.CL cs.SD eess.AS

    DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

    Authors: Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu

    Abstract: The integration of pre-trained text-based large language models (LLM) with speech input has enabled instruction-following capabilities for diverse speech tasks. This integration requires the use of a speech encoder, a speech adapter, and an LLM, trained on diverse tasks. We propose the use of discrete speech units (DSU), rather than continuous-valued speech encoder outputs, that are converted to t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2406.02000  [pdf, other

    cs.NI eess.SP

    Advancing Ultra-Reliable 6G: Transformer and Semantic Localization Empowered Robust Beamforming in Millimeter-Wave Communications

    Authors: Avi Deb Raha, Kitae Kim, Apurba Adhikary, Mrityunjoy Gain, Choong Seon Hong

    Abstract: Advancements in 6G wireless technology have elevated the importance of beamforming, especially for attaining ultra-high data rates via millimeter-wave (mmWave) frequency deployment. Although promising, mmWave bands require substantial beam training to achieve precise beamforming. While initial deep learning models that use RGB camera images demonstrated promise in reducing beam training overhead,… ▽ More

    Submitted 21 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.19771  [pdf, other

    cs.NI eess.SP

    Data Service Maximization in Integrated Terrestrial-Non-Terrestrial 6G Networks: A Deep Reinforcement Learning Approach

    Authors: Nway Nway Ei, Kitae Kim, Yan Kyaw Tun, Choong Seon Hong

    Abstract: Integrating terrestrial and non-terrestrial networks has emerged as a promising paradigm to fulfill the constantly growing demand for connectivity, low transmission delay, and quality of services (QoS). This integration brings together the strengths of terrestrial and non-terrestrial networks, such as the reliability of terrestrial networks, broad coverage, and service continuity of non-terrestria… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 5 pages, 4 figures

  6. arXiv:2405.05787  [pdf, other

    cs.RO cs.CV eess.SY

    Autonomous Robotic Ultrasound System for Liver Follow-up Diagnosis: Pilot Phantom Study

    Authors: Tianpeng Zhang, Sekeun Kim, Jerome Charton, Haitong Ma, Kyungsang Kim, Na Li, Quanzheng Li

    Abstract: The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate map** between CT image and robot, and (iii) ta… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  7. arXiv:2405.03905  [pdf, other

    cs.AR cs.CV cs.SD eess.AS

    A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

    Authors: Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

    Abstract: This paper introduces, to the best of the authors' knowledge, the first fine-grained temporal sparsity-aware keyword spotting (KWS) IC leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses. This KWS IC, featuring a bio-inspired delta-gated recurrent neural network (ΔRNN) cla… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  8. arXiv:2404.07021  [pdf, other

    eess.SP

    A 4x32Gb/s 1.8pJ/bit Collaborative Baud-Rate CDR with Background Eye-Climbing Algorithm and Low-Power Global Clock Distribution

    Authors: Jihee Kim, Jia Park, Jiwon Shin, Hanseok Kim, Kahyun Kim, Haengbeom Shin, Ha-Jung Park, Woo-Seok Choi

    Abstract: This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the freq… ▽ More

    Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  9. arXiv:2404.04096  [pdf, other

    cs.IT eess.SP

    Machine Learning-Aided Cooperative Localization under Dense Urban Environment

    Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee

    Abstract: Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  10. arXiv:2404.01517  [pdf, other

    cs.LG eess.SP

    Addressing Heterogeneity in Federated Load Forecasting with Personalization Layers

    Authors: Shourya Bose, Yu Zhang, Kibaek Kim

    Abstract: The advent of smart meters has enabled pervasive collection of energy consumption data for training short-term load forecasting models. In response to privacy concerns, federated learning (FL) has been proposed as a privacy-preserving approach for training, but the quality of trained models degrades as client data becomes heterogeneous. In this paper we propose the use of personalization layers fo… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  11. arXiv:2404.01464  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images

    Authors: JungEun Kim, Hangyul Yoon, Geondo Park, Kyungsu Kim, Eunho Yang

    Abstract: 4D medical images, which represent 3D images with temporal information, are crucial in clinical practice for capturing dynamic changes and monitoring long-term disease progression. However, acquiring 4D medical images poses challenges due to factors such as radiation exposure and imaging duration, necessitating a balance between achieving high temporal resolution and minimizing adverse effects. Gi… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  12. arXiv:2402.17790  [pdf, other

    eess.SP cs.LG

    EEG classifier cross-task transfer to avoid training sessions in robot-assisted rehabilitation

    Authors: Niklas Kueper, Su Kyoung Kim, Elsa Andrea Kirchner

    Abstract: Background: For an individualized support of patients during rehabilitation, learning of individual machine learning models from the human electroencephalogram (EEG) is required. Our approach allows labeled training data to be recorded without the need for a specific training session. For this, the planned exoskeleton-assisted rehabilitation enables bilateral mirror therapy, in which movement inte… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 11 pages, 6 figures, 1 table

    MSC Class: 68

  13. arXiv:2401.15313  [pdf, other

    cs.RO cs.CV eess.SY math.OC

    Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization

    Authors: Kihoon Shin, Hyunjae Sim, Seungwon Nam, Yonghee Kim, Jae Hu, Kwang-Ki K. Kim

    Abstract: In this study, we address multi-robot localization issues, with a specific focus on cooperative localization and observability analysis of relative pose estimation. Cooperative localization involves enhancing each robot's information through a communication network and message passing. If odometry data from a target robot can be transmitted to the ego robot, observability of their relative pose es… ▽ More

    Submitted 4 February, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: 20 pages, 21 figures

    MSC Class: 93C85; 93E11; 93E24; 90C26; 93E10; 62M20;

  14. arXiv:2401.08962  [pdf, other

    cs.HC cs.LG cs.SD eess.AS

    DOO-RE: A dataset of ambient sensors in a meeting room for activity recognition

    Authors: Hyunju Kim, Geon Kim, Taehoon Lee, Kisoo Kim, Dongman Lee

    Abstract: With the advancement of IoT technology, recognizing user activities with machine learning methods is a promising way to provide various smart services to users. High-quality data with privacy protection is essential for deploying such services in the real world. Data streams from surrounding ambient sensors are well suited to the requirement. Existing ambient sensor datasets only support constrain… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  15. arXiv:2401.08835  [pdf, other

    cs.CL eess.AS

    Improving ASR Contextual Biasing with Guided Attention

    Authors: Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe

    Abstract: In this paper, we propose a Guided Attention (GA) auxiliary training loss, which improves the effectiveness and robustness of automatic speech recognition (ASR) contextual biasing without introducing additional parameters. A common challenge in previous literature is that the word error rate (WER) reduction brought by contextual biasing diminishes as the number of bias phrases increases. To addres… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  16. arXiv:2312.14939  [pdf, other

    q-bio.NC cs.CV cs.LG eess.IV

    Large-scale Graph Representation Learning of Dynamic Brain Connectome with Transformers

    Authors: Byung-Hoon Kim, Jungwon Choi, EungGu Yun, Kyungsang Kim, Xiang Li, Juho Lee

    Abstract: Graph Transformers have recently been successful in various graph representation learning tasks, providing a number of advantages over message-passing Graph Neural Networks. Utilizing Graph Transformers for learning the representation of the brain functional connectivity network is also gaining interest. However, studies to date have underlooked the temporal dynamics of functional connectivity, wh… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 Temporal Graph Learning Workshop

  17. arXiv:2312.09895  [pdf, other

    cs.CL cs.SD eess.AS

    Generative Context-aware Fine-tuning of Self-supervised Speech Models

    Authors: Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu

    Abstract: When performing tasks like automatic speech recognition or spoken language understanding for a given utterance, access to preceding text or audio provides contextual information can improve performance. Considering the recent advances in generative large language models (LLM), we hypothesize that an LLM could generate useful context information using the preceding text. With appropriate prompts, L… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  18. arXiv:2312.01004  [pdf, other

    eess.SY

    Learning-based Ecological Adaptive Cruise Control of Autonomous Electric Vehicles: A Comparison of ADP, DQN and DDPG Approaches

    Authors: Sunwoo Kim, Kwang-Ki K. Kim

    Abstract: This paper presents model-based and model-free learning methods for economic and ecological adaptive cruise control (Eco-ACC) of connected and autonomous electric vehicles. For model-based optimal control of Eco-ACC, we considered longitudinal vehicle dynamics and a quasi-steady-state powertrain model including the physical limits of a commercial electric vehicle. We used adaptive dynamic programm… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    MSC Class: 93E20; 68T20; 49M37; 90-08

  19. arXiv:2311.10224  [pdf, other

    eess.IV cs.CV cs.LG

    CV-Attention UNet: Attention-based UNet for 3D Cerebrovascular Segmentation of Enhanced TOF-MRA Images

    Authors: Syed Farhan Abbas, Nguyen Thanh Duc, Yoonguu Song, Kyungwon Kim, Ekta Srivastava, Boreom Lee

    Abstract: Due to the lack of automated methods, to diagnose cerebrovascular disease, time-of-flight magnetic resonance angiography (TOF-MRA) is assessed visually, making it time-consuming. The commonly used encoder-decoder architectures for cerebrovascular segmentation utilize redundant features, eventually leading to the extraction of low-level features multiple times. Additionally, convolutional neural ne… ▽ More

    Submitted 19 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  20. Deep Video Inpainting Guided by Audio-Visual Self-Supervision

    Authors: Kyuyeon Kim, Junsik Jung, Woo Jae Kim, Sung-Eui Yoon

    Abstract: Humans can easily imagine a scene from auditory information based on their prior knowledge of audio-visual events. In this paper, we mimic this innate human ability in deep learning models to improve the quality of video inpainting. To implement the prior knowledge, we first train the audio-visual network, which learns the correspondence between auditory and visual information. Then, the audio-vis… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted at ICASSP 2022

  21. arXiv:2310.02467  [pdf

    physics.optics eess.SP physics.app-ph

    Dual-Polarization Phase Retrieval Receiver in Silicon Photonics

    Authors: Brian Stern, Hanzi Huang, Haoshuo Chen, Kwangwoong Kim, Mohamad Hossein Idjadi

    Abstract: We demonstrate a silicon photonic dual-polarization phase retrieval receiver. The receiver recovers phase from intensity-only measurements without a local oscillator or transmitted carrier. We design silicon waveguides providing long delays and microring resonators with large dispersion to enable symbol-to-symbol interference and dispersive projection in the phase retrieval algorithm. We retrieve… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 11 pages, 7 figures

  22. arXiv:2309.13539  [pdf, other

    eess.IV

    MediViSTA-SAM: Zero-shot Medical Video Analysis with Spatio-temporal SAM Adaptation for Echocardiography

    Authors: Sekeun Kim, Kyungsang Kim, Jiang Hu, Cheng Chen, Zhiliang Lyu, Ren Hui, Sunghwan Kim, Zhengliang Liu, Aoxiao Zhong, Xiang Li, Tianming Liu, Quanzheng Li

    Abstract: The Segmentation Anything Model (SAM) has gained significant attention for its robust generalization capabilities across diverse downstream tasks. However, the performance of SAM is noticeably diminished in medical images due to the substantial disparity between natural and medical image domain. In this paper, we present a zero-shot generalization model specifically designed for echocardiography a… ▽ More

    Submitted 6 April, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

  23. arXiv:2309.12566  [pdf, other

    cs.RO eess.SY math.OC

    Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives

    Authors: Muhammad Kazim, JunGee Hong, Min-Gyeom Kim, Kwang-Ki K. Kim

    Abstract: This paper presents a tutorial overview of path integral (PI) control approaches for stochastic optimal control and trajectory optimization. We concisely summarize the theoretical development of path integral control to compute a solution for stochastic optimal control and provide algorithmic descriptions of the cross-entropy (CE) method, an open-loop controller using the receding horizon scheme k… ▽ More

    Submitted 1 December, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: 16 pages, 9 figures

    MSC Class: 68T40; 13P25 ACM Class: I.2.9; I.2.8; G.1.6; G.4

  24. arXiv:2308.07788  [pdf, ps, other

    eess.AS

    GIST-AiTeR Speaker Diarization System for VoxCeleb Speaker Recognition Challenge (VoxSRC) 2023

    Authors: Dongkeon Park, Ji Won Kim, Kang Ryeol Kim, Do Hyun Lee, Hong Kook Kim

    Abstract: This report describes the submission system by the GIST-AiTeR team for the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23) Track 4. Our submission system focuses on implementing diverse speaker diarization (SD) techniques, including ResNet293 and MFA-Conformer with different combinations of segment and hop length. Then, those models are combined into an ensemble model. The ResNet293 and MF… ▽ More

    Submitted 25 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: VoxSRC 2023 Track4

  25. arXiv:2308.02416  [pdf, other

    eess.SP cs.LG

    Local-Global Temporal Fusion Network with an Attention Mechanism for Multiple and Multiclass Arrhythmia Classification

    Authors: Yun Kwan Kim, Minji Lee, Kunwook Jo, Hee Seok Song, Seong-Whan Lee

    Abstract: Clinical decision support systems (CDSSs) have been widely utilized to support the decisions made by cardiologists when detecting and classifying arrhythmia from electrocardiograms (ECGs). However, forming a CDSS for the arrhythmia classification task is challenging due to the varying lengths of arrhythmias. Although the onset time of arrhythmia varies, previously developed methods have not consid… ▽ More

    Submitted 13 October, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 14 pages, 6 figures

    MSC Class: 68T07; 92C55

  26. arXiv:2307.16207  [pdf, other

    eess.SP

    Trajectory Optimization for Cellular-Enabled UAV with Connectivity and Battery Constraints

    Authors: Hyeon-Seong Im, Kyu-Yeong Kim, Si-Hyeon Lee

    Abstract: In this paper, we address the problem of path planning for a cellular-enabled UAV with connectivity and battery constraints. The UAV's mission is to deliver a payload from an initial point to a final point as soon as possible, while maintaining connectivity with a BS and adhering to the battery constraint. The UAV's battery can be replaced by a fully charged battery at a charging station, which ma… ▽ More

    Submitted 6 October, 2023; v1 submitted 30 July, 2023; originally announced July 2023.

    Comments: This article was presented in part at the IEEE Vehicular Technology Conference (VTC) 2023-Fall

  27. arXiv:2307.10667  [pdf, other

    eess.IV cs.CV

    Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors

    Authors: Haechang Lee, Dongwon Park, Wongi Jeong, Kijeong Kim, Hyunwoo Je, Dongil Ryu, Se Young Chun

    Abstract: As the physical size of recent CMOS image sensors (CIS) gets smaller, the latest mobile cameras are adopting unique non-Bayer color filter array (CFA) patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA thanks to their changeable pixel-bin sizes for different light conditions but may introdu… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  28. arXiv:2306.16739  [pdf, other

    eess.SP

    Sparse RF Lens Antenna Array Design for AoA Estimation in Wideband Systems: Placement Optimization and Performance Analysis

    Authors: Joo-Hyun Jo, Jae-Nam Shim, Chan-Byoung Chae, Dong Ku Kim, Robert W. Heath Jr

    Abstract: In this paper, we propose a novel architecture for a lens antenna array (LAA) designed to work with a small number of antennas and enable angle-of-arrival (AoA) estimation for advanced 5G vehicle-to-everything (V2X) use cases that demand wider bandwidths and higher data rates. We derive a received signal in terms of optical analysis to consider the variability of the focal region for different car… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 15 pages, 10 figures

  29. arXiv:2306.16721  [pdf, other

    eess.SP

    AoA-based Position and Orientation Estimation Using Lens MIMO in Cooperative Vehicle-to-Vehicle Systems

    Authors: Joo-Hyun Jo, Jae-Nam Shim, Byoungnam, Kim, Chan-Byoung Chae, Dong Ku Kim

    Abstract: Positioning accuracy is a critical requirement for vehicle-to-everything (V2X) use cases. Therefore, this paper derives the theoretical limits of estimation for the position and orientation of vehicles in a cooperative vehicle-to-vehicle (V2V) scenario, using a lens-based multiple-input multiple-output (lens-MIMO) system. Following this, we analyze the Cram$\acute{\text{e}}$r-Rao lower bounds (CRL… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 16 pages, 11 figures

  30. arXiv:2306.06461  [pdf

    eess.AS cs.SD

    Semi-supervsied Learning-based Sound Event Detection using Freuqency Dynamic Convolution with Large Kernel Attention for DCASE Challenge 2023 Task 4

    Authors: Ji Won Kim, Sang Won Son, Yoonah Song, Hong Kook Kim, Il Hoon Song, Jeong Eun Lim

    Abstract: This report proposes a frequency dynamic convolution (FDY) with a large kernel attention (LKA)-convolutional recurrent neural network (CRNN) with a pre-trained bidirectional encoder representation from audio transformers (BEATs) embedding-based sound event detection (SED) model that employs a mean-teacher and pseudo-label approach to address the challenge of limited labeled data for DCASE 2023 Tas… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Comments: DCASE 2023 Challenge Task 4A, 5 pages

  31. arXiv:2305.15349  [pdf, other

    cs.LG eess.SP math.OC stat.CO stat.ML

    On the Convergence of Black-Box Variational Inference

    Authors: Kyurae Kim, Jisu Oh, Kaiwen Wu, Yi-An Ma, Jacob R. Gardner

    Abstract: We provide the first convergence guarantee for full black-box variational inference (BBVI), also known as Monte Carlo variational inference. While preliminary investigations worked on simplified versions of BBVI (e.g., bounded domain, bounded support, only optimizing for the scale, and such), our setup does not need any such algorithmic modifications. Our results hold for log-smooth posterior dens… ▽ More

    Submitted 10 January, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to NeurIPS'23; previous title: "Black-Box Variational Inference Converges"

  32. arXiv:2305.11073  [pdf, other

    cs.CL cs.SD eess.AS

    A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

    Authors: Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe

    Abstract: Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU). Recently, a new encoder called E-Branchformer has outperformed Conformer in the LibriSpeech ASR benchmark, making it… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted at INTERSPEECH 2023. Code: https://github.com/espnet/espnet

  33. arXiv:2305.05085  [pdf, other

    physics.optics eess.IV

    Tensorial tomographic Fourier Ptychography with applications to muscle tissue imaging

    Authors: Shiqi Xu, Xiang Dai, Paul Ritter, Kyung Chul Lee, Xi Yang, Lucas Kreiss, Kevin C. Zhou, Kanghyun Kim, Amey Chaware, Jadee Neff, Carolyn Glass, Seung Ah Lee, Oliver Friedrich, Roarke Horstmeyer

    Abstract: We report Tensorial tomographic Fourier Ptychography (ToFu), a new non-scanning label-free tomographic microscopy method for simultaneous imaging of quantitative phase and anisotropic specimen information in 3D. Built upon Fourier Ptychography, a quantitative phase imaging technique, ToFu additionally highlights the vectorial nature of light. The imaging setup consists of a standard microscope equ… ▽ More

    Submitted 13 May, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Journal ref: Tensorial tomographic Fourier Ptychography with applications to muscle tissue imaging, Adv. Photon. 6(2), 026004 (2024)

  34. A numerically efficient output-only system-identification framework for stochastically forced self-sustained oscillators

    Authors: Minwoo Lee, Kyu Tae Kim, Jongho Park

    Abstract: Self-sustained oscillations are ubiquitous in nature and engineering. In this paper, we propose a novel output-only system-identification framework for identifying the system parameters of a self-sustained oscillator affected by Gaussian white noise. A Langevin model that characterizes the self-sustained oscillator is postulated, and the corresponding Fokker--Planck equation is derived from stocha… ▽ More

    Submitted 16 August, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: 17 pages, 10 figures

    MSC Class: 90C53; 90C56; 90C26; 65Z05

    Journal ref: Probabilistic Eng. Mech. 74 (2023) Paper No. 103516

  35. X-CANIDS: Signal-Aware Explainable Intrusion Detection System for Controller Area Network-Based In-Vehicle Network

    Authors: Seonghoon Jeong, Sangho Lee, Hwejae Lee, Huy Kang Kim

    Abstract: Controller Area Network (CAN) is an essential networking protocol that connects multiple electronic control units (ECUs) in a vehicle. However, CAN-based in-vehicle networks (IVNs) face security risks owing to the CAN mechanisms. An adversary can sabotage a vehicle by leveraging the security risks if they can access the CAN bus. Thus, recent actions and cybersecurity regulations (e.g., UNR 155) re… ▽ More

    Submitted 14 March, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: This is the Accepted version of an article for publication in IEEE TVT

    Journal ref: IEEE Transactions on Vehicular Technology, Vol. 73, No. 3, pp. 3230-3246, Mar. 2024

  36. arXiv:2303.11535  [pdf, other

    cs.RO cs.MA eess.SY

    Adaptive Goal Management System of Robots

    Authors: Muhammad Kazim, Michael Muldoon, Kwang-Ki K. Kim

    Abstract: This paper considers the problem of managing single or multiple robots and proposes a cloud-based robot fleet manager, Adaptive Goal Management (AGM) System, for teams of unmanned mobile robots. The AGM system uses an adaptive goal execution approach and provides a restful API for communication between single or multiple robots, enabling real-time monitoring and control. The overarching goal of AG… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  37. arXiv:2303.08140  [pdf, other

    eess.IV cs.LG physics.bio-ph

    Digital staining in optical microscopy using deep learning -- a review

    Authors: Lucas Kreiss, Shaowei Jiang, Xiang Li, Shiqi Xu, Kevin C. Zhou, Alexander Mühlberg, Kyung Chul Lee, Kanghyun Kim, Amey Chaware, Michael Ando, Laura Barisoni, Seung Ah Lee, Guoan Zheng, Kyle Lafata, Oliver Friedrich, Roarke Horstmeyer

    Abstract: Until recently, conventional biochemical staining had the undisputed status as well-established benchmark for most biomedical problems related to clinical diagnostics, fundamental research and biotechnology. Despite this role as gold-standard, staining protocols face several challenges, such as a need for extensive, manual processing of samples, substantial time delays, altered tissue homeostasis,… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Review article, 4 main Figures, 3 Tables, 2 supplementary figures

  38. arXiv:2303.04661  [pdf, other

    eess.IV cs.CV

    DULDA: Dual-domain Unsupervised Learned Descent Algorithm for PET image reconstruction

    Authors: Rui Hu, Yunmei Chen, Kyungsang Kim, Marcio Aloisio Bezerra Cavalcanti Rockenbach, Quanzheng Li, Huafeng Liu

    Abstract: Deep learning based PET image reconstruction methods have achieved promising results recently. However, most of these methods follow a supervised learning paradigm, which rely heavily on the availability of high-quality training labels. In particular, the long scanning time required and high radiation exposure associated with PET scans make obtaining this labels impractical. In this paper, we prop… ▽ More

    Submitted 9 March, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

  39. arXiv:2302.14132  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding

    Authors: Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe

    Abstract: Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow. Model compression techniques such as pruning aim to reduce the model size and computation without degradation in accuracy. Prior studies focus on the pruning of Transformers; however, speech models not only utilize a stack of Transformer blocks, but… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted at ICASSP 2023

  40. arXiv:2302.03022  [pdf, other

    cs.CV cs.RO eess.IV

    SurgT challenge: Benchmark of Soft-Tissue Trackers for Robotic Surgery

    Authors: Joao Cartucho, Alistair Weld, Samyakh Tukra, Haozheng Xu, Hiroki Matsuzaki, Taiyo Ishikawa, Minjun Kwon, Yong Eun Jang, Kwang-Ju Kim, Gwang Lee, Bizhe Bai, Lueder Kahrs, Lars Boecking, Simeon Allmendinger, Leopold Muller, Yitong Zhang, Yueming **, Sophia Bano, Francisco Vasconcelos, Wolfgang Reiter, Jonas Hajek, Bruno Silva, Estevao Lima, Joao L. Vilaca, Sandro Queiros , et al. (1 additional authors not shown)

    Abstract: This paper introduces the ``SurgT: Surgical Tracking" challenge which was organised in conjunction with MICCAI 2022. There were two purposes for the creation of this challenge: (1) the establishment of the first standardised benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated da… ▽ More

    Submitted 30 August, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  41. arXiv:2301.08351  [pdf, other

    physics.optics eess.IV physics.bio-ph

    Parallelized computational 3D video microscopy of freely moving organisms at multiple gigapixels per second

    Authors: Kevin C. Zhou, Mark Harfouche, Colin L. Cooke, Jaehee Park, Pavan C. Konda, Lucas Kreiss, Kanghyun Kim, Joakim Jönsson, Jed Doman, Paul Reamey, Veton Saliu, Clare B. Cook, Maxwell Zheng, Jack P. Bechtel, Aurélien Bègue, Matthew McCarroll, Jennifer Bagwell, Gregor Horstmeyer, Michel Bagnat, Roarke Horstmeyer

    Abstract: To study the behavior of freely moving model organisms such as zebrafish (Danio rerio) and fruit flies (Drosophila) across multiple spatial scales, it would be ideal to use a light microscope that can resolve 3D information over a wide field of view (FOV) at high speed and high spatial resolution. However, it is challenging to design an optical instrument to achieve all of these properties simulta… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

  42. arXiv:2301.05777  [pdf

    cs.LG eess.IV q-bio.TO

    Lung airway geometry as an early predictor of autism: A preliminary machine learning-based study

    Authors: Asef Islam, Anthony Ronco, Stephen M. Becker, Jeremiah Blackburn, Johannes C. Schittny, Kyoungmi Kim, Rebecca Stein-Wexler, Anthony S. Wexler

    Abstract: The goal of this study is to assess the feasibility of airway geometry as a biomarker for ASD. Chest CT images of children with a documented diagnosis of ASD as well as healthy controls were identified retrospectively. 54 scans were obtained for analysis, including 31 ASD cases and 23 age and sex-matched controls. A feature selection and classification procedure using principal component analysis… ▽ More

    Submitted 9 February, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

  43. arXiv:2212.08542  [pdf, other

    eess.AS cs.CL

    Context-aware Fine-tuning of Self-supervised Speech Models

    Authors: Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe

    Abstract: Self-supervised pre-trained transformers have improved the state of the art on a variety of speech tasks. Due to the quadratic time and space complexity of self-attention, they usually operate at the level of relatively short (e.g., utterance) segments. In this paper, we study the use of context, i.e., surrounding segments, during fine-tuning and propose a new approach called context-aware fine-tu… ▽ More

    Submitted 28 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

  44. arXiv:2212.03824  [pdf, other

    eess.SP

    Adaptive Bayesian Beamforming for Imaging by Marginalizing the Speed of Sound

    Authors: Kyurae Kim, Simon Maskell, Jason F. Ralph

    Abstract: Imaging methods based on array signal processing often require a fixed propagation speed of the medium, or speed of sound (SoS) for methods based on acoustic signals. The resolution of the images formed using these methods is strongly affected by the assumed SoS, which, due to multipath, nonlinear propagation, and non-uniform mediums, is challenging at best to select. In this letter, we propose a… ▽ More

    Submitted 8 December, 2022; v1 submitted 7 December, 2022; originally announced December 2022.

  45. arXiv:2212.00027  [pdf, other

    eess.IV physics.optics

    Imaging across multiple spatial scales with the multi-camera array microscope

    Authors: Mark Harfouche, Kanghyun Kim, Kevin C. Zhou, Pavan Chandra Konda, Sunanda Sharma, Eric E. Thomson, Colin Cooke, Shiqi Xu, Lucas Kreiss, Amey Chaware, Xi Yang, Xing Yao, Vinayak Pathak, Martin Bohlen, Ron Appel, Aurélien Bègue, Clare Cook, Jed Doman, John Efromson, Gregor Horstmeyer, Jaehee Park, Paul Reamey, Veton Saliu, Eva Naumann, Roarke Horstmeyer

    Abstract: This article experimentally examines different configurations of a novel multi-camera array microscope (MCAM) imaging technology. The MCAM is based upon a densely packed array of "micro-cameras" to jointly image across a large field-of-view at high resolution. Each micro-camera within the array images a unique area of a sample of interest, and then all acquired data with 54 micro-cameras are digit… ▽ More

    Submitted 28 February, 2023; v1 submitted 30 November, 2022; originally announced December 2022.

  46. arXiv:2211.15950  [pdf, other

    eess.IV cs.CV

    Enhanced artificial intelligence-based diagnosis using CBCT with internal denoising: Clinical validation for discrimination of fungal ball, sinusitis, and normal cases in the maxillary sinus

    Authors: Kyungsu Kim, Chae Yeon Lim, Joong Bo Shin, Myung ** Chung, Yong Gi Jung

    Abstract: The cone-beam computed tomography (CBCT) provides 3D volumetric imaging of a target with low radiation dose and cost compared with conventional computed tomography, and it is widely used in the detection of paranasal sinus disease. However, it lacks the sensitivity to detect soft tissue lesions owing to reconstruction constraints. Consequently, only physicians with expertise in CBCT reading can di… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  47. arXiv:2211.07951  [pdf, other

    cs.SD cs.LG eess.AS

    Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio

    Authors: Kyungsu Kim, Minju Park, Haesun Joung, Yunkee Chae, Yeongbeom Hong, Seonghyeon Go, Kyogu Lee

    Abstract: As digital music production has become mainstream, the selection of appropriate virtual instruments plays a crucial role in determining the quality of music. To search the musical instrument samples or virtual instruments that make one's desired sound, music producers use their ears to listen and compare each instrument sample in their collection, which is time-consuming and inefficient. In this p… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: 5 pages, 4 figures, submitted to ICASSP 2023

  48. arXiv:2211.05910  [pdf, other

    eess.IV cs.CV

    Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

    Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

  49. arXiv:2210.17143  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring Train and Test-Time Augmentations for Audio-Language Learning

    Authors: Eungbeom Kim, **hee Kim, Yoori Oh, Kyungsu Kim, Minju Park, Jaeheon Sim, **woo Lee, Kyogu Lee

    Abstract: In this paper, we aim to unveil the impact of data augmentation in audio-language multi-modal learning, which has not been explored despite its importance. We explore various augmentation methods at not only train-time but also test-time and find out that proper data augmentation can lead to substantial improvements. Specifically, applying our proposed audio-language paired augmentation PairMix, w… ▽ More

    Submitted 23 May, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures

  50. arXiv:2210.12938  [pdf

    eess.IV cs.CV

    GradMix for nuclei segmentation and classification in imbalanced pathology image datasets

    Authors: Tan Nhu Nhat Doan, Kyungeun Kim, Boram Song, ** Tae Kwak

    Abstract: An automated segmentation and classification of nuclei is an essential task in digital pathology. The current deep learning-based approaches require a vast amount of annotated datasets by pathologists. However, the existing datasets are imbalanced among different types of nuclei in general, leading to a substantial performance degradation. In this paper, we propose a simple but effective data augm… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: submitted to MICCAI2022