Skip to main content

Showing 1–50 of 166 results for author: Lee, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00888  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Papez: Resource-Efficient Speech Separation with Auditory Working Memory

    Authors: Hyunseok Oh, Juheon Yi, Youngki Lee

    Abstract: Transformer-based models recently reached state-of-the-art single-channel speech separation accuracy; However, their extreme computational load makes it difficult to deploy them in resource-constrained mobile or IoT devices. We thus present Papez, a lightweight and computation-efficient single-channel speech separation model. Papez is based on three key techniques. We first replace the inter-chunk… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 5 pages. Accepted by ICASSP 2023

  2. arXiv:2407.00762  [pdf, other

    eess.SY cs.MA

    Guarding a Target Area from a Heterogeneous Group of Cooperative Attackers

    Authors: Yoonjae Lee, Goutam Das, Daigo Shishika, Efstathios Bakolas

    Abstract: In this paper, we investigate a multi-agent target guarding problem in which a single defender seeks to capture multiple attackers aiming to reach a high-value target area. In contrast to previous studies, the attackers herein are assumed to be heterogeneous in the sense that they have not only different speeds but also different weights representing their respective degrees of importance (e.g., t… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: This is the revised version of the paper, with the same title, to be presented at American Control Conference (ACC) 2024

  3. arXiv:2406.17310  [pdf, other

    eess.AS

    High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

    Authors: Joun Yeop Lee, Myeonghun Jeong, Minchan Kim, Ji-Hyun Lee, Hoon-Young Cho, Nam Soo Kim

    Abstract: We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target v… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech2024

  4. arXiv:2406.09894  [pdf, other

    eess.AS cs.SD

    Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis

    Authors: Taewoo Kim, Choongsang Cho, Young Han Lee

    Abstract: In this paper, we present Period Singer, a novel end-to-end singing voice synthesis (SVS) model that utilizes variational inference for periodic and aperiodic components, aimed at producing natural-sounding waveforms. Recent end-to-end SVS models have demonstrated the capability of synthesizing high-fidelity singing voices. However, owing to deterministic pitch conditioning, they do not fully addr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  5. A Deep Learning-Augmented Stand-off Radar Scheme for Rapidly Detecting Tree Defects

    Authors: Jiwei Qian, Yee Hui Lee, Kaixuan Cheng, Qiqi Dai, Mohamed Lokman Mohd Yusof, Daryl Lee, Abdulkadir C. Yucel

    Abstract: Tree defect detection is crucial for the structural health screening of trees. Existing nondestructive testing (NDT) techniques for tree defect detection require time-consuming and labor-intensive measurement campaigns. This discourages their application for the routine structural health screening of whole populations of managed urban trees. To address this issue, this study proposes a deep-learni… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted and to be published in IEEE Transactions on Geoscience and Remote Sensing

  6. arXiv:2404.19167  [pdf

    eess.IV physics.med-ph

    Advancing low-field MRI with a universal denoising imaging transformer: Towards fast and high-quality imaging

    Authors: Zheren Zhu, Azaan Rehman, Xiaozhi Cao, Congyu Liao, Yoo ** Lee, Michael Ohliger, Hui Xue, Yang Yang

    Abstract: Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access. A robust denoising method to overcome the intrinsic low signal-noise-ratio (SNR) barrier is critical to the success of LF MRI. However, current data-driven MRI denoising methods predominantly handle magnitude images and rely on customized models… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  7. arXiv:2404.03991  [pdf, other

    eess.IV cs.CV cs.LG

    Towards Efficient and Accurate CT Segmentation via Edge-Preserving Probabilistic Downsampling

    Authors: Shahzad Ali, Yu Rim Lee, Soo Young Park, Won Young Tak, Soon Ki Jung

    Abstract: Downsampling images and labels, often necessitated by limited resources or to expedite network training, leads to the loss of small objects and thin boundaries. This undermines the segmentation network's capacity to interpret images accurately and predict detailed labels, resulting in diminished performance compared to processing at original resolutions. This situation exemplifies the trade-off be… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 5 pages (4 figures, 1 table); This work has been submitted to the IEEE Signal Processing Letters. Copyright may be transferred without notice, after which this version may no longer be accessible

  8. arXiv:2403.17938  [pdf, other

    cs.NE eess.SY

    Circuit-centric Genetic Algorithm (CGA) for Analog and Radio-Frequency Circuit Optimization

    Authors: Mingi Kwon, Yeonjun Lee, Ickhyun Song

    Abstract: This paper presents an automated method for optimizing parameters in analog/high-frequency circuits, aiming to maximize performance parameters of a radio-frequency (RF) receiver. The design target includes a reduction of power consumption and noise figure and an increase in conversion gain. This study investigates the use of an artificial algorithm for the optimization of a receiver, illustrating… ▽ More

    Submitted 18 November, 2023; originally announced March 2024.

    Comments: 15 pages, 6 figures, submission to Circuits, Systems and Signal Processing

  9. arXiv:2402.12412  [pdf, other

    cs.HC cs.AI cs.MM eess.SP

    Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating The Same

    Authors: Sungjun Ahn, Hyun-Jeong Yim, Youngwan Lee, Sung-Ik Park

    Abstract: This paper introduces a media service model that exploits artificial intelligence (AI) video generators at the receive end. This proposal deviates from the traditional multimedia ecosystem, completely relying on in-house production, by shifting part of the content creation onto the receiver. We bring a semantic process into the framework, allowing the distribution network to provide service elemen… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 13 pages, 7 figures

  10. arXiv:2401.16963  [pdf, ps, other

    math.OC eess.SY physics.space-ph

    Sub-Optimal Fast Fourier Series Approximation for Initial Trajectory Design

    Authors: Caleb Gunsaulus, Carl De Vries, William Brown, Youngro Lee, Madhusudan Vijayakumar, Ossama Abdelkhalik

    Abstract: The Finite Fourier Series (FFS) Shape-Based (SB) trajectory approximation method has been used to rapidly generate initial trajectories that satisfy the dynamics, trajectory boundary conditions, and limitation on maximum thrust acceleration. The FFS SB approach solves a nonlinear programming problem (NLP) in searching for feasible trajectories. This paper extends the development of the FFS SB appr… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 2021 AAS/AIAA Astrodynamics Specialist Conference, Big Sky, Virtual, August 9-11, 2021

  11. arXiv:2401.12473  [pdf, other

    eess.AS cs.SD

    Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

    Authors: Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe

    Abstract: We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. The proposed model stacks 1) a dual-path processing block that can model spectro-temporal patterns, 2) a transformer decoder-based attractor (TDA) calculation module that can deal with an unknown number of speakers, and 3) triple-path processing blocks that can model inter-speaker relations… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, accepted by ICASSP 2024

  12. arXiv:2401.03564  [pdf

    physics.optics eess.SY

    Experimental Demonstration of Imperfection-Agnostic Local Learning Rules on Photonic Neural Networks with Mach-Zehnder Interferometric Meshes

    Authors: Luis El Srouji, Mehmet Berkay On, Yun-Jhu Lee, Mahmoud Abdelghany, S. J. Ben Yoo

    Abstract: Mach-Zehnder Interferometric meshes are attractive for low-loss photonic matrix multiplication but are challenging to program. Using least-squares optimization of directional derivatives, we experimentally demonstrate that desired matrix updates can be implemented agnostic to hardware imperfections. \c{opyright} 2024 The Author(s)

    Submitted 7 January, 2024; originally announced January 2024.

  13. arXiv:2401.01498  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

    Authors: Minchan Kim, Myeonghun Jeong, Byoung ** Choi, Semin Kim, Joun Yeop Lee, Nam Soo Kim

    Abstract: We propose a novel text-to-speech (TTS) framework centered around a neural transducer. Our approach divides the whole TTS pipeline into semantic-level sequence-to-sequence (seq2seq) modeling and fine-grained acoustic modeling stages, utilizing discrete semantic tokens obtained from wav2vec2.0 embeddings. For a robust and efficient alignment modeling, we employ a neural transducer named token trans… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  14. arXiv:2401.01099  [pdf, other

    eess.AS cs.AI cs.LG

    Efficient Parallel Audio Generation using Group Masked Language Modeling

    Authors: Myeonghun Jeong, Minchan Kim, Joun Yeop Lee, Nam Soo Kim

    Abstract: We present a fast and high-quality codec language model for parallel audio generation. While SoundStorm, a state-of-the-art parallel audio generation model, accelerates inference speed compared to autoregressive models, it still suffers from slow inference due to iterative sampling. To resolve this problem, we propose Group-Masked Language Modeling~(G-MLM) and Group Iterative Parallel Decoding~(G-… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  15. arXiv:2312.05814  [pdf, other

    cs.AI cs.SD eess.AS

    Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks

    Authors: Seo-Hyun Lee, Young-Eun Lee, Soowon Kim, Byung-Kwan Ko, Jun-Young Kim, Seong-Whan Lee

    Abstract: Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication.… ▽ More

    Submitted 26 February, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: 4 pages

  16. arXiv:2311.17923  [pdf, other

    eess.AS cs.HC

    Enhanced Generative Adversarial Networks for Unseen Word Generation from EEG Signals

    Authors: Young-Eun Lee, Seo-Hyun Lee, Soowon Kim, Jung-Sun Lee, Deok-Seon Kim, Seong-Whan Lee

    Abstract: Recent advances in brain-computer interface (BCI) technology, particularly based on generative adversarial networks (GAN), have shown great promise for improving decoding performance for BCI. Within the realm of Brain-Computer Interfaces (BCI), GANs find application in addressing many areas. They serve as a valuable tool for data augmentation, which can solve the challenge of limited data availabi… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures

  17. arXiv:2311.15474  [pdf, other

    eess.SY

    Demonstration of Programmable Brain-Inspired Optoelectronic Neuron in Photonic Spiking Neural Network with Neural Heterogeneity

    Authors: Yun-Jhu Lee, Mehmet Berkay On, Luis El Srouji, Li Zhang, Mahmoud Abdelghany, S. J. Ben Yoo

    Abstract: Photonic Spiking Neural Networks (PSNN) composed of the co-integrated CMOS and photonic elements can offer low loss, low power, highly-parallel, and high-throughput computing for brain-inspired neuromorphic systems. In addition, heterogeneity of neuron dynamics can also bring greater diversity and expressivity to brain-inspired networks, potentially allowing for the implementation of complex funct… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  18. arXiv:2311.10430  [pdf, other

    eess.IV cs.CV cs.LG

    Deep Residual CNN for Multi-Class Chest Infection Diagnosis

    Authors: Ryan Donghan Kwon, Dohyun Lim, Yoonha Lee, Seung Won Lee

    Abstract: The advent of deep learning has significantly propelled the capabilities of automated medical image diagnosis, providing valuable tools and resources in the realm of healthcare and medical diagnostics. This research delves into the development and evaluation of a Deep Residual Convolutional Neural Network (CNN) for the multi-class diagnosis of chest infections, utilizing chest X-ray images. The im… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  19. arXiv:2311.10306  [pdf, other

    eess.IV cs.CV cs.LG

    MPSeg : Multi-Phase strategy for coronary artery Segmentation

    Authors: Jonghoe Ku, Yong-Hee Lee, Junsup Shin, In Kyu Lee, Hyun-Woo Kim

    Abstract: Accurate segmentation of coronary arteries is a pivotal process in assessing cardiovascular diseases. However, the intricate structure of the cardiovascular system presents significant challenges for automatic segmentation, especially when utilizing methodologies like the SYNTAX Score, which relies extensively on detailed structural information for precise risk stratification. To address these dif… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: MICCAI 2023 Conference ARCADE Challenge

  20. arXiv:2311.07487  [pdf, other

    eess.SP eess.SY

    Vertiport Navigation Requirements and Multisensor Architecture Considerations for Urban Air Mobility

    Authors: Omar Garcia Crespillo, Chen Zhu, Maximilian Simonetti, Daniel Gerbeth, Young-Hee Lee, Wenhan Hao

    Abstract: Communication, Navigation and Surveillance (CNS) technologies are key enablers for future safe operation of drones in urban environments. However, the design of navigation technologies for these new applications is more challenging compared to e.g., civil aviation. On the one hand, the use cases and operations in urban environments are expected to have stringent requirements in terms of accuracy,… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  21. arXiv:2311.05889  [pdf, other

    eess.IV cs.CV cs.LG

    Semantic Map Guided Synthesis of Wireless Capsule Endoscopy Images using Diffusion Models

    Authors: Hae** Lee, Jeongwoo Ju, Jonghyuck Lee, Yeoun Joo Lee, Heechul Jung

    Abstract: Wireless capsule endoscopy (WCE) is a non-invasive method for visualizing the gastrointestinal (GI) tract, crucial for diagnosing GI tract diseases. However, interpreting WCE results can be time-consuming and tiring. Existing studies have employed deep neural networks (DNNs) for automatic GI tract lesion detection, but acquiring sufficient training examples, particularly due to privacy concerns, r… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  22. arXiv:2310.15463  [pdf, other

    eess.SY

    Nested Control Co-design of a Spar Buoy Horizontal-axis Floating Offshore Wind Turbine

    Authors: Saeid Bayat, Yong Hoon Lee, James T. Allison

    Abstract: Floating offshore wind turbine (FOWT) systems involve several coupled physical analysis disciplines, including aeroelasticity, multi-body structural dynamics, hydrodynamics, and controls. Conventionally, physical structure (plant) and control design decisions are treated as two separate problems, and generally, control design is performed after the plant design is complete. However, this sequentia… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 21 pages, 15 figures, 5 tables

  23. arXiv:2310.14506  [pdf, other

    eess.SP cs.DB

    Label Space Partition Selection for Multi-Object Tracking Using Two-Layer Partitioning

    Authors: Ji Youn Lee, Changbeom Shim, Hoa Van Nguyen, Tran Thien Dat Nguyen, Hyun** Choi, Youngho Kim

    Abstract: Estimating the trajectories of multi-objects poses a significant challenge due to data association ambiguity, which leads to a substantial increase in computational requirements. To address such problems, a divide-and-conquer manner has been employed with parallel computation. In this strategy, distinguished objects that have unique labels are grouped based on their statistical dependencies, the i… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: 6 pages, 4 figures

  24. Open-Loop Control Co-Design of Semisubmersible Floating Offshore Wind Turbines using Linear Parameter-Varying Models

    Authors: Athul Krishna Sundarrajan, Yong Hoon Lee, James T Allison, Daniel Zalkind, Daniel Herber

    Abstract: This paper discusses a framework to design elements of the plant and control systems for floating offshore wind turbines in an integrated manner using linear parameter-varying models. Multiple linearized models derived from aeroelastic simulation software in different operating regions characterized by the incoming wind speed are combined to construct an approximate low-fidelity model of the syste… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 16 pages 47 figures

  25. arXiv:2310.08619  [pdf, ps, other

    eess.IV

    Unlocking the capabilities of explainable fewshot learning in remote sensing

    Authors: Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N Duong

    Abstract: Recent advancements have significantly improved the efficiency and effectiveness of deep learning methods for imagebased remote sensing tasks. However, the requirement for large amounts of labeled data can limit the applicability of deep neural networks to existing remote sensing datasets. To overcome this challenge, fewshot learning has emerged as a valuable approach for enabling learning with li… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: Under review, once the paper is accepted, the copyright will be transferred to the corresponding journal

  26. arXiv:2310.06546  [pdf, other

    cs.SD cs.CL eess.AS

    AutoCycle-VC: Towards Bottleneck-Independent Zero-Shot Cross-Lingual Voice Conversion

    Authors: Haeyun Choi, Jio Gim, Yuho Lee, Youngin Kim, Young-Joo Suh

    Abstract: This paper proposes a simple and robust zero-shot voice conversion system with a cycle structure and mel-spectrogram pre-processing. Previous works suffer from information loss and poor synthesis quality due to their reliance on a carefully designed bottleneck structure. Moreover, models relying solely on self-reconstruction loss struggled with reproducing different speakers' voices. To address th… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  27. arXiv:2310.04010  [pdf, other

    cs.CV cs.AI eess.IV

    Excision And Recovery: Visual Defect Obfuscation Based Self-Supervised Anomaly Detection Strategy

    Authors: YeongHyeon Park, Sungho Kang, Myung ** Kim, Yeonho Lee, Hyeong Seok Kim, Juneho Yi

    Abstract: Due to scarcity of anomaly situations in the early manufacturing stage, an unsupervised anomaly detection (UAD) approach is widely adopted which only uses normal samples for training. This approach is based on the assumption that the trained UAD model will accurately reconstruct normal patterns but struggles with unseen anomalous patterns. To enhance the UAD performance, reconstruction-by-inpainti… ▽ More

    Submitted 9 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: 10 pages, 5 figures, 5 tables

  28. arXiv:2310.03538  [pdf, other

    eess.AS

    Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis

    Authors: Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim

    Abstract: Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems by enlarging the training data through crowd-sourcing or augmenting existing speech data. However, the use of low-quality data has led to a decline in the overall system performance. To avoid such degradation, instead of directly augmenting the input data, we propose a latent filling (LF) method that adopts s… ▽ More

    Submitted 22 January, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted to ICASSP 2024

  29. arXiv:2310.02486  [pdf, other

    eess.IV cs.CV cs.LG

    OCU-Net: A Novel U-Net Architecture for Enhanced Oral Cancer Segmentation

    Authors: Ahmed Albishri, Syed Jawad Hussain Shah, Yugyung Lee, Rong Wang

    Abstract: Accurate detection of oral cancer is crucial for improving patient outcomes. However, the field faces two key challenges: the scarcity of deep learning-based image segmentation research specifically targeting oral cancer and the lack of annotated data. Our study proposes OCU-Net, a pioneering U-Net image segmentation architecture exclusively designed to detect oral cancer in hematoxylin and eosin… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  30. arXiv:2309.14967  [pdf, other

    cs.CV eess.IV

    A novel approach for holographic 3D content generation without depth map

    Authors: Hakdong Kim, Minkyu Jee, Yurim Lee, Kyudam Choi, MinSung Yoon, Cheongwon Kim

    Abstract: In preparation for observing holographic 3D content, acquiring a set of RGB color and depth map images per scene is necessary to generate computer-generated holograms (CGHs) when using the fast Fourier transform (FFT) algorithm. However, in real-world situations, these paired formats of RGB color and depth map images are not always fully available. We propose a deep learning-based method to synthe… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  31. arXiv:2309.13664  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    VoiceLDM: Text-to-Speech with Environmental Context

    Authors: Yeonghyeon Lee, Inmo Yeon, Juhan Nam, Joon Son Chung

    Abstract: This paper presents VoiceLDM, a model designed to produce audio that accurately follows two distinct natural language text prompts: the description prompt and the content prompt. The former provides information about the overall environmental context of the audio, while the latter conveys the linguistic content. To achieve this, we adopt a text-to-audio (TTA) model based on latent diffusion models… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: Demos and code are available at https://voiceldm.github.io

  32. arXiv:2309.07152  [pdf

    eess.SP physics.med-ph

    Novel Smart N95 Filtering Facepiece Respirator with Real-time Adaptive Fit Functionality and Wireless Humidity Monitoring for Enhanced Wearable Comfort

    Authors: Kangkyu Kwon, Yoon Jae Lee, Yeongju Jung, Ira Soltis, Chanyeong Choi, Yewon Na, Lissette Romero, Myung Chul Kim, Nathan Rodeheaver, Hodam Kim, Michael S. Lloyd, Ziqing Zhuang, William King, Susan Xu, Seung-Hwan Ko, **woo Lee, Woon-Hong Yeo

    Abstract: The widespread emergence of the COVID-19 pandemic has transformed our lifestyle, and facial respirators have become an essential part of daily life. Nevertheless, the current respirators possess several limitations such as poor respirator fit because they are incapable of covering diverse human facial sizes and shapes, potentially diminishing the effect of wearing respirators. In addition, the cur… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 20 pages, 5 figures, 1 table, submitted for possible publication

    MSC Class: 92C55

  33. arXiv:2309.06096  [pdf, other

    eess.AS eess.SP

    iPhonMatchNet: Zero-Shot User-Defined Keyword Spotting Using Implicit Acoustic Echo Cancellation

    Authors: Yong-Hyeok Lee, Namhyun Cho

    Abstract: In response to the increasing interest in human--machine communication across various domains, this paper introduces a novel approach called iPhonMatchNet, which addresses the challenge of barge-in scenarios, wherein user speech overlaps with device playback audio, thereby creating a self-referencing problem. The proposed model leverages implicit acoustic echo cancellation (iAEC) techniques to inc… ▽ More

    Submitted 13 December, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  34. arXiv:2309.04655  [pdf

    cs.RO cs.LG eess.SP eess.SY

    Intelligent upper-limb exoskeleton integrated with soft wearable bioelectronics and deep-learning for human intention-driven strength augmentation based on sensory feedback

    Authors: **woo Lee, Kangkyu Kwon, Ira Soltis, Jared Matthews, Yoonjae Lee, Hojoong Kim, Lissette Romero, Nathan Zavanelli, Young** Kwon, Shinjae Kwon, Jimin Lee, Yewon Na, Sung Hoon Lee, Ki Jun Yu, Minoru Shinohara, Frank L. Hammond, Woon-Hong Yeo

    Abstract: The age and stroke-associated decline in musculoskeletal strength degrades the ability to perform daily human tasks using the upper extremities. Although there are a few examples of exoskeletons, they need manual operations due to the absence of sensor feedback and no intention prediction of movements. Here, we introduce an intelligent upper-limb exoskeleton system that uses cloud-based deep learn… ▽ More

    Submitted 26 January, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: 15 pages, 6 figures, 1 table, published in npj flexible electronics journals

    MSC Class: 68T40 (Primary) 92C55; 68T99 (Secondary)

  35. arXiv:2308.16511  [pdf, other

    eess.AS cs.SD eess.SP

    PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords

    Authors: Yong-Hyeok Lee, Namhyun Cho

    Abstract: This study presents a novel zero-shot user-defined keyword spotting model that utilizes the audio-phoneme relationship of the keyword to improve performance. Unlike the previous approach that estimates at utterance level, we use both utterance and phoneme level information. Our proposed method comprises a two-stream speech encoder architecture, self-attention-based pattern extractor, and phoneme-l… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Journal ref: Proc. INTERSPEECH 2023, 3964-3968

  36. arXiv:2307.14389  [pdf, other

    eess.AS cs.CL cs.HC cs.LG

    Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG

    Authors: Soowon Kim, Young-Eun Lee, Seo-Hyun Lee, Seong-Whan Lee

    Abstract: Decoding EEG signals for imagined speech is a challenging task due to the high-dimensional nature of the data and low signal-to-noise ratio. In recent years, denoising diffusion probabilistic models (DDPMs) have emerged as promising approaches for representation learning in various domains. Our study proposes a novel method for decoding EEG signals for imagined speech using DDPMs and a conditional… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted to Interspeech 2023

    MSC Class: 68T10

  37. arXiv:2306.05682  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.IV

    Lightweight Monocular Depth Estimation via Token-Sharing Transformer

    Authors: Dong-Jae Lee, Jae Young Lee, Hyounguk Shon, Eo**dl Yi, Yeong-Hun Park, Sung-Sik Cho, Junmo Kim

    Abstract: Depth estimation is an important task in various robotics systems and applications. In mobile robotics systems, monocular depth estimation is desirable since a single RGB camera can be deployable at a low cost and compact size. Due to its significant and growing needs, many lightweight monocular depth estimation networks have been proposed for mobile robotics systems. While most lightweight monocu… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: ICRA 2023

  38. arXiv:2305.18775  [pdf

    eess.SP

    A Depth-Adaptive Filtering Method for Effective GPR Tree Roots Detection in Tropical Area

    Authors: Wenhao Luo, Yee Hui Lee, Mohamed Lokman Mohd Yusof, Abdulkadir C. Yucel

    Abstract: This study presents a technique for processing Stepfrequency continuous wave (SFCW) ground penetrating radar (GPR) data to detect tree roots. SFCW GPR is portable and enables precise control of energy levels, balancing depth and resolution trade-offs. However, the high-frequency components of the transmission band suffers from poor penetrating capability and generates noise that interferes with ro… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 10 pages, 12 figures, Accepted by IEEE TIM

  39. arXiv:2305.05532  [pdf, other

    eess.SP cs.AI cs.LG stat.AP stat.ML

    An ensemble of convolution-based methods for fault detection using vibration signals

    Authors: Xian Yeow Lee, Aman Kumar, Lasitha Vidyaratne, Aniruddha Rajendra Rao, Ahmed Farahat, Chetan Gupta

    Abstract: This paper focuses on solving a fault detection problem using multivariate time series of vibration signals collected from planetary gearboxes in a test rig. Various traditional machine learning and deep learning methods have been proposed for multivariate time-series classification, including distance-based, functional data-oriented, feature-driven, and convolution kernel-based methods. Recent st… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: 12 Pages, 9 Figures, 2 Tables. Accepted at ICPHM 2023

    Journal ref: 2023 IEEE International Conference on Prognostics and Health Management (ICPHM)

  40. 3DInvNet: A Deep Learning-Based 3D Ground-Penetrating Radar Data Inversion

    Authors: Qiqi Dai, Yee Hui Lee, Hai-Han Sun, Genevieve Ow, Mohamed Lokman Mohd Yusof, Abdulkadir C. Yucel

    Abstract: The reconstruction of the 3D permittivity map from ground-penetrating radar (GPR) data is of great importance for map** subsurface environments and inspecting underground structural integrity. Traditional iterative 3D reconstruction algorithms suffer from strong non-linearity, ill-posedness, and high computational cost. To tackle these issues, a 3D deep learning scheme, called 3DInvNet, is propo… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  41. WATT-EffNet: A Lightweight and Accurate Model for Classifying Aerial Disaster Images

    Authors: Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N. Duong

    Abstract: Incorporating deep learning (DL) classification models into unmanned aerial vehicles (UAVs) can significantly augment search-and-rescue operations and disaster management efforts. In such critical situations, the UAV's ability to promptly comprehend the crisis and optimally utilize its limited power and processing resources to narrow down search areas is crucial. Therefore, develo** an efficient… ▽ More

    Submitted 1 May, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: This paper is accepted in IEEE Trans. GRSL

  42. arXiv:2304.08707  [pdf, other

    eess.AS cs.SD

    Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

    Authors: Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

    Abstract: We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. The model maintains an information highway to flow an over-complete input representation through multiple FSB-LSTM modules. Each FSB-LSTM module consists of a full-band bl… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: in ICASSP 2023

  43. arXiv:2303.10770  [pdf, other

    cs.CV cs.AI eess.IV

    RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network

    Authors: Sangmin Yoo, Eric Yeu-Jer Lee, Ziyu Wang, Xinxin Wang, Wei D. Lu

    Abstract: Event-based cameras are inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the event data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks that are expensive to train. In this work, we propose a neural network architecture, Reservoir Nodes-enabled neuromorphic vision… ▽ More

    Submitted 24 May, 2024; v1 submitted 19 March, 2023; originally announced March 2023.

    Comments: 12 pages, 5 figures, 4 tables

  44. arXiv:2303.08329  [pdf, other

    cs.SD cs.CL eess.AS

    Cross-speaker Emotion Transfer by Manipulating Speech Style Latents

    Authors: Suhee Jo, Younggun Lee, Yookyung Shin, Yeongtae Hwang, Taesu Kim

    Abstract: In recent years, emotional text-to-speech has shown considerable progress. However, it requires a large amount of labeled data, which is not easily accessible. Even if it is possible to acquire an emotional speech dataset, there is still a limitation in controlling emotion intensity. In this work, we propose a novel method for cross-speaker emotion transfer and manipulation using vector arithmetic… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: accepted to ICASSP 2023

  45. arXiv:2302.14624  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    The 2022 NIST Language Recognition Evaluation

    Authors: Yooyoung Lee, Craig Greenberg, Eliot Godard, Asad A. Butt, Elliot Singer, Trang Nguyen, Lisa Mason, Douglas Reynolds

    Abstract: In 2022, the U.S. National Institute of Standards and Technology (NIST) conducted the latest Language Recognition Evaluation (LRE) in an ongoing series administered by NIST since 1996 to foster research in language recognition and to measure state-of-the-art technology. Similar to previous LREs, LRE22 focused on conversational telephone speech (CTS) and broadcast narrowband speech (BNBS) data. LRE… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: 5 pages, 10 figures

  46. arXiv:2302.14273  [pdf, other

    cs.RO eess.SY

    QP Chaser: Polynomial Trajectory Generation for Autonomous Aerial Tracking

    Authors: Yunwoo Lee, Jungwon Park, Seungwoo Jung, Boseong Jeon, Dahyun Oh, H. ** Kim

    Abstract: Maintaining the visibility of the targets is one of the major objectives of aerial tracking applications. This paper proposes QP Chaser, a trajectory planning pipeline that can enhance the visibility of single- and dual-target in both static and dynamic environments. As the name suggests, the proposed planner generates a target-visible trajectory via quadratic programming problems. First, the pred… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: 15 pages, 13 figures

  47. Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow

    Authors: Yoonhyung Lee, **hyeok Yang, Kyomin Jung

    Abstract: There are two types of methods for non-autoregressive text-to-speech models to learn the one-to-many relationship between text and speech effectively. The first one is to use an advanced generative framework such as normalizing flow (NF). The second one is to use variance information such as pitch or energy together when generating speech. For the second type, it is also possible to control the va… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

    Comments: Accepted for ICASSP 2022

  48. arXiv:2302.12172  [pdf, other

    eess.IV cs.CV cs.LG

    Vision-Language Generative Model for View-Specific Chest X-ray Generation

    Authors: Hyungyung Lee, Da Young Lee, Wonjae Kim, **-Hwa Kim, Tackeun Kim, Jihang Kim, Leonard Sunwoo, Edward Choi

    Abstract: Synthetic medical data generation has opened up new possibilities in the healthcare domain, offering a powerful tool for simulating clinical scenarios, enhancing diagnostic and treatment quality, gaining granular medical knowledge, and accelerating the development of unbiased algorithms. In this context, we present a novel approach called ViewXGen, designed to overcome the limitations of existing… ▽ More

    Submitted 29 April, 2024; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: Accepted at CHIL 2024

  49. arXiv:2302.01738  [pdf, other

    eess.IV cs.LG

    AIROGS: Artificial Intelligence for RObust Glaucoma Screening Challenge

    Authors: Coen de Vente, Koenraad A. Vermeer, Nicolas Jaccard, He Wang, Hongyi Sun, Firas Khader, Daniel Truhn, Temirgali Aimyshev, Yerkebulan Zhanibekuly, Tien-Dung Le, Adrian Galdran, Miguel Ángel González Ballester, Gustavo Carneiro, Devika R G, Hrishikesh P S, Densen Puthussery, Hong Liu, Zekang Yang, Satoshi Kondo, Satoshi Kasai, Edward Wang, Ashritha Durvasula, Jónathan Heras, Miguel Ángel Zapata, Teresa Araújo , et al. (11 additional authors not shown)

    Abstract: The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios… ▽ More

    Submitted 10 February, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: 19 pages, 8 figures, 3 tables

  50. arXiv:2301.13173  [pdf, other

    cs.CV eess.IV

    Shape-aware Text-driven Layered Video Editing

    Authors: Yao-Chih Lee, Ji-Ze Genevieve Jang, Yi-Ting Chen, Elizabeth Qiu, Jia-Bin Huang

    Abstract: Temporal consistency is essential for video editing applications. Existing work on layered representation of videos allows propagating edits consistently to each frame. These methods, however, can only edit object appearance rather than object shape changes due to the limitation of using a fixed UV map** field for texture atlas. We present a shape-aware, text-driven video editing method to tackl… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Project page: https://text-video-edit.github.io/