Skip to main content

Showing 1–50 of 68 results for author: Lee, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.15723  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment

    Authors: Hee** Do, Wonjun Lee, Gary Geunbae Lee

    Abstract: In automated pronunciation assessment, recent emphasis progressively lies on evaluating multiple aspects to provide enriched feedback. However, acquiring multi-aspect-score labeled data for non-native language learners' speech poses challenges; moreover, it often leads to score-imbalanced distributions. In this paper, we propose two Acoustic Feature Mixup strategies, linearly and non-linearly inte… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  2. arXiv:2406.13935  [pdf, other

    eess.AS cs.AI cs.SD

    CONMOD: Controllable Neural Frame-based Modulation Effects

    Authors: Gyubin Lee, Hounsu Kim, Junwon Lee, Juhan Nam

    Abstract: Deep learning models have seen widespread use in modelling LFO-driven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single blac… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2406.06650  [pdf, other

    eess.IV cs.CV

    Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images

    Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

    Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages, 7 figures

  4. arXiv:2404.02592  [pdf

    cs.CL cs.SD eess.AS

    Leveraging the Interplay Between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation

    Authors: Ye** Jeon, Yunsu Kim, Gary Geunbae Lee

    Abstract: Contemporary neural speech synthesis models have indeed demonstrated remarkable proficiency in synthetic speech generation as they have attained a level of quality comparable to that of human-produced speech. Nevertheless, it is important to note that these achievements have predominantly been verified within the context of high-resource languages such as English. Furthermore, the Tacotron and Fas… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024

  5. arXiv:2403.04111  [pdf

    cs.SD eess.AS

    Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication

    Authors: Ye** Jeon, Gary Geunbae Lee

    Abstract: This paper explores the task of language-agnostic speaker replication, a novel endeavor that seeks to replicate a speaker's voice irrespective of the language they are speaking. Towards this end, we introduce a multi-level attention aggregation approach that systematically probes and amplifies various speaker-specific attributes in a hierarchical manner. Through rigorous evaluations across a wide… ▽ More

    Submitted 3 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted to EACL Main 2024

  6. arXiv:2402.00325  [pdf

    eess.SY

    Using digital twins for managing change in complex projects

    Authors: Jennifer Whyte, Ranjith Soman, Rafael Sacks, Neda Mohammadi, Nader Naderpajouh, Wei-Ting Hong, Ghang Lee

    Abstract: Complex systems are not entirely decomposable, hence interdependences arise at the interfaces in complex projects. When changes occur, significant risks arise at these interfaces as it is hard to identify, manage and visualise the systemic consequences of changes. Particularly problematic are the interfaces in which there are multiple interdependencies, which occur where the boundaries between des… ▽ More

    Submitted 30 May, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: 11 pages, 5 figures

  7. arXiv:2401.13146  [pdf, other

    eess.AS cs.CL cs.SD

    Locality enhanced dynamic biasing and sampling strategies for contextual ASR

    Authors: Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  8. arXiv:2401.12085  [pdf, other

    eess.AS cs.SD

    Consistency Based Unsupervised Self-training For ASR Personalisation

    Authors: Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  9. arXiv:2401.11429  [pdf, ps, other

    cs.IT eess.SP

    Joint Downlink and Uplink Optimization for RIS-Aided FDD MIMO Communication Systems

    Authors: Gyoseung Lee, Hyeongtaek Lee, Donghwan Kim, Jaehoon Chung, A. Lee. Swindlehurst, Junil Choi

    Abstract: This paper investigates reconfigurable intelligent surface (RIS)-aided frequency division duplexing (FDD) communication systems. Since the downlink and uplink signals are simultaneously transmitted in FDD, the phase shifts at the RIS should be designed to support both transmissions. Considering a single-user multiple-input multiple-output system, we formulate a weighted sum-rate maximization probl… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE Transactions on Wireless Communications

  10. arXiv:2401.02014  [pdf, other

    cs.SD eess.AS

    Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations

    Authors: Ye** Jeon, Yunsu Kim, Gary Geunbae Lee

    Abstract: Zero-shot multi-speaker TTS aims to synthesize speech with the voice of a chosen target speaker without any fine-tuning. Prevailing methods, however, encounter limitations at adapting to new speakers of out-of-domain settings, primarily due to inadequate speaker disentanglement and content leakage. To overcome these constraints, we propose an innovative negation feature learning paradigm that mode… ▽ More

    Submitted 5 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted to AAAI 2024

  11. arXiv:2312.03312  [pdf, other

    cs.CL cs.SD eess.AS

    Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation

    Authors: Wonjun Lee, Gary Geunbae Lee, Yunsu Kim

    Abstract: This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. Our approach optimizes these two stages to improve speech recognition across languages. We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics, thus improving recognition accuracy. A… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 8 pages, ASRU 2023 Accepted

  12. arXiv:2312.01842  [pdf, other

    cs.SD cs.AI eess.AS

    Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking

    Authors: Jihyun Lee, Ye** Jeon, Wonjun Lee, Yunsu Kim, Gary Geunbae Lee

    Abstract: Dialogue state tracking plays a crucial role in extracting information in task-oriented dialogue systems. However, preceding research are limited to textual modalities, primarily due to the shortage of authentic human audio datasets. We address this by investigating synthetic audio data for audio-based DST. To this end, we develop cascading and end-to-end models, train them with our synthetic audi… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted in ASRU 2023

  13. arXiv:2310.08619  [pdf, ps, other

    eess.IV

    Unlocking the capabilities of explainable fewshot learning in remote sensing

    Authors: Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N Duong

    Abstract: Recent advancements have significantly improved the efficiency and effectiveness of deep learning methods for imagebased remote sensing tasks. However, the requirement for large amounts of labeled data can limit the applicability of deep neural networks to existing remote sensing datasets. To overcome this challenge, fewshot learning has emerged as a valuable approach for enabling learning with li… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: Under review, once the paper is accepted, the copyright will be transferred to the corresponding journal

  14. arXiv:2308.06332  [pdf, other

    eess.IV cs.CV

    Revolutionizing Space Health (Swin-FSR): Advancing Super-Resolution of Fundus Images for SANS Visual Assessment Technology

    Authors: Khondker Fariha Hossain, Sharif Amit Kamran, Joshua Ong, Andrew G. Lee, Alireza Tavakkoli

    Abstract: The rapid accessibility of portable and affordable retinal imaging devices has made early differential diagnosis easier. For example, color funduscopy imaging is readily available in remote villages, which can help to identify diseases like age-related macular degeneration (AMD), glaucoma, or pathological myopia (PM). On the other hand, astronauts at the International Space Station utilize this ca… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: Accepted in 26th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2023

  15. arXiv:2308.05864  [pdf, other

    eess.IV cs.CV cs.LG q-bio.QM

    The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

    Authors: Jun Ma, Ronald Xie, Shamini Ayyadhury, Cheng Ge, Anubha Gupta, Ritu Gupta, Song Gu, Yao Zhang, Gihun Lee, Joonkee Kim, Wei Lou, Haofeng Li, Eric Upschulte, Timo Dickscheid, José Guilherme de Almeida, Yixin Wang, Lin Han, Xin Yang, Marco Labagnara, Vojislav Gligorovski, Maxime Scheder, Sahand Jamal Rahi, Carly Kempster, Alice Pollitt, Leon Espinosa , et al. (15 additional authors not shown)

    Abstract: Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diver… ▽ More

    Submitted 1 April, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: NeurIPS22 Cell Segmentation Challenge: https://neurips22-cellseg.grand-challenge.org/ . Nature Methods (2024)

  16. arXiv:2306.15681  [pdf, other

    q-bio.QM cs.LG eess.SP

    ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram

    Authors: Jungwoo Oh, Gyubok Lee, Seongsu Bae, Joon-myoung Kwon, Edward Choi

    Abstract: Question answering (QA) in the field of healthcare has received much attention due to significant advancements in natural language processing. However, existing healthcare QA datasets primarily focus on medical images, clinical notes, or structured electronic health record tables. This leaves the vast potential of combining electrocardiogram (ECG) data with these systems largely untapped. To addre… ▽ More

    Submitted 10 October, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS 2023 Datasets and Benchmarks Track (10 pages for main text, 2 pages for references, 28 pages for supplementary materials)

  17. arXiv:2306.14411  [pdf, other

    cs.LG eess.SP

    Score-based Source Separation with Applications to Digital Communication Signals

    Authors: Tejas Jayashankar, Gary C. F. Lee, Alejandro Lancho, Amir Weiss, Yury Polyanskiy, Gregory W. Wornell

    Abstract: We propose a new method for separating superimposed sources using diffusion-based generative models. Our method relies only on separately trained statistical priors of independent sources to establish a new objective function guided by maximum a posteriori estimation with an $α$-posterior, across multiple levels of Gaussian smoothing. Motivated by applications in radio-frequency (RF) systems, we a… ▽ More

    Submitted 17 January, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: 34 pages, 18 figures, for associated project webpage see https://alpha-rgs.github.io

  18. Score-balanced Loss for Multi-aspect Pronunciation Assessment

    Authors: Hee** Do, Yunsu Kim, Gary Geunbae Lee

    Abstract: With rapid technological growth, automatic pronunciation assessment has transitioned toward systems that evaluate pronunciation in various aspects, such as fluency and stress. However, despite the highly imbalanced score labels within each aspect, existing studies have rarely tackled the data imbalance problem. In this paper, we suggest a novel loss function, score-balanced loss, to address the pr… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted at Interspeech 2023

  19. WATT-EffNet: A Lightweight and Accurate Model for Classifying Aerial Disaster Images

    Authors: Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N. Duong

    Abstract: Incorporating deep learning (DL) classification models into unmanned aerial vehicles (UAVs) can significantly augment search-and-rescue operations and disaster management efforts. In such critical situations, the UAV's ability to promptly comprehend the crisis and optimally utilize its limited power and processing resources to narrow down search areas is crucial. Therefore, develo** an efficient… ▽ More

    Submitted 1 May, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: This paper is accepted in IEEE Trans. GRSL

  20. arXiv:2304.08983  [pdf, other

    eess.SY

    Complexity reduction for resilient state estimation of uniformly observable nonlinear systems

    Authors: Junsoo Kim, ** Gyu Lee, Henrik Sandberg, Karl H. Johansson

    Abstract: A resilient state estimation scheme for uniformly observable nonlinear systems, based on a method for local identification of sensor attacks, is presented. The estimation problem is combinatorial in nature, and so many methods require substantial computational and storage resources as the number of sensors increases. To reduce the complexity, the proposed method performs the attack identification… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: 12 pages, 4 figures, submitted to IEEE Transactions on Automatic Control

  21. On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals

    Authors: Gary C. F. Lee, Amir Weiss, Alejandro Lancho, Yury Polyanskiy, Gregory W. Wornell

    Abstract: We study the single-channel source separation problem involving orthogonal frequency-division multiplexing (OFDM) signals, which are ubiquitous in many modern-day digital communication systems. Related efforts have been pursued in monaural source separation, where state-of-the-art neural architectures have been adopted to train an end-to-end separator for audio signals (as 1-dimensional time serie… ▽ More

    Submitted 15 March, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

  22. Channel Estimation for Reconfigurable Intelligent Surface with a few Active Elements

    Authors: Gyoseung Lee, Hyeongtaek Lee, Jaeky Oh, Jaehoon Chung, Junil Choi

    Abstract: In this paper, a channel estimation technique for reconfigurable intelligent surface (RIS)-aided multi-user multiple-input single-output communication systems is proposed. By deploying a small number of active elements at the RIS, the RIS can receive and process the training signals. Through the partial channel state information (CSI) obtained from the active elements, the overall training overhea… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Comments: Accepted to IEEE Transactions on Vehicular Technology

    Journal ref: IEEE Transactions on Vehicular Technology, early access, Feb. 03, 2023

  23. arXiv:2302.03022  [pdf, other

    cs.CV cs.RO eess.IV

    SurgT challenge: Benchmark of Soft-Tissue Trackers for Robotic Surgery

    Authors: Joao Cartucho, Alistair Weld, Samyakh Tukra, Haozheng Xu, Hiroki Matsuzaki, Taiyo Ishikawa, Minjun Kwon, Yong Eun Jang, Kwang-Ju Kim, Gwang Lee, Bizhe Bai, Lueder Kahrs, Lars Boecking, Simeon Allmendinger, Leopold Muller, Yitong Zhang, Yueming **, Sophia Bano, Francisco Vasconcelos, Wolfgang Reiter, Jonas Hajek, Bruno Silva, Estevao Lima, Joao L. Vilaca, Sandro Queiros , et al. (1 additional authors not shown)

    Abstract: This paper introduces the ``SurgT: Surgical Tracking" challenge which was organised in conjunction with MICCAI 2022. There were two purposes for the creation of this challenge: (1) the establishment of the first standardised benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated da… ▽ More

    Submitted 30 August, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  24. Hierarchical Pronunciation Assessment with Multi-Aspect Attention

    Authors: Hee** Do, Yunsu Kim, Gary Geunbae Lee

    Abstract: Automatic pronunciation assessment is a major component of a computer-assisted pronunciation training system. To provide in-depth feedback, scoring pronunciation at various levels of granularity such as phoneme, word, and utterance, with diverse aspects such as accuracy, fluency, and completeness, is essential. However, existing multi-aspect multi-granularity methods simultaneously predict all asp… ▽ More

    Submitted 26 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted at ICASSP 2023

  25. arXiv:2211.03371  [pdf, other

    cs.SD eess.AS

    Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words

    Authors: Taesu Kim, SeungHeon Doh, Gyunpyo Lee, Hyungseok Jeon, Juhan Nam, Hyeon-Jeong Suk

    Abstract: Wake-up words (WUW) is a short sentence used to activate a speech recognition system to receive the user's speech input. WUW utterances include not only the lexical information for waking up the system but also non-lexical information such as speaker identity or emotion. In particular, recognizing the user's emotional state may elaborate the voice communication. However, there is few dataset where… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2022

  26. arXiv:2210.08218  [pdf

    cs.IT eess.SP

    Massive MIMO Evolution Towards 3GPP Release 18

    Authors: Huang** **, Kunpeng Liu, Gilwon Lee, Emad J. Farag, Min Zhang, Dalin Zhu, Leiming Zhang, Eko Onggosanusi, Mansoor Shafi, Harsh Tataria

    Abstract: Since the introduction of fifth-generation new radio (5G-NR) in Third Generation Partnership Project (3GPP) Release 15, swift progress has been made to evolve 5G with 3GPP Release 18 emerging. A critical aspect is the design of massive multiple-input multiple-output (MIMO) technology. In this line, this paper makes several important contributions: We provide a comprehensive overview of the evoluti… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

    Comments: 23 pages, 37 Figures, one fig in the annex

  27. A Design Method of Distributed Algorithms via Discrete-time Blended Dynamics Theorem

    Authors: Jeong Woo Kim, ** Gyu Lee, Donggil Lee, Hyungbo Shim

    Abstract: We develop a discrete-time version of the blended dynamics theorem for the use of designing distributed computation algorithms. The blended dynamics theorem enables to predict the behavior of heterogeneous multi-agent systems. Therefore, once we get a blended dynamics for a particular computational task, design idea of node dynamics for individual heterogeneous agents can easily occur. In the cont… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Journal ref: Automatica, vol. 159, pp. 111371, Jan 2024

  28. arXiv:2210.00263  [pdf, other

    eess.AS cs.LG cs.SD

    Fine-tuning Wav2vec for Vocal-burst Emotion Recognition

    Authors: Dang-Khanh Nguyen, Sudarshan Pant, Ngoc-Huynh Ho, Guee-Sang Lee, Soo-Huyng Kim, Hyung-Jeong Yang

    Abstract: The ACII Affective Vocal Bursts (A-VB) competition introduces a new topic in affective computing, which is understanding emotional expression using the non-verbal sound of humans. We are familiar with emotion recognition via verbal vocal or facial expression. However, the vocal bursts such as laughs, cries, and signs, are not exploited even though they are very informative for behavior analysis. T… ▽ More

    Submitted 1 October, 2022; originally announced October 2022.

  29. arXiv:2209.07629  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

    Authors: Dang-Linh Trinh, Minh-Cong Vo, Guee-Sang Lee

    Abstract: The technical report presents our emotion recognition pipeline for high-dimensional emotion task (A-VB High) in The ACII Affective Vocal Bursts (A-VB) 2022 Workshop \& Competition. Our proposed method contains three stages. Firstly, we extract the latent features from the raw audio signal and its Mel-spectrogram by self-supervised learning methods. Then, the features from the raw signal are fed to… ▽ More

    Submitted 26 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

  30. Data-Driven Blind Synchronization and Interference Rejection for Digital Communication Signals

    Authors: Alejandro Lancho, Amir Weiss, Gary C. F. Lee, Jennifer Tang, Yuheng Bu, Yury Polyanskiy, Gregory W. Wornell

    Abstract: We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture. In particular, we assume knowledge on the generation process of one of the signals, dubbed signal of interest (SOI), and no knowledge on the generation process of the second signal, referred to as interference. This form of the single-channel source separati… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

    Comments: 9 pages, 6 figures, accepted at IEEE GLOBECOM 2022 (this version contains extended proofs)

  31. arXiv:2209.04440  [pdf, other

    eess.SY

    Open-loop contraction design

    Authors: ** Gyu Lee, Thiago B. Burghi, Rodolphe Sepulchre

    Abstract: Given a non-contracting trajectory of a nonlinear system, we consider the question of designing an input perturbation that makes the perturbed trajectory contracting. This paper stresses the analogy of this question with the classical question of feedback stabilization. In particular, it is shown that the existence of an output variable that ensures contraction of the inverse system facilitates th… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

    Comments: 7 pages, 6 figures

  32. Exploiting Temporal Structures of Cyclostationary Signals for Data-Driven Single-Channel Source Separation

    Authors: Gary C. F. Lee, Amir Weiss, Alejandro Lancho, Jennifer Tang, Yuheng Bu, Yury Polyanskiy, Gregory W. Wornell

    Abstract: We study the problem of single-channel source separation (SCSS), and focus on cyclostationary signals, which are particularly suitable in a variety of application domains. Unlike classical SCSS approaches, we consider a setting where only examples of the sources are available rather than their models, inspiring a data-driven approach. For source models with underlying cyclostationary Gaussian cons… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

  33. Rapid and robust synchronization via weak synaptic coupling Extended arXiv version

    Authors: ** Gyu Lee, Rodolphe Sepulchre

    Abstract: This paper examines how weak synaptic coupling can achieve rapid synchronization in heterogeneous networks. The assumptions aim at capturing the key mathematical properties that make this possible for biophysical networks. In particular, the combination of nodal excitability and synaptic coupling are shown to be essential to the phenomenon.

    Submitted 17 October, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: 20 pages, 3 figures

    Journal ref: Automatica, vol. 160, pp. 111416, Feb 2024

  34. arXiv:2207.00003  [pdf, other

    cs.LG cs.CV eess.IV

    A Multi-stage Framework with Mean Subspace Computation and Recursive Feedback for Online Unsupervised Domain Adaptation

    Authors: Jihoon Moon, Debasmit Das, C. S. George Lee

    Abstract: In this paper, we address the Online Unsupervised Domain Adaptation (OUDA) problem and propose a novel multi-stage framework to solve real-world situations when the target data are unlabeled and arriving online sequentially in batches. To project the data from the source and the target domains to a common subspace and manipulate the projected data in real-time, our proposed framework institutes a… ▽ More

    Submitted 23 June, 2022; originally announced July 2022.

  35. arXiv:2206.11558  [pdf, other

    eess.AS cs.SD

    Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis

    Authors: Tae-Woo Kim, Min-Su Kang, Gyeong-Hoon Lee

    Abstract: Recently, deep learning-based generative models have been introduced to generate singing voices. One approach is to predict the parametric vocoder features consisting of explicit speech parameters. This approach has the advantage that the meaning of each feature is explicitly distinguished. Another approach is to predict mel-spectrograms for a neural vocoder. However, parametric vocoders have limi… ▽ More

    Submitted 13 June, 2024; v1 submitted 23 June, 2022; originally announced June 2022.

    Comments: Accepted by Interspeech 2022

  36. arXiv:2203.03166  [pdf

    eess.AS cs.SD eess.SP

    HRTF measurement for accurate sound localization cues

    Authors: Gyeong-Tae Lee, Sang-Min Choi, Byeong-Yun Ko, Yong-Hwa Park

    Abstract: A new database of head-related transfer functions (HRTFs) for accurate sound source localization is presented through precise measurement and post-processing in terms of improved frequency bandwidth and causality of head-related impulse responses (HRIRs) for accurate spectral cue (SC) and interaural time difference (ITD), respectively. The improvement effects of the proposed methods on binaural so… ▽ More

    Submitted 5 April, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: 39 pages, 27 figures, and 1 table

  37. arXiv:2202.10456  [pdf, other

    cs.LG cs.CR cs.CV eess.IV

    Feasibility Study of Multi-Site Split Learning for Privacy-Preserving Medical Systems under Data Imbalance Constraints in COVID-19, X-Ray, and Cholesterol Dataset

    Authors: Yoo Jeong Ha, Gusang Lee, Minjae Yoo, Soyi Jung, Seehwan Yoo, Joongheon Kim

    Abstract: It seems as though progressively more people are in the race to upload content, data, and information online; and hospitals haven't neglected this trend either. Hospitals are now at the forefront for multi-site medical data sharing to provide groundbreaking advancements in the way health records are shared and patients are diagnosed. Sharing of medical data is essential in modern medical research.… ▽ More

    Submitted 20 February, 2022; originally announced February 2022.

  38. arXiv:2202.02759  [pdf, other

    eess.SY

    Node-wise monotone barrier coupling law for formation control

    Authors: ** Gyu Lee, Cyrus Mostajeran, Graham Van Goffrier

    Abstract: We study a node-wise monotone barrier coupling law, motivated by the synaptic coupling of neural central pattern generators. It is illustrated that this coupling imitates the desirable properties of neural central pattern generators. In particular, the coupling law 1) allows us to assign multiple central patterns on the circle and 2) allows for rapid switching between different patterns via simple… ▽ More

    Submitted 1 February, 2024; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: 25 pages, 8 figures

    MSC Class: 93A16; 93B51; 34D06; 34D45; 37E10

  39. arXiv:2112.12343  [pdf, other

    cs.SD eess.AS

    Graph attentive feature aggregation for text-independent speaker verification

    Authors: Hye-** Shim, Jungwoo Heo, Jae-han Park, Ga-hui Lee, Ha-** Yu

    Abstract: The objective of this paper is to combine multiple frame-level features into a single utterance-level representation considering pairwise relationship. For this purpose, we propose a novel graph attentive feature aggregation module by interpreting each frame-level feature as a node of a graph. The inter-relationship between all possible pairs of features, typically exploited indirectly, can be dir… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: 5 pages, 1 figure, 6 tables, submitted to ICASSP 2022

  40. Edge-wise funnel output synchronization of heterogeneous agents with relative degree one

    Authors: ** Gyu Lee, Thomas Berger, Stephan Trenn, Hyungbo Shim

    Abstract: When a group of heterogeneous node dynamics are diffusively coupled with a high coupling gain, the group exhibits a collective emergent behavior which is governed by a simple algebraic average of the node dynamics called the blended dynamics. This finding has been utilized for designing heterogeneous multi-agent systems by building the desired blended dynamics first and then splitting it into the… ▽ More

    Submitted 16 January, 2023; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: 14 pages, 3 figures

    Journal ref: Automatica, vol. 156, pp. 111204, Oct 2023

  41. arXiv:2108.10147  [pdf, other

    cs.LG cs.AI eess.IV

    Spatio-Temporal Split Learning for Privacy-Preserving Medical Platforms: Case Studies with COVID-19 CT, X-Ray, and Cholesterol Data

    Authors: Yoo Jeong Ha, Minjae Yoo, Gusang Lee, Soyi Jung, Sae Won Choi, Joongheon Kim, Seehwan Yoo

    Abstract: Machine learning requires a large volume of sample data, especially when it is used in high-accuracy medical applications. However, patient records are one of the most sensitive private information that is not usually shared among institutes. This paper presents spatio-temporal split learning, a distributed deep neural network framework, which is a turning point in allowing collaboration among pri… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

  42. Deep learning based cough detection camera using enhanced features

    Authors: Gyeong-Tae Lee, Hyeonuk Nam, Seong-Hu Kim, Sang-Min Choi, Youngkey Kim, Yong-Hwa Park

    Abstract: Coughing is a typical symptom of COVID-19. To detect and localize coughing sounds remotely, a convolutional neural network (CNN) based deep learning model was developed in this work and integrated with a sound camera for the visualization of the cough sounds. The cough detection model is a binary classifier of which the input is a two second acoustic feature and the output is one of two inferences… ▽ More

    Submitted 24 May, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 28 pages, 20 figures, and 14 tables

    Journal ref: Expert Systems With Applications, Vol. 206, No. 15, pp. 1-20, 2022

  43. arXiv:2107.04526  [pdf, ps, other

    cs.NI eess.SY

    A Dual-Connection based Handover Scheme for Ultra-Dense Millimeter-Wave Cellular Networks

    Authors: Seongjoon Kang, Siyoung Choi, Goodsol Lee, Saewoong Bahk

    Abstract: Mobile users in an ultra-dense millimeter-wave cellular network experience handover events more frequently than in conventional networks, which results in increased service interruption time and performance degradation due to blockages. Multi-connectivity has been proposed to resolve this, and it also extends the coverage of millimeter-wave communications. In this paper, we propose a dual-connecti… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  44. arXiv:2107.03649  [pdf

    eess.AS cs.SD

    Heavily Augmented Sound Event Detection utilizing Weak Predictions

    Authors: Hyeonuk Nam, Byeong-Yun Ko, Gyeong-Tae Lee, Seong-Hu Kim, Won-Ho Jung, Sang-Min Choi, Yong-Hwa Park

    Abstract: The performances of Sound Event Detection (SED) systems are greatly limited by the difficulty in generating large strongly labeled dataset. In this work, we used two main approaches to overcome the lack of strongly labeled data. First, we applied heavy data augmentation on input features. Data augmentation methods used include not only conventional methods used in speech/audio domains but also our… ▽ More

    Submitted 14 September, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: Won 3rd place on IEEE DCASE 2021 Task 4

  45. arXiv:2106.15205  [pdf, other

    eess.AS cs.SD

    N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement

    Authors: Gyeong-Hoon Lee, Tae-Woo Kim, Hanbin Bae, Min-Ji Lee, Young-Ik Kim, Hoon-Young Cho

    Abstract: Recently, end-to-end Korean singing voice systems have been designed to generate realistic singing voices. However, these systems still suffer from a lack of robustness in terms of pronunciation accuracy. In this paper, we propose N-Singer, a non-autoregressive Korean singing voice system, to synthesize accurate and pronounced Korean singing voices in parallel. N-Singer consists of a Transformer-b… ▽ More

    Submitted 21 February, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: Accepted to INTERSPEECH 2021

  46. arXiv:2106.09899  [pdf, other

    eess.SY

    Networks obtained by Implicit-Explicit Method: Discrete-time distributed median solver

    Authors: ** Gyu Lee

    Abstract: In the purpose of making the consensus algorithm robust to outliers, consensus on the median value has recently attracted some attention. It has its applicability in for instance constructing a resilient distributed state estimator. Meanwhile, most of the existing works consider continuous-time algorithms and uses high-gain and discontinuous vector fields. This issues a problem of the need for sma… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 5 pages, 3 figures

  47. arXiv:2103.13952  [pdf, other

    cs.RO eess.SY

    Estimation of Closest In-Path Vehicle (CIPV) by Low-Channel LiDAR and Camera Sensor Fusion for Autonomous Vehicle

    Authors: Hyun** Bae, Gu Lee, Jaeseung Yang, Gwanjun Shin, Yongseob Lim, Gyeungho Choi

    Abstract: In autonomous driving, using a variety of sensors to recognize preceding vehicles in middle and long distance is helpful for improving driving performance and develo** various functions. However, if only LiDAR or camera is used in the recognition stage, it is difficult to obtain necessary data due to the limitations of each sensor. In this paper, we proposed a method of converting the tracking d… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

    Comments: 13 pages, 19 figures, submitted to MDPI Sensors

  48. Design of heterogeneous multi-agent system for distributed computation

    Authors: ** Gyu Lee, Hyungbo Shim

    Abstract: A group behavior of a heterogeneous multi-agent system is studied which obeys an "average of individual vector fields" under strong couplings among the agents. Under stability of the averaged dynamics (not asking stability of individual agents), the behavior of heterogeneous multi-agent system can be estimated by the solution to the averaged dynamics. A following idea is to "design" individual age… ▽ More

    Submitted 19 September, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

    Comments: arXiv admin note: text overlap with arXiv:1804.00638

  49. Synchronization with prescribed transient behavior: Heterogeneous multi-agent systems under funnel coupling Extended arXiv version

    Authors: ** Gyu Lee, Stephan Trenn, Hyungbo Shim

    Abstract: In this paper, we introduce a nonlinear time-varying coupling law, which can be designed in a fully decentralized manner and achieves approximate synchronization with arbitrary precision, under only mild assumptions on the individual vector fields and the underlying (undirected) graph structure. The proposed coupling law is motivated by the so-called funnel control method studied in adaptive contr… ▽ More

    Submitted 11 October, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Journal ref: Automatica, vol. 141, pp. 110276, July 2022

  50. arXiv:2012.02636  [pdf, other

    eess.SY

    Adaptive Charging Networks: A Framework for Smart Electric Vehicle Charging

    Authors: Zachary J. Lee, George Lee, Ted Lee, Cheng **, Rand Lee, Zhi Low, Daniel Chang, Christine Ortega, Steven H. Low

    Abstract: We describe the architecture and algorithms of the Adaptive Charging Network (ACN), which was first deployed on the Caltech campus in early 2016 and is currently operating at over 100 other sites in the United States. The architecture enables real-time monitoring and control and supports electric vehicle (EV) charging at scale. The ACN adopts a flexible Adaptive Scheduling Algorithm based on conve… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

    Comments: 11 pages, 8 figures