Skip to main content

Showing 1–50 of 250 results for author: Lee, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17376  [pdf, other

    cs.SD cs.AI eess.AS

    Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

    Authors: Duc-Tuan Truong, Ruijie Tao, Tuan Nguyen, Hieu-Thi Luong, Kong Aik Lee, Eng Siong Chng

    Abstract: Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts. This improvement could be due to the powerful modeling ability of the multi-head self-attention (MHSA) in the Transformer model, which learns the temporal relationship of each input token. However, artifacts of synthetic speech can be located in sp… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  2. Improving Rehabilitative Assessment with Statistical and Shape Preserving Surrogate Data and Singular Spectrum Analysis

    Authors: T. K. M. Lee, H. W. Chan, K. H. Leo, E. Chew, Ling Zhao, S. Sanei

    Abstract: Time series data are collected in temporal order and are widely used to train systems for prediction, modeling and classification to name a few. These systems require large amounts of data to improve generalization and prevent over-fitting. However there is a comparative lack of time series data due to operational constraints. This situation is alleviated by synthesizing data which have a suitable… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: This version of the paper under the same title, acknowledges the data source and the funding for current research using this data. arXiv admin note: substantial text overlap with arXiv:2404.14211

    Journal ref: 2022 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznan, Poland, 2022, pp. 58-63

  3. arXiv:2406.14176  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection

    Authors: Kyungbok Lee, You Zhang, Zhiyao Duan

    Abstract: This paper addresses the challenge of develo** a robust audio-visual deepfake detection model. In practical use cases, new generation algorithms are continually emerging, and these algorithms are not encountered during the development of detection methods. This calls for the generalization ability of the method. Additionally, to ensure the credibility of detection methods, it is beneficial for t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.11427  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

    Authors: Keon Lee, Dong Won Kim, Jaehyeon Kim, Jaewoong Cho

    Abstract: Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models f… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.10836  [pdf, other

    eess.AS cs.SD

    Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis

    Authors: Xin Wang, Tomi Kinnunen, Kong Aik Lee, Paul-Gauthier NoƩ, Junichi Yamagishi

    Abstract: Fusing outputs from automatic speaker verification (ASV) and spoofing countermeasure (CM) is expected to make an integrated system robust to zero-effort imposters and synthesized spoofing attacks. Many score-level fusion methods have been proposed, but many remain heuristic. This paper revisits score-level fusion using tools from decision theory and presents three main findings. First, fusion by s… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024 Accepted. https://github.com/nii-yamagishilab/SpeechSPC-mini

  6. arXiv:2406.08200  [pdf, other

    cs.SD cs.AI eess.AS

    Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding

    Authors: Rui Wang, Li** Chen, Kong AiK Lee, Zhen-Hua Ling

    Abstract: Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: accpeted by Interspeech2024

  7. arXiv:2406.07909  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

    Authors: Eungbeom Kim, Hantae Kim, Kyogu Lee

    Abstract: Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduce… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  8. arXiv:2405.00367  [pdf, other

    cs.IR cs.AI cs.SD eess.AS

    Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

    Authors: Yoori Oh, Yoseob Han, Kyogu Lee

    Abstract: There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted at SIGIR 2024 short paper track

  9. arXiv:2404.15305  [pdf, other

    eess.SP cs.LG

    ADAPT^2: Adapting Pre-Trained Sensing Models to End-Users via Self-Supervision Replay

    Authors: Hyungjun Yoon, Jaehyun Kwak, Biniyam Aschalew Tolera, Gaole Dai, Mo Li, Taesik Gong, Kimin Lee, Sung-Ju Lee

    Abstract: Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  10. arXiv:2404.15302  [pdf, other

    eess.SP math.OC math.ST

    Robust Phase Retrieval by Alternating Minimization

    Authors: Seonho Kim, Kiryung Lee

    Abstract: We consider a least absolute deviation (LAD) approach to the robust phase retrieval problem that aims to recover a signal from its absolute measurements corrupted with sparse noise. To solve the resulting non-convex optimization problem, we propose a robust alternating minimization (Robust-AM) derived as an unconstrained Gauss-Newton method. To solve the inner optimization arising in each step of… ▽ More

    Submitted 28 March, 2024; originally announced April 2024.

  11. Fidelitous Augmentation of Human Accelerometric Data for Deep Learning

    Authors: Tracey K. M. Lee, H. W. Chan, K. H. Leo, Effie Chew, L. Zhao, Saeid Sanei

    Abstract: Time series (TS) data have consistently been in short supply, yet their demand remains high for training systems in prediction, modeling, classification, and various other applications. Synthesis can serve to expand the sample population, yet it is crucial to maintain the statistical characteristics between the synthesized and the original TS : this ensures consistent sampling of data for both tra… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  12. arXiv:2404.03296  [pdf, other

    cs.CV eess.IV

    AdaBM: On-the-Fly Adaptive Bit Map** for Image Super-Resolution

    Authors: Cheeun Hong, Kyoung Mu Lee

    Abstract: Although image super-resolution (SR) problem has experienced unprecedented restoration accuracy with deep neural networks, it has yet limited versatile applications due to the substantial computational costs. Since different input images for SR face different restoration difficulties, adapting computational costs based on the input image, referred to as adaptive inference, has emerged as a promisi… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  13. arXiv:2404.02781  [pdf, other

    eess.AS cs.SD

    CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech

    Authors: Jaehyeon Kim, Keon Lee, Seungjun Chung, Jaewoong Cho

    Abstract: With the emergence of neural audio codecs, which encode multiple streams of discrete tokens from audio, large language models have recently gained attention as a promising approach for zero-shot Text-to-Speech (TTS) synthesis. Despite the ongoing rush towards scaling paradigms, audio tokenization ironically amplifies the scalability challenge, stemming from its long sequence length and the complex… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  14. arXiv:2404.01636  [pdf, other

    cs.CV cs.AI cs.LG cs.RO eess.SY

    Learning to Control Camera Exposure via Reinforcement Learning

    Authors: Kyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee

    Abstract: Adjusting camera exposure in arbitrary lighting conditions is the first step to ensure the functionality of computer vision applications. Poorly adjusted camera exposure often leads to critical failure and performance degradation. Traditional camera exposure control methods require multiple convergence steps and time-consuming processes, making them unsuitable for dynamic lighting conditions. In t… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024, *First two authors contributed equally to this work. Project page link: https://sites.google.com/view/drl-ae

  15. arXiv:2404.00856  [pdf, other

    cs.SD cs.AI eess.AS

    Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling

    Authors: Injune Hwang, Kyogu Lee

    Abstract: Recently, there have been efforts to encode the linguistic information of speech using a self-supervised framework for speech synthesis. However, predicting representations from surrounding representations can inadvertently entangle speaker information in the speech representation. This paper aims to remove speaker information by exploiting the structured nature of speech, composed of discrete uni… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  16. arXiv:2403.06404  [pdf, other

    cs.SD cs.LG eess.AS

    Cosine Scoring with Uncertainty for Neural Speaker Embedding

    Authors: Qiongqiong Wang, Kong Aik Lee

    Abstract: Uncertainty modeling in speaker representation aims to learn the variability present in speech utterances. While the conventional cosine-scoring is computationally efficient and prevalent in speaker recognition, it lacks the capability to handle uncertainty. To address this challenge, this paper proposes an approach for estimating uncertainty at the speaker embedding front-end and propagating it t… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 5 pages, 4 figures

    Journal ref: IEEE Signal Processing Letters 2024

  17. arXiv:2403.03294  [pdf, other

    eess.SP

    Small-Noise Sensitivity Analysis of Locating Pulses in the Presence of Adversarial Perturbation

    Authors: Meghna Kalra, Maxime Ferreira Da Costa, Kiryung Lee

    Abstract: A fundamental small-noise sensitivity analysis of spike localization in the presence of adversarial perturbations and arbitrary point spread function (PSF) is presented. The analysis leverages the local Lipschitz property of the inverse map from measurement noise to parameter estimate. In the small noise regime, the local Lipschitz constant converges to the spectral norm of the noiseless Jacobian… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  18. arXiv:2403.00529  [pdf, other

    cs.SD cs.LG eess.AS

    VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

    Authors: Weiwei Lin, Chenhang He, Man-Wai Mak, Jiachen Lian, Kong Aik Lee

    Abstract: Achieving nuanced and accurate emulation of human voice has been a longstanding goal in artificial intelligence. Although significant progress has been made in recent years, the mainstream of speech synthesis models still relies on supervised speaker modeling and explicit reference utterances. However, there are many aspects of human voice, such as emotion, intonation, and speaking style, for whic… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: preprint

  19. arXiv:2402.03399  [pdf, other

    eess.IV cs.CV

    Rethinking RGB Color Representation for Image Restoration Models

    Authors: Jaerin Lee, JoonKyu Park, Sungyong Baik, Kyoung Mu Lee

    Abstract: Image restoration models are typically trained with a pixel-wise distance loss defined over the RGB color representation space, which is well known to be a source of blurry and unrealistic textures in the restored images. The reason, we believe, is that the three-channel RGB space is insufficient for supervising the restoration models. To this end, we augment the representation to hold structural… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 31 pages (11 pages main manuscript + 20 pages appendices), 22 figures

  20. arXiv:2402.01298  [pdf, other

    eess.AS cs.AI cs.SD

    Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations

    Authors: Jaeyeon Kim, Injune Hwang, Kyogu Lee

    Abstract: We propose a framework to learn semantics from raw audio signals using two types of representations, encoding contextual and phonetic information respectively. Specifically, we introduce a speech-to-unit processing pipeline that captures two types of representations with different time resolutions. For the language model, we adopt a dual-channel architecture to incorporate both types of representa… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024

  21. arXiv:2401.15323  [pdf, other

    cs.SD cs.AI cs.IR eess.AS

    Music Auto-Tagging with Robust Music Representation Learned via Domain Adversarial Training

    Authors: Haesun Joung, Kyogu Lee

    Abstract: Music auto-tagging is crucial for enhancing music discovery and recommendation. Existing models in Music Information Retrieval (MIR) struggle with real-world noise such as environmental and speech sounds in multimedia content. This study proposes a method inspired by speech-related tasks to enhance music auto-tagging performance in noisy settings. The approach integrates Domain Adversarial Trainin… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: 5 pages, 3 figures, accepted to ICASSP 2024

  22. Adversarial speech for voice privacy protection from Personalized Speech generation

    Authors: Shihao Chen, Li** Chen, Jie Zhang, KongAik Lee, Zhenhua Ling, Lirong Dai

    Abstract: The rapid progress in personalized speech generation technology, including personalized text-to-speech (TTS) and voice conversion (VC), poses a challenge in distinguishing between generated and real speech for human listeners, resulting in an urgent demand in protecting speakers' voices from malicious misuse. In this regard, we propose a speaker protection method based on adversarial attacks. The… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted by icassp 2024

  23. arXiv:2401.11156  [pdf, other

    cs.CR cs.AI cs.SD eess.AS

    Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

    Authors: Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen

    Abstract: It is now well-known that automatic speaker verification (ASV) systems can be spoofed using various types of adversaries. The usual approach to counteract ASV systems against such attacks is to develop a separate spoofing countermeasure (CM) module to classify speech input either as a bonafide, or a spoofed utterance. Nevertheless, such a design requires additional computation and utilization effo… ▽ More

    Submitted 27 January, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (doi updated)

  24. arXiv:2401.03850  [pdf, other

    eess.AS cs.SD

    Inverse Nonlinearity Compensation of Hyperelastic Deformation in Dielectric Elastomer for Acoustic Actuation

    Authors: ** Woo Lee, Gwang Seok An, Jeong-Yun Sun, Kyogu Lee

    Abstract: This paper delves into the analysis of nonlinear deformation induced by dielectric actuation in pre-stressed ideal dielectric elastomers. It formulates a nonlinear ordinary differential equation governing this deformation based on the hyperelastic model under dielectric stress. Through numerical integration and neural network approximations, the relationship between voltage and stretch is establis… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  25. arXiv:2401.03650  [pdf, other

    eess.AS cs.SD eess.SP

    DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper

    Authors: Jayeon Yi, Junghyun Koo, Kyogu Lee

    Abstract: Clip** is a common nonlinear distortion that occurs whenever the input or output of an audio system exceeds the supported range. This phenomenon undermines not only the perception of speech quality but also downstream processes utilizing the disrupted signal. Therefore, a real-time-capable, robust, and low-response-time method for speech declip** (SD) is desired. In this work, we introduce DDD… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: To appear, ICASSP 2024. Demo samples at https://stet-stet.github.io/DDD, repo at https://github.com/stet-stet/DDD

  26. arXiv:2401.02626  [pdf, other

    cs.SD eess.AS

    Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio

    Authors: Yi Ma, Kong Aik Lee, Ville HautamƤki, Meng Ge, Haizhou Li

    Abstract: Speaker verification is hampered by background noise, particularly at extremely low Signal-to-Noise Ratio (SNR) under 0 dB. It is difficult to suppress noise without introducing unwanted artifacts, which adversely affects speaker verification. We proposed the mechanism called Gradient Weighting (Grad-W), which dynamically identifies and reduces artifact noise during prediction. The mechanism is ba… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  27. arXiv:2312.15400  [pdf, other

    cs.SD cs.AI eess.AS

    Combinatorial music generation model with song structure graph analysis

    Authors: Seonghyeon Go, Kyogu Lee

    Abstract: In this work, we propose a symbolic music generation model with the song structure graph analysis network. We construct a graph that uses information such as note sequence and instrument as node features, while the correlation between note sequences acts as the edge feature. We trained a Graph Neural Network to obtain node representation in the graph, then we use node representation as input of Un… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: 5 pages(4 pages of paper and 1 references), 3 figures

  28. arXiv:2312.09842  [pdf, ps, other

    cs.SD eess.AS

    On the compression of shallow non-causal ASR models using knowledge distillation and tied-and-reduced decoder for low-latency on-device speech recognition

    Authors: Nagaraj Adiga, **hwan Park, Chintigari Shiva Kumar, Shatrughan Singh, Kyungmin Lee, Chanwoo Kim, Dhananjaya Gowda

    Abstract: Recently, the cascaded two-pass architecture has emerged as a strong contender for on-device automatic speech recognition (ASR). A cascade of causal and shallow non-causal encoders coupled with a shared decoder enables operation in both streaming and look-ahead modes. In this paper, we propose shallow cascaded model by combining various model compression techniques such as knowledge distillation,… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  29. arXiv:2312.03620  [pdf, other

    eess.AS cs.SD

    Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

    Authors: Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li

    Abstract: Previous studies demonstrate the impressive performance of residual neural networks (ResNet) in speaker verification. The ResNet models treat the time and frequency dimensions equally. They follow the default stride configuration designed for image recognition, where the horizontal and vertical axes exhibit similarities. This approach ignores the fact that time and frequency are asymmetric in spee… ▽ More

    Submitted 24 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. Open Access: https://ieeexplore.ieee.org/abstract/document/10497864

  30. arXiv:2311.18505  [pdf, other

    cs.SD eess.AS eess.SP

    String Sound Synthesizer on GPU-accelerated Finite Difference Scheme

    Authors: ** Woo Lee, Min Jun Choi, Kyogu Lee

    Abstract: This paper introduces a nonlinear string sound synthesizer, based on a finite difference simulation of the dynamic behavior of strings under various excitations. The presented synthesizer features a versatile string simulation engine capable of stochastic parameterization, encompassing fundamental frequency modulation, stiffness, tension, frequency-dependent loss, and excitation control. This open… ▽ More

    Submitted 8 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: To be appeared in ICASSP 2024

  31. arXiv:2311.13687  [pdf, other

    cs.LG cs.MM cs.SD eess.AS

    Beat-Aligned Spectrogram-to-Sequence Generation of Rhythm-Game Charts

    Authors: Jayeon Yi, Sungho Lee, Kyogu Lee

    Abstract: In the heart of "rhythm games" - games where players must perform actions in sync with a piece of music - are "charts", the directives to be given to players. We newly formulate chart generation as a sequence generation task and train a Transformer using a large dataset. We also introduce tempo-informed preprocessing and training procedures, some of which are suggested to be integral for a success… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: ISMIR 2023 LBD. Demo videos and code at stet-stet.github.io/goct

  32. arXiv:2311.10306  [pdf, other

    eess.IV cs.CV cs.LG

    MPSeg : Multi-Phase strategy for coronary artery Segmentation

    Authors: Jonghoe Ku, Yong-Hee Lee, Junsup Shin, In Kyu Lee, Hyun-Woo Kim

    Abstract: Accurate segmentation of coronary arteries is a pivotal process in assessing cardiovascular diseases. However, the intricate structure of the cardiovascular system presents significant challenges for automatic segmentation, especially when utilizing methodologies like the SYNTAX Score, which relies extensively on detailed structural information for precise risk stratification. To address these dif… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: MICCAI 2023 Conference ARCADE Challenge

  33. arXiv:2311.06712  [pdf, other

    eess.IV

    PuzzleTuning: Explicitly Bridge Pathological and Natural Image with Puzzles

    Authors: Tianyi Zhang, Shangqing Lyu, Yanli Lei, Sicheng Chen, Nan Ying, Yufang He, Yu Zhao, Yunlu Feng, Hwee Kuan Lee, Guanglei Zhang

    Abstract: Pathological image analysis is a crucial field in computer vision. Due to the annotation scarcity in the pathological field, pre-training with self-supervised learning (SSL) is widely applied to learn on unlabeled images. However, the current SSL-based pathological pre-training: (1) does not explicitly explore the essential focuses of the pathological field, and (2) does not effectively bridge wit… ▽ More

    Submitted 22 April, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

    Comments: 13 pages, 9 figures, 8 tables

  34. arXiv:2311.02581  [pdf, other

    cs.SD eess.AS

    Yet Another Generative Model For Room Impulse Response Estimation

    Authors: Sungho Lee, Hyeong-Seok Choi, Kyogu Lee

    Abstract: Recent neural room impulse response (RIR) estimators typically comprise an encoder for reference audio analysis and a generator for RIR synthesis. Especially, it is the performance of the generator that directly influences the overall estimation quality. In this context, we explore an alternate generator architecture for improved performance. We first train an autoencoder with residual quantizatio… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: WASPAA 2023

  35. arXiv:2310.03457  [pdf, other

    cs.AI eess.IV

    A Quantitatively Interpretable Model for Alzheimer's Disease Prediction Using Deep Counterfactuals

    Authors: Kwanseok Oh, Da-Woon Heo, Ahmad Wisnu Mulyadi, Wonsik Jung, Eunsong Kang, Kun Ho Lee, Heung-Il Suk

    Abstract: Deep learning (DL) for predicting Alzheimer's disease (AD) has provided timely intervention in disease progression yet still demands attentive interpretability to explain how their DL models make definitive decisions. Recently, counterfactual reasoning has gained increasing attention in medical research because of its ability to provide a refined visual explanatory map. However, such visual explan… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: 15 pages, 5 figures, 4 tables

  36. arXiv:2310.01128  [pdf, other

    eess.AS cs.AI

    Disentangling Voice and Content with Self-Supervision for Speaker Recognition

    Authors: Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li

    Abstract: For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker traits and content variability in speech. It is realized with the use of three Gaussian inference layers, each consisting of a learnable transition model that extra… ▽ More

    Submitted 1 November, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023 (main track)

  37. Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification

    Authors: Duc-Tuan Truong, Ruijie Tao, Jia Qi Yip, Kong Aik Lee, Eng Siong Chng

    Abstract: Knowledge distillation (KD) is used to enhance automatic speaker verification performance by ensuring consistency between large teacher networks and lightweight student networks at the embedding level or label level. However, the conventional label-level KD overlooks the significant knowledge from non-target speakers, particularly their classification probabilities, which can be crucial for automa… ▽ More

    Submitted 14 January, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 10336-10340

  38. arXiv:2309.13573  [pdf, other

    cs.SD eess.AS

    The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

    Authors: Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu

    Abstract: With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tr… ▽ More

    Submitted 5 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: 8 pages, Accepted by ASRU2023

  39. arXiv:2309.12237  [pdf, other

    cs.CR cs.LG cs.SD eess.AS eess.IV stat.CO

    t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators

    Authors: Tomi Kinnunen, Kong Aik Lee, Hemlata Tak, Nicholas Evans, Andreas Nautsch

    Abstract: Presentation attack (spoofing) detection (PAD) typically operates alongside biometric verification to improve reliablity in the face of spoofing attacks. Even though the two sub-systems operate in tandem to solve the single task of reliable biometric verification, they address different detection tasks and are hence typically evaluated separately. Evidence shows that this approach is suboptimal. W… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence. For associated codes, see https://github.com/TakHemlata/T-EER (Github) and https://colab.research.google.com/drive/1ga7eiKFP11wOFMuZjThLJlkBcwEG6_4m?usp=sharing (Google Colab)

  40. arXiv:2308.16265  [pdf, other

    eess.SP

    Stable estimation of pulses of unknown shape from multiple snapshots via ESPRIT

    Authors: Meghna Kalra, Kiryung Lee

    Abstract: We consider the problem of resolving overlap** pulses from noisy multi-snapshot measurements, which has been a problem central to various applications including medical imaging and array signal processing. ESPRIT algorithm has been used to estimate the pulse locations. However, existing theoretical analysis is restricted to ideal assumptions on signal and measurement models. We present a novel p… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  41. arXiv:2308.13001  [pdf, other

    math.OC eess.SY

    On Correcting Errors in Existing Mathematical Approaches for UAV Trajectory Design Considering No-Fly-Zones

    Authors: Kanghyun Heo, Gitae Park, Kisong Lee

    Abstract: Motivated by the fact that current mathematical methods for the trajectory design of an unmanned aerial vehicle (UAV) considering no-fly-zones (NFZs) cannot perfectly avoid NFZs throughout the entire continuous trajectory, this study introduces a new constraint that ensures the complete avoidance of NFZs. Moreover, we provide mathematical proof demonstrating that a UAV operating within the propose… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: 6 pages, 6 figures

  42. arXiv:2308.12599  [pdf, other

    cs.SD cs.LG eess.AS

    Exploiting Time-Frequency Conformers for Music Audio Enhancement

    Authors: Yunkee Chae, Junghyun Koo, Sungho Lee, Kyogu Lee

    Abstract: With the proliferation of video platforms on the internet, recording musical performances by mobile devices has become commonplace. However, these recordings often suffer from degradation such as noise and reverberation, which negatively impact the listening experience. Consequently, the necessity for music audio enhancement (referred to as music enhancement from this point onward), involving the… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM Multimedia 2023

  43. arXiv:2308.01594  [pdf, other

    cs.CV eess.IV

    Reference-Free Isotropic 3D EM Reconstruction using Diffusion Models

    Authors: Kyungryun Lee, Won-Ki Jeong

    Abstract: Electron microscopy (EM) images exhibit anisotropic axial resolution due to the characteristics inherent to the imaging modality, presenting challenges in analysis and downstream tasks.In this paper, we propose a diffusion-model-based framework that overcomes the limitations of requiring reference data or prior knowledge about the degradation process. Our approach utilizes 2D diffusion models to c… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  44. arXiv:2308.01187  [pdf, other

    cs.SD eess.AS

    Music De-limiter Networks via Sample-wise Gain Inversion

    Authors: Chang-Bin Jeon, Kyogu Lee

    Abstract: The loudness war, an ongoing phenomenon in the music industry characterized by the increasing final loudness of music while reducing its dynamic range, has been a controversial topic for decades. Music mastering engineers have used limiters to heavily compress and make music louder, which can induce ear fatigue and hearing loss in listeners. In this paper, we introduce music de-limiter networks th… ▽ More

    Submitted 23 June, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: Results corrected as some bugs were found in the previous codes and dataset. Presented at WASPAA 2023

  45. arXiv:2307.13337  [pdf, other

    cs.CV eess.IV

    Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks

    Authors: Cheeun Hong, Kyoung Mu Lee

    Abstract: Quantization is a promising approach to reduce the high computational complexity of image super-resolution (SR) networks. However, compared to high-level tasks like image classification, low-bit quantization leads to severe accuracy loss in SR networks. This is because feature distributions of SR networks are significantly divergent for each channel or input image, and is thus difficult to determi… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  46. arXiv:2307.12751  [pdf, other

    eess.IV cs.CV

    ICF-SRSR: Invertible scale-Conditional Function for Self-Supervised Real-world Single Image Super-Resolution

    Authors: Reyhaneh Neshatavar, Mohsen Yavartanoo, Sanghyun Son, Kyoung Mu Lee

    Abstract: Single image super-resolution (SISR) is a challenging ill-posed problem that aims to up-sample a given low-resolution (LR) image to a high-resolution (HR) counterpart. Due to the difficulty in obtaining real LR-HR training pairs, recent approaches are trained on simulated LR images degraded by simplified down-sampling operators, e.g., bicubic. Such an approach can be problematic in practice becaus… ▽ More

    Submitted 31 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

  47. arXiv:2307.12644  [pdf, other

    eess.IV cs.AI cs.CV cs.LG eess.SP

    Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG

    Authors: Dae-Yeol Kim, Eunsu Goh, KwangKee Lee, JongEui Chae, JongHyeon Mun, Junyeong Na, Chae-bong Sohn, Do-Yup Kim

    Abstract: rPPG (Remote photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. Analyzing the measured BVP can derive various physiological signals such as heart rate, stress level, and blood pressure, which can be applied to various applications such as telemedicine, remote patient monito… ▽ More

    Submitted 18 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 20 pages, 10 figures

    MSC Class: 68T45; 68T07 ACM Class: I.4.9; I.5.4; I.2

  48. arXiv:2307.12576  [pdf, other

    eess.AS cs.IR cs.LG cs.SD

    Self-refining of Pseudo Labels for Music Source Separation with Noisy Labeled Data

    Authors: Junghyun Koo, Yunkee Chae, Chang-Bin Jeon, Kyogu Lee

    Abstract: Music source separation (MSS) faces challenges due to the limited availability of correctly-labeled individual instrument tracks. With the push to acquire larger datasets to improve MSS performance, the inevitability of encountering mislabeled individual instrument tracks becomes a significant challenge to address. This paper introduces an automated technique for refining the labels in a partially… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 24th International Society for Music Information Retrieval Conference (ISMIR 2023)

  49. arXiv:2307.05748  [pdf, other

    eess.SP

    Dual-Polarized IRS-Assisted MIMO Network

    Authors: Muteen Munawar, Kyungchun Lee

    Abstract: This study considers a dual-polarized intelligent reflecting surface (DP-IRS)-assisted multiple-input multiple-output (MIMO) single-user wireless communication system. The transmitter and receiver are equipped with DP antennas, and each antenna features a separate phase shifter for each polarization. We attempt to maximize the system's spectral efficiency (SE) by optimizing the operations of the r… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: 32 pages, 13 figures, 1 table

    MSC Class: 15B52 ACM Class: H.0

  50. arXiv:2306.12978  [pdf, other

    cs.IT eess.SP

    Rate-Splitting Multiple Access for 6G Networks: Ten Promising Scenarios and Applications

    Authors: Jeonghun Park, Byungju Lee, **seok Choi, Hoon Lee, Namyoon Lee, Seok-Hwan Park, Kyoung-Jae Lee, Junil Choi, Sung Ho Chae, Sang-Woon Jeon, Kyung Sup Kwak, Bruno Clerckx, Wonjae Shin

    Abstract: In the upcoming 6G era, multiple access (MA) will play an essential role in achieving high throughput performances required in a wide range of wireless applications. Since MA and interference management are closely related issues, the conventional MA techniques are limited in that they cannot provide near-optimal performance in universal interference regimes. Recently, rate-splitting multiple acce… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 17 pages, 6 figures, submitted to IEEE Network Magazine