Skip to main content

Showing 1–50 of 159 results for author: Lee, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19666  [pdf, other

    cs.CV eess.IV

    CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion

    Authors: Chih-Chung Hsu, Chih-Chien Ni, Chia-Ming Lee, Li-Wei Kang

    Abstract: Hyperspectral imaging, capturing detailed spectral information for each pixel, is pivotal in diverse scientific and industrial applications. Yet, the acquisition of high-resolution (HR) hyperspectral images (HSIs) often needs to be addressed due to the hardware limitations of existing imaging systems. A prevalent workaround involves capturing both a high-resolution multispectral image (HR-MSI) and… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Submitted to TIP 2024

  2. arXiv:2406.15160  [pdf, other

    eess.AS eess.SP

    Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

    Authors: Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee

    Abstract: This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by icme2024

  3. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2406.05065  [pdf, ps, other

    eess.AS

    Emo-bias: A Large Scale Evaluation of Social Bias on Speech Emotion Recognition

    Authors: Yi-Cheng Lin, Haibin Wu, Huang-Cheng Chou, Chi-Chun Lee, Hung-yi Lee

    Abstract: The rapid growth of Speech Emotion Recognition (SER) has diverse global applications, from improving human-computer interactions to aiding mental health diagnostics. However, SER models might contain social bias toward gender, leading to unfair outcomes. This study analyzes gender bias in SER models trained with Self-Supervised Learning (SSL) at scale, exploring factors influencing it. SSL-based S… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  5. arXiv:2406.02488  [pdf, other

    eess.AS cs.CL cs.SD

    Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition

    Authors: Hao Yen, Pin-Jui Ku, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: We propose a novel language-universal approach to end-to-end automatic spoken keyword recognition (SKR) leveraging upon (i) a self-supervised pre-trained model, and (ii) a set of universal speech attributes (manner and place of articulation). Specifically, Wav2Vec2.0 is used to generate robust speech representations, followed by a linear output layer to produce attribute sequences. A non-trainable… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  6. arXiv:2405.01264  [pdf, other

    eess.SY

    Model Predictive Guidance for Fuel-Optimal Landing of Reusable Launch Vehicles

    Authors: Ki-Wook Jung, Sang-Don Lee, Cheol-Goo Jung, Chang-Hun Lee

    Abstract: This paper introduces a landing guidance strategy for reusable launch vehicles (RLVs) using a model predictive approach based on sequential convex programming (SCP). The proposed approach devises two distinct optimal control problems (OCPs): planning a fuel-optimal landing trajectory that accommodates practical path constraints specific to RLVs, and determining real-time optimal tracking commands.… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  7. arXiv:2404.17585  [pdf, other

    cs.HC cs.AI cs.LG eess.SP

    NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep Stage Classification Using Single-Channel EEG

    Authors: Cheol-Hui Lee, Hakseung Kim, Hyun-jee Han, Min-Kyung Jung, Byung C. Yoon, Dong-Joo Kim

    Abstract: The classification of sleep stages is a pivotal aspect of diagnosing sleep disorders and evaluating sleep quality. However, the conventional manual scoring process, conducted by clinicians, is time-consuming and prone to human bias. Recent advancements in deep learning have substantially propelled the automation of sleep stage classification. Nevertheless, challenges persist, including the need fo… ▽ More

    Submitted 13 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: 14 pages, 4 figures

  8. arXiv:2404.15781  [pdf, other

    cs.CV cs.AI eess.IV

    Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat

    Authors: Chih-Chung Hsu, Chih-Yu Jian, Eng-Shen Tu, Chia-Ming Lee, Guan-Lin Chen

    Abstract: This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by TGRS 2024

  9. arXiv:2404.15181  [pdf

    cs.SD cs.HC eess.AS

    Tailors: New Music Timbre Visualizer to Entertain Music Through Imagery

    Authors: ChungHa Lee

    Abstract: In this paper, I have implemented a timbre visualization system called Tailors. Through the experiment with 27 MIR users, Tailors was found to be effective in conveying timbral warmth, brightness, depth, shallowness, hardness, roughness, and sharpness features of music compared to the only music condition and basic visualization. All scores of Tailors in the music imagery and music entertainment s… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 47 pages, 9 figures, 5 tables

    ACM Class: J.5

  10. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  11. arXiv:2404.01643  [pdf, other

    eess.IV cs.CV cs.LG

    A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai

    Abstract: Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions… ▽ More

    Submitted 20 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Camera-ready version, accepted by DEF-AI-MIA workshop, in conjunted with CVPR2024

  12. arXiv:2403.18707  [pdf, other

    math.OC eess.SY

    Connections between Reachability and Time Optimality

    Authors: Juho Bae, Ji Hoon Bai, Byung-Yoon Lee, Jun-Yong Lee, Chang-Hun Lee

    Abstract: This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of opti… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Submitted to Automatica

  13. arXiv:2403.11230  [pdf, other

    eess.IV cs.CV cs.LG

    Simple 2D Convolutional Neural Network-based Approach for COVID-19 Detection

    Authors: Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai

    Abstract: This study explores the use of deep learning techniques for analyzing lung Computed Tomography (CT) images. Classic deep learning approaches face challenges with varying slice counts and resolutions in CT images, a diversity arising from the utilization of assorted scanning equipment. Typically, predictions are made on single slices which are then combined for a comprehensive outcome. Yet, this me… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  14. arXiv:2403.09009  [pdf, other

    eess.SY

    A Geometric Approach to Resilient Distributed Consensus Accounting for State Imprecision and Adversarial Agents

    Authors: Christopher A. Lee, Waseem Abbas

    Abstract: This paper presents a novel approach for resilient distributed consensus in multiagent networks when dealing with adversarial agents imprecision in states observed by normal agents. Traditional resilient distributed consensus algorithms often presume that agents have exact knowledge of their neighbors' states, which is unrealistic in practical scenarios. We show that such existing methods are inad… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: American Control Conference (ACC), 2024

  15. arXiv:2403.04245  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

    Authors: Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee

    Abstract: Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed to be sensitive to missing video frames, performing even worse than single-modality models. While applying the dropout technique to the video modality enhances robustness to missing frames, it simultaneously results in a performance loss when dealing with complete data input. In this paper, we investigate this contrasting p… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: the paper is accepted by CVPR2024

  16. arXiv:2402.16144  [pdf, ps, other

    eess.SY physics.optics

    100 Gbps Indoor Access and 4.8 Gbps Outdoor Point-to-Point LiFi Transmission Systems using Laser-based Light Sources

    Authors: Cheng Cheng, Sovan Das, Stefan Videv, Adrian Spark, Sina Babadi, Aravindh Krishnamoorthy, Changmin Lee, Daniel Grieder, Kathleen Hartnett, Paul Rudy, James Raring, Marzieh Najafi, Vasilis K. Papanikolaou, Robert Schober, Harald Haas

    Abstract: In this paper, we demonstrate the communication capabilities of light-fidelity (LiFi) systems based on highbrightness and high-bandwidth integrated laser-based sources in a surface mount device (SMD) packaging platform. The laserbased source is able to deliver 450 lumens of white light illumination and the resultant light brightness is over 1000 cd mm2. It is demonstrated that a wavelength divisio… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  17. arXiv:2402.13018  [pdf, other

    eess.AS cs.SD

    EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

    Authors: Haibin Wu, Huang-Cheng Chou, Kai-Wei Chang, Lucas Goncalves, Jiawei Du, Jyh-Shing Roger Jang, Chi-Chun Lee, Hung-Yi Lee

    Abstract: Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems. However, 80.77% of SER papers yield results that cannot be reproduced. We develop EMO-SUPERB, short for EMOtion Speech Universal PERformance Benchmark, which aims to enhance open-source initiatives for SER. EMO-SUPERB includes a user-friendly codebase to leverage 15 state-of-the-art speech self-supervi… ▽ More

    Submitted 12 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: webpage: https://emosuperb.github.io/

  18. arXiv:2402.05482  [pdf, other

    eess.SP cs.LG

    A Non-Intrusive Neural Quality Assessment Model for Surface Electromyography Signals

    Authors: Cho-Yuan Lee, Kuan-Chen Wang, Kai-Chun Liu, Yu-Te Wang, Xugang Lu, **-Cheng Yeh, Yu Tsao

    Abstract: In practical scenarios involving the measurement of surface electromyography (sEMG) in muscles, particularly those areas near the heart, one of the primary sources of contamination is the presence of electrocardiogram (ECG) signals. To assess the quality of real-world sEMG data more effectively, this study proposes QASE-net, a new non-intrusive model that predicts the SNR of sEMG signals. QASE-net… ▽ More

    Submitted 13 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 5 pages, 4 figures

  19. arXiv:2402.00313  [pdf, other

    cs.LG eess.SY

    Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach

    Authors: Zhiyuan Yao, Ionut Florescu, Chihoon Lee

    Abstract: In this paper we are introducing a new reinforcement learning method for control problems in environments with delayed feedback. Specifically, our method employs stochastic planning, versus previous methods that used deterministic planning. This allows us to embed risk preference in the policy optimization problem. We show that this formulation can recover the optimal policy for problems with dete… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

    Comments: Under Review

  20. arXiv:2401.13766  [pdf, ps, other

    eess.AS cs.SD

    Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori

    Authors: Hu Hu, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: In this work, we aim to establish a Bayesian adaptive learning framework by focusing on estimating latent variables in deep neural network (DNN) models. Latent variables indeed encode both transferable distributional information and structural relationships. Thus the distributions of the source latent variables (prior) can be combined with the knowledge learned from the target data (likelihood) to… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: ASRU2023 Bayesian Symposium. arXiv admin note: text overlap with arXiv:2110.08598

  21. arXiv:2401.11371  [pdf, other

    cs.RO eess.SY

    Modeling Considerations for Develo** Deep Space Autonomous Spacecraft and Simulators

    Authors: Christopher Agia, Guillem Casadesus Vila, Saptarshi Bandyopadhyay, David S. Bayard, Kar-Ming Cheung, Charles H. Lee, Eric Wood, Ian Aenishanslin, Steven Ardito, Lorraine Fesq, Marco Pavone, Issa A. D. Nesnas

    Abstract: To extend the limited scope of autonomy used in prior missions for operation in distant and complex environments, there is a need to further develop and mature autonomy that jointly reasons over multiple subsystems, which we term system-level autonomy. System-level autonomy establishes situational awareness that resolves conflicting information across subsystems, which may necessitate the refineme… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Comments: Project page: https://sites.google.com/stanford.edu/spacecraft-models. 20 pages, 8 figures. Accepted to the IEEE Conference on Aerospace (AeroConf) 2024

    ACM Class: I.2.8; I.2.9; I.6.1; I.6.3; I.6.4; I.6.6; J.2

  22. arXiv:2401.08864  [pdf, other

    eess.AS cs.LG cs.SD

    Binaural Angular Separation Network

    Authors: Yang Yang, George Sung, Shao-Fu Shih, Hakan Erdogan, Chehung Lee, Matthias Grundmann

    Abstract: We propose a neural network model that can separate target speech sources from interfering sources at different angular regions using two microphones. The model is trained with simulated room impulse responses (RIRs) using omni-directional microphones without needing to collect real RIRs. By relying on specific angular regions and multiple room simulations, the model utilizes consistent time diffe… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  23. arXiv:2312.09799  [pdf, other

    eess.IV cs.AI cs.CV

    IQNet: Image Quality Assessment Guided Just Noticeable Difference Prefiltering For Versatile Video Coding

    Authors: Yu-Han Sun, Chiang Lo-Hsuan Lee, Tian-Sheuan Chang

    Abstract: Image prefiltering with just noticeable distortion (JND) improves coding efficiency in a visual lossless way by filtering the perceptually redundant information prior to compression. However, real JND cannot be well modeled with inaccurate masking equations in traditional approaches or image-level subject tests in deep learning approaches. Thus, this paper proposes a fine-grained JND prefiltering… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  24. arXiv:2312.09576  [pdf, other

    eess.IV cs.CV

    SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

    Authors: Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, ** Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein , et al. (17 additional authors not shown)

    Abstract: Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: A challenge report of SegRap2023 (organized in conjunction with MICCAI2023)

  25. arXiv:2311.17936  [pdf

    eess.SY

    Diagnostics Using Nuclear Plant Cyber Attack Analysis Toolkit

    Authors: Japan K. Patel, Athi Varuttamaseni, Robert W. Youngblood III, John C. Lee

    Abstract: A Python interface is developed for the GPWR Simulator to automatically simulate cyber-spoofing of different steam generator parameters and plant operation. Specifically, steam generator water level, feedwater flowrate, steam flowrate, valve position, and steam generator controller parameters, including controller gain and time constant, can be directly attacked using command inject, denial of ser… ▽ More

    Submitted 4 February, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Paper has been submitted to ANS for review

  26. arXiv:2311.16604  [pdf, other

    eess.AS cs.LG

    LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models

    Authors: Chi-Chang Lee, Hong-Wei Chen, Chu-Song Chen, Hsin-Min Wang, Tsung-Te Liu, Yu Tsao

    Abstract: The performance of speaker verification (SV) models may drop dramatically in noisy environments. A speech enhancement (SE) module can be used as a front-end strategy. However, existing SE methods may fail to bring performance improvements to downstream SV systems due to artifacts in the predicted signals of SE models. To compensate for artifacts, we propose a generic denoising framework named LC4S… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  27. arXiv:2311.16595  [pdf, other

    cs.SD cs.LG eess.AS

    D4AM: A General Denoising Framework for Downstream Acoustic Models

    Authors: Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen

    Abstract: The performance of acoustic models degrades notably in noisy environments. Speech enhancement (SE) can be used as a front-end strategy to aid automatic speech recognition (ASR) systems. However, existing training objectives of SE methods are not fully effective at integrating speech-text and noisy-clean paired data for training toward unseen ASR systems. In this study, we propose a general denoisi… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  28. arXiv:2311.11745  [pdf, other

    cs.SD cs.CL eess.AS

    ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

    Authors: Jungil Kong, Junmo Lee, Jeongmin Kim, Beomjeong Kim, Jihoon Park, Dohee Kong, Changheon Lee, Sang** Kim

    Abstract: In this work, we propose a novel method for modeling numerous speakers, which enables expressing the overall characteristics of speakers in detail like a trained multi-speaker model without additional training on the target speaker's dataset. Although various works with similar purposes have been actively studied, their performance has not yet reached that of trained multi-speaker models due to th… ▽ More

    Submitted 31 May, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: ICML 2024

  29. arXiv:2311.06834  [pdf, other

    eess.IV cs.CV

    Osteoporosis Prediction from Hand and Wrist X-rays using Image Segmentation and Self-Supervised Learning

    Authors: Hyungeun Lee, Ung Hwang, Seungwon Yu, Chang-Hun Lee, Kijung Yoon

    Abstract: Osteoporosis is a widespread and chronic metabolic bone disease that often remains undiagnosed and untreated due to limited access to bone mineral density (BMD) tests like Dual-energy X-ray absorptiometry (DXA). In response to this challenge, current advancements are pivoting towards detecting osteoporosis by examining alternative indicators from peripheral bone areas, with the goal of increasing… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 10 pages

  30. arXiv:2310.18882  [pdf, other

    cs.LG cs.AI cs.CV eess.IV eess.SP

    Differentiable Learning of Generalized Structured Matrices for Efficient Deep Neural Networks

    Authors: Changwoo Lee, Hun-Seok Kim

    Abstract: This paper investigates efficient deep neural networks (DNNs) to replace dense unstructured weight matrices with structured ones that possess desired properties. The challenge arises because the optimal weight matrix structure in popular neural network models is obscure in most cases and may vary from layer to layer even in the same network. Prior structured matrices proposed for efficient DNNs we… ▽ More

    Submitted 7 March, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

  31. arXiv:2310.15883  [pdf, other

    eess.SY

    Attitude Takeover Control for Noncooperative Space Targets Based on Gaussian Processes with Online Model Learning

    Authors: Yuhan Liu, Pengyu Wang, Chang-Hun Lee, Roland Tóth

    Abstract: One major challenge for autonomous attitude takeover control for on-orbit servicing of spacecraft is that an accurate dynamic motion model of the combined vehicles is highly nonlinear, complex and often costly to identify online, which makes traditional model-based control impractical for this task. To address this issue, a recursive online sparse Gaussian Process (GP)-based learning strategy for… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 17 pages, 14 figures. Submitted to in IEEE Transactions on Aerospace and Electronic Systems

  32. arXiv:2309.10278  [pdf, other

    eess.SY cs.RO math.OC

    Parameter-Varying Koopman Operator for Nonlinear System Modeling and Control

    Authors: Changyu Lee, Kiyong Park, **whan Kim

    Abstract: This paper proposes a novel approach for modeling and controlling nonlinear systems with varying parameters. The approach introduces the use of a parameter-varying Koopman operator (PVKO) in a lifted space, which provides an efficient way to understand system behavior and design control algorithms that account for underlying dynamics and changing parameters. The PVKO builds on a conventional Koopm… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 62nd IEEE Conference on Decision and Control (CDC 2023)

  33. arXiv:2309.09270   

    eess.AS cs.AI cs.SD

    Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning

    Authors: Zilu Guo, Jun Du, CHin-Hui Lee

    Abstract: In this paper, we explore a continuous modeling approach for deep-learning-based speech enhancement, focusing on the denoising process. We use a state variable to indicate the denoising process. The starting state is noisy speech and the ending state is clean speech. The noise component in the state variable decreases with the change of the state index until the noise component is 0. During traini… ▽ More

    Submitted 7 January, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: We found the results are got from some wrong experimental settings. We needs new experiments

  34. arXiv:2309.09180  [pdf, other

    eess.AS cs.AI cs.SD

    Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

    Authors: Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee

    Abstract: We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance. Next, we further decrease the memory occupation of decoding by in… ▽ More

    Submitted 26 December, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  35. arXiv:2309.08828  [pdf, other

    eess.AS cs.SD

    Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints

    Authors: Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: We propose a first step toward multilingual end-to-end automatic speech recognition (ASR) by integrating knowledge about speech articulators. The key idea is to leverage a rich set of fundamental units that can be defined "universally" across all spoken languages, referred to as speech attributes, namely manner and place of articulation. Specifically, several deterministic attribute-to-phoneme map… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  36. arXiv:2309.08348  [pdf, other

    eess.AS cs.SD

    The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

    Authors: Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, **gdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

    Abstract: Previous Multimodal Information based Speech Processing (MISP) challenges mainly focused on audio-visual speech recognition (AVSR) with commendable success. However, the most advanced back-end recognition systems often hit performance limits due to the complex acoustic environments. This has prompted a shift in focus towards the Audio-Visual Target Speaker Extraction (AVTSE) task for the MISP 2023… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures

  37. arXiv:2309.07778  [pdf, other

    eess.IV cs.CV cs.LG q-bio.TO

    Virchow: A Million-Slide Digital Pathology Foundation Model

    Authors: Eugene Vorontsov, Alican Bozkurt, Adam Casson, George Shaikovski, Michal Zelechowski, Siqi Liu, Kristen Severson, Eric Zimmermann, James Hall, Neil Tenenholtz, Nicolo Fusi, Philippe Mathieu, Alexander van Eck, Donghun Lee, Julian Viret, Eric Robert, Yi Kan Wang, Jeremy D. Kunz, Matthew C. H. Lee, Jan Bernhard, Ran A. Godrich, Gerard Oakley, Ewan Millar, Matthew Hanna, Juan Retamero , et al. (6 additional authors not shown)

    Abstract: The use of artificial intelligence to enable precision medicine and decision support systems through the analysis of pathology images has the potential to revolutionize the diagnosis and treatment of cancer. Such applications will depend on models' abilities to capture the diverse patterns observed in pathology images. To address this challenge, we present Virchow, a foundation model for computati… ▽ More

    Submitted 17 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

  38. arXiv:2308.16319  [pdf, other

    eess.SP

    A Radiological Clip Design Using Ultrasound Identification to Improve Localization

    Authors: Jenna Cario, Zhengchang Kou, Rita J. Miller, April Dickenson, Christine U. Lee, Michael L. Oelze

    Abstract: Objective: We demonstrate the use of ultrasound to receive an acoustic signal transmitted from a radiological clip designed from a custom circuit. This signal encodes an identification number and is localized and identified wirelessly by the ultrasound imaging system. Methods: We designed and constructed the test platform with a Teensy 4.0 microcontroller core to detect ultrasonic imaging pulses r… ▽ More

    Submitted 1 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: 8 pages, 6 figures, for associated .gif files, see https://drive.google.com/drive/folders/1yhRTtPJQ6mDHKmcxeqGnqy1oCQVsSDwC?usp=drive_link, submitted to IEEE Transactions on Biomedical Engineering (TBME) Revised 2/1/24: two figures converted to tables, introduction revised, results and discussion revised for n = 3 trials, in vivo experiment data added, added Rita J. Miller as author

  39. arXiv:2308.14638  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

    Authors: Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

    Abstract: This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy base… ▽ More

    Submitted 10 October, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by 2023 CHiME Workshop, Oral

  40. arXiv:2308.08488  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder

    Authors: Yusheng Dai, Hang Chen, Jun Du, Xiaofei Ding, Ning Ding, Feijun Jiang, Chin-Hui Lee

    Abstract: In recent research, slight performance improvement is observed from automatic speech recognition systems to audio-visual speech recognition systems in the end-to-end framework with low-quality videos. Unmatching convergence rates and specialized input representations between audio and visual modalities are considered to cause the problem. In this paper, we propose two novel techniques to improve a… ▽ More

    Submitted 8 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: 6 pages, 2 figures, published in ICME2023

  41. arXiv:2307.13241  [pdf, other

    eess.IV

    A Visual Quality Assessment Method for Raster Images in Scanned Document

    Authors: Justin Yang, Peter Bauer, Todd Harris, Changhyung Lee, Hyeon Seok Seo, Jan P Allebach, Fengqing Zhu

    Abstract: Image quality assessment (IQA) is an active research area in the field of image processing. Most prior works focus on visual quality of natural images captured by cameras. In this paper, we explore visual quality of scanned documents, focusing on raster image areas. Different from many existing works which aim to estimate a visual quality score, we propose a machine learning based classification m… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  42. arXiv:2307.08688  [pdf, other

    eess.AS

    Semi-supervised multi-channel speaker diarization with cross-channel attention

    Authors: Shilong Wu, Jun Du, Maokui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee

    Abstract: Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios. This paper proposes a semi-supervised speaker diarization system to utilize large-scale multi-channel training data by generating pseudo-labels for unlabeled data. Furthermore, we introduce cross-channel attention into the Neural Speaker Diarization Using Me… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 8 pages,3 figures

  43. arXiv:2307.05914  [pdf, other

    cs.NI cs.LG eess.SP

    FIS-ONE: Floor Identification System with One Label for Crowdsourced RF Signals

    Authors: Weipeng Zhuo, Ka Ho Chiu, Jierun Chen, Ziqi Zhao, S. -H. Gary Chan, Sangtae Ha, Chul-Ho Lee

    Abstract: Floor labels of crowdsourced RF signals are crucial for many smart-city applications, such as multi-floor indoor localization, geofencing, and robot surveillance. To build a prediction model to identify the floor number of a new RF signal upon its measurement, conventional approaches using the crowdsourced RF signals assume that at least few labeled signal samples are available on each floor. In t… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE ICDCS 2023

  44. arXiv:2306.14411  [pdf, other

    cs.LG eess.SP

    Score-based Source Separation with Applications to Digital Communication Signals

    Authors: Tejas Jayashankar, Gary C. F. Lee, Alejandro Lancho, Amir Weiss, Yury Polyanskiy, Gregory W. Wornell

    Abstract: We propose a new method for separating superimposed sources using diffusion-based generative models. Our method relies only on separately trained statistical priors of independent sources to establish a new objective function guided by maximum a posteriori estimation with an $α$-posterior, across multiple levels of Gaussian smoothing. Motivated by applications in radio-frequency (RF) systems, we a… ▽ More

    Submitted 17 January, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: 34 pages, 18 figures, for associated project webpage see https://alpha-rgs.github.io

  45. arXiv:2306.08527  [pdf, other

    eess.AS cs.AI cs.SD

    Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement

    Authors: Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang

    Abstract: The goal of this study is to implement diffusion models for speech enhancement (SE). The first step is to emphasize the theoretical foundation of variance-preserving (VP)-based interpolation diffusion under continuous conditions. Subsequently, we present a more concise framework that encapsulates both the VP- and variance-exploding (VE)-based interpolation diffusion methods. We demonstrate that th… ▽ More

    Submitted 17 September, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  46. arXiv:2306.05283  [pdf, other

    eess.SP cs.LG

    A Method for Detecting Murmurous Heart Sounds based on Self-similar Properties

    Authors: Dixon Vimalajeewa, Chihoon Lee, Brani Vidakovic

    Abstract: A heart murmur is an atypical sound produced by the flow of blood through the heart. It can be a sign of a serious heart condition, so detecting heart murmurs is critical for identifying and managing cardiovascular diseases. However, current methods for identifying murmurous heart sounds do not fully utilize the valuable insights that can be gained by exploring intrinsic properties of heart sound… ▽ More

    Submitted 28 May, 2023; originally announced June 2023.

  47. arXiv:2306.03297  [pdf, other

    cs.IT eess.SP

    ISI-Mitigating Character Encoding for Molecular communications via Diffusion

    Authors: Haewoong Hyun Changmin Lee, Miaowen Wen, Sang-Hyo Kim, Chan-Byoung Chae

    Abstract: This letter introduces a novel algorithm for generating codebooks in molecular communications (MC). The proposed algorithm utilizes character entropy to effectively mitigate inter-symbol interference (ISI) during MC via diffusion. Based on Huffman coding, the algorithm ensures that consecutive bit-1s are avoided in the resulting codebook. Additionally, the error-correction process at the receiver… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: 5 pages, 4 figures

  48. arXiv:2306.02634  [pdf, other

    physics.optics cs.CV cs.LG eess.IV

    Computational 3D topographic microscopy from terabytes of data per sample

    Authors: Kevin C. Zhou, Mark Harfouche, Maxwell Zheng, Joakim Jönsson, Kyung Chul Lee, Ron Appel, Paul Reamey, Thomas Doman, Veton Saliu, Gregor Horstmeyer, Roarke Horstmeyer

    Abstract: We present a large-scale computational 3D topographic microscope that enables 6-gigapixel profilometric 3D imaging at micron-scale resolution across $>$110 cm$^2$ areas over multi-millimeter axial ranges. Our computational microscope, termed STARCAM (Scanning Topographic All-in-focus Reconstruction with a Computational Array Microscope), features a parallelized, 54-camera architecture with 3-axis… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  49. arXiv:2306.00331  [pdf, other

    eess.AS cs.AI cs.SD eess.SP eess.SY

    A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models

    Authors: Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: We propose a multi-dimensional structured state space (S4) approach to speech enhancement. To better capture the spectral dependencies across the frequency axis, we focus on modifying the multi-dimensional S4 layer with whitening transformation to build new small-footprint models that also achieve good performance. We explore several S4-based deep architectures in time (T) and time-frequency (TF)… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech 2023. Code will be released at https://github.com/Kuray107/S4ND-U-Net_speech_enhancement

  50. arXiv:2305.05085  [pdf, other

    physics.optics eess.IV

    Tensorial tomographic Fourier Ptychography with applications to muscle tissue imaging

    Authors: Shiqi Xu, Xiang Dai, Paul Ritter, Kyung Chul Lee, Xi Yang, Lucas Kreiss, Kevin C. Zhou, Kanghyun Kim, Amey Chaware, Jadee Neff, Carolyn Glass, Seung Ah Lee, Oliver Friedrich, Roarke Horstmeyer

    Abstract: We report Tensorial tomographic Fourier Ptychography (ToFu), a new non-scanning label-free tomographic microscopy method for simultaneous imaging of quantitative phase and anisotropic specimen information in 3D. Built upon Fourier Ptychography, a quantitative phase imaging technique, ToFu additionally highlights the vectorial nature of light. The imaging setup consists of a standard microscope equ… ▽ More

    Submitted 13 May, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Journal ref: Tensorial tomographic Fourier Ptychography with applications to muscle tissue imaging, Adv. Photon. 6(2), 026004 (2024)