Skip to main content

Showing 1–50 of 129 results for author: Yu, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.11619  [pdf, other

    eess.AS cs.LG

    AV-CrossNet: an Audiovisual Complex Spectral Map** Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling

    Authors: Vahid Ahmadi Kalkhorani, Cheng Yu, Anurag Kumar, Ke Tan, Buye Xu, DeLiang Wang

    Abstract: Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the CrossNet architecture, which is a recently proposed network that performs complex spectral map** for speech separation by lever… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 Figures, and 4 Tables

  2. arXiv:2406.06252  [pdf, other

    eess.SP cs.CR

    Random Time-hop** Secure Ranging Strategy Against Distance-Reduction Attacks in UWB

    Authors: Wenlong Gou, Chuanhang Yu, Gang Wu

    Abstract: In order to mitigate the distance reduction attack in Ultra-Wide Band (UWB) ranging, this paper proposes a secure ranging scheme based on a random time-hop** mechanism without redundant signaling overhead. Additionally, a secure ranging strategy is designed for backward compatibility with existing standards such as IEEE 802.15.4a/z, combined with an attack detection scheme. The effectiveness and… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    ACM Class: H.1.1

  3. arXiv:2406.05128  [pdf, other

    eess.AS cs.SD

    Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis

    Authors: Chin-Yun Yu, György Fazekas

    Abstract: Training the linear prediction (LP) operator end-to-end for audio synthesis in modern deep learning frameworks is slow due to its recursive formulation. In addition, frame-wise approximation as an acceleration method cannot generalise well to test time conditions where the LP is computed sample-wise. Efficient differentiable sample-wise LP for end-to-end training is the key to removing this barrie… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  4. arXiv:2406.02126  [pdf, other

    eess.SY cs.AI cs.LG cs.MA

    CityLight: A Universal Model Towards Real-world City-scale Traffic Signal Control Coordination

    Authors: **wei Zeng, Chao Yu, Xinyi Yang, Wenxuan Ao, Jian Yuan, Yong Li, Yu Wang, Huazhong Yang

    Abstract: Traffic signal control (TSC) is a promising low-cost measure to enhance transportation efficiency without affecting existing road infrastructure. While various reinforcement learning-based TSC methods have been proposed and experimentally outperform conventional rule-based methods, none of them has been deployed in the real world. An essential gap lies in the oversimplification of the scenarios in… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.18255  [pdf, other

    cs.CR cs.SI eess.SP

    Channel Reciprocity Based Attack Detection for Securing UWB Ranging by Autoencoder

    Authors: Wenlong Gou, Chuanhang Yu, Juntao Ma, Gang Wu, Vladimir Mordachev

    Abstract: A variety of ranging threats represented by Ghost Peak attack have raised concerns regarding the security performance of Ultra-Wide Band (UWB) systems with the finalization of the IEEE 802.15.4z standard. Based on channel reciprocity, this paper proposes a low complexity attack detection scheme that compares Channel Impulse Response (CIR) features of both ranging sides utilizing an autoencoder wit… ▽ More

    Submitted 10 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    ACM Class: H.1.1

  6. arXiv:2405.06804  [pdf, other

    cs.SD eess.AS eess.SP

    Time-of-arrival Estimation and Phase Unwrap** of Head-related Transfer Functions With Integer Linear Programming

    Authors: Chin-Yun Yu, Johan Pauwels, György Fazekas

    Abstract: In binaural audio synthesis, aligning head-related impulse responses (HRIRs) in time has been an important pre-processing step, enabling accurate spatial interpolation and efficient data compression. The maximum correlation time delay between spatially nearby HRIRs has previously been used to get accurate and smooth alignment by solving a matrix equation in which the solution has the minimum Eucli… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted to be presented at Audio Engineering Society 156th Convention, 2024 June, Madrid, Spain

  7. arXiv:2404.07970  [pdf, other

    eess.AS cs.LG cs.SD

    Differentiable All-pole Filters for Time-varying Audio Systems

    Authors: Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D. Reiss, György Fazekas

    Abstract: Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous wo… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted at DAFx 2024

  8. arXiv:2403.08434  [pdf, other

    cs.RO eess.SY

    GRF-based Predictive Flocking Control with Dynamic Pattern Formation

    Authors: Chenghao Yu, Dengyu Zhang, Qingrui Zhang

    Abstract: It is promising but challenging to design flocking control for a robot swarm to autonomously follow changing patterns or shapes in a optimal distributed manner. The optimal flocking control with dynamic pattern formation is, therefore, investigated in this paper. A predictive flocking control algorithm is proposed based on a Gibbs random field (GRF), where bio-inspired potential energies are used… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by ICRA 2024

  9. arXiv:2402.17247  [pdf, ps, other

    eess.SY math.OC

    Inverse Optimal Control for Linear Quadratic Tracking with Unknown Target States

    Authors: Yao Li, Chengpu Yu, Hao Fang, Jie Chen

    Abstract: This paper addresses the inverse optimal control for the linear quadratic tracking problem with a fixed but unknown target state, which aims to estimate the possible triplets comprising the target state, the state weight matrix, and the input weight matrix from observed optimal control input and the corresponding state trajectories. Sufficient conditions have been provided for the unique determina… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  10. arXiv:2311.07345  [pdf, other

    eess.AS cs.SD

    Zero-Shot Duet Singing Voices Separation with Diffusion Models

    Authors: Chin-Yun Yu, Emilian Postolache, Emanuele Rodolà, György Fazekas

    Abstract: In recent studies, diffusion models have shown promise as priors for solving audio inverse problems. These models allow us to sample from the posterior distribution of a target signal given an observed signal by manipulating the diffusion process. However, when separating audio sources of the same type, such as duet singing voices, the prior learned by the diffusion process may not be sufficient t… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 9 pages, 1 figure. Published at Sound Demixing Workshop 2023

  11. arXiv:2311.07169  [pdf, other

    eess.SP

    CASTER: A Computer-Vision-Assisted Wireless Channel Simulator for Gesture Recognition

    Authors: Zhenyu Ren, Guoliang Li, Chenqing Ji, Chao Yu, Shuai Wang, Rui Wang

    Abstract: In this paper, a computer-vision-assisted simulation method is proposed to address the issue of training dataset acquisition for wireless hand gesture recognition. In the existing literature, in order to classify gestures via the wireless channel estimation, massive training samples should be measured in a consistent environment, consuming significant efforts. In the proposed CASTER simulator, how… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: 10 pages, 11 figures

  12. arXiv:2311.06916  [pdf

    eess.SY cs.AI

    TSViT: A Time Series Vision Transformer for Fault Diagnosis

    Authors: Shouhua Zhang, Jiehan Zhou, Xue Ma, Chenglin Wen, Susanna Pirttikangas, Chen Yu, Weishan Zhang, Chunsheng Yang

    Abstract: Traditional fault diagnosis methods using Convolutional Neural Networks (CNNs) face limitations in capturing temporal features (i.e., the variation of vibration signals over time). To address this issue, this paper introduces a novel model, the Time Series Vision Transformer (TSViT), specifically designed for fault diagnosis. On one hand, TSViT model integrates a convolutional layer to segment vib… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  13. arXiv:2311.02554  [pdf, other

    cs.CR eess.SP

    Pilot-Based Key Distribution and Encryption for Secure Coherent Passive Optical Networks

    Authors: Haide Wang, Ji Zhou, Qingxin Lu, Jianrui Zeng, Yongqing Liao, Wei** Liu, Changyuan Yu, Zhaohui Li

    Abstract: The security issues of passive optical networks (PONs) have always been a concern due to broadcast transmission. Physical-layer security enhancement for the coherent PON should be as significant as improving transmission performance. In this paper, we propose the advanced encryption standard (AES) algorithm and geometric constellation sha** four-level pulse amplitude modulation (GCS-PAM4) pilot-… ▽ More

    Submitted 25 December, 2023; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: The paper has been submitted to the Journal of Lightwave Technology

  14. arXiv:2311.01781  [pdf, other

    eess.SY

    Passive Handwriting Tracking via Weak mmWave Communication Signals

    Authors: Chao Yu, Yan Luo, Renqi Chen, Rui Wang

    Abstract: In this letter, a cooperative sensing framework based on millimeter wave (mmWave) communication systems is proposed to detect tiny motions with a millimeter-level resolution. Particularly, the cooperative sensing framework is facilitated with one transmitter and two receivers. There are two radio frequency (RF) chains at each receiver. Hence, the Doppler effect due to the tiny motions can be detec… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  15. arXiv:2310.17864  [pdf, other

    eess.AS cs.SD

    TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

    Authors: Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, **chuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

    Abstract: TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by develo** impactful features. Here, we survey TorchAudio's devel… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  16. arXiv:2310.12709  [pdf, other

    cs.IT eess.SP

    Capacity Limitation and Optimization Strategy for Flexible Point-to-Multi-Point Optical Networks

    Authors: Ji Zhou, Haide Wang, Liangchuan Li, Wei** Liu, Changyuan Yu, Zhaohui Li

    Abstract: Point-to-multi-point (PtMP) optical networks become the main solutions for network-edge applications such as passive optical networks and radio access networks. Entropy-loading digital subcarrier multiplexing (DSCM) is the core technology to achieve low latency and approach high capacity for flexible PtMP optical networks. However, the high peak-to-average power ratio of the entropy-loading DSCM s… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: The paper has been submitted to the IEEE Transactions on Communications

  17. Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification

    Authors: Yuanbo Hou, Siyang Song, Chuang Yu, Wenwu Wang, Dick Botteldooren

    Abstract: Most deep learning-based acoustic scene classification (ASC) approaches identify scenes based on acoustic features converted from audio clips containing mixed information entangled by polyphonic audio events (AEs). However, these approaches have difficulties in explaining what cues they use to identify scenes. This paper conducts the first study on disclosing the relationship between real-life aco… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: IEEE Signal Processing Letters, doi: 10.1109/LSP.2023.3319233

  18. arXiv:2309.10299  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Using fine-tuning and min lookahead beam search to improve Whisper

    Authors: Andrea Do, Oscar Brown, Zhengjie Wang, Nikhil Mathew, Zixin Liu, Jawwad Ahmed, Cheng Yu

    Abstract: The performance of Whisper in low-resource languages is still far from perfect. In addition to a lack of training data on low-resource languages, we identify some limitations in the beam search algorithm used in Whisper. To address these issues, we fine-tune Whisper on additional data and propose an improved decoding algorithm. On the Vietnamese language, fine-tuning Whisper-Tiny with LoRA leads t… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 8 pages, submitted to IEEE ICASSP 2024

  19. arXiv:2309.04156  [pdf, other

    cs.SD cs.CL eess.AS

    Cross-Utterance Conditioned VAE for Speech Generation

    Authors: Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun

    Abstract: Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance Conditioned Variational Autoencoder speech synthesis (CUC-VAE S2) framework to enhance prosody and ensure natural speech generation. This framework leverages the powerful representat… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 13 pages;

  20. arXiv:2309.00853  [pdf

    eess.IV cs.CV

    Correlated and Multi-frequency Diffusion Modeling for Highly Under-sampled MRI Reconstruction

    Authors: Yu Guan, Chuanming Yu, Shiyu Lu, Zhuoxu Cui, Dong Liang, Qiegen Liu

    Abstract: Most existing MRI reconstruction methods perform tar-geted reconstruction of the entire MR image without tak-ing specific tissue regions into consideration. This may fail to emphasize the reconstruction accuracy on im-portant tissues for diagnosis. In this study, leveraging a combination of the properties of k-space data and the diffusion process, our novel scheme focuses on mining the multi-frequ… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

  21. arXiv:2308.03772  [pdf, other

    cs.CV cs.AI cs.GR cs.LG eess.IV

    Improved Neural Radiance Fields Using Pseudo-depth and Fusion

    Authors: **gliang Li, Qiang Zhou, Chaohui Yu, Zhengda Lu, Jun Xiao, Zhibin Wang, Fan Wang

    Abstract: Since the advent of Neural Radiance Fields, novel view synthesis has received tremendous attention. The existing approach for the generalization of radiance field reconstruction primarily constructs an encoding volume from nearby source images as additional inputs. However, these approaches cannot efficiently encode the geometric information of real scenes with various scale objects/structures. In… ▽ More

    Submitted 27 July, 2023; originally announced August 2023.

  22. arXiv:2308.03382  [pdf, ps, other

    eess.IV cs.CV cs.LG

    Enhancing Nucleus Segmentation with HARU-Net: A Hybrid Attention Based Residual U-Blocks Network

    Authors: Junzhou Chen, Qian Huang, Yulin Chen, Linyi Qian, Chengyuan Yu

    Abstract: Nucleus image segmentation is a crucial step in the analysis, pathological diagnosis, and classification, which heavily relies on the quality of nucleus segmentation. However, the complexity of issues such as variations in nucleus size, blurred nucleus contours, uneven staining, cell clustering, and overlap** cells poses significant challenges. Current methods for nucleus segmentation primarily… ▽ More

    Submitted 10 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Nucleus segmentation, Deep learning, Instance segmentation, Medical imaging, Dual-Branch network

  23. arXiv:2307.14491  [pdf, other

    cs.MM cs.SD eess.AS

    A Unified Framework for Modality-Agnostic Deepfakes Detection

    Authors: Cai Yu, Peng Chen, Jiahe Tian, ** Liu, Jiao Dai, Xi Wang, Yesheng Chai, Shan Jia, Siwei Lyu, Jizhong Han

    Abstract: As AI-generated content (AIGC) thrives, deepfakes have expanded from single-modality falsification to cross-modal fake content creation, where either audio or visual components can be manipulated. While using two unimodal detectors can detect audio-visual deepfakes, cross-modal forgery clues could be overlooked. Existing multimodal deepfake detection methods typically establish correspondence betw… ▽ More

    Submitted 24 October, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  24. arXiv:2307.11320  [pdf, ps, other

    stat.ME eess.SY

    Sparse plus low-rank identification for dynamical latent-variable graphical AR models

    Authors: Junyao You, Chengpu Yu

    Abstract: This paper focuses on the identification of graphical autoregressive models with dynamical latent variables. The dynamical structure of latent variables is described by a matrix polynomial transfer function. Taking account of the sparse interactions between the observed variables and the low-rank property of the latent-variable model, a new sparse plus low-rank optimization problem is formulated t… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  25. arXiv:2306.17252  [pdf, other

    eess.AS cs.SD

    Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables

    Authors: Chin-Yun Yu, György Fazekas

    Abstract: This paper introduces GlOttal-flow LPC Filter (GOLF), a novel method for singing voice synthesis (SVS) that exploits the physical characteristics of the human voice using differentiable digital signal processing. GOLF employs a glottal model as the harmonic source and IIR filters to simulate the vocal tract, resulting in an interpretable and efficient approach. We show it is competitive with state… ▽ More

    Submitted 12 July, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 9 pages, 4 figures. Accepted at ISMIR 2023

  26. arXiv:2306.15167  [pdf, other

    cs.IT eess.SP

    An Efficient Global Algorithm for One-Bit Maximum-Likelihood MIMO Detection

    Authors: Cheng-Yang Yu, Mingjie Shao, Wei-Kun Chen, Ya-Feng Liu, Wing-Kin Ma

    Abstract: There has been growing interest in implementing massive MIMO systems by one-bit analog-to-digital converters (ADCs), which have the benefit of reducing the power consumption and hardware complexity. One-bit MIMO detection arises in such a scenario. It aims to detect the multiuser signals from the one-bit quantized received signals in an uplink channel. In this paper, we consider one-bit maximum-li… ▽ More

    Submitted 3 July, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

  27. arXiv:2306.11325  [pdf, other

    cs.IT eess.SP

    Non-Integer-Oversampling Digital Signal Processing for Coherent Passive Optical Networks

    Authors: Haide Wang, Ji Zhou, **yang Yang, Jianrui Zeng, Wei** Liu, Changyuan Yu, Fan Li, Zhaohui Li

    Abstract: Beyond 100G passive optical networks (PONs) will be required to meet the ever-increasing traffic demand in the future. Coherent optical technologies are the competitive solutions for the future beyond 100G PON but also face challenges such as the high computational complexity of digital signal processing (DSP). A high oversampling rate in coherent optical technologies results in the high computati… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: This paper has been submitted to the Journal of Optical Communications and Networking

  28. arXiv:2304.04938  [pdf, other

    eess.SP

    Timing Recovery for Point-to-Multi-Point Coherent Passive Optical Networks

    Authors: Ji Zhou, **yang Yang, Haide Wang, Jianrui Zeng, Changyuan Yu

    Abstract: We propose a timing recovery for point-to-multi-point coherent passive optical networks. The results show that the proposed algorithm has low complexity and better robustness against the residual chromatic dispersion.

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: The artical have been submitted to SPPCom conference

  29. arXiv:2303.04667  [pdf, other

    eess.IV cs.CV

    STPDnet: Spatial-temporal convolutional primal dual network for dynamic PET image reconstruction

    Authors: Rui Hu, Jianan Cui, Cheng** Yu, Yunmei Chen, Huafeng Liu

    Abstract: Dynamic positron emission tomography (dPET) image reconstruction is extremely challenging due to the limited counts received in individual frame. In this paper, we propose a spatial-temporal convolutional primal dual network (STPDnet) for dynamic PET image reconstruction. Both spatial and temporal correlations are encoded by 3D convolution operators. The physical projection of PET is embedded in t… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: ISBI2023 accepted

  30. arXiv:2302.11203  [pdf, other

    eess.SP

    mmAlert: mmWave Link Blockage Prediction via Passive Sensing

    Authors: Chao Yu, Yifei Sun, Yan Luo, Rui Wang

    Abstract: In this letter, the mmAlert system, predicting millimeter wave (mmWave) link blockage during data communication, is elaborated and demonstrated. The passive sensing method is adopted for mobile blocker detection, where two receive beams with separated radio frequency (RF) chains are equipped at the data communication receiver. One receive beam is aligned to the direction of line-of-sight (LoS) pat… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

  31. arXiv:2212.07599  [pdf

    eess.IV cs.CV

    Universal Generative Modeling in Dual-domain for Dynamic MR Imaging

    Authors: Chuanming Yu, Yu Guan, Ziwen Ke, Dong Liang, Qiegen Liu

    Abstract: Dynamic magnetic resonance image reconstruction from incomplete k-space data has generated great research interest due to its capability to reduce scan time. Never-theless, the reconstruction problem is still challenging due to its ill-posed nature. Recently, diffusion models espe-cially score-based generative models have exhibited great potential in algorithm robustness and usage flexi-bility. Mo… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 12 pages, 11 figures

  32. arXiv:2211.09731  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

    Authors: Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran

    Abstract: Stuttering is a speech disorder where the natural flow of speech is interrupted by blocks, repetitions or prolongations of syllables, words and phrases. The majority of existing automatic speech recognition (ASR) interfaces perform poorly on utterances with stutter, mainly due to lack of matched training data. Synthesis of speech with stutter thus presents an opportunity to improve ASR for this ty… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 8 pages, 3 figures, 2 tables

    Journal ref: NeurIPS Workshop on SyntheticData4ML, December 2022

  33. arXiv:2211.05396  [pdf

    cs.CV eess.IV

    Learning Visual Representation of Underwater Acoustic Imagery Using Transformer-Based Style Transfer Method

    Authors: Xiaoteng Zhou, Changli Yu, Shihao Yuan, Xin Yuan, Hangchi Yu, Citong Luo

    Abstract: Underwater automatic target recognition (UATR) has been a challenging research topic in ocean engineering. Although deep learning brings opportunities for target recognition on land and in the air, underwater target recognition techniques based on deep learning have lagged due to sensor performance and the size of trainable data. This letter proposed a framework for learning the visual representat… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: 11 pages, 9 figures, conference

  34. arXiv:2211.00887  [pdf, other

    quant-ph cs.LG cs.NE eess.SP

    Certified Robustness of Quantum Classifiers against Adversarial Examples through Quantum Noise

    Authors: Jhih-Cing Huang, Yu-Lin Tsai, Chao-Han Huck Yang, Cheng-Fang Su, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo

    Abstract: Recently, quantum classifiers have been found to be vulnerable to adversarial attacks, in which quantum classifiers are deceived by imperceptible noises, leading to misclassification. In this paper, we propose the first theoretical study demonstrating that adding quantum random rotation noise can improve robustness in quantum classifiers against adversarial attacks. We link the definition of diffe… ▽ More

    Submitted 28 April, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE ICASSP 2023

  35. arXiv:2210.15793  [pdf, ps, other

    eess.AS cs.SD eess.SP

    Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution

    Authors: Chin-Yun Yu, Sung-Lin Yeh, György Fazekas, Hao Tang

    Abstract: Recently, diffusion models (DMs) have been increasingly used in audio processing tasks, including speech super-resolution (SR), which aims to restore high-frequency content given low-resolution speech utterances. This is commonly achieved by conditioning the network of noise predictor with low-resolution audio. In this paper, we propose a novel sampling algorithm that communicates the information… ▽ More

    Submitted 24 November, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  36. arXiv:2210.15366  [pdf, other

    eess.AS cs.SD

    Multi-dimensional Edge-based Audio Event Relational Graph Representation Learning for Acoustic Scene Classification

    Authors: Yuanbo Hou, Siyang Song, Chuang Yu, Yuxin Song, Wenwu Wang, Dick Botteldooren

    Abstract: Most existing deep learning-based acoustic scene classification (ASC) approaches directly utilize representations extracted from spectrograms to identify target scenes. However, these approaches pay little attention to the audio events occurring in the scene despite they provide crucial semantic information. This paper conducts the first study that investigates whether real-life acoustic scenes ca… ▽ More

    Submitted 1 November, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

  37. arXiv:2210.09378  [pdf, other

    cs.RO cs.AI cs.MA eess.SY

    Learning Control Admissibility Models with Graph Neural Networks for Multi-Agent Navigation

    Authors: Chenning Yu, Hongzhan Yu, Sicun Gao

    Abstract: Deep reinforcement learning in continuous domains focuses on learning control policies that map states to distributions over actions that ideally concentrate on the optimal choices in each step. In multi-agent navigation problems, the optimal actions depend heavily on the agents' density. Their interaction patterns grow exponentially with respect to such density, making it hard for learning-based… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  38. arXiv:2209.03272  [pdf

    eess.SP

    Compact and Robust Deep Learning Architecture for Fluorescence Lifetime Imaging and FPGA Implementation

    Authors: Zhenya Zang, Dong Xiao, Quan Wang, Ziao Jiao, Chen Yu, David Day-Uei Li

    Abstract: This paper reported a bespoke adder-based deep learning network for time-domain fluorescence lifetime imaging (FLIM). By leveraging the l1-norm extraction method, we propose a 1-D Fluorescence Lifetime AdderNet (FLAN) without multiplication-based convolutions to reduce the computational complexity. Further, we compressed fluorescence decays in temporal dimension using a log-scale merging technique… ▽ More

    Submitted 9 September, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: 13 pages, 14 figures

  39. arXiv:2208.01702  [pdf, other

    eess.IV cs.CV physics.optics

    Non-Line-of-Sight Tracking and Map** with an Active Corner Camera

    Authors: Sheila Seidel, Hoover Rueda-Chacon, Iris Cusini, Federica Villa, Franco Zappa, Christopher Yu, Vivek K Goyal

    Abstract: The ability to form non-line-of-sight (NLOS) images of changing scenes could be transformative in a variety of fields, including search and rescue, autonomous vehicle navigation, and reconnaissance. Most existing active NLOS methods illuminate the hidden scene using a pulsed laser directed at a relay surface and collect time-resolved measurements of returning light. The prevailing approaches inclu… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

  40. Towards Better Dermoscopic Image Feature Representation Learning for Melanoma Classification

    Authors: ChengHui Yu, MingKang Tang, ShengGe Yang, MingQing Wang, Zhe Xu, JiangPeng Yan, HanMo Chen, Yu Yang, Xiao-Jun Zeng, Xiu Li

    Abstract: Deep learning-based melanoma classification with dermoscopic images has recently shown great potential in automatic early-stage melanoma diagnosis. However, limited by the significant data imbalance and obvious extraneous artifacts, i.e., the hair and ruler markings, discriminative feature extraction from dermoscopic images is very challenging. In this study, we seek to resolve these problems resp… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: ICONIP 2021 conference

  41. arXiv:2207.03105  [pdf

    q-bio.TO cs.CV eess.IV physics.med-ph

    Uncertainty-Aware Self-supervised Neural Network for Liver $T_{1ρ}$ Map** with Relaxation Constraint

    Authors: Chaoxing Huang, Yurui Qian, Simon Chun Ho Yu, Jian Hou, Baiyan Jiang, Queenie Chan, Vincent Wai-Sun Wong, Winnie Chiu-Wing Chu, Weitian Chen

    Abstract: $T_{1ρ}$ map** is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1ρ}$ from a reduced number of $T_{1ρ}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1ρ}… ▽ More

    Submitted 25 October, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Provisionally accepted by Physics in Medicine and Biology

  42. arXiv:2206.07364  [pdf, other

    eess.IV cs.CV

    Seeking Common Ground While Reserving Differences: Multiple Anatomy Collaborative Framework for Undersampled MRI Reconstruction

    Authors: Jiangpeng Yan, Chenghui Yu, Hanbo Chen, Zhe Xu, Junzhou Huang, Xiu Li, Jianhua Yao

    Abstract: Recently, deep neural networks have greatly advanced undersampled Magnetic Resonance Image (MRI) reconstruction, wherein most studies follow the one-anatomy-one-network fashion, i.e., each expert network is trained and evaluated for a specific anatomy. Apart from inefficiency in training multiple independent models, such convention ignores the shared de-aliasing knowledge across various anatomies… ▽ More

    Submitted 15 June, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: submitted to an IEEE journal

  43. arXiv:2205.04120  [pdf, other

    cs.SD cs.CL eess.AS

    Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech

    Authors: Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, Jun Wang

    Abstract: Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: ACL 2022 camera ready

  44. arXiv:2204.03204  [pdf

    eess.IV cs.CV

    Convolutional Neural Network for Early Pulmonary Embolism Detection via Computed Tomography Pulmonary Angiography

    Authors: Ching-Yuan Yu, Ming-Che Chang, Yun-Chien Cheng, Chin Kuo

    Abstract: This study was conducted to develop a computer-aided detection (CAD) system for triaging patients with pulmonary embolism (PE). The purpose of the system was to reduce the death rate during the waiting period. Computed tomography pulmonary angiography (CTPA) is used for PE diagnosis. Because CTPA reports require a radiologist to review the case and suggest further management, this creates a waitin… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

  45. arXiv:2203.17152  [pdf, other

    cs.SD cs.CL eess.AS

    Perceptual Contrast Stretching on Target Feature for Speech Enhancement

    Authors: Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

    Abstract: Speech enhancement (SE) performance has improved considerably owing to the use of deep learning models as a base function. Herein, we propose a perceptual contrast stretching (PCS) approach to further improve SE performance. The PCS is derived based on the critical band importance function and is applied to modify the targets of the SE model. Specifically, the contrast of target features is stretc… ▽ More

    Submitted 15 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech 2022

  46. arXiv:2203.14588  [pdf, other

    eess.SP

    Passive Motion Detection via mmWave Communication System

    Authors: Jie Li, Chao Yu, Yan Luo, Yifei Sun, Rui Wang

    Abstract: In this paper, an integrated passive sensing and communication system working in 60 GHz band is elaborated, and the sensing performance is investigated in an application of hand gesture recognition. Specifically, in this integrated system, there are two radio frequency (RF) chains at the receiver and one at the transmitter. Each RF chain is connected with one phased array for analog beamforming. T… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

  47. arXiv:2203.13699  [pdf, other

    cs.CV eess.IV

    Unsupervised Image Deraining: Optimization Model Driven Deep CNN

    Authors: Changfeng Yu, Yi Chang, Yi Li, Xile Zhao, Luxin Yan

    Abstract: The deep convolutional neural network has achieved significant progress for single image rain streak removal. However, most of the data-driven learning methods are full-supervised or semi-supervised, unexpectedly suffering from significant performance drops when dealing with real rain. These data-driven learning methods are representative yet generalize poor for real rain. The opposite holds true… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: Accept to 2021ACMMM

  48. arXiv:2203.02118  [pdf, other

    cs.RO eess.SY

    OmniWheg: An Omnidirectional Wheel-Leg Transformable Robot

    Authors: Ruixiang Cao, Jun Gu, Chen Yu, Andre Rosendo

    Abstract: This paper presents the design, analysis, and performance evaluation of an omnidirectional transformable wheel-leg robot called OmniWheg. We design a novel mechanism consisting of a separable omni-wheel and 4-bar linkages, allowing the robot to transform between omni-wheeled and legged modes smoothly. In wheeled mode, the robot can move in all directions and efficiently adjust the relative positio… ▽ More

    Submitted 25 July, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: 6 pages, 10 figures, IROS

  49. arXiv:2202.12611  [pdf, ps, other

    cond-mat.mtrl-sci eess.IV

    Phase Object Reconstruction for 4D-STEM using Deep Learning

    Authors: Thomas Friedrich, Chu-** Yu, Jo Verbeeck, Sandra Van Aert

    Abstract: In this study we explore the possibility to use deep learning for the reconstruction of phase images from 4D scanning transmission electron microscopy (4D-STEM) data. The process can be divided into two main steps. First, the complex electron wave function is recovered for a convergent beam electron diffraction pattern (CBED) using a convolutional neural network (CNN). Subsequently a corresponding… ▽ More

    Submitted 30 August, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

  50. arXiv:2202.05256  [pdf, other

    eess.AS cs.LG cs.SD

    Conditional Diffusion Probabilistic Model for Speech Enhancement

    Authors: Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao

    Abstract: Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorit… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.