Skip to main content

Showing 1–50 of 197 results for author: Li, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19205  [pdf, other

    eess.SP

    Coordinated RSMA for Integrated Sensing and Communication in Emergency UAV Systems

    Authors: Binghan Yao, Ruoguang Li, Yingyang Chen, Li Wang

    Abstract: Recently, unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) is emerging as a promising technique for achieving robust and rapid emergency response capabilities. Such a novel framework offers high-quality and cost-efficient C\&S services due to the intrinsic flexibility and mobility of UAVs. In parallel, rate-splitting multiple access (RSMA) is able to achieve a tail… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.12164  [pdf, other

    cs.SD cs.AI eess.AS

    A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis

    Authors: Guoqiang Hu, Huaning Tan, Ruilai Li

    Abstract: Acoustic features play an important role in improving the quality of the synthesised speech. Currently, the Mel spectrogram is a widely employed acoustic feature in most acoustic models. However, due to the fine-grained loss caused by its Fourier transform process, the clarity of speech synthesised by Mel spectrogram is compromised in mutant signals. In order to obtain a more detailed Mel spectrog… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. Interpretable modulated differentiable STFT and physics-informed balanced spectrum metric for freight train wheelset bearing cross-machine transfer fault diagnosis under speed fluctuations

    Authors: Chao He, Hongmei Shi, Ruixin Li, Jianbo Li, ZuJun Yu

    Abstract: The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentia… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Journal ref: Advanced Engineering Informatics, 2024

  4. arXiv:2406.09082  [pdf

    eess.SY cs.AI

    Data-driven modeling and supervisory control system optimization for plug-in hybrid electric vehicles

    Authors: Hao Zhang, Nuo Lei, Boli Chen, Bingbing Li, Rulong Li, Zhi Wang

    Abstract: Learning-based intelligent energy management systems for plug-in hybrid electric vehicles (PHEVs) are crucial for achieving efficient energy utilization. However, their application faces system reliability challenges in the real world, which prevents widespread acceptance by original equipment manufacturers (OEMs). This paper begins by establishing a PHEV model based on physical and data-driven mo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2406.02429  [pdf, other

    eess.AS cs.SD

    Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion

    Authors: Ruiqi Li, Rongjie Huang, Yongqi Wang, Zhiqing Hong, Zhou Zhao

    Abstract: Speech-to-singing voice conversion (STS) task always suffers from data scarcity, because it requires paired speech and singing data. Compounding this issue are the challenges of content-pitch alignment and the suboptimal quality of generated outputs, presenting significant hurdles in STS research. This paper presents SVPT, an STS approach boosted by a self-supervised singing voice pre-training mod… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 13 pages

  6. arXiv:2406.01922  [pdf, ps, other

    eess.SP cs.IT

    Performance Analysis of Hybrid Cellular and Cell-free MIMO Network

    Authors: Zhuoyin Dai, **gran Xu, Xiaoli Xu, Ruoguang Li, Yong Zeng

    Abstract: Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communicatio… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  7. arXiv:2406.00320  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching

    Authors: Yongqi Wang, Wenxiang Guo, Rongjie Huang, Jiawei Huang, Zehan Wang, Fuming You, Ruiqi Li, Zhou Zhao

    Abstract: Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video, and it remains challenging to build V2A models with high generation quality, efficiency, and visual-audio temporal synchrony. We propose Frieren, a V2A model based on rectified flow matching. Frieren regresses the conditional transport vector field from noise to spectrogram latent with straight paths and c… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  8. arXiv:2405.14210  [pdf, other

    cs.CV eess.IV

    Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds

    Authors: Hanwei Zhang, Luo Cheng, Qisong He, Wei Huang, Renjue Li, Ronan Sicre, Xiaowei Huang, Holger Hermanns, Lijun Zhang

    Abstract: Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect tha… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint

  9. arXiv:2405.10068  [pdf, other

    eess.IV cs.CV

    MrRegNet: Multi-resolution Mask Guided Convolutional Neural Network for Medical Image Registration with Large Deformations

    Authors: Ruizhe Li, Grazziela Figueredo, Dorothee Auer, Christian Wagner, Xin Chen

    Abstract: Deformable image registration (alignment) is highly sought after in numerous clinical applications, such as computer aided diagnosis and disease progression analysis. Deep Convolutional Neural Network (DCNN)-based image registration methods have demonstrated advantages in terms of registration accuracy and computational speed. However, while most methods excel at global alignment, they often perfo… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at IEEE International Symposium on Biomedical Imaging (ISBI) 2024

  10. arXiv:2405.10025  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models

    Authors: Yuchen Hu, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng, Ruizhe Li

    Abstract: Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which aims to predict the ground-truth transcription from the decoded N-best hypotheses. Thanks to the strong language generation ability of LLMs and rich information in the N-best list, GER shows great effectiveness in enhancing ASR results. However, it still suf… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 14 pages, Accepted by ACL 2024

  11. arXiv:2405.09940  [pdf, other

    eess.AS cs.SD

    Robust Singing Voice Transcription Serves Synthesis

    Authors: Ruiqi Li, Yu Zhang, Yongqi Wang, Zhiqing Hong, Rongjie Huang, Zhou Zhao

    Abstract: Note-level Automatic Singing Voice Transcription (AST) converts singing recordings into note sequences, facilitating the automatic annotation of singing datasets for Singing Voice Synthesis (SVS) applications. Current AST methods, however, struggle with accuracy and robustness when used for practical annotation. This paper presents ROSVOT, the first robust AST model that serves SVS, incorporating… ▽ More

    Submitted 3 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: ACL 2024

  12. arXiv:2404.10312  [pdf, other

    cs.CV eess.IV

    OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

    Authors: Runyi Li, Xuhan Sheng, Weiqi Li, Jian Zhang

    Abstract: Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks. Most existing super-resolution methods for ODIs use end-to-end learning strategies, resulting in inferior realness of generated images and a lack of effective out-of-domain generalization capabilities in training methods. Image generation method… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  13. arXiv:2404.09506  [pdf, other

    cs.IT eess.SP

    Performance analysis of satellite-terrestrial integrated radio access networks based on stochastic geometry

    Authors: Yaohua Sun, Ruiwen Li

    Abstract: To enhance coverage and improve service continuity, satellite-terrestrial integrated radio access network (STIRAN) has been seen as an essential trend in the development of 6G. However, there is still a lack of theoretical analysis on its coverage performance. To fill this gap, we first establish a system model to characterize a typical scenario where low-earth-orbit (LEO) satellites and terrestri… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  14. arXiv:2404.09313  [pdf, other

    eess.AS cs.AI

    Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

    Authors: Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

    Abstract: A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consi… ▽ More

    Submitted 20 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: ACL 2024 Main

  15. arXiv:2404.07473  [pdf

    eess.IV cs.CV cs.LG

    LUCF-Net: Lightweight U-shaped Cascade Fusion Network for Medical Image Segmentation

    Authors: Songkai Sun, Qingshan She, Yuliang Ma, Rihui Li, Yingchun Zhang

    Abstract: In this study, the performance of existing U-shaped neural network architectures was enhanced for medical image segmentation by adding Transformer. Although Transformer architectures are powerful at extracting global information, its ability to capture local information is limited due to its high complexity. To address this challenge, we proposed a new lightweight U-shaped cascade fusion network (… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  16. arXiv:2403.20168  [pdf, other

    eess.IV cs.CV

    Unsupervised Tumor-Aware Distillation for Multi-Modal Brain Image Translation

    Authors: Chuan Huang, Jia Wei, Rui Li

    Abstract: Multi-modal brain images from MRI scans are widely used in clinical diagnosis to provide complementary information from different modalities. However, obtaining fully paired multi-modal images in practice is challenging due to various factors, such as time, cost, and artifacts, resulting in modality-missing brain images. To address this problem, unsupervised multi-modal brain image translation has… ▽ More

    Submitted 24 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures. It has been provisionally accepted for IJCNN 2024

  17. arXiv:2403.15770  [pdf, other

    eess.IV cs.CV

    Graph Image Prior for Unsupervised Dynamic Cardiac Cine MRI Reconstruction

    Authors: Zhongsen Li, Wenxuan Chen, Shuai Wang, Chuyu Liu, Qing Zou, Rui Li

    Abstract: The inductive bias of the convolutional neural network (CNN) can be a strong prior for image restoration, which is known as the Deep Image Prior (DIP). Recently, DIP is utilized in unsupervised dynamic MRI reconstruction, which adopts a generative model from the latent space to the image space. However, existing methods usually use a pyramid-shaped CNN generator shared by all frames, embedding the… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

  18. arXiv:2403.14011  [pdf, other

    eess.SY cs.GT

    A Unified Toll Lane Framework for Autonomous and High-Occupancy Vehicles in Interactive Mixed Autonomy

    Authors: Ruolin Li, Philip N. Brown, Roberto Horowitz

    Abstract: In this study, we introduce a toll lane framework that optimizes the mixed flow of autonomous and high-occupancy vehicles on freeways, where human-driven and autonomous vehicles of varying commuter occupancy share a segment. Autonomous vehicles, with their ability to maintain shorter headways, boost traffic throughput. Our framework designates a toll lane for autonomous vehicles with high occupanc… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  19. arXiv:2403.12487  [pdf

    eess.SY

    Unveiling Four Key Factors for Tire Force Control Allocation in 4WID-4WIS Electric Vehicles at Handling Limits

    Authors: Ao Lu, Runfeng Li, Yunchang Yu, Ziwang Lu, Guangyu Tian

    Abstract: The four-wheel independent drive and four-wheel independent steering (4WID-4WIS) configurations enhance control flexibility and dynamic performance potential for more integrated electric vehicles. This paper comprehensively analyzes the impacts of four key factors on tire force control allocation: vertical load estimation, actuator dynamic characteristics, tire force constraints, and wheel steerin… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  20. arXiv:2403.11780  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

    Authors: Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, Ruiqi Li, Wenrui Liu, Fuming You, Tao **, Zhou Zhao

    Abstract: Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by NAACL 2024 (main conference)

  21. arXiv:2403.11542  [pdf, ps, other

    eess.SY

    Topology Data Analysis-based Error Detection for Semantic Image Transmission with Incremental Knowledge-based HARQ

    Authors: Fei Ni, Rongpeng Li, Zhifeng Zhao, Honggang Zhang

    Abstract: Semantic communication (SemCom) aims to achieve high fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy. Nevertheless, semantic communication still suffers from unexpected channel volatility and thus develo** a re-transmission mechanism (e.g., hybrid automatic repeat request [HARQ]) is indispensable. In that regard, instead of discarding prev… ▽ More

    Submitted 23 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  22. arXiv:2403.10518  [pdf, other

    cs.CV cs.GR cs.SD eess.AS

    Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

    Authors: Ronghui Li, YuXiang Zhang, Yachao Zhang, Hongwen Zhang, Jie Guo, Yan Zhang, Yebin Liu, Xiu Li

    Abstract: We propose Lodge, a network capable of generating extremely long dance sequences conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion architecture, and propose the characteristic dance primitives that possess significant expressiveness as intermediate representations between two diffusion models. The first stage is global diffusion, which focuses on comprehending the… ▽ More

    Submitted 19 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024, Project page: https://li-ronghui.github.io/lodge

  23. arXiv:2402.06894  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

    Abstract: Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the divers… ▽ More

    Submitted 16 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Comments: 18 pages, Accepted by ACL 2024. This work is open sourced at: https://github.com/YUCHEN005/GenTranslate

  24. arXiv:2402.05457  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

    Authors: Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng, Chao-Han Huck Yang

    Abstract: Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. Specifically, an LLM is utilized to carry out a direct map** from the N-best hypotheses list generated by an ASR system to the predicted output transcription. However, despite its effectiveness, GER introd… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to ICLR 2024, 17 pages. This work will be open sourced under MIT license

  25. arXiv:2401.15344  [pdf, other

    cs.IT eess.SP

    IRS Aided Millimeter-Wave Sensing and Communication: Beam Scanning, Beam Splitting, and Performance Analysis

    Authors: Renwang Li, Xiaodan Shao, Shu Sun, Meixia Tao, Rui Zhang

    Abstract: Integrated sensing and communication (ISAC) has attracted growing interests for enabling the future 6G wireless networks, due to its capability of sharing spectrum and hardware resources between communication and sensing systems. However, existing works on ISAC usually need to modify the communication protocol to cater for the new sensing performance requirement, which may be difficult to implemen… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: submitted to IEEE TWC

  26. arXiv:2401.10446  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng

    Abstract: Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic knowledge and powerful reasoning ability of LLMs to improve recognition results. The latest work proposes a GER benchmark with HyPoradise dataset to learn the map** from ASR N-best hypotheses to ground-truth transcription by e… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024, Spotlight top 5%, 24 pages. This work will be open sourced at: https://github.com/YUCHEN005/RobustGER under MIT license

  27. arXiv:2401.09013  [pdf, other

    cs.NI eess.SP

    An Improved Virtual Force Approach for UAV Deployment and Resource Allocation in Emergency Communications

    Authors: Hongying Guo, Li Wang, Ruoguang Li, Luyang Hou, Lianming Xu, Aiguo Fei

    Abstract: In this paper, we consider an unmanned aerial vehicle (UAV)-enabled emergency communication system, which establishes temporary communication link with users equipment (UEs) in a typical disaster environment with mountainous forest and obstacles. Towards this end, a joint deployment, power allocation, and user association optimization problem is formulated to maximize the total transmission rate,… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  28. arXiv:2401.07001  [pdf, other

    cs.NI eess.SP

    UAV-assisted Emergency Integrated Sensing and Communication Networks: A CNN-based Rapid Deployment Approach

    Authors: Zao Wang, Lianming Xu, Luyang Hou, Ruoguang Li, Li Wang

    Abstract: UAV-assisted integrated sensing and communication (ISAC) network is crucial for post-disaster emergency rescue. The speed of UAV deployment will directly impact rescue results. However, the ISAC UAV deployment in emergency scenarios is difficult to solve, which contradicts the rapid deployment. In this paper, we propose a two-stage deployment framework to achieve rapid ISAC UAV deployment in emerg… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  29. arXiv:2401.03680  [pdf

    eess.SY

    Decision-Oriented Learning for Future Power System Decision-Making under Uncertainty

    Authors: Ran Li, Haipeng Zhang, Mingyang Sun, Fei Teng, Can Wan, Salvador Pineda, Georges Kariniotakis

    Abstract: Better forecasts may not lead to better decision-making. To address this challenge, decision-oriented learning (DOL) has been proposed as a new branch of machine learning that replaces traditional statistical loss with a decision loss to form an end-to-end model. Applications of DOL in power systems have been developed in recent years. For renewable-rich power systems, uncertainties propagate thro… ▽ More

    Submitted 7 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  30. Exploring Multi-Modal Control in Music-Driven Dance Generation

    Authors: Ronghui Li, Yuqin Dai, Yachao Zhang, Jun Li, Jian Yang, Jie Guo, Xiu Li

    Abstract: Existing music-driven 3D dance generation methods mainly concentrate on high-quality dance generation, but lack sufficient control during the generation process. To address these issues, we propose a unified framework capable of generating high-quality dance movements and supporting multi-modal control, including genre control, semantic control, and spatial control. First, we decouple the dance ge… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  31. arXiv:2312.15639  [pdf

    eess.SY

    Coordinated Planning of Offshore Charging Stations and Electrified Ships: A Case Study on Shanghai-Busan Maritime Route

    Authors: Hao Li, Hanqi Tao, Wentao Huang, Hongcai Zhang, Ran Li

    Abstract: Despite the success of electric vehicles on land, electrification of maritime ships is challenged by the dilemma of range anxiety and cargo-carrying capacity. The longer range requires larger batteries, which inevitably eat up the precious cargo space and weight. This paper breaks new ground by proposing a coordinated planning model for offshore charging stations (OCSs) and electric ships (ESs), m… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  32. arXiv:2312.15177  [pdf, other

    eess.SY

    Stochastic Data-Driven Predictive Control with Equivalence to Stochastic MPC

    Authors: Ruiqi Li, John W. Simpson-Porco, Stephen L. Smith

    Abstract: We propose a data-driven receding-horizon control method dealing with the chance-constrained output-tracking problem of unknown stochastic linear time-invariant (LTI) systems with partial state observation. The proposed method takes into account the statistics of the process noise, the measurement noise and the uncertain initial condition, following an analogous framework to Stochastic Model Predi… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: 20 pages, 4 figures. The extended version of a submission to IEEE Transactions on Automatic Control

  33. arXiv:2312.13501  [pdf

    eess.SY

    Adaptive Decision-Objective Loss for Forecast-then-Optimize in Power Systems

    Authors: Haipeng Zhang, Ran Li, Mingyang Sun, Teng Fei

    Abstract: Forecast-then-optimize is a widely-used framework for decision-making problems in power systems. Traditionally, statistical losses have been employed to train forecasting models, but recent research demonstrated that improved decision utility in downstream optimization tasks can be achieved by using decision loss as an alternative. However, the implementation of decision loss in power systems face… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  34. arXiv:2312.10741  [pdf, other

    eess.AS cs.CL cs.SD

    StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

    Authors: Yu Zhang, Rongjie Huang, Ruiqi Li, **Zheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

    Abstract: Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expr… ▽ More

    Submitted 2 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  35. arXiv:2312.04398  [pdf

    cs.CV cs.AI cs.LG eess.IV stat.ML

    Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning

    Authors: Yongqi Dong, Xingmin Lu, Ruohan Li, Wei Song, Bart van Arem, Haneen Farah

    Abstract: The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, t… ▽ More

    Submitted 29 May, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: 22 pages, 6 figures, accepted by the 103rd Transportation Research Board (TRB) Annual Meeting, under review by Transportation Research Record: Journal of the Transportation Research Board

  36. arXiv:2312.01423  [pdf, other

    eess.SP

    Self-Critical Alternate Learning based Semantic Broadcast Communication

    Authors: Zhilin Lu, Rongpeng Li, Ming Lei, Chan Wang, Zhifeng Zhao, Honggang Zhang

    Abstract: Semantic communication (SemCom) has been deemed as a promising communication paradigm to break through the bottleneck of traditional communications. Nonetheless, most of the existing works focus more on point-to-point communication scenarios and its extension to multi-user scenarios is not that straightforward due to its cost-inefficiencies to directly scale the JSCC framework to the multi-user co… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  37. arXiv:2312.00082  [pdf, other

    eess.IV cs.CV

    A Compact Implicit Neural Representation for Efficient Storage of Massive 4D Functional Magnetic Resonance Imaging

    Authors: Ruoran Li, Runzhao Yang, Wenxin Xiang, Yuxiao Cheng, Tingxiong Xiao, **li Suo

    Abstract: Functional Magnetic Resonance Imaging (fMRI) data is a widely used kind of four-dimensional biomedical data, which requires effective compression. However, fMRI compressing poses unique challenges due to its intricate temporal dynamics, low signal-to-noise ratio, and complicated underlying redundancies. This paper reports a novel compression paradigm specifically tailored for fMRI data based on Im… ▽ More

    Submitted 29 February, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

  38. arXiv:2311.12525  [pdf

    eess.SY

    Power System Capacity Planning Considering Seasonal Hydrogen Storage by Salt Caverns

    Authors: Xueqian He, Tianguang Lu, **g Li, Wanxing Sheng, Rui Li

    Abstract: In China, air conditioning in summer and electric heating in winter lead to seasonal volatility in load power. Therefore, it is urgent to develop economic and efficient long-term energy storage systems to enhance peak regulation. Power-to-hydrogen technology is a perspective solution to balance seasonal power fluctuation. However, current hydrogen storage methods have shortcomings such as small st… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  39. arXiv:2310.14182  [pdf, ps, other

    eess.SP

    A Coordinate Descent Approach to Atomic Norm Minimization

    Authors: Ruifu Li, Danijela Cabric

    Abstract: Atomic norm minimization is of great interest in various applications of sparse signal processing including super-resolution line-spectral estimation and signal denoising. In prac-tice, atomic norm minimization (ANM) is formulated as semi-definite programming (SDP) that is generally hard to solve. This work introduces a low-complexity solver for a type of ANM known as atomic norm soft thresholding… ▽ More

    Submitted 5 June, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: 14 pages, 12 figures, manuscript currently under review

  40. arXiv:2309.16205  [pdf, other

    cs.CV eess.IV

    DiffGAN-F2S: Symmetric and Efficient Denoising Diffusion GANs for Structural Connectivity Prediction from Brain fMRI

    Authors: Qiankun Zuo, Ruiheng Li, Yi Di, Hao Tian, Changhong **g, Xuhang Chen, Shuqiang Wang

    Abstract: Map** from functional connectivity (FC) to structural connectivity (SC) can facilitate multimodal brain network fusion and discover potential biomarkers for clinical implications. However, it is challenging to directly bridge the reliable non-linear map** relations between SC and functional magnetic resonance imaging (fMRI). In this paper, a novel diffusision generative adversarial network-bas… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 12 pages

  41. arXiv:2309.13238  [pdf, other

    eess.SP

    How to Differentiate between Near Field and Far Field: Revisiting the Rayleigh Distance

    Authors: Shu Sun, Renwang Li, Xingchen Liu, Liuxun Xue, Chong Han, Meixia Tao

    Abstract: Future wireless communication systems are likely to adopt extremely large aperture arrays and millimeter-wave/sub-THz frequency bands to achieve higher throughput, lower latency, and higher energy efficiency. Conventional wireless systems predominantly operate in the far field (FF) of the radiation source of signals. As the array size increases and the carrier wavelength shrinks, however, the near… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  42. arXiv:2309.08070  [pdf, ps, other

    eess.SY eess.SP

    Exploration into Optimal State Estimation with Event-triggered Communication

    Authors: Xiaolei Bian, Huimin Chen, X. Rong Li

    Abstract: This paper deals with the problem of remote estimation of the state of a discrete-time stochastic linear system observed by a sensor with computational capacity to calculate local estimates. We design an event-triggered communication (ETC) scheme and a remote state estimator to optimally calibrate the tradeoff between system performance and limited communication resources. The novel communication… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  43. arXiv:2309.07566  [pdf, other

    cs.SD cs.AI eess.AS

    Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

    Authors: Yongqi Wang, Jionghao Bai, Rongjie Huang, Ruiqi Li, Zhiqing Hong, Zhou Zhao

    Abstract: Direct speech-to-speech translation (S2ST) with discrete self-supervised representations has achieved remarkable accuracy, but is unable to preserve the speaker timbre of the source speech during translation. Meanwhile, the scarcity of high-quality speaker-parallel data poses a challenge for learning style transfer between source and target speech. We propose an S2ST framework with an acoustic lan… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 5 pages, 1 figure. submitted to ICASSP 2024

  44. arXiv:2308.08767  [pdf, other

    eess.AS cs.SD

    Graph Neural Network Backend for Speaker Recognition

    Authors: Liang He, Ruida Li, Mengqi Niu

    Abstract: Currently, most speaker recognition backends, such as cosine, linear discriminant analysis (LDA), or probabilistic linear discriminant analysis (PLDA), make decisions by calculating similarity or distance between enrollment and test embeddings which are already extracted from neural networks. However, for each embedding, the local structure of itself and its neighbor embeddings in the low-dimensio… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  45. arXiv:2307.13220  [pdf

    eess.IV cs.AI physics.med-ph

    One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

    Authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Mei**g Lin, Jiefeng Guo, Congbo Cai, Zhong Chen , et al. (3 additional authors not shown)

    Abstract: Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 38 pages, 19 figures, 5 tables

  46. arXiv:2307.09775  [pdf, other

    cs.IR cs.SD eess.AS

    DisCover: Disentangled Music Representation Learning for Cover Song Identification

    Authors: Jiahao Xun, Shengyu Zhang, Yanting Yang, Jieming Zhu, Liqun Deng, Zhou Zhao, Zhenhua Dong, Ruiqi Li, Lichao Zhang, Fei Wu

    Abstract: In the field of music information retrieval (MIR), cover song identification (CSI) is a challenging task that aims to identify cover versions of a query song from a massive collection. Existing works still suffer from high intra-song variances and inter-song correlations, due to the entangled nature of version-specific and version-invariant factors in their modeling. In this work, we set the goal… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  47. arXiv:2307.08029  [pdf, other

    eess.AS cs.LG cs.SD

    Noise-aware Speech Enhancement using Diffusion Probabilistic Model

    Authors: Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

    Abstract: With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of clean speech, underexploiting the varying noise information in real world. In this paper, we propose a noise-aware speech enhancement (NASE) approach that extract… ▽ More

    Submitted 4 June, 2024; v1 submitted 16 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, Accepted by InterSpeech 2024

  48. arXiv:2307.00200  [pdf, other

    cs.IT eess.SP

    Beam Scanning for Integrated Sensing and Communication in IRS-aided mmWave Systems

    Authors: Renwang Li, Xiaodan Shao, Shu Sun, Meixia Tao, Rui Zhang

    Abstract: This paper investigates an intelligent reflecting surface (IRS) aided millimeter-wave integrated sensing and communication (ISAC) system. Specifically, based on the passive beam scanning in the downlink, the IRS finds the optimal beam for reflecting the signals from the base station to a communication user. Meanwhile, the IRS estimates the angle of a nearby target based on its echo signal received… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

    Comments: Accepted by IEEE SPAWC

  49. arXiv:2306.10567  [pdf, other

    eess.AS cs.CV cs.MM cs.SD

    MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition

    Authors: Yuchen Hu, Chen Chen, Ruizhe Li, Heqing Zou, Eng Siong Chng

    Abstract: Audio-visual speech recognition (AVSR) attracts a surge of research interest recently by leveraging multimodal signals to understand human speech. Mainstream approaches addressing this task have developed sophisticated architectures and techniques for multi-modality fusion and representation learning. However, the natural heterogeneity of different modalities causes distribution gap between their… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

    Comments: 14 pages, 5 figures, Accepted by ACL 2023

  50. arXiv:2306.10563  [pdf, other

    eess.AS cs.CV cs.MM cs.SD

    Hearing Lips in Noise: Universal Viseme-Phoneme Map** and Transfer for Robust Audio-Visual Speech Recognition

    Authors: Yuchen Hu, Ruizhe Li, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng

    Abstract: Audio-visual speech recognition (AVSR) provides a promising solution to ameliorate the noise-robustness of audio-only speech recognition with visual information. However, most existing efforts still focus on audio modality to improve robustness considering its dominance in AVSR task, with noise adaptation techniques such as front-end denoise processing. Though effective, these methods are usually… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

    Comments: 19 pages, 9 figures, Accepted by ACL 2023