Skip to main content

Showing 1–50 of 174 results for author: Xue, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17286  [pdf

    cs.RO eess.SY

    Prioritized experience replay-based DDQN for Unmanned Vehicle Path Planning

    Authors: Liu Lipeng, Letian Xu, Jiabei Liu, Haopeng Zhao, Tongzhou Jiang, Tianyao Zheng

    Abstract: Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it wi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 4 pages, 6 figures, 2024 5th International Conference on Information Science, Parallel and Distributed Systems

  2. arXiv:2406.12707  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

    Authors: Haoqiu Yan, Yongxin Zhu, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu

    Abstract: Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 9 pages, 3 figures, ACL24 accepted

  3. arXiv:2406.07422  [pdf, other

    eess.AS

    Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

    Authors: Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, Yuanjun Lv, Lei Xie, Yunlin Chen, Hao Yin, Zhifei Li

    Abstract: The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2406.05763  [pdf, other

    eess.AS

    WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

    Authors: Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie

    Abstract: With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the open-sourced WenetSpeech dataset. Tailored for the text-to-speech tasks, we refined WenetSpeech by adjusting segment boundaries, enhancing the audio… ▽ More

    Submitted 19 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  5. arXiv:2406.05672  [pdf, other

    eess.AS

    Text-aware and Context-aware Expressive Audiobook Speech Synthesis

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Yongmao Zhang, Wenjie Tian, Lei Xie

    Abstract: Recent advances in text-to-speech have significantly improved the expressiveness of synthetic speech. However, a major challenge remains in generating speech that captures the diverse styles exhibited by professional narrators in audiobooks without relying on manually labeled data or reference speech. To address this problem, we propose a text-aware and context-aware(TACA) style modeling approach… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  6. arXiv:2406.03391  [pdf, other

    eess.SP

    Joint Association, Beamforming, and Resource Allocation for Multi-IRS Enabled MU-MISO Systems With RSMA

    Authors: Chunjie Wang, Xuhui Zhang, Huijun Xing, Liang Xue, Shuqiang Wang, Yanyan Shen, Bo Yang, ** Guan

    Abstract: Intelligent reflecting surface (IRS) and rate-splitting multiple access (RSMA) technologies are at the forefront of enhancing spectrum and energy efficiency in the next generation multi-antenna communication systems. This paper explores a RSMA system with multiple IRSs, and proposes two purpose-driven scheduling schemes, i.e., the exhaustive IRS-aided (EIA) and opportunistic IRS-aided (OIA) scheme… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  7. arXiv:2406.00976  [pdf, other

    cs.CL cs.SD eess.AS

    Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

    Authors: Yongxin Zhu, Dan Su, Liqiang He, Linli Xu, Dong Yu

    Abstract: While recent advancements in speech language models have achieved significant progress, they face remarkable challenges in modeling the long acoustic sequences of neural audio codecs. In this paper, we introduce \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST), a hierarchical transformer designed for efficient speech language modeling. GPST quantizes audio wavef… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accept in ACL2024-main

  8. arXiv:2405.17297  [pdf, other

    eess.SP

    Enhanced Automotive Radar Collaborative Sensing By Exploiting Constructive Interference

    Authors: Lifan Xu, Shunqiao Sun, A. Lee Swindlehurst

    Abstract: Automotive radar emerges as a crucial sensor for autonomous vehicle perception. As more cars are equipped radars, radar interference is an unavoidable challenge. Unlike conventional approaches such as interference mitigation and interference-avoiding technologies, this paper introduces an innovative collaborative sensing scheme with multiple automotive radars that exploits constructive interferenc… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: paper accepted by IEEE SAM Workshop 2024

  9. arXiv:2405.06463  [pdf, other

    eess.IV cs.CV cs.LG

    MRSegmentator: Robust Multi-Modality Segmentation of 40 Classes in MRI and CT Sequences

    Authors: Hartmut Häntze, Lina Xu, Felix J. Dorfner, Leonhard Donle, Daniel Truhn, Hugo Aerts, Mathias Prokop, Bram van Ginneken, Alessa Hering, Lisa C. Adams, Keno K. Bressem

    Abstract: Purpose: To introduce a deep learning model capable of multi-organ segmentation in MRI scans, offering a solution to the current limitations in MRI analysis due to challenges in resolution, standardized intensity values, and variability in sequences. Materials and Methods: he model was trained on 1,200 manually annotated MRI scans from the UK Biobank, 221 in-house MRI scans and 1228 CT scans, le… ▽ More

    Submitted 13 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures; corrected co-author info

    ACM Class: J.3

  10. arXiv:2405.05715  [pdf, other

    eess.SP

    Shifting the ISAC Trade-Off with Fluid Antenna Systems

    Authors: Jiaqi Zou, Hao Xu, Chao Wang, Lvxin Xu, Songlin Sun, Kaitao Meng, Christos Masouros, Kai-Kit Wong

    Abstract: As an emerging antenna technology, a fluid antenna system (FAS) enhances spatial diversity to improve both sensing and communication performance by shifting the active antennas among available ports. In this letter, we study the potential of shifting the integrated sensing and communication (ISAC) trade-off with FAS. We propose the model for FAS-enabled ISAC and jointly optimize the transmit beamf… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 5 pages, 5 figures

  11. arXiv:2405.03713  [pdf, other

    eess.IV cs.CV cs.LG

    Improve Cross-Modality Segmentation by Treating MRI Images as Inverted CT Scans

    Authors: Hartmut Häntze, Lina Xu, Leonhard Donle, Felix J. Dorfner, Alessa Hering, Lisa C. Adams, Keno K. Bressem

    Abstract: Computed tomography (CT) segmentation models frequently include classes that are not currently supported by magnetic resonance imaging (MRI) segmentation models. In this study, we show that a simple image inversion technique can significantly improve the segmentation quality of CT segmentation models on MRI data, by using the TotalSegmentator model, applied to T1-weighted MRI images, as example. I… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 3 pages, 2 figures

    ACM Class: J.3

  12. arXiv:2404.17280  [pdf, other

    cs.SD eess.AS

    Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks

    Authors: Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das

    Abstract: The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequen… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  13. arXiv:2404.17161  [pdf, other

    cs.SD eess.AS eess.SP

    An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder

    Authors: Yicheng Gu, Xueyao Zhang, Liumeng Xue, Haizhou Li, Zhizheng Wu

    Abstract: Generative Adversarial Network (GAN) based vocoders are superior in both inference speed and synthesis quality when reconstructing an audible waveform from an acoustic representation. This study focuses on improving the discriminator for GAN-based vocoders. Most existing Time-Frequency Representation (TFR)-based discriminators are rooted in Short-Time Fourier Transform (STFT), which owns a constan… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.14957

  14. arXiv:2404.13640  [pdf, other

    cs.MM cs.CV eess.IV

    Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer

    Authors: Kepeng Xu, Li Xu, Gang He, Wenxin Yu, Yunsong Li

    Abstract: Multiple complex degradations are coupled in low-quality video faces in the real world. Therefore, blind video face restoration is a highly challenging ill-posed problem, requiring not only hallucinating high-fidelity details but also enhancing temporal coherence across diverse pose variations. Restoring each frame independently in a naive manner inevitably introduces temporal incoherence and arti… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 9 pages

  15. arXiv:2404.09500  [pdf

    physics.optics eess.IV

    On-chip Real-time Hyperspectral Imager with Full CMOS Resolution Enabled by Massively Parallel Neural Network

    Authors: Junren Wen, Haiqi Gao, Weiming Shi, Shuaibo Feng, Lingyun Hao, Yujie Liu, Liang Xu, Yuchuan Shao, Yueguang Zhang, Weidong Shen, Chenying Yang

    Abstract: Traditional spectral imaging methods are constrained by the time-consuming scanning process, limiting the application in dynamic scenarios. One-shot spectral imaging based on reconstruction has been a hot research topic recently and the primary challenges still lie in both efficient fabrication techniques suitable for mass production and the high-speed, high-accuracy reconstruction algorithm for r… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  16. arXiv:2404.03253  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

    Authors: Yin Li, Qi Chen, Kai Wang, Meige Li, Li** Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we in… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  17. arXiv:2404.00549  [pdf

    eess.IV cs.CV

    Pneumonia App: a mobile application for efficient pediatric pneumonia diagnosis using explainable convolutional neural networks (CNN)

    Authors: Jiaming Deng, Zhenglin Chen, Minjiang Chen, Lulu Xu, Jiaqi Yang, Zhendong Luo, Peiwu Qin

    Abstract: Mycoplasma pneumoniae pneumonia (MPP) poses significant diagnostic challenges in pediatric healthcare, especially in regions like China where it's prevalent. We introduce PneumoniaAPP, a mobile application leveraging deep learning techniques for rapid MPP detection. Our approach capitalizes on convolutional neural networks (CNNs) trained on a comprehensive dataset comprising 3345 chest X-ray (CXR)… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 27 Pages,7 figures

    MSC Class: 68 ACM Class: J.3

  18. arXiv:2403.17793  [pdf, other

    eess.SY

    Neural Exponential Stabilization of Control-affine Nonlinear Systems

    Authors: Muhammad Zakwan, Liang Xu, Giancarlo Ferrari-Trecate

    Abstract: This paper proposes a novel learning-based approach for achieving exponential stabilization of nonlinear control-affine systems. We leverage the Control Contraction Metrics (CCMs) framework to co-synthesize Neural Contraction Metrics (NCMs) and Neural Network (NN) controllers. First, we transform the infinite-dimensional semi-definite program (SDP) for CCM computation into a tractable inequality f… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: This paper is submitted in CDC2024 for a possible publication

  19. arXiv:2403.16397  [pdf, other

    eess.SP cs.AI

    RadioGAT: A Joint Model-based and Data-driven Framework for Multi-band Radiomap Reconstruction via Graph Attention Networks

    Authors: Xiaojie Li, Songyang Zhang, Hang Li, Xiaoyang Li, Lexi Xu, Haigao Xu, Hui Mei, Guangxu Zhu, Nan Qi, Ming Xiao

    Abstract: Multi-band radiomap reconstruction (MB-RMR) is a key component in wireless communications for tasks such as spectrum management and network planning. However, traditional machine-learning-based MB-RMR methods, which rely heavily on simulated data or complete structured ground truth, face significant deployment challenges. These challenges stem from the differences between simulated and actual data… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: submitted to IEEE journal for possible publication

  20. arXiv:2403.16353  [pdf, other

    cs.IT eess.SP

    Energy-Efficient Hybrid Beamforming with Dynamic On-off Control for Integrated Sensing, Communications, and Powering

    Authors: Zeyu Hao, Yuan Fang, Xianghao Yu, Jie Xu, Ling Qiu, Lexi Xu, Shuguang Cui

    Abstract: This paper investigates the energy-efficient hybrid beamforming design for a multi-functional integrated sensing, communications, and powering (ISCAP) system. In this system, a base station (BS) with a hybrid analog-digital (HAD) architecture sends unified wireless signals to communicate with multiple information receivers (IRs), sense multiple point targets, and wirelessly charge multiple energy… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 13 pages, 6 figures, submitted to IEEE Transactions on Communications

  21. arXiv:2403.11575  [pdf, other

    eess.SP

    Task-Oriented Hybrid Beamforming for OFDM-DFRC Systems with Flexibly Controlled Space-Frequency Spectra

    Authors: Lingyun Xu, Bowen Wang, Ziyang Cheng

    Abstract: This paper investigates the issues of the hybrid beamforming design for the orthogonal frequency division multiplexing dual-function radar-communication (DFRC) system in multiple task scenarios involving the radar scanning and detection task and the target tracking task. To meet different task requirements of the DFRC system, we introduce two novel radar beampattern metrics, the average integrated… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  22. Enhancing Physical Layer Security in Dual-Function Radar-Communication Systems with Hybrid Beamforming Architecture

    Authors: Lingyun Xu, Bowen Wang, Huiyong Li, Ziyang Cheng

    Abstract: In this letter, we investigate enhancing the physical layer security (PLS) for the dual-function radar-communication (DFRC) system with hybrid beamforming (HBF) architecture, where the base station (BS) achieves downlink communication and radar target detection simultaneously. We consider an eavesdropper intercepting the information transmitted from the BS to the downlink communication users with… ▽ More

    Submitted 4 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Journal ref: IEEE Wireless Communications Letters, 2024

  23. arXiv:2403.00381  [pdf, other

    cs.RO cs.LG eess.SY

    Structured Deep Neural Networks-Based Backstep** Trajectory Tracking Control for Lagrangian Systems

    Authors: Jiajun Qian, Liang Xu, Xiaoqiang Ren, Xiaofan Wang

    Abstract: Deep neural networks (DNN) are increasingly being used to learn controllers due to their excellent approximation capabilities. However, their black-box nature poses significant challenges to closed-loop stability guarantees and performance analysis. In this paper, we introduce a structured DNN-based controller for the trajectory tracking control of Lagrangian systems using backing techniques. By p… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  24. arXiv:2402.17550  [pdf, other

    cs.NI cs.AI eess.SP

    Emergency Caching: Coded Caching-based Reliable Map Transmission in Emergency Networks

    Authors: Zeyu Tian, Lianming Xu, Liang Li, Li Wang, Aiguo Fei

    Abstract: Many rescue missions demand effective perception and real-time decision making, which highly rely on effective data collection and processing. In this study, we propose a three-layer architecture of emergency caching networks focusing on data collection and reliable transmission, by leveraging efficient perception and edge caching technologies. Based on this architecture, we propose a disaster map… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  25. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, **gcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

  26. arXiv:2402.12660  [pdf, other

    cs.SD cs.HC eess.AS

    SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion

    Authors: Liumeng Xue, Chaoren Wang, Mingxuan Wang, Xueyao Zhang, Jun Han, Zhizheng Wu

    Abstract: In this study, we present SingVisio, an interactive visual analysis system that aims to explain the diffusion model used in singing voice conversion. SingVisio provides a visual display of the generation process in diffusion models, showcasing the step-by-step denoising of the noisy spectrum and its transformation into a clean spectrum that captures the desired singer's timbre. The system also fac… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  27. arXiv:2402.09372  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

    Authors: Jiancheng Yang, Rui Shi, Liang **, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

    Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

  28. arXiv:2402.02245  [pdf, other

    cs.CV cs.LG eess.IV

    Revisiting Generative Adversarial Networks for Binary Semantic Segmentation on Imbalanced Datasets

    Authors: Lei Xu, Moncef Gabbouj

    Abstract: Anomalous crack region detection is a typical binary semantic segmentation task, which aims to detect pixels representing cracks on pavement surface images automatically by algorithms. Although existing deep learning-based methods have achieved outcoming results on specific public pavement datasets, the performance would deteriorate dramatically on imbalanced datasets. The input datasets used in s… ▽ More

    Submitted 7 March, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  29. arXiv:2402.02146  [pdf, other

    cs.AI cs.LG cs.NI eess.SP

    Emergency Computing: An Adaptive Collaborative Inference Method Based on Hierarchical Reinforcement Learning

    Authors: Weiqi Fu, Lianming Xu, Xin Wu, Li Wang, Aiguo Fei

    Abstract: In achieving effective emergency response, the timely acquisition of environmental information, seamless command data transmission, and prompt decision-making are crucial. This necessitates the establishment of a resilient emergency communication dedicated network, capable of providing communication and sensing services even in the absence of basic infrastructure. In this paper, we propose an Emer… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  30. arXiv:2402.01271  [pdf, other

    eess.AS cs.SD

    An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

    Authors: Lin** Xu, Jiawei Jiang, Dejun Zhang, Xianjun Xia, Li Chen, Yijian Xiao, Piao Ding, Shenyi Song, Sixing Yin, Ferdous Sohel

    Abstract: Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleave… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: INTERSPEECH 2023

  31. arXiv:2402.00616  [pdf

    eess.SP

    Dual-Tap Optical-Digital Feedforward Equalization Enabling High-Speed Optical Transmission in IM/DD Systems

    Authors: Yu Guo, Yangbo Wu, Zhao Yang, Lei Xue, Ning Liang, Yang Ren, Zhengrui Tu, Jia Feng, Qunbi Zhuge

    Abstract: Intensity-modulation and direct-detection (IM/DD) transmission is widely adopted for high-speed optical transmission scenarios due to its cost-effectiveness and simplicity. However, as the data rate increases, the fiber chromatic dispersion (CD) would induce a serious power fading effect, and direct detection could generate inter-symbol interference (ISI). Moreover, the ISI becomes more severe wit… ▽ More

    Submitted 1 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 6 pages, 7 gigures, journal

  32. arXiv:2401.16564  [pdf

    eess.SP

    Data and Physics driven Deep Learning Models for Fast MRI Reconstruction: Fundamentals and Methodologies

    Authors: Jiahao Huang, Yinzhe Wu, Fanwen Wang, Yingying Fang, Yang Nan, Cagan Alkan, Lei Xu, Zhifan Gao, Weiwen Wu, Lei Zhu, Zhaolin Chen, Peter Lally, Neal Bangerter, Kawin Setsompop, Yike Guo, Daniel Rueckert, Ge Wang, Guang Yang

    Abstract: Magnetic Resonance Imaging (MRI) is a pivotal clinical diagnostic tool, yet its extended scanning times often compromise patient comfort and image quality, especially in volumetric, temporal and quantitative scans. This review elucidates recent advances in MRI acceleration via data and physics-driven models, leveraging techniques from algorithm unrolling models, enhancement-based models, and plug-… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  33. arXiv:2401.14183  [pdf, other

    cs.AI cs.MA eess.SY math.OC

    Towards Autonomous Supply Chains: Definition, Characteristics, Conceptual Framework, and Autonomy Levels

    Authors: Liming Xu, Stephen Mak, Yaniv Proselkov, Alexandra Brintrup

    Abstract: Recent global disruptions, such as the pandemic and geopolitical conflicts, have profoundly exposed vulnerabilities in traditional supply chains, requiring exploration of more resilient alternatives. Autonomous supply chains (ASCs) have emerged as a potential solution, offering increased visibility, flexibility, and resilience in turbulent trade environments. Despite discussions in industry and ac… ▽ More

    Submitted 13 October, 2023; originally announced January 2024.

    Comments: This paper includes 20 pages and 8 figures

  34. arXiv:2401.13062  [pdf

    cs.RO eess.SY physics.bio-ph

    Force sensing to reconstruct potential energy landscapes for cluttered large obstacle traversal

    Authors: Yaqing Wang, Ling Xu, Chen Li

    Abstract: Visual sensing of environmental geometry allows robots to use artificial potential fields to avoid sparse obstacles. Yet robots must further traverse cluttered large obstacles for applications like search and rescue through rubble and planetary exploration across Martain rocks. Recent studies discovered that to traverse cluttered large obstacles, multi-legged insects and insect-inspired robots mak… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  35. arXiv:2401.11224  [pdf, other

    eess.IV cs.CV

    Susceptibility of Adversarial Attack on Medical Image Segmentation Models

    Authors: Zhongxuan Wang, Leo Xu

    Abstract: The nature of deep neural networks has given rise to a variety of attacks, but little work has been done to address the effect of adversarial attacks on segmentation models trained on MRI datasets. In light of the grave consequences that such attacks could cause, we explore four models from the U-Net family and examine their responses to the Fast Gradient Sign Method (FGSM) attack. We conduct FGSM… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Comments: 6 pages, 8 figures, presented at 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI) conference

  36. arXiv:2401.10418  [pdf, other

    eess.SY

    Hazard resistance-based spatiotemporal risk analysis for distribution network outages during hurricanes

    Authors: Luo Xu, Ning Lin, Dazhi Xi, Kairui Feng, H. Vincent Poor

    Abstract: Blackouts in recent decades show an increasing prevalence of power outages due to extreme weather events such as hurricanes. Precisely assessing the spatiotemporal outages in distribution networks, the most vulnerable part of power systems, is critical to enhance power system resilience. The Sequential Monte Carlo (SMC) simulation method is widely used for spatiotemporal risk analysis of power sys… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 10 pages, 10 figures

  37. arXiv:2401.10070  [pdf, other

    cs.CL cs.SD eess.AS

    Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks

    Authors: Yichao Du, Zhirui Zhang, Linan Yue, Xu Huang, Yuqing Zhang, Tong Xu, Linli Xu, Enhong Chen

    Abstract: To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc{FedAvg}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the wh… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  38. arXiv:2401.09013  [pdf, other

    cs.NI eess.SP

    An Improved Virtual Force Approach for UAV Deployment and Resource Allocation in Emergency Communications

    Authors: Hongying Guo, Li Wang, Ruoguang Li, Luyang Hou, Lianming Xu, Aiguo Fei

    Abstract: In this paper, we consider an unmanned aerial vehicle (UAV)-enabled emergency communication system, which establishes temporary communication link with users equipment (UEs) in a typical disaster environment with mountainous forest and obstacles. Towards this end, a joint deployment, power allocation, and user association optimization problem is formulated to maximize the total transmission rate,… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  39. arXiv:2401.07001  [pdf, other

    cs.NI eess.SP

    UAV-assisted Emergency Integrated Sensing and Communication Networks: A CNN-based Rapid Deployment Approach

    Authors: Zao Wang, Lianming Xu, Luyang Hou, Ruoguang Li, Li Wang

    Abstract: UAV-assisted integrated sensing and communication (ISAC) network is crucial for post-disaster emergency rescue. The speed of UAV deployment will directly impact rescue results. However, the ISAC UAV deployment in emergency scenarios is difficult to solve, which contradicts the rapid deployment. In this paper, we propose a two-stage deployment framework to achieve rapid ISAC UAV deployment in emerg… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  40. arXiv:2401.03538  [pdf, other

    cs.CL cs.SD eess.AS

    Transfer the linguistic representations from TTS to accent conversion with non-parallel data

    Authors: Xi Chen, Jiakun Pei, Liumeng Xue, Mingyang Zhang

    Abstract: Accent conversion aims to convert the accent of a source speech to a target accent, meanwhile preserving the speaker's identity. This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech. Specifically, the proposed system aligns speech representations with lingu… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  41. arXiv:2312.09911  [pdf, other

    cs.SD eess.AS

    Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

    Authors: Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu

    Abstract: Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, a… ▽ More

    Submitted 22 February, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Amphion Website: https://github.com/open-mmlab/Amphion

  42. arXiv:2311.15846  [pdf, other

    cs.CV eess.IV

    Learning with Noisy Low-Cost MOS for Image Quality Assessment via Dual-Bias Calibration

    Authors: Lei Wang, Qingbo Wu, Desen Yuan, King Ngi Ngan, Hongliang Li, Fanman Meng, Linfeng Xu

    Abstract: Learning based image quality assessment (IQA) models have obtained impressive performance with the help of reliable subjective quality labels, where mean opinion score (MOS) is the most popular choice. However, in view of the subjective bias of individual annotators, the labor-abundant MOS (LA-MOS) typically requires a large collection of opinion scores from multiple annotators for each image, whi… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  43. arXiv:2311.14957  [pdf, other

    cs.SD eess.AS

    Multi-Scale Sub-Band Constant-Q Transform Discriminator for High-Fidelity Vocoder

    Authors: Yicheng Gu, Xueyao Zhang, Liumeng Xue, Zhizheng Wu

    Abstract: Generative Adversarial Network (GAN) based vocoders are superior in inference speed and synthesis quality when reconstructing an audible waveform from an acoustic representation. This study focuses on improving the discriminator to promote GAN-based vocoders. Most existing time-frequency-representation-based discriminators are rooted in Short-Time Fourier Transform (STFT), whose time-frequency res… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  44. arXiv:2311.08661  [pdf, other

    stat.ML cs.CV cs.LG eess.IV

    Deep Neural Network Identification of Limnonectes Species and New Class Detection Using Image Data

    Authors: Li Xu, Yili Hong, Eric P. Smith, David S. McLeod, Xinwei Deng, Laura J. Freeman

    Abstract: As is true of many complex tasks, the work of discovering, describing, and understanding the diversity of life on Earth (viz., biological systematics and taxonomy) requires many tools. Some of this work can be accomplished as it has been done in the past, but some aspects present us with challenges which traditional knowledge and tools cannot adequately resolve. One such challenge is presented by… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 26 pages, 11 Figures

  45. arXiv:2311.07912  [pdf, other

    cs.CV eess.SP

    Detection of Small Targets in Sea Clutter Based on RepVGG and Continuous Wavelet Transform

    Authors: **gchen Ni, Haoru Li, Lilin Xu, **g Liang

    Abstract: Constructing a high-performance target detector under the background of sea clutter is always necessary and important. In this work, we propose a RepVGGA0-CWT detector, where RepVGG is a residual network that gains a high detection accuracy. Different from traditional residual networks, RepVGG keeps an acceptable calculation speed. Giving consideration to both accuracy and speed, the RepVGGA0 is s… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  46. arXiv:2311.07179  [pdf, other

    cs.SD eess.AS

    SponTTS: modeling and transferring spontaneous style for TTS

    Authors: Hanzhao Li, Xinfa Zhu, Liumeng Xue, Yang Song, Yunlin Chen, Lei Xie

    Abstract: Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like a smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous d… ▽ More

    Submitted 8 January, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, Accepted by ICASSP2024

  47. arXiv:2311.02958  [pdf, other

    eess.SP

    Optimization of RIS Placement for Satellite-to-Ground Coverage Enhancement

    Authors: Xingchen Liu, Liuxun Xue, Shu Sun, Meixia Tao

    Abstract: In satellite-to-ground communication, ensuring reliable and efficient connectivity poses significant challenges. The reconfigurable intelligent surface (RIS) offers a promising solution due to its ability to manipulate wireless propagation environments and thus enhance communication performance. In this paper, we propose a method for optimizing the placement of RISs on building facets to improve s… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  48. arXiv:2310.20151  [pdf, other

    cs.CL cs.RO eess.SY

    Multi-Agent Consensus Seeking via Large Language Models

    Authors: Huaben Chen, Wenkang Ji, Lufeng Xu, Shiyu Zhao

    Abstract: Multi-agent systems driven by large language models (LLMs) have shown promising abilities for solving complex tasks in a collaborative manner. This work considers a fundamental problem in multi-agent collaboration: consensus seeking. When multiple agents work together, we are interested in how they can reach a consensus through inter-agent negotiation. To that end, this work studies a consensus-se… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  49. arXiv:2310.11160  [pdf, other

    cs.SD eess.AS

    Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion

    Authors: Xueyao Zhang, Yicheng Gu, Haopeng Chen, Zihao Fang, Lexiao Zou, Junan Zhang, Liumeng Xue, **chao Zhang, Jie Zhou, Zhizheng Wu

    Abstract: Singing Voice Conversion (SVC) is a technique that enables any singer to perform any song. To achieve this, it is essential to obtain speaker-agnostic representations from the source audio, which poses a significant challenge. A common solution involves utilizing a semantic-based audio pretrained model as a feature extractor. However, the degree to which the extracted features can meet the SVC req… ▽ More

    Submitted 27 May, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

  50. arXiv:2310.06339  [pdf, other

    eess.IV cs.LG

    Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination

    Authors: Siyuan Jiang, Yan Ding, Yuling Wang, Lei Xu, Wenli Dai, Wanru Chang, Jianfeng Zhang, Jie Yu, Jianqiao Zhou, Chunquan Zhang, ** Liang, Dexing Kong

    Abstract: Ultrasound is a vital diagnostic technique in health screening, with the advantages of non-invasive, cost-effective, and radiation free, and therefore is widely applied in the diagnosis of nodules. However, it relies heavily on the expertise and clinical experience of the sonographer. In ultrasound images, a single nodule might present heterogeneous appearances in different cross-sectional views w… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.