Skip to main content

Showing 1–50 of 303 results for author: Wang, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00995  [pdf, other

    cs.CY eess.SY physics.app-ph

    Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense

    Authors: Yi Yu, Shengyue Yao, Tianchen Zhou, Yexuan Fu, **gru Yu, Ding Wang, Xuhong Wang, Cen Chen, Yilun Lin

    Abstract: In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, an… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.19590  [pdf, other

    eess.SP

    Joint Beamforming and Antenna Position Optimization for Movable Antenna-Assisted Spectrum Sharing

    Authors: Xin Wei, Weidong Mei, Dong Wang, Boyu Ning, Zhi Chen

    Abstract: Fluid antennas (FAs) and movable antennas (MAs) have drawn increasing attention in wireless communications recently due to their ability to create favorable channel conditions via local antenna movement within a confined region. In this letter, we advance their application for cognitive radio to facilitate efficient spectrum sharing between primary and secondary communication systems. In particula… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.11619  [pdf, other

    eess.AS cs.LG

    AV-CrossNet: an Audiovisual Complex Spectral Map** Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling

    Authors: Vahid Ahmadi Kalkhorani, Cheng Yu, Anurag Kumar, Ke Tan, Buye Xu, DeLiang Wang

    Abstract: Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the CrossNet architecture, which is a recently proposed network that performs complex spectral map** for speech separation by lever… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 Figures, and 4 Tables

  4. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao **, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, **g Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  5. arXiv:2406.08336  [pdf, other

    cs.SD cs.CV eess.AS

    CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

    Authors: Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng

    Abstract: Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech. It still suffers from low speaker similarity and poor prosody naturalness. In this paper, we propose a multi-modal DSR model by leveraging neural codec language modeling to improve the reconstruction results, especially for the speaker similarity and prosody naturalness. Our proposed model consists of: (… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  6. arXiv:2406.08268  [pdf, other

    eess.SY

    Multi-Static ISAC based on Network-Assisted Full-Duplex Cell-Free Networks: Performance Analysis and Duplex Mode Optimization

    Authors: Fan Zeng, Ruoyun Liu, Xiaoyu Sun, **gxuan Yu, Jiamin Li, Pengchen Zhu, Dongming Wang, Xiaohu You

    Abstract: Multi-static integrated sensing and communication (ISAC) technology, which can achieve a wider coverage range and avoid self-interference, is an important trend for the future development of ISAC. Existing multi-static ISAC designs are unable to support the asymmetric uplink (UL)/downlink (DL) communication requirements in the scenario while simultaneously achieving optimal sensing performance. Th… ▽ More

    Submitted 12 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.07854  [pdf, other

    cs.SD cs.MM eess.AS

    Zero-Shot Fake Video Detection by Audio-Visual Consistency

    Authors: Xiaolou Li, Zehua Liu, Chen Chen, Lantian Li, Li Guo, Dong Wang

    Abstract: Recent studies have advocated the detection of fake videos as a one-class detection task, predicated on the hypothesis that the consistency between audio and visual modalities of genuine data is more significant than that of fake data. This methodology, which solely relies on genuine audio-visual data while negating the need for forged counterparts, is thus delineated as a `zero-shot' detection pa… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  8. arXiv:2406.07832  [pdf, other

    cs.SD eess.AS

    SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition

    Authors: Tianhao Wang, Lantian Li, Dong Wang

    Abstract: Deploying a well-optimized pre-trained speaker recognition model in a new domain often leads to a significant decline in performance. While fine-tuning is a commonly employed solution, it demands ample adaptation data and suffers from parameter inefficiency, rendering it impractical for real-world applications with limited data available for model adaptation. Drawing inspiration from the success o… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  9. arXiv:2406.07421  [pdf, other

    cs.SD eess.AS

    A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

    Authors: Zhenyu Zhou, Shibiao Xu, Shi Yin, Lantian Li, Dong Wang

    Abstract: Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this stu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: to be published in INTERSPEECH 2024

  10. arXiv:2406.02328  [pdf, other

    cs.SD eess.AS

    SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng

    Abstract: In this study, we propose a simple and efficient Non-Autoregressive (NAR) text-to-speech (TTS) system based on diffusion, named SimpleSpeech. Its simpleness shows in three aspects: (1) It can be trained on the speech-only dataset, without any alignment information; (2) It directly takes plain text as input and generates speech through an NAR way; (3) It tries to model speech in a finite and compac… ▽ More

    Submitted 14 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  11. arXiv:2406.01331  [pdf, other

    cs.IT eess.SP

    Performance Trade-off of Integrated Sensing and Communications for Multi-User Backscatter Systems

    Authors: Yuanming Tian, Dan Wang, Chuan Huang, Wei Zhang

    Abstract: This paper studies the performance trade-off in a multi-user backscatter communication (BackCom) system for integrated sensing and communications (ISAC), where the multi-antenna ISAC transmitter sends excitation signals to power multiple single-antenna passive backscatter devices (BD), and the multi-antenna ISAC receiver performs joint sensing (localization) and communication tasks based on the ba… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  12. arXiv:2406.00444  [pdf, other

    eess.SP

    Exploring Channel Estimation and Signal Detection for ODDM-based ISAC Systems

    Authors: Dezhi Wang, Chongwen Huang, Lei Liu, Xiaoming Chen, Wei Wang, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: Inspired by providing reliable communications for high-mobility scenarios, in this letter, we investigate the channel estimation and signal detection in integrated sensing and communication~(ISAC) systems based on the orthogonal delay-Doppler multiplexing~(ODDM) modulation, which consists of a pulse-train that can achieve the orthogonality with respect to the resolution of the delay-Doppler~(DD) p… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: accepted by IEEE Wireless Communications Letters

  13. arXiv:2405.18775  [pdf, other

    eess.SP

    Synchronization Scheme based on Pilot Sharing in Cell-Free Massive MIMO Systems

    Authors: Qihao Peng, Hong Ren, Zhendong Peng, Cunhua Pan, Maged Elkashlan, Dongming Wang, Jiangzhou Wang, Xiaohu You

    Abstract: This paper analyzes the impact of pilot-sharing scheme on synchronization performance in a scenario where several slave access points (APs) with uncertain carrier frequency offsets (CFOs) and timing offsets (TOs) share a common pilot sequence. First, the Cramer-Rao bound (CRB) with pilot contamination is derived for pilot-pairing estimation. Furthermore, a maximum likelihood algorithm is presented… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Submitted to IEEE Journal for pos

  14. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  15. arXiv:2405.17809  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

    Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, **yu Li, Sheng Zhao, Michael Zeng

    Abstract: There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models. The primary challenges stem from the inherent complex… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Work in progress

  16. arXiv:2405.17441  [pdf, other

    cs.NI cs.AI cs.CL eess.SY

    When Large Language Models Meet Optical Networks: Paving the Way for Automation

    Authors: Danshi Wang, Yidi Wang, Xiaotian Jiang, Yao Zhang, Yue Pang, Min Zhang

    Abstract: Since the advent of GPT, large language models (LLMs) have brought about revolutionary advancements in all walks of life. As a superior natural language processing (NLP) technology, LLMs have consistently achieved state-of-the-art performance on numerous areas. However, LLMs are considered to be general-purpose models for NLP tasks, which may encounter challenges when applied to complex tasks in s… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  17. arXiv:2405.16797  [pdf

    cs.SD cs.AI eess.AS

    A Real-Time Voice Activity Detection Based On Lightweight Neural

    Authors: Jidong Jia, Pei Zhao, Di Wang

    Abstract: Voice activity detection (VAD) is the task of detecting speech in an audio stream, which is challenging due to numerous unseen noises and low signal-to-noise ratios in real environments. Recently, neural network-based VADs have alleviated the degradation of performance to some extent. However, the majority of existing studies have employed excessively large models and incorporated future context,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  18. arXiv:2405.14770  [pdf, other

    eess.IV

    Physics-informed Score-based Diffusion Model for Limited-angle Reconstruction of Cardiac Computed Tomography

    Authors: Shuo Han, Yongshun Xu, Dayang Wang, Bahareh Morovati, Li Zhou, Jonathan S. Maltz, Ge Wang, Hengyong Yu

    Abstract: Cardiac computed tomography (CT) has emerged as a major imaging modality for the diagnosis and monitoring of cardiovascular diseases. High temporal resolution is essential to ensure diagnostic accuracy. Limited-angle data acquisition can reduce scan time and improve temporal resolution, but typically leads to severe image degradation and motivates for improved reconstruction techniques. In this pa… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 12 pages

  19. arXiv:2405.09470  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

    Authors: Weifei **, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie Hao, Ziyao Liu

    Abstract: In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to SecTL (AsiaCCS Workshop) 2024

  20. arXiv:2405.02660  [pdf, other

    cs.IT eess.SP

    AFDM Channel Estimation in Multi-Scale Multi-Lag Channels

    Authors: Rongyou Cao, Yuheng Zhong, Jiangbin Lyu, Deqing Wang, Liqun Fu

    Abstract: Affine Frequency Division Multiplexing (AFDM) is a brand new chirp-based multi-carrier (MC) waveform for high mobility communications, with promising advantages over Orthogonal Frequency Division Multiplexing (OFDM) and other MC waveforms. Existing AFDM research focuses on wireless communication at high carrier frequency (CF), which typically considers only Doppler frequency shift (DFS) as a resul… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures. Investigate AFDM under underwater multi-scale multi-lag channels. Derive the new input-output formula with the impact of Doppler time scaling. Propose two new channel estimation methods to tackle different level of Doppler factors. Perform diversity analyis based on CFR overlap probability (COP) and mutual incoherent property (MIP)

  21. arXiv:2404.07989  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.SD eess.AS

    Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

    Authors: Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li

    Abstract: Large foundation models have recently emerged as a prominent focus of interest, attaining superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts have been made to adapt pre-trained transformers from vision to 3D domains. However, such 2D-to-3D approaches are still limited, due to the potential loss of spatial geometries and high computation cost. More importantl… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point

  22. arXiv:2404.06690  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

    Authors: Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, **yu Li, Lei He, Sheng Zhao, Michael Zeng

    Abstract: Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-rou… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  23. Network-Assisted Full-Duplex Cell-Free mmWave Networks: Hybrid MIMO Processing and Multi-Agent DRL-Based Power Allocation

    Authors: Qingrui Fan, Yu Zhang, Jiamin Li, Dongming Wang, Hongbiao Zhang, Xiaohu You

    Abstract: This paper investigates the network-assisted full-duplex (NAFD) cell-free millimeter-wave (mmWave) networks, where the distribution of the transmitting access points (T-APs) and receiving access points (R-APs) across distinct geographical locations mitigates cross-link interference, facilitating the attainment of a truly flexible duplex mode. To curtail deployment expenses and power consumption fo… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 14 pages, 9 figures, published on Physical Communication

    Journal ref: Physical Communication, volume 64, pages 102350, 2024

  24. arXiv:2403.06901  [pdf, other

    eess.IV cs.AI cs.LG

    LIBR+: Improving Intraoperative Liver Registration by Learning the Residual of Biomechanics-Based Deformable Registration

    Authors: Dingrong Wang, Soheil Azadvar, Jon Heiselman, Xiajun Jiang, Michael Miga, Linwei Wang

    Abstract: The surgical environment imposes unique challenges to the intraoperative registration of organ shapes to their preoperatively-imaged geometry. Biomechanical model-based registration remains popular, while deep learning solutions remain limited due to the sparsity and variability of intraoperative measurements and the limited ground-truth deformation of an organ that can be obtained during the surg… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 12 pages, Medical Image Computing and Computer Assisted Intervention 2024

  25. arXiv:2403.06387  [pdf, other

    cs.SD eess.AS

    Towards Decoupling Frontend Enhancement and Backend Recognition in Monaural Robust ASR

    Authors: Yufeng Yang, Ashutosh Pandey, DeLiang Wang

    Abstract: It has been shown that the intelligibility of noisy speech can be improved by speech enhancement (SE) algorithms. However, monaural SE has not been established as an effective frontend for automatic speech recognition (ASR) in noisy conditions compared to an ASR model trained on noisy speech directly. The divide between SE and ASR impedes the progress of robust ASR systems, especially as SE has ma… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing. arXiv admin note: text overlap with arXiv:2210.13318

  26. arXiv:2403.03411  [pdf, other

    cs.SD eess.AS

    CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation

    Authors: Vahid Ahmadi Kalkhorani, DeLiang Wang

    Abstract: We introduce CrossNet, a complex spectral map** approach to speaker separation and enhancement in reverberant and noisy conditions. The proposed architecture comprises an encoder layer, a global multi-head self-attention module, a cross-band module, a narrow-band module, and an output layer. CrossNet captures global, cross-band, and narrow-band correlations in the time-frequency domain. To addre… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 9 pages

  27. arXiv:2403.02028  [pdf, other

    eess.SP

    Target Localization and Performance Trade-Offs in Cooperative ISAC Systems: A Scheme Based on 5G NR OFDM Signals

    Authors: Zhenkun Zhang, Hong Ren, Cunhua Pan, Sheng Hong, Dongming Wang, Jiangzhou Wang, Xiaohu You

    Abstract: The integration of sensing capabilities into communication systems, by sharing physical resources, has a significant potential for reducing spectrum, hardware, and energy costs while inspiring innovative applications. Cooperative networks, in particular, are expected to enhance sensing services by enlarging the coverage area and enriching sensing measurements, thus improving the service availabili… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  28. arXiv:2403.01153  [pdf, other

    eess.SP

    Transfer Learning-Enhanced Instantaneous Multi-Person Indoor Localization by CSI

    Authors: Zhiyuan He, Ke Deng, Jiangchao Gong, Yi Zhou, Desheng Wang

    Abstract: Passive indoor localization, integral to smart buildings, emergency response, and indoor navigation, has traditionally been limited by a focus on single-target localization and reliance on multi-packet CSI. We introduce a novel Multi-target loss, notably enhancing multi-person localization. Utilizing this loss function, our instantaneous CSI-ResNet achieves an impressive 99.21% accuracy at 0.6m pr… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  29. arXiv:2402.19124  [pdf, other

    eess.SP

    Analysis of Processing Pipelines for Indoor Human Tracking using FMCW radar

    Authors: Dingyang Wang, Francesco Fioranelli, Alexander Yarovoy

    Abstract: In this paper, the problem of formulating effective processing pipelines for indoor human tracking is investigated, with the usage of a Multiple Input Multiple Output (MIMO) Frequency Modulated Continuous Wave (FMCW) radar. Specifically, two processing pipelines starting with detections on the Range-Azimuth (RA) maps and the Range-Doppler (RD) maps are formulated and compared, together with subseq… ▽ More

    Submitted 15 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted for presentation at IEEE RadarConf'24, Denver, USA

  30. arXiv:2402.18017  [pdf, other

    eess.SP

    Hy-DAT: A Tool to Address Hydropower Modeling Gaps Using Interdependency, Efficiency Curves, and Unit Dispatch Models

    Authors: Dewei Wang, Bhaskar Mitra, Sameer Nekkalapu, Sohom Datta, Bibi Matthew, Rounak Meyur, Heng Wang, Slaven Kincic

    Abstract: As the power system continues to be flooded with intermittent resources, it becomes more important to accurately assess the role of hydro and its impact on the power grid. While hydropower generation has been studied for decades, dependency of power generation on water availability and constraints in hydro operation are not well represented in power system models used in the planning and operation… ▽ More

    Submitted 5 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  31. arXiv:2402.09423  [pdf, other

    eess.SP physics.data-an

    Online Mean Estimation for Multi-frame Optical Fiber Signals On Highways

    Authors: Linlin Wang, Mingxue Quan, Wei Wang, Dezhao Wang, Shanwen Wang

    Abstract: In the era of Big Data, prompt analysis and processing of data sets is critical. Meanwhile, statistical methods provide key tools and techniques to extract valuable insights and knowledge from complex data sets. This paper creatively applies statistical methods to the field of traffic, particularly focusing on the preprocessing of multi-frame signals obtained by optical fiber-based Distributed Aco… ▽ More

    Submitted 22 May, 2024; v1 submitted 20 January, 2024; originally announced February 2024.

    Comments: 10 pages, 11figures

  32. arXiv:2402.09422  [pdf, other

    eess.SP

    Traffic Flow and Speed Monitoring Based On Optical Fiber Distributed Acoustic Sensor

    Authors: Linlin Wang, Shixin Wang, Peng Wang, Wei Wang, Dezhao Wang, Yongcai Wang, Shanwen Wang

    Abstract: In the realm of intelligent transportation systems, accurate and reliable traffic monitoring is crucial. Traditional devices, such as cameras and lidars, face limitations in adverse weather conditions and complex traffic scenarios, prompting the need for more resilient technologies. This paper presents traffic flow monitoring method using optical fiber-based Distributed Acoustic Sensors (DAS). An… ▽ More

    Submitted 20 January, 2024; originally announced February 2024.

    Comments: 10 pages,23 figures, references added

  33. arXiv:2402.02730  [pdf, ps, other

    cs.SD eess.AS

    How phonemes contribute to deep speaker models?

    Authors: Pengqi Li, Tianhao Wang, Lantian Li, Askar Hamdulla, Dong Wang

    Abstract: Which phonemes convey more speaker traits is a long-standing question, and various perception experiments were conducted with human subjects. For speaker recognition, studies were conducted with the conventional statistical models and the drawn conclusions are more or less consistent with the perception results. However, which phonemes are more important with modern deep neural models is still une… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  34. arXiv:2402.02699  [pdf, other

    cs.SD cs.LG eess.AS

    Adversarial Data Augmentation for Robust Speaker Verification

    Authors: Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang

    Abstract: Data augmentation (DA) has gained widespread popularity in deep speaker models due to its ease of implementation and significant effectiveness. It enriches training data by simulating real-life acoustic variations, enabling deep neural networks to learn speaker-related representations while disregarding irrelevant acoustic variations, thereby improving robustness and generalization. However, a pot… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  35. arXiv:2401.17796  [pdf, other

    cs.SD eess.AS

    Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction

    Authors: Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng

    Abstract: Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech by improving the intelligibility and naturalness. This is a challenging task especially for patients with severe dysarthria and speaking in complex, noisy acoustic environments. To address these challenges, we propose a novel multi-modal framework to utilize visual information, e.g., lip movements, in DSR… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  36. arXiv:2401.14664  [pdf, other

    cs.SD cs.CL eess.AS

    UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization

    Authors: Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng

    Abstract: Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric speech into normal-sounding speech. The technology eases communication with speakers affected by the neuromotor disorder and enhances their social inclusion. NED-based (Neural Encoder-Decoder) systems have significantly improved the intelligibility of the reconstructed speech as compared with GAN-based (Generati… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  37. arXiv:2401.14269  [pdf, other

    eess.AS cs.SD

    Combined Generative and Predictive Modeling for Speech Super-resolution

    Authors: Heming Wang, Eric W. Healy, DeLiang Wang

    Abstract: Speech super-resolution (SR) is the task that restores high-resolution speech from low-resolution input. Existing models employ simulated data and constrained experimental settings, which limit generalization to real-world SR. Predictive models are known to perform well in fixed experimental settings, but can introduce artifacts in adverse conditions. On the other hand, generative models learn the… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  38. arXiv:2401.00444  [pdf, other

    eess.SP

    RIS-Enabled Integrated Sensing and Communication for 6G Systems

    Authors: Dexin Wang, Ahmad Bazzi, Marwa Chafii

    Abstract: The following paper proposes a new target localization system design using an architecture based on reconfigurable intelligent surfaces (RISs) and passive radars (PRs) for integrated sensing and communications systems. The preamble of the communication signal is exploited in order to perform target sensing tasks, which involve detection and localization. The RIS in this case can aid the PR in sens… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Journal ref: IEEE Wireless Communications and Networking Conference, 2024

  39. Implementing Digital Twin in Field-Deployed Optical Networks: Uncertain Factors, Operational Guidance, and Field-Trial Demonstration

    Authors: Yuchen Song, Min Zhang, Yao Zhang, Yan Shi, Shikui Shen, Bingli Guo, Shanguo Huang, Danshi Wang

    Abstract: Digital twin has revolutionized optical communication networks by enabling their full life-cycle management, including design, troubleshooting, optimization, upgrade, and prediction. While extensive literature exists on frameworks, standards, and applications of digital twin, there is a pressing need in implementing digital twin in field-deployed optical networks operating in real-world environmen… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 10 pages, 5 figures Accepted by IEEE Network Magazine, early access

  40. arXiv:2312.03129  [pdf, other

    eess.AS cs.SD

    Leveraging Laryngograph Data for Robust Voicing Detection in Speech

    Authors: Yixuan Zhang, Heming Wang, DeLiang Wang

    Abstract: Accurately detecting voiced intervals in speech signals is a critical step in pitch tracking and has numerous applications. While conventional signal processing methods and deep learning algorithms have been proposed for this task, their need to fine-tune threshold parameters for different datasets and limited generalization restrict their utility in real-world applications. To address these chall… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  41. arXiv:2311.17624  [pdf, other

    eess.SP cs.NI

    Combating Multi-path Interference to Improve Chirp-based Underwater Acoustic Communication

    Authors: Wenjun Xie, Enqi Zhang, Lizhao You, Deqing Wang, Zhaorui Wang, Liqun Fu

    Abstract: Linear chirp-based underwater acoustic communication has been widely used due to its reliability and long-range transmission capability. However, unlike the counterpart chirp technology in wireless -- LoRa, its throughput is severely limited by the number of modulated chirps in a symbol. The fundamental challenge lies in the underwater multi-path channel, where the delayed copied of one symbol may… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  42. arXiv:2311.16810  [pdf, other

    eess.SP cs.NI eess.SY

    A Short Overview of 6G V2X Communication Standards

    Authors: Donglin Wang, Yann Nana Nganso, Hans D. Schotten

    Abstract: We are on the verge of a new age of linked autonomous cars with unheard-of user experiences, dramatically improved air quality and road safety, extremely varied transportation settings, and a plethora of cutting-edge apps. A substantially improved Vehicle-to-Everything (V2X) communication network that can simultaneously support massive hyper-fast, ultra-reliable, and low-latency information exchan… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 7 pages, 2 figures, IEEE ICN 2023

  43. arXiv:2311.13787  [pdf, other

    eess.SP

    A Fast Power Spectrum Sensing Solution for Generalized Coprime Sampling

    Authors: Kaili Jiang, Dechang Wang, Kailun Tian, Hancong Feng, Yuxin Zhao, Junyu Yuan, Bin Tang

    Abstract: The growing scarcity of spectrum resources, wideband spectrum sensing is required to process a prohibitive volume of data at a high sampling rate. For some applications, spectrum estimation only requires second-order statistics. In this case, a fast power spectrum sensing solution is proposed based on the generalized coprime sampling. By exploring the sensing vector inherent structure, the autocor… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  44. arXiv:2311.13626  [pdf, other

    eess.IV cs.AI cs.IR physics.optics

    Physics-driven generative adversarial networks empower single-pixel infrared hyperspectral imaging

    Authors: Dong-Yin Wang, Shu-Hang Bie, Xi-Hao Chen, Wen-Kai Yu

    Abstract: A physics-driven generative adversarial network (GAN) was established here for single-pixel hyperspectral imaging (HSI) in the infrared spectrum, to eliminate the extensive data training work required by traditional data-driven model. Within the GAN framework, the physical process of single-pixel imaging (SPI) was integrated into the generator, and the actual and estimated one-dimensional (1D) buc… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 14 pages, 8 figures

  45. arXiv:2311.08630  [pdf, other

    eess.AS cs.SD

    Multi-channel Conversational Speaker Separation via Neural Diarization

    Authors: Hassan Taherian, DeLiang Wang

    Abstract: When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems substantially degrades as they are designed for single-talker speech. To enhance ASR performance in conversational or meeting environments, continuous speaker separation (CSS) is commonly employed. However, CSS requires a short separation window to avoid many speakers inside the window and sequential… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 10 pages, 4 figures

  46. arXiv:2311.07758  [pdf, other

    eess.SY

    Synchrophasor Data Anomaly Detection on Grid Edge by 5G Communication and Adjacent Compute

    Authors: Chuan Qin, Dexin Wang, Kishan Prudhvi Guddanti, Xiaoyuan Fan, Zhangshuan Hou

    Abstract: The fifth-generation mobile communication (5G) technology offers opportunities to enhance the real-time monitoring of grids. The 5G-enabled phasor measurement units (PMUs) feature flexible positioning and cost-effective long-term maintenance without the constraints of fixing wires. This paper is the first to demonstrate the applicability of 5G in PMU communication, and the experiment was carried o… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 5 pages, 4 figures

  47. arXiv:2311.06003  [pdf, ps, other

    eess.SP

    Passive Integrated Sensing and Communication Scheme based on RF Fingerprint Information Extraction for Cell-Free RAN

    Authors: **gxuan Yu, Fan Zeng, Jiamin Li, Feiyang Liu, Pengcheng Zhu, Dongming Wang, Xiaohu You

    Abstract: This paper investigates how to achieve integrated sensing and communication (ISAC) based on a cell-free radio access network (CF-RAN) architecture with a minimum footprint of communication resources. We propose a new passive sensing scheme. The scheme is based on the radio frequency (RF) fingerprint learning of the RF radio unit (RRU) to build an RF fingerprint library of RRUs. The source RRU is i… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 11 pages, 6 figures, submitted on 28-Feb-2023, China Communication, Accepted on 14-Sep-2023

  48. arXiv:2311.05101  [pdf, other

    cs.IT eess.SP

    Integrated Sensing and Communication for Network-Assisted Full-Duplex Cell-Free Distributed Massive MIMO Systems

    Authors: Fan Zeng, **gxuan Yu, Jiamin Li, Feiyang Liu, Dongming Wang, Xiaohu You

    Abstract: In this paper, we combine the network-assisted full-duplex (NAFD) technology and distributed radar sensing to implement integrated sensing and communication (ISAC). The ISAC system features both uplink and downlink remote radio units (RRUs) equipped with communication and sensing capabilities. We evaluate the communication and sensing performance of the system using the sum communication rates and… ▽ More

    Submitted 13 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 14 pages, 7 figures,submit to China Communication February 28, 2023, date of major revision July 09, 2023

  49. arXiv:2310.12405  [pdf, other

    eess.IV cs.CV

    LoMAE: Low-level Vision Masked Autoencoders for Low-dose CT Denoising

    Authors: Dayang Wang, Yongshun Xu, Shuo Han, Zhan Wu, Li Zhou, Bahareh Morovati, Hengyong Yu

    Abstract: Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings.… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  50. arXiv:2310.05352  [pdf, other

    cs.CL cs.SD eess.AS

    A Glance is Enough: Extract Target Sentence By Looking at A keyword

    Authors: Ying Shi, Dong Wang, Lantian Li, Jiqing Han

    Abstract: This paper investigates the possibility of extracting a target sentence from multi-talker speech using only a keyword as input. For example, in social security applications, the keyword might be "help", and the goal is to identify what the person who called for help is articulating while ignoring other speakers. To address this problem, we propose using the Transformer architecture to embed both t… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: submitted to ICASSP 2024