Skip to main content

Showing 1–50 of 127 results for author: Dai, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.15802  [pdf, other

    cs.IT eess.SP

    Coded Beam Training for RIS Assisted Wireless Communications

    Authors: Yuhao Chen, Linglong Dai

    Abstract: Reconfigurable intelligent surface (RIS) is considered as one of the key technologies for future 6G communications. To fully unleash the performance of RIS, accurate channel state information (CSI) is crucial. Beam training is widely utilized to acquire the CSI. However, before aligning the beam correctly to establish stable connections, the signal-to-noise ratio (SNR) at UE is inevitably low, whi… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: In this paper, we exploit the coded beam training framework in RIS systems. By applying the idea of channel coding in the beam training process, we can leverage the error correction capability of channel coding to enhance the reliability of beam training under low SNR. Simulation codes will be provided at: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  2. arXiv:2406.12270  [pdf, other

    cs.IT eess.SP

    Sparse MIMO for ISAC: New Opportunities and Challenges

    Authors: Xinrui Li, Hongqi Min, Yong Zeng, Shi **, Linglong Dai, Yifei Yuan, Rui Zhang

    Abstract: Multiple-input multiple-output (MIMO) has been a key technology of wireless communications for decades. A typical MIMO system employs antenna arrays with the inter-antenna spacing being half of the signal wavelength, which we term as compact MIMO. Looking forward towards the future sixth-generation (6G) mobile communication networks, MIMO system will achieve even finer spatial resolution to not on… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.07989  [pdf, other

    cs.IT eess.SP

    Near-Field Wideband Beam Training Based on Distance-Dependent Beam Split

    Authors: Tianyue Zheng, Mingyao Cui, Zidong Wu, Linglong Dai

    Abstract: Near-field beam training is essential for acquiring channel state information in 6G extremely large-scale multiple input multiple output (XL-MIMO) systems. To achieve low-overhead beam training, existing method has been proposed to leverage the near-field beam split effect, which deploys true-time-delay arrays to simultaneously search multiple angles of the entire angular range in a distance ring… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2406.05325  [pdf, other

    eess.AS cs.SD

    LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance

    Authors: Shihao Chen, Yu Gu, Jie Zhang, Na Li, Rilin Chen, Li** Chen, Lirong Dai

    Abstract: Any-to-any singing voice conversion (SVC) is an interesting audio editing technique, aiming to convert the singing voice of one singer into that of another, given only a few seconds of singing data. However, during the conversion process, the issue of timbre leakage is inevitable: the converted singing voice still sounds like the original singer's voice. To tackle this, we propose a latent diffusi… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  5. arXiv:2406.04881  [pdf, other

    cs.IT eess.SP

    MIMO Capacity Analysis and Channel Estimation for Electromagnetic Information Theory

    Authors: Jieao Zhu, Vincent Y. F. Tan, Linglong Dai

    Abstract: Electromagnetic information theory (EIT) is an interdisciplinary subject that serves to integrate deterministic electromagnetic theory with stochastic Shannon's information theory. Existing EIT analysis operates in the continuous space domain, which is not aligned with the practical algorithms working in the discrete space domain. This mismatch leads to a significant difficulty in application of E… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Submitted to the IEEE TWC. In this paper, we established the discrete-continuous correspondence for electromagnetic information theory (EIT), thus enabling analytical tools in the continuous space domain to be applied to discrete space MIMO architectures. Simulation codes will be provided at http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  6. arXiv:2405.11352  [pdf, other

    cs.NI eess.SP

    Hierarchical Reinforcement Learning Empowered Task Offloading in V2I Networks

    Authors: Xinyu You, Haojie Yan, Yuedong Xu, Lifeng Wang, Liangui Dai

    Abstract: Edge computing plays an essential role in the vehicle-to-infrastructure (V2I) networks, where vehicles offload their intensive computation tasks to the road-side units for saving energy and reduce the latency. This paper designs the optimal task offloading policy to address the concerns involving processing delay, energy consumption and edge computing cost. Each computation task consisting of some… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  7. arXiv:2405.10496  [pdf, other

    cs.IT eess.SP

    Electromagnetic Information Theory for Holographic MIMO Communications

    Authors: Li Wei, Tierui Gong, Chongwen Huang, Zhaoyang Zhang, Wei E. I. Sha, Zhi Ning Chen, Linglong Dai, Merouane Debbah, Chau Yuen

    Abstract: Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far it… ▽ More

    Submitted 25 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  8. arXiv:2404.06806  [pdf, other

    cs.IT eess.SP

    Near-Optimal Channel Estimation for Dense Array Systems

    Authors: Mingyao Cui, Zijian Zhang, Linglong Dai, Kaibin Huang

    Abstract: By deploying a large number of antennas with sub-half-wavelength spacing in a compact space, dense array systems(DASs) can fully unleash the multiplexing-and-diversity gains of limited apertures. To acquire these gains, accurate channel state information acquisition is necessary but challenging due to the large antenna numbers. To overcome this obstacle, this paper reveals that exploiting the high… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 19 pages, 10 figures

  9. arXiv:2403.17770  [pdf, other

    eess.IV cs.CV

    CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

    Authors: Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, Xiaofan Zhang

    Abstract: Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  10. arXiv:2403.16062  [pdf

    eess.SP

    Holography inspired self-controlled reconfigurable intelligent surface

    Authors: Jieao Zhu, Ze Gu, Qian Ma, Linglong Dai, Tie Jun Cui

    Abstract: Among various promising candidate technologies for the sixth-generation (6G) wireless communications, recent advances in microwave metasurfaces have sparked a new research area of reconfigurable intelligent surfaces (RISs). By controllably reprogramming the wireless propagation channel, RISs are envisioned to achieve low-cost wireless capacity boosting, coverage extension, and enhanced energy effi… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Traditional BS-controlled RISs suffer from complicated control cables. To "cut" the control cables, we propose a self-controlled RIS by leveraging the holographic interference principle, thus realizing autonomous RIS beamforming

  11. arXiv:2403.12268  [pdf, other

    cs.IT eess.SP

    Near-Field Channel Modeling for Electromagnetic Information Theory

    Authors: Zhongzhichao Wan, Jieao Zhu, Linglong Dai

    Abstract: Electromagnetic information theory (EIT) is one of the emerging topics for 6G communication due to its potential to reveal the performance limit of wireless communication systems. For EIT, the research foundation is reasonable and accurate channel modeling. Existing channel modeling works for EIT in non-line-of-sight (NLoS) scenario focus on far-field modeling, which can not accurately capture the… ▽ More

    Submitted 26 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: In this paper, we propose the near-field channel model for EIT based on electromagnetic scattering theory. Then, we derive the analytical expression of the correlation function of the fields and analyze the characteristics of it. Finally, we design a channel estimation scheme for near-field scenario

  12. arXiv:2403.07247  [pdf, other

    eess.IV cs.CV cs.LG

    GuideGen: A Text-guided Framework for Joint CT Volume and Anatomical structure Generation

    Authors: Linrui Dai, Rongzhao Zhang, Zhongzhen Huang, Xiaofan Zhang

    Abstract: The annotation burden and extensive labor for gathering a large medical dataset with images and corresponding labels are rarely cost-effective and highly intimidating. This results in a lack of abundant training data that undermines downstream tasks and partially contributes to the challenge image analysis faces in the medical field. As a workaround, given the recent success of generative neural m… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: submitted to MICCAI2024

  13. arXiv:2403.05970  [pdf, other

    cs.IT eess.SP

    Electromagnetic Hybrid Beamforming for Holographic Communications

    Authors: Ran Ji, Chongwen Huang, Xiaoming Chen, Wei E. I. Sha, Linglong Dai, Jiguang He, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: It is well known that there is inherent radiation pattern distortion for the commercial base station antenna array, which usually needs three antenna sectors to cover the whole space. To eliminate pattern distortion and further enhance beamforming performance, we propose an electromagnetic hybrid beamforming (EHB) scheme based on a three-dimensional (3D) superdirective holographic antenna array. S… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 13 pages

  14. arXiv:2402.02688  [pdf, ps, other

    cs.IT eess.SP eess.SY

    Successive Bayesian Reconstructor for FAS Channel Estimation

    Authors: Zijian Zhang, Jieao Zhu, Linglong Dai, Robert W. Heath Jr

    Abstract: Fluid antenna systems (FASs) can reconfigure their locations freely within a spatially continuous space. To keep favorable antenna positions, the channel state information (CSI) acquisition for FASs is essential. While some techniques have been proposed, most existing FAS channel estimators require several channel assumptions, such as slow variation and angular-domain sparsity. When these assumpti… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted by IEEE WCNC 2024. This paper proposes S-BAR as a general solution to estimate FAS channels. More insights can be found in the journal version of this paper: arXiv:2312.06551. arXiv admin note: substantial text overlap with arXiv:2312.06551

  15. Adversarial speech for voice privacy protection from Personalized Speech generation

    Authors: Shihao Chen, Li** Chen, Jie Zhang, KongAik Lee, Zhenhua Ling, Lirong Dai

    Abstract: The rapid progress in personalized speech generation technology, including personalized text-to-speech (TTS) and voice conversion (VC), poses a challenge in distinguishing between generated and real speech for human listeners, resulting in an urgent demand in protecting speakers' voices from malicious misuse. In this regard, we propose a speaker protection method based on adversarial attacks. The… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted by icassp 2024

  16. arXiv:2401.03468  [pdf, other

    eess.AS cs.SD

    Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

    Authors: Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, Lirong Dai

    Abstract: Self-supervised speech pre-training methods have developed rapidly in recent years, which show to be very effective for many near-field single-channel speech tasks. However, far-field multichannel speech processing is suffering from the scarcity of labeled multichannel data and complex ambient noises. The efficacy of self-supervised learning for far-field multichannel and multi-modal speech proces… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  17. arXiv:2401.01673  [pdf, other

    cs.IT eess.SP

    Coded Beam Training

    Authors: Tianyue Zheng, Jieao Zhu, Qiumo Yu, Yongli Yan, Linglong Dai

    Abstract: In extremely large-scale multiple input multiple output (XL-MIMO) systems for future sixth-generation (6G) communications, codebook-based beam training stands out as a promising technology to acquire channel state information (CSI). Despite their effectiveness, when the pilot overhead is limited, existing beam training methods suffer from significant achievable rate degradation for remote users wi… ▽ More

    Submitted 6 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: In this paper, we introduce channel coding theory into hierarchical beam training and propose a beam training scheme called coded beam training. By leveraging the error-correcting capability of channel codes, the proposed coded beam training method can enable reliable beam training performance for remote users with low SNR, while kee** training overhead low

  18. arXiv:2312.06551  [pdf, ps, other

    cs.IT eess.SP eess.SY

    Successive Bayesian Reconstructor for Channel Estimation in Fluid Antenna Systems

    Authors: Zijian Zhang, Jieao Zhu, Linglong Dai, Robert W. Heath Jr

    Abstract: Fluid antenna systems (FASs) can reconfigure their antenna locations freely within a spatially continuous space. To keep favorable antenna positions, the channel state information (CSI) acquisition for FASs is essential. While some techniques have been proposed, most existing FAS channel estimators require several channel assumptions, such as slow variation and angular-domain sparsity. When these… ▽ More

    Submitted 17 January, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: 13 pages, 8 figures. This paper proposes S-BAR as a general solution to estimate FAS channels. Unlike model-based estimators, the proposed S-BAR is prior-aided, which builds the experiential kernel for CSI acquisition. Simulation codes will be provided at: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  19. arXiv:2311.08024  [pdf, other

    eess.IV cs.CV cs.LG

    MD-IQA: Learning Multi-scale Distributed Image Quality Assessment with Semi Supervised Learning for Low Dose CT

    Authors: Tao Song, Ruizhi Hou, Lisong Dai, Lei Xiang

    Abstract: Image quality assessment (IQA) plays a critical role in optimizing radiation dose and develo** novel medical imaging techniques in computed tomography (CT). Traditional IQA methods relying on hand-crafted features have limitations in summarizing the subjective perceptual experience of image quality. Recent deep learning-based approaches have demonstrated strong modeling capabilities and potentia… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  20. arXiv:2310.15901  [pdf, other

    eess.SP

    Enhancing Energy Efficiency for Reconfigurable Intelligent Surfaces with Practical Power Models

    Authors: Zhiyi Li, Jida Zhang, Jieao Zhu, Shi **, Linglong Dai

    Abstract: Reconfigurable intelligent surfaces (RISs) are widely considered a promising technology for future wireless communication systems. As an important indicator of RIS-assisted communication systems in green wireless communications, energy efficiency (EE) has recently received intensive research interest as an optimization target. However, most previous works have ignored the different power consumpti… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Reconfigurable intelligent surface is a promising 6G technology. However, RIS power models are inaccurate. In this paper, we construct a practical power model for RIS communication systems with an SDP-relaxation algorithm, achieving optimal energy efficiency

  21. arXiv:2310.12446  [pdf, other

    cs.IT eess.SP

    Can Electromagnetic Information Theory Improve Wireless Systems? A Channel Estimation Example

    Authors: Jieao Zhu, Zhongzhichao Wan, Linglong Dai, Tie Jun Cui

    Abstract: Electromagnetic information theory (EIT) is an emerging interdisciplinary subject that integrates classical Maxwell electromagnetics and Shannon information theory. The goal of EIT is to uncover the information transmission mechanisms from an electromagnetic (EM) perspective in wireless systems. Existing works on EIT are mainly focused on the analysis of EM channel characteristics, degrees-of-free… ▽ More

    Submitted 6 February, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Electromagnetic information theory (EIT) is an emerging interdisciplinary subject, aiming at providing a unified analytical framework for wireless systems as well as guiding practical system design. This paper answers the question: "Whether can we improve wireless communication systems via EIT"?

  22. arXiv:2310.00687  [pdf, ps, other

    eess.SP

    DISCO Might Not Be Funky: Random Intelligent Reflective Surface Configurations That Attack

    Authors: Huan Huang, Lipeng Dai, Hongliang Zhang, Chongfu Zhang, Zhongxing Tian, Yi Cai, A. Lee Swindlehurst, Zhu Han

    Abstract: Emerging intelligent reflective surfaces (IRSs) significantly improve system performance, but also pose a significant risk for physical layer security (PLS). Unlike the extensive research on legitimate IRS-enhanced communications, in this article we present an adversarial IRS-based fully-passive jammer (FPJ). We describe typical application scenarios for Disco IRS (DIRS)-based FPJ, where an illegi… ▽ More

    Submitted 10 June, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by IEEE Wireless Communications. For the code of the DISCO RIS is available on Github (https://github.com/huanhuan1799/Disco-Intelligent-Reflecting-Surfaces-Active-Channel-Aging-for-Fully-Passive-Jamming-Attacks)

  23. arXiv:2309.09242  [pdf, ps, other

    cs.IT eess.SP

    Toward Beamfocusing-Aided Near-Field Communications: Research Advances, Potential, and Challenges

    Authors: Jiancheng An, Chau Yuen, Linglong Dai, Marco Di Renzo, Merouane Debbah, Lajos Hanzo

    Abstract: Next-generation mobile networks promise to support high throughput, massive connectivity, and improved energy efficiency. To achieve these ambitious goals, extremely large-scale antenna arrays (ELAAs) and terahertz communications constitute a pair of promising technologies. This will result in future wireless communications occurring in the near-field regions. To accurately portray the channel cha… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: 8 pages, 5 figures, 1 table

  24. arXiv:2308.15716  [pdf, ps, other

    eess.SP

    Anti-Jamming Precoding Against Disco Intelligent Reflecting Surfaces Based Fully-Passive Jamming Attacks

    Authors: Huan Huang, Lipeng Dai, Hongliang Zhang, Zhongxing Tian, Yi Cai, Chongfu Zhang, A. Lee Swindlehurst, Zhu Han

    Abstract: Emerging intelligent reflecting surfaces (IRSs) significantly improve system performance, but also pose a huge risk for physical layer security. Existing works have illustrated that a disco IRS (DIRS), i.e., an illegitimate IRS with random time-varying reflection properties (like a "disco ball"), can be employed by an attacker to actively age the channels of legitimate users (LUs). Such active cha… ▽ More

    Submitted 24 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: This paper has been submitted for possible publication

  25. arXiv:2308.14553  [pdf, other

    eess.AS cs.SD

    Rep2wav: Noise Robust text-to-speech Using self-supervised representations

    Authors: Qiushi Zhu, Yu Gu, Rilin Chen, Chao Weng, Yuchen Hu, Lirong Dai, Jie Zhang

    Abstract: Benefiting from the development of deep learning, text-to-speech (TTS) techniques using clean speech have achieved significant performance improvements. The data collected from real scenes often contains noise and generally needs to be denoised by speech enhancement models. Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background… ▽ More

    Submitted 3 September, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: 5 pages,2 figures

  26. arXiv:2307.16518  [pdf, other

    cs.IT eess.SP

    Continuous-Time Channel Prediction Based on Tensor Neural Ordinary Differential Equation

    Authors: Mingyao Cui, Hao Jiang, Yuhao Chen, Yang Du, Linglong Dai

    Abstract: Channel prediction is critical to address the channel aging issue in mobile scenarios. Existing channel prediction techniques are mainly designed for discrete channel prediction, which can only predict the future channel in a fixed time slot per frame, while the other intra-frame channels are usually recovered by interpolation. However, these approaches suffer from a serious interpolation loss, es… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: A tensor neural ODE based method is proposed to predict continuous-time wireless channels

  27. arXiv:2307.12307  [pdf, other

    cs.IT eess.SP

    Robust Weighted Sum-Rate Maximization for Transmissive RIS Transmitter Enabled RSMA Networks

    Authors: Bojiang Li, Wen Chen, Zhendong Li, Qingqing Wu, Nan Cheng, Changle Li, Linglong Dai

    Abstract: Due to the low power consumption and low cost nature of transmissive reconfigurable intelligent surface (RIS),in this paper, we propose a downlink multi-user rate-splitting multiple access (RSMA) architecture based on the transmissive RIS transmitter, where the channel state information (CSI) is only accquired partially. We investigate the weighted sum-rate maximization problem by jointly optimizi… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

  28. arXiv:2306.16206  [pdf, other

    eess.SP cs.IT

    Near-Field Beam Management for Extremely Large-Scale Array Communications

    Authors: Changsheng You, Yunpu Zhang, Chenyu Wu, Yong Zeng, Beixiong Zheng, Li Chen, Linglong Dai, A. Lee Swindlehurst

    Abstract: Extremely large-scale arrays (XL-arrays) have emerged as a promising technology to achieve super-high spectral efficiency and spatial resolution in future wireless systems. The large aperture of XL-arrays means that spherical rather than planar wavefronts must be considered, and a paradigm shift from far-field to near-field communications is necessary. Unlike existing works that have mainly consid… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: We studied the new near-field beam management for XL-arrays. This paper has been submitted to IEEE for possible publication

  29. arXiv:2306.02759  [pdf, other

    eess.SP

    On the Role of ViT and CNN in Semantic Communications: Analysis and Prototype Validation

    Authors: Hanju Yoo, Linglong Dai, Songkuk Kim, Chan-Byoung Chae

    Abstract: Semantic communications have shown promising advancements by optimizing source and channel coding jointly. However, the dynamics of these systems remain understudied, limiting research and performance gains. Inspired by the robustness of Vision Transformers (ViTs) in handling image nuisances, we propose a ViT-based model for semantic communications. Our approach achieves a peak signal-to-noise rat… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  30. arXiv:2305.12459  [pdf, other

    eess.AS cs.SD

    CASA-ASR: Context-Aware Speaker-Attributed ASR

    Authors: Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai

    Abstract: Recently, speaker-attributed automatic speech recognition (SA-ASR) has attracted a wide attention, which aims at answering the question ``who spoke what''. Different from modular systems, end-to-end (E2E) SA-ASR minimizes the speaker-dependent recognition errors directly and shows a promising applicability. In this paper, we propose a context-aware SA-ASR (CASA-ASR) model by enhancing the contextu… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech2023

  31. arXiv:2305.12450  [pdf, other

    eess.AS cs.SD

    Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction

    Authors: Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai

    Abstract: For speech interaction, voice activity detection (VAD) is often used as a front-end. However, traditional VAD algorithms usually need to wait for a continuous tail silence to reach a preset maximum duration before segmentation, resulting in a large latency that affects user experience. In this paper, we propose a novel semantic VAD for low-latency segmentation. Different from existing methods, a f… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech2023

  32. arXiv:2305.12111  [pdf, other

    eess.AS cs.SD

    Joint Generative-Contrastive Representation Learning for Anomalous Sound Detection

    Authors: Xiao-Min Zeng, Yan Song, Zhu Zhuo, Yu Zhou, Yu-Hong Li, Hui Xue, Li-Rong Dai, Ian McLoughlin

    Abstract: In this paper, we propose a joint generative and contrastive representation learning method (GeCo) for anomalous sound detection (ASD). GeCo exploits a Predictive AutoEncoder (PAE) equipped with self-attention as a generative model to perform frame-level prediction. The output of the PAE together with original normal samples, are used for supervised contrastive representative learning in a multi-t… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted by ICASSP2023

  33. Reconfigurable Intelligent Surfaces for 6G: Nine Fundamental Issues and One Critical Problem

    Authors: Zijian Zhang, Linglong Dai

    Abstract: Thanks to the recent advances in metamaterials, reconfigurable intelligent surface (RIS) has emerged as a promising technology for future 6G wireless communications. Benefiting from its high array gain, low cost, and low power consumption, RISs are expected to greatly enlarge signal coverage, improve system capacity, and increase energy efficiency. In this article, we systematically overview the e… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: To appear in TST as an invited paper. This paper discusses nine fundamental issues and one critical problem of RISs. Highly related works can be found at arxiv:2103.15154

  34. arXiv:2305.02875  [pdf, other

    eess.SP

    The Manifestation of Spatial Wideband Effect in Circular Array: From Beam Split to Beam Defocus

    Authors: Zidong Wu, Linglong Dai

    Abstract: Millimeter-wave (mmWave) and terahertz (THz) communications with hybrid precoding architectures have been regarded as energy-efficient solutions to fulfill the vision of high-speed transmissions for 6G communications. Benefiting from the advantages of providing a wide scan range and flat array gain, the uniform circular array (UCA) has attracted much attention. However, the growing bandwidth of mm… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: In this paper, the mechanism of the beam defocus effect in circular array systems is investigated for the first time. The delay-phase precoding architecture is employed to mitigate the beam defocus effect. Simulation codes will be provided to reproduce the results: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  35. arXiv:2305.01980  [pdf, other

    cs.SD eess.AS

    Diverse and Vivid Sound Generation from Text Descriptions

    Authors: Guangwei Li, Xuenan Xu, Lingfeng Dai, Mengyue Wu, Kai Yu

    Abstract: Previous audio generation mainly focuses on specified sound classes such as speech or music, whose form and content are greatly restricted. In this paper, we go beyond specific audio generation by using natural language description as a clue to generate broad sounds. Unlike visual information, a text description is concise by its nature but has rich hidden meanings beneath, which poses a higher po… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  36. arXiv:2303.03689  [pdf, other

    eess.AS cs.SD

    AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer

    Authors: Kang Li, Yan Song, Li-Rong Dai, Ian McLoughlin, Xin Fang, Lin Liu

    Abstract: In this paper, we propose an effective sound event detection (SED) method based on the audio spectrogram transformer (AST) model, pretrained on the large-scale AudioSet for audio tagging (AT) task, termed AST-SED. Pretrained AST models have recently shown promise on DCASE2022 challenge task4 where they help mitigate a lack of sufficient real annotated data. However, mainly due to differences betwe… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: accepted to ICASSP 2023

  37. arXiv:2301.09082  [pdf, other

    eess.SP cs.IT

    Location Division Multiple Access for Near-Field Communications

    Authors: Zidong Wu, Linglong Dai

    Abstract: Spatial division multiple access (SDMA) is essential to improve the spectrum efficiency for multi-user multiple-input multiple-output (MIMO) communications. The classical SDMA for massive MIMO with hybrid precoding heavily relies on the angular orthogonality in the far field to distinguish multiple users at different angles, which fails to fully exploit spatial resources in the distance domain. Wi… ▽ More

    Submitted 22 January, 2023; originally announced January 2023.

    Comments: Accepted by IEEE ICC 2023. This paper investigates the concept of location division multiple access (LDMA) to exploit extra spatial resources in distance domain for multiple access, exploring a new possibility to enhance spectrum efficiency. The journal version is: arXiv:2208.06349. Simulation codes are provided at: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  38. arXiv:2301.03035  [pdf, ps, other

    cs.IT eess.SP

    Cross Far- and Near-field Wireless Communications in Terahertz Ultra-large Antenna Array Systems

    Authors: Chong Han, Yuhang Chen, Longfei Yan, Zhi Chen, Linglong Dai

    Abstract: Terahertz (THz) band owning the abundant multi-ten-GHz bandwidth is capable to support Terabit-per-second wireless communications, which is a pillar technology for 6G and beyond systems. With sub-millimeter-long antennas, ultra-massive (UM) MIMO and intelligent surface (IS) systems with thousands of array elements are exploited to effectively combat the distance limitation and blockage problems, w… ▽ More

    Submitted 3 August, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

  39. arXiv:2301.00161  [pdf, other

    cs.IT cs.AR eess.SP eess.SY

    Active RISs: Signal Modeling, Asymptotic Analysis, and Beamforming Design

    Authors: Zijian Zhang, Linglong Dai, Xibi Chen, Changhao Liu, Fan Yang, Robert Schober, H. Vincent Poor

    Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a candidate technology for future 6G networks. However, due to the "multiplicative fading" effect, the existing passive RISs only achieve a negligible capacity gain in environments with strong direct links. In this paper, the concept of active RISs is studied to overcome this fundamental limitation. Unlike the existing passive RISs that re… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

    Comments: Accepted by IEEE GLOBECOM 2022. This paper includes a 64-element active RIS aided wireless communication prototype and the field test results. The journal version is at: arXiv:2103.15154. Simulation codes are provided at: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

    Journal ref: IEEE GLOBECOM 2022

  40. arXiv:2212.14654  [pdf, other

    cs.IT eess.SP

    Enabling More Users to Benefit from Near-Field Communications: From Linear to Circular Array

    Authors: Zidong Wu, Mingyao Cui, Linglong Dai

    Abstract: Massive multiple-input multiple-output (MIMO) for 5G is evolving into the extremely large-scale antenna array (ELAA) to increase the spectrum efficiency by orders of magnitude for 6G communications. ELAA introduces spherical-wave-based near-field communications, where channel capacity can be significantly improved for single-user and multi-user scenarios. Unfortunately, the near-field region at la… ▽ More

    Submitted 30 October, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

    Comments: Accepted by IEEE TWC. In this paper, the rotational symmetry of UCA is leveraged to provide uniform and enlarged near-field regions, enabling more users to benefit from near-field communications. Simulation codes will be provided to reproduce the results in this paper: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  41. arXiv:2212.08401  [pdf, other

    cs.IT eess.SP

    Near-Field Wideband Channel Estimation for Extremely Large-Scale MIMO

    Authors: Mingyao Cui, Linglong Dai

    Abstract: Extremely large-scale multiple-input-multiple-output (XL-MIMO) at millimeter-wave (mmWave) and terahertz (THz) bands plays an important role in supporting extreme high beamforming gain as well as ultra-wideband spectrum resources. Unfortunately, accurate wideband XL-MIMO channel estimation suffers from the new challenge called as the near-field beam split effect. Prior works either neglect the acc… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: This paper has been accepted by Science China Information Sciences. Simulation codes will be provided to reproduce the results in this paper: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  42. arXiv:2211.11275  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.SD

    VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

    Authors: Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, Lirong Dai, Daxin Jiang, **yu Li, Furu Wei

    Abstract: Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate different modal information and leverage different resources (e.g., visual-audio pairs, audio-text pairs, unlabeled speech, and unlabeled text) to facilitate speech rep… ▽ More

    Submitted 19 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: 11 pages, Accepted by IEEE Transactions on Multimedia

  43. arXiv:2211.00511  [pdf, other

    eess.AS cs.SD

    A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings

    Authors: Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai

    Abstract: Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task. It was shown that single-channel frame-level diarization with serialized output training (SC-FD-SOT), single-channel word-level diarization with SOT (SC-WD-SOT) and joint training of single-channel target-speaker separation and ASR (SC-TS-ASR) can be explo… ▽ More

    Submitted 1 March, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

  44. arXiv:2210.15324  [pdf, other

    eess.AS cs.SD

    Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning

    Authors: Qiu-Shi Zhu, Long Zhou, Jie Zhang, Shu-Jie Liu, Yu-Chen Hu, Li-Rong Dai

    Abstract: Self-supervised pre-training methods based on contrastive learning or regression tasks can utilize more unlabeled data to improve the performance of automatic speech recognition (ASR). However, the robustness impact of combining the two pre-training tasks and constructing different negative samples for contrastive learning still remains unclear. In this paper, we propose a noise-robust data2vec fo… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  45. arXiv:2210.03730  [pdf, other

    cs.CL eess.AS

    SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

    Authors: Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, Lirong Dai, **yu Li, Furu Wei

    Abstract: The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods. In this paper, we propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder. Leveraging hidden-unit as an interface to align speech and text, we can decomp… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: 14 pages, accepted by EMNLP 2022

  46. arXiv:2209.15329  [pdf, other

    cs.CL cs.AI eess.AS

    SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

    Authors: Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Lirong Dai, **yu Li, Furu Wei

    Abstract: How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation. Specifically, we introduce two alternative discret… ▽ More

    Submitted 15 June, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: We have corrected the errors in the pre-training data for SpeechLM-P Base models, new results are updated

  47. arXiv:2209.07884  [pdf, ps, other

    eess.SY cs.DC

    Workflow-based Fast Data-driven Predictive Control with Disturbance Observer in Cloud-edge Collaborative Architecture

    Authors: Runze Gao, Qiwen Li, Li Dai, Yufeng Zhan, Yuanqing Xia

    Abstract: Data-driven predictive control (DPC) has been studied and used in various scenarios, since it could generate the predicted control sequence only relying on the historical input and output data. Recently, based on cloud computing, data-driven predictive cloud control system (DPCCS) has been proposed with the advantage of sufficient computational resources. However, the existing computation mode of… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    Comments: 58 pages and 23 figures

  48. arXiv:2209.01424  [pdf, ps, other

    eess.SP

    Dynamic Write-Voltage Design and Read-Voltage Optimization for MLC NAND Flash Memory

    Authors: Runbin Cai, Yi Fang, Zhifang Shi, Lin Dai, Guojun Han

    Abstract: To mitigate the impact of noise and interference on multi-level-cell (MLC) flash memory with the use of low-density parity-check (LDPC) codes, we propose a dynamic write-voltage design scheme considering the asymmetric property of raw bit error rate (RBER), which can obtain the optimal write voltage by minimizing a cost function. In order to further improve the decoding performance of flash memory… ▽ More

    Submitted 3 September, 2022; originally announced September 2022.

    Comments: 12 pages, 6 figures, submitted to China Communication

  49. arXiv:2208.06349  [pdf, other

    eess.SP

    Multiple access for near-field communications: SDMA or LDMA?

    Authors: Zidong Wu, Linglong Dai

    Abstract: Spatial division multiple access (SDMA) is essential to improve the spectrum efficiency for multi-user multiple-input multiple-output (MIMO) communications. The classical SDMA for massive MIMO with hybrid precoding heavily relies on the angular orthogonality in the far field to distinguish multiple users at different angles, which fails to fully exploit spatial resources in the distance domain. Wi… ▽ More

    Submitted 26 June, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: Accepted by IEEE JSAC. This paper investigates the concept of location division multiple access (LDMA) to exploit extra spatial resources in distance domain for multiple access, exploring a new possibility to enhance spectrum efficiency. Simulation codes will be provided at: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  50. arXiv:2208.04509  [pdf, other

    eess.SP cs.NI

    Reconfigurable Intelligent Computational Surfaces: When Wave Propagation Control Meets Computing

    Authors: Bo Yang, Xuelin Cao, **dan Xu, Chongwen Huang, George C. Alexandropoulos, Linglong Dai, M'erouane Debbah, H. Vincent Poor, Chau Yuen

    Abstract: The envisioned sixth-generation (6G) of wireless networks will involve an intelligent integration of communications and computing, thereby meeting the urgent demands of diverse applications. To realize the concept of the smart radio environment, reconfigurable intelligent surfaces (RISs) are a promising technology for offering programmable propagation of im**ing electromagnetic signals via exter… ▽ More

    Submitted 3 October, 2022; v1 submitted 8 August, 2022; originally announced August 2022.