Skip to main content

Showing 1–50 of 276 results for author: Guo, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00753  [pdf, other

    eess.AS cs.SD

    FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis

    Authors: Yinlin Guo, Yening Lv, **qiao Dou, Yan Zhang, Yuehai Wang

    Abstract: While recent advances in Text-To-Speech synthesis have yielded remarkable improvements in generating high-quality speech, research on lightweight and fast models is limited. This paper introduces FLY-TTS, a new fast, lightweight and high-quality speech synthesis system based on VITS. Specifically, 1) We replace the decoder with ConvNeXt blocks that generate Fourier spectral coefficients followed b… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted to Interspeech 2024. 5 pages, 1 figure

  2. arXiv:2406.14795  [pdf, other

    cs.RO eess.SY

    Design and Control of a Low-cost Non-backdrivable End-effector Upper Limb Rehabilitation Device

    Authors: Fulan Li, Yunfei Guo, Wenda Xu, Weide Zhang, Fangyun Zhao, Baiyu Wang, Huaguang Du, Chengkun Zhang

    Abstract: This paper presents the development of an upper limb end-effector based rehabilitation device for stroke patients, offering assistance or resistance along any 2-dimensional trajectory during physical therapy. It employs a non-backdrivable ball-screw-driven mechanism for enhanced control accuracy. The control system features three novel algorithms: First, the Implicit Euler velocity control algorit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 15 figures

  3. arXiv:2406.06649  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

    Authors: Kai Liu, Haotong Qin, Yong Guo, Xin Yuan, Linghe Kong, Guihai Chen, Yulun Zhang

    Abstract: Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their ful… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures. The code and models will be available at https://github.com/Kai-Liu001/2DQuant

  4. arXiv:2406.02164  [pdf, other

    cs.IT eess.SP

    Sparse Recovery for Holographic MIMO Channels: Leveraging the Clustered Sparsity

    Authors: Yuqing Guo, Xufeng Guo, Yuanbin Chen, Ying Wang

    Abstract: Envisioned as the next-generation transceiver technology, the holographic multiple-input-multiple-output (HMIMO) garners attention for its superior capabilities of fabricating electromagnetic (EM) waves. However, the densely packed antenna elements significantly increase the dimension of the HMIMO channel matrix, rendering traditional channel estimation methods inefficient. While the dimension cur… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: This manuscript has been submitted to IEEE journal, 5 pages, 3 figures

  5. arXiv:2405.08783  [pdf, other

    eess.IV

    The Develo** Human Connectome Project: A Fast Deep Learning-based Pipeline for Neonatal Cortical Surface Reconstruction

    Authors: Qiang Ma, Kaili Liang, Liu Li, Saga Masui, Yourong Guo, Chiara Nosarti, Emma C. Robinson, Bernhard Kainz, Daniel Rueckert

    Abstract: The Develo** Human Connectome Project (dHCP) aims to explore developmental patterns of the human brain during the perinatal period. An automated processing pipeline has been developed to extract high-quality cortical surfaces from structural brain magnetic resonance (MR) images for the dHCP neonatal dataset. However, the current implementation of the pipeline requires more than 6.5 hours to proc… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  6. arXiv:2405.07682  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation

    Authors: Jianyi Chen, Wei Xue, Xu Tan, Zhen Ye, Qifeng Liu, Yike Guo

    Abstract: Singing Accompaniment Generation (SAG), which generates instrumental music to accompany input vocals, is crucial to develo** human-AI symbiotic art creation systems. The state-of-the-art method, SingSong, utilizes a multi-stage autoregressive (AR) model for SAG, however, this method is extremely slow as it generates semantic and acoustic tokens recursively, and this makes it impossible for real-… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: IJCAI 2024

  7. arXiv:2405.05641  [pdf, other

    eess.SP cs.IT

    Channel Estimation for Holographic MIMO: Wavenumber-Domain Sparsity Inspired Approaches

    Authors: Yuqing Guo, Yuanbin Chen, Ying Wang

    Abstract: This paper investigates the sparse channel estimation for holographic multiple-input multiple-output (HMIMO) systems. Given that the wavenumber-domain representation is based on a series of Fourier harmonics that are in essence a series of orthogonal basis functions, a novel wavenumber-domain sparsifying basis is designed to expose the sparsity inherent in HMIMO channels. Furthermore, by harnessin… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted for publication in IEEE WCL

  8. arXiv:2405.05522  [pdf, other

    eess.SP

    Deep Learning for CSI Feedback: One-Sided Model and Joint Multi-Module Learning Perspectives

    Authors: Yiran Guo, Wei Chen, Feifei Sun, Jiaming Cheng, Michail Matthaiou, Bo Ai

    Abstract: The use of deep learning (DL) for channel state information (CSI) feedback has garnered widespread attention across academia and industry. The mainstream DL architectures, e.g., CsiNet, deploy DL models on the base station (BS) side and the user equipment (UE) side, which are highly coupled and need to be trained jointly. However, two-sided DL models require collaborations between different networ… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  9. arXiv:2405.00734  [pdf, other

    eess.SP cs.AI cs.LG

    EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations

    Authors: Zhenxi Song, Ruihan Qin, Huixia Ren, Zhen Liang, Yi Guo, Min Zhang, Zhiguo Zhang

    Abstract: Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Mani… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

  10. arXiv:2405.00077  [pdf, other

    cs.LG eess.SP

    BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations

    Authors: Kaiqiao Han, Yi Yang, Zijie Huang, Xuan Kan, Yang Yang, Ying Guo, Lifang He, Liang Zhan, Yizhou Sun, Wei Wang, Carl Yang

    Abstract: Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes. However, widely used brain signals such as Blood Oxygen Level Dependent (BOLD) time series generated from functional Magnetic Resonance Imaging (fMRI) often manifest three challenges: (1) missing values, (2) irregular samp… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  11. arXiv:2404.19723  [pdf, other

    eess.AS cs.SD

    Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech

    Authors: Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu

    Abstract: Recent popular decoder-only text-to-speech models are known for their ability of generating natural-sounding speech. However, such models sometimes suffer from word skip** and repeating due to the lack of explicit monotonic alignment constraints. In this paper, we notice from the attention maps that some particular attention heads of the decoder-only model indicate the alignments between speech… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  12. arXiv:2404.18081  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ComposerX: Multi-Agent Symbolic Music Composition with LLMs

    Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

    Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  13. arXiv:2404.16289  [pdf, other

    cs.IT eess.SP

    Deep Joint CSI Feedback and Multiuser Precoding for MIMO OFDM Systems

    Authors: Yiran Guo, Wei Chen, Jialong Xu, Lun Li, Bo Ai

    Abstract: The design of precoding plays a crucial role in achieving a high downlink sum-rate in multiuser multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems. In this correspondence, we propose a deep learning based joint CSI feedback and multiuser precoding method in frequency division duplex systems, aiming at maximizing the downlink sum-rate performance in an e… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  14. arXiv:2404.14946  [pdf, other

    cs.SD cs.CL eess.AS

    StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

    Authors: Sen Liu, Yiwei Guo, Xie Chen, Kai Yu

    Abstract: While acoustic expressiveness has long been studied in expressive text-to-speech (ETTS), the inherent expressiveness in text lacks sufficient attention, especially for ETTS of artistic works. In this paper, we introduce StoryTTS, a highly ETTS dataset that contains rich expressiveness both in acoustic and textual perspective, from the recording of a Mandarin storytelling show. A systematic and com… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by ICASSP 2024

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 11521-11525

  15. arXiv:2404.14700  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    FlashSpeech: Efficient Zero-Shot Speech Synthesis

    Authors: Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue

    Abstract: Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large… ▽ More

    Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Efficient zero-shot speech synthesis

  16. arXiv:2404.14386  [pdf, other

    eess.SY

    Activating the Flexibility in Distribution systems via a Unified Quantification Approach

    Authors: Yilin Wen, Yi Guo, Zechun Hu, Gabriela Hug

    Abstract: Effective quantification of the cost of flexibility and its contribution to the system is essential for activating the flexibility of the numerous distributed energy resources (DERs) in power systems operation. We first propose a unified flexibility cost quantification method for heterogonous DERs based on the adjustment ranges in both power and accumulated energy consumption. Compared with tradit… ▽ More

    Submitted 22 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  17. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  18. arXiv:2404.06079  [pdf, other

    eess.AS cs.AI

    The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

    Authors: Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu

    Abstract: Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challen… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 5 pages, 3 figures. Report of a challenge

  19. arXiv:2404.03253  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

    Authors: Yin Li, Qi Chen, Kai Wang, Meige Li, Li** Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we in… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  20. arXiv:2404.02811  [pdf, other

    cs.IT eess.SP

    Wideband Beamforming for Near-Field Communications with Circular Arrays

    Authors: Yunhui Guo, Yang Zhang, Zhaolin Wang, Yuanwei Liu

    Abstract: The beamforming performance of the uniform circular array (UCA) in near-field wideband communication systems is investigated. Compared to uniform linear array (ULA), UCA exhibits uniform effective array aperture in all directions, thus enabling more users to benefit from near-field communications. In this paper, the unique beam squint effect in near-field wideband UCA systems is comprehensively an… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  21. Multiple Joint Chance Constraints Approximation for Uncertainty Modeling in Dispatch Problems

    Authors: Yilin Wen, Yi Guo, Zechun Hu, Gabriela Hug

    Abstract: Uncertainty modeling has become increasingly important in power system decision-making. The widely-used tractable uncertainty modeling method-chance constraints with Conditional Value at Risk (CVaR) approximation, can be overconservative and even turn an originally feasible problem into an infeasible one. This paper proposes a new approximation method for multiple joint chance constraints (JCCs) t… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  22. arXiv:2404.00729  [pdf, other

    eess.SY cs.LG

    Nonparametric End-to-End Probabilistic Forecasting of Distributed Generation Outputs Considering Missing Data Imputation

    Authors: Minghui Chen, Zichao Meng, Yan** Liu, Longbo Luo, Ye Guo, Kang Wang

    Abstract: In this paper, we introduce a nonparametric end-to-end method for probabilistic forecasting of distributed renewable generation outputs while including missing data imputation. Firstly, we employ a nonparametric probabilistic forecast model utilizing the long short-term memory (LSTM) network to model the probability distributions of distributed renewable generations' outputs. Secondly, we design a… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  23. arXiv:2403.16643  [pdf, other

    eess.IV cs.CV

    Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

    Authors: Qing** Zheng, Ling Zheng, Yuanfan Guo, Ying Li, Songcen Xu, Jiankang Deng, Hang Xu

    Abstract: Artifact-free super-resolution (SR) aims to translate low-resolution images into their high-resolution counterparts with a strict integrity of the original content, eliminating any distortions or synthetic details. While traditional diffusion-based SR techniques have demonstrated remarkable abilities to enhance image detail, they are prone to artifact introduction during iterative procedures. Such… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  24. arXiv:2403.15468  [pdf, other

    eess.SP

    Human Detection in Realistic Through-the-Wall Environments using Raw Radar ADC Data and Parametric Neural Networks

    Authors: Wei Wang, Naike Du, Yuchao Guo, Chao Sun, **gyang Liu, Rencheng Song, Xiuzhu Ye

    Abstract: The radar signal processing algorithm is one of the core components in through-wall radar human detection technology. Traditional algorithms (e.g., DFT and matched filtering) struggle to adaptively handle low signal-to-noise ratio echo signals in challenging and dynamic real-world through-wall application environments, which becomes a major bottleneck in the system. In this paper, we introduce an… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 11pages,13figures

  25. arXiv:2403.05236  [pdf, other

    eess.SY

    Fault Recovery and Transient Stability of Grid-Forming Converters Equipped with Current Saturation

    Authors: Ali Arjomandi-Nezhad, Yifei Guo, Bikash C. Pal, Guangya Yang

    Abstract: When grid-forming (GFM) inverter-based resources (IBRs) experience large grid disturbances (e.g., short-circuit faults), the current limiter may be triggered and GFM IBRs enter the current saturation mode, inducing nonlinear dynamical behaviors and imposing great challenges to the post-disturbance transient angle stability. This paper presents a systematic study to reveal the fault recovery behavi… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 10 pages, 22 figures

  26. Fast, nonlocal and neural: a lightweight high quality solution to image denoising

    Authors: Yu Guo, Axel Davy, Gabriele Facciolo, Jean-Michel Morel, Qiyu **

    Abstract: With the widespread application of convolutional neural networks (CNNs), the traditional model based denoising algorithms are now outperformed. However, CNNs face two problems. First, they are computationally demanding, which makes their deployment especially difficult for mobile terminals. Second, experimental evidence shows that CNNs often over-smooth regular textures present in images, in contr… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 5 pages. This paper was accepted by IEEE Signal Processing Letters on July 1, 2021

    Journal ref: IEEE Signal Processing Letters, 2021, 28:1515-1519

  27. arXiv:2403.03145  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

    Authors: Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng

    Abstract: Audio-Visual Source Localization (AVSL) aims to locate sounding objects within video frames given the paired audio clips. Existing methods predominantly rely on self-supervised contrastive learning of audio-visual correspondence. Without any bounding-box annotations, they struggle to achieve precise localization, especially for small objects, and suffer from blurry boundaries and false positives.… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to NeurIPS2023

  28. arXiv:2403.03095  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

    Authors: Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou

    Abstract: Audio-Visual Source Localization (AVSL) is the task of identifying specific sounding objects in the scene given audio cues. In our work, we focus on semi-supervised AVSL with pseudo-labeling. To address the issues with vanilla hard pseudo-labels including bias accumulation, noise sensitivity, and instability, we propose a novel method named Cross Pseudo-Labeling (XPL), wherein two models learn fro… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted To ICASSP2024

  29. arXiv:2403.00274  [pdf, other

    cs.CV cs.SD eess.AS

    CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

    Authors: Xi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yan

    Abstract: Listening head generation aims to synthesize a non-verbal responsive listener head by modeling the correlation between the speaker and the listener in dynamic conversion.The applications of listener agent generation in virtual interaction have promoted many works achieving the diverse and fine-grained motion generation. However, they can only manipulate motions through simple emotional labels, but… ▽ More

    Submitted 29 March, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  30. arXiv:2402.19085  [pdf, other

    cs.CL cs.AI eess.SY

    Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

    Authors: Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Jiexin Wang, Huimin Chen, Bowen Sun, Ruobing Xie, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e.g.,harmlessness) can diminish performance in others (e.g.,helpfulness). However, exi… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  31. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, **gcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

  32. arXiv:2402.09048  [pdf, other

    eess.SP

    Sensing in Bi-Static ISAC Systems with Clock Asynchronism: A Signal Processing Perspective

    Authors: Kai Wu, Jacopo Pegoraro, Francesca Meneghello, J. Andrew Zhang, Jesus O. Lacruz, Joerg Widmer, Francesco Restuccia, Michele Rossi, Xiao**g Huang, Daqing Zhang, Giuseppe Caire, Y. Jay Guo

    Abstract: Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks a… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 20 pages, 6 figures, 1 table

  33. arXiv:2402.00616  [pdf

    eess.SP

    Dual-Tap Optical-Digital Feedforward Equalization Enabling High-Speed Optical Transmission in IM/DD Systems

    Authors: Yu Guo, Yangbo Wu, Zhao Yang, Lei Xue, Ning Liang, Yang Ren, Zhengrui Tu, Jia Feng, Qunbi Zhuge

    Abstract: Intensity-modulation and direct-detection (IM/DD) transmission is widely adopted for high-speed optical transmission scenarios due to its cost-effectiveness and simplicity. However, as the data rate increases, the fiber chromatic dispersion (CD) would induce a serious power fading effect, and direct detection could generate inter-symbol interference (ISI). Moreover, the ISI becomes more severe wit… ▽ More

    Submitted 1 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 6 pages, 7 gigures, journal

  34. arXiv:2401.16564  [pdf

    eess.SP

    Data and Physics driven Deep Learning Models for Fast MRI Reconstruction: Fundamentals and Methodologies

    Authors: Jiahao Huang, Yinzhe Wu, Fanwen Wang, Yingying Fang, Yang Nan, Cagan Alkan, Lei Xu, Zhifan Gao, Weiwen Wu, Lei Zhu, Zhaolin Chen, Peter Lally, Neal Bangerter, Kawin Setsompop, Yike Guo, Daniel Rueckert, Ge Wang, Guang Yang

    Abstract: Magnetic Resonance Imaging (MRI) is a pivotal clinical diagnostic tool, yet its extended scanning times often compromise patient comfort and image quality, especially in volumetric, temporal and quantitative scans. This review elucidates recent advances in MRI acceleration via data and physics-driven models, leveraging techniques from algorithm unrolling models, enhancement-based models, and plug-… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  35. arXiv:2401.16017  [pdf, other

    eess.SP

    DMCE: Diffusion Model Channel Enhancer for Multi-User Semantic Communication Systems

    Authors: Youcheng Zeng, Xinxin He, Xu Chen, Haonan Tong, Zhaohui Yang, Yijun Guo, Jianjun Hao

    Abstract: To achieve continuous massive data transmission with significantly reduced data payload, the users can adopt semantic communication techniques to compress the redundant information by transmitting semantic features instead. However, current works on semantic communication mainly focus on high compression ratio, neglecting the wireless channel effects including dynamic distortion and multi-user int… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: accepted by IEEE ICC 2024

  36. arXiv:2401.14321  [pdf, other

    eess.AS cs.SD

    VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

    Authors: Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen, Kai Yu

    Abstract: Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot adaptation given a speech prompt. However, such decoder-only TTS models lack monotonic alignment constraints, sometimes leading to hallucination issues such as mispronunciation, word skip** and repeating. To address this limitation,… ▽ More

    Submitted 29 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  37. arXiv:2401.13220  [pdf, other

    eess.IV cs.CV

    Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation

    Authors: Saiyang Na, Yuzhi Guo, Feng Jiang, Hehuan Ma, Junzhou Huang

    Abstract: In the rapidly evolving field of AI research, foundational models like BERT and GPT have significantly advanced language and vision tasks. The advent of pretrain-prompting models such as ChatGPT and Segmentation Anything Model (SAM) has further revolutionized image segmentation. However, their applications in specialized areas, particularly in nuclei segmentation within medical imaging, reveal a k… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  38. arXiv:2401.09119  [pdf, other

    eess.SP

    Anchor-points Assisted Uplink Sensing in Perceptive Mobile Networks

    Authors: Yanmo Hu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

    Abstract: Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and ef… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 14 pages, 12 figures, journal paper

  39. arXiv:2401.09064  [pdf, other

    cs.IT eess.SP

    Performance Bounds and Optimization for CSI-Ratio based Bi-static Doppler Sensing in ISAC Systems

    Authors: Yanmo Hu, Kai Wu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

    Abstract: Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the r… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 14 pages, 15 figures, journal paper

  40. arXiv:2401.06435  [pdf, other

    cs.IT eess.SP

    Swin Transformer-Based CSI Feedback for Massive MIMO

    Authors: Jiaming Cheng, Wei Chen, Jialong Xu, Yiran Guo, Lun Li, Bo Ai

    Abstract: For massive multiple-input multiple-output systems in the frequency division duplex (FDD) mode, accurate downlink channel state information (CSI) is required at the base station (BS). However, the increasing number of transmit antennas aggravates the feedback overhead of CSI. Recently, deep learning (DL) has shown considerable potential to reduce CSI feedback overhead. In this paper, we propose a… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  41. arXiv:2401.03800  [pdf, other

    cs.CV eess.IV

    MvKSR: Multi-view Knowledge-guided Scene Recovery for Hazy and Rainy Degradation

    Authors: Dong Yang, Wenyu Xu, Yuan Gao, Yuxu Lu, **gming Zhang, Yu Guo

    Abstract: High-quality imaging is crucial for ensuring safety supervision and intelligent deployment in fields like transportation and industry. It enables precise and detailed monitoring of operations, facilitating timely detection of potential hazards and efficient management. However, adverse weather conditions, such as atmospheric haziness and precipitation, can have a significant impact on image qualit… ▽ More

    Submitted 8 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  42. arXiv:2401.03396  [pdf

    eess.SP

    A Closed-loop Brain-Machine Interface SoC Featuring a 0.2$μ$J/class Multiplexer Based Neural Network

    Authors: Chao Zhang, Yongxiang Guo, Dawid Sheng, Zhixiong Ma, Chao Sun, Yuwei Zhang, Wenxin Zhao, Fenyan Zhang, Tongfei Wang, Xing Sheng, Milin Zhang

    Abstract: This work presents the first fabricated electrophysiology-optogenetic closed-loop bidirectional brain-machine interface (CL-BBMI) system-on-chip (SoC) with electrical neural signal recording, on-chip sleep staging and optogenetic stimulation. The first multiplexer with static assignment based table lookup solution (MUXnet) for multiplier-free NN processor was proposed. A state-of-the-art average a… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 2 pages, 6 figures. Accepted by IEEE Custom Integrated Circuits Conference (CICC) 2024. The codes for the MUXnet (constructing neural networks using multiplexers instead of multipliers) will be open-sourced after the Journal version of this work is accepted

  43. arXiv:2401.01792  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    CoMoSVC: Consistency Model-based Singing Voice Conversion

    Authors: Yiwen Lu, Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo

    Abstract: The diffusion-based Singing Voice Conversion (SVC) methods have achieved remarkable performances, producing natural audios with high similarity to the target timbre. However, the iterative sampling process results in slow inference speed, and acceleration thus becomes crucial. In this paper, we propose CoMoSVC, a consistency model-based SVC method, which aims to achieve both high-quality generatio… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  44. arXiv:2401.00859  [pdf, ps, other

    eess.IV cs.CV cs.LG

    Federated Multi-View Synthesizing for Metaverse

    Authors: Yiyu Guo, Zhi** Qin, Xiaoming Tao, Geoffrey Ye Li

    Abstract: The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view… ▽ More

    Submitted 18 December, 2023; originally announced January 2024.

  45. arXiv:2401.00153  [pdf, other

    eess.IV

    USFM: A Universal Ultrasound Foundation Model Generalized to Tasks and Organs towards Label Efficient Image Analysis

    Authors: **g Jiao, ** Zhou, Xiaokang Li, Menghua Xia, Yi Huang, Lihong Huang, Na Wang, Xiaofan Zhang, Shichong Zhou, Yuanyuan Wang, Yi Guo

    Abstract: Inadequate generality across different organs and tasks constrains the application of ultrasound (US) image analysis methods in smart healthcare. Building a universal US foundation model holds the potential to address these issues. Nevertheless, the development of such foundational models encounters intrinsic challenges in US analysis, i.e., insufficient databases, low quality, and ineffective fea… ▽ More

    Submitted 2 January, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: Submit to MedIA, 17 pages, 11 figures

  46. arXiv:2312.15424  [pdf, other

    eess.SY

    Integrating Renewable Energy Sources as Reserve Providers: Modeling, Pricing, and Properties

    Authors: Wenli Wu, Ye Guo, Jiantao Shi

    Abstract: In pursuit of carbon neutrality, many countries have adopted renewable portfolio standards to facilitate the integration of renewable energy. However, increasing penetration of renewable energy resources will also pose higher requirements on system flexibility. Allowing renewable themselves to participate in the reserve market could be a viable solution. To this end, this paper proposes an optimal… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: 13 pages, 5 figures

  47. arXiv:2312.14448  [pdf, other

    cs.NI eess.SP

    Quantum-Assisted Joint Caching and Power Allocation for Integrated Satellite-Terrestrial Networks

    Authors: Yu Zhang, Yanmin Gong, Lei Fan, Yu Wang, Zhu Han, Yuanxiong Guo

    Abstract: Low earth orbit (LEO) satellite network can complement terrestrial networks for achieving global wireless coverage and improving delay-sensitive Internet services. This paper proposes an integrated satellite-terrestrial network (ISTN) architecture to provide ground users with seamless and reliable content delivery services. For optimal service provisioning in this architecture, we formulate an opt… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  48. arXiv:2312.11898  [pdf, other

    cs.LG eess.SP

    Short-Term Multi-Horizon Line Loss Rate Forecasting of a Distribution Network Using Attention-GCN-LSTM

    Authors: Jie Liu, Yijia Cao, Yong Li, Yixiu Guo, Wei Deng

    Abstract: Accurately predicting line loss rates is vital for effective line loss management in distribution networks, especially over short-term multi-horizons ranging from one hour to one week. In this study, we propose Attention-GCN-LSTM, a novel method that combines Graph Convolutional Networks (GCN), Long Short-Term Memory (LSTM), and a three-level attention mechanism to address this challenge. By captu… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  49. arXiv:2312.08676  [pdf, other

    cs.SD cs.CL eess.AS

    SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention

    Authors: Junjie Li, Yiwei Guo, Xie Chen, Kai Yu

    Abstract: Zero-shot voice conversion (VC) aims to transfer the source speaker timbre to arbitrary unseen target speaker timbre, while kee** the linguistic content unchanged. Although the voice of generated speech can be controlled by providing the speaker embedding of the target speaker, the speaker similarity still lags behind the ground truth recordings. In this paper, we propose SEF-VC, a speaker embed… ▽ More

    Submitted 30 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 5 pages, 2 figures, accepted to ICASSP 2024

  50. arXiv:2312.08089  [pdf, other

    eess.AS

    Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier

    Authors: Yinlin Guo, Haofan Huang, Xi Chen, He Zhao, Yuehai Wang

    Abstract: With the rapid development of speech synthesis and voice conversion technologies, Audio Deepfake has become a serious threat to the Automatic Speaker Verification (ASV) system. Numerous countermeasures are proposed to detect this type of attack. In this paper, we report our efforts to combine the self-supervised WavLM model and Multi-Fusion Attentive classifier for audio deepfake detection. Our me… ▽ More

    Submitted 9 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP 2024. 5 pages, 1 figure