Skip to main content

Showing 1–50 of 1,495 results for author: Zhang, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  2. arXiv:2406.18026  [pdf

    eess.SY

    USLC: Universal Self-Learning Control via Physical Performance Policy-Optimization Neural Network

    Authors: Yanhui Zhang, Weifang Chen

    Abstract: This study addresses the challenge of achieving real-time Universal Self-Learning Control (USLC) in nonlinear dynamic systems with uncertain models. The proposed control method incorporates a Universal Self-Learning module, which introduces a model-free online executor-evaluator framework to enable controller adaptation in the presence of unknown disturbances. By leveraging a neural network model… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 15 Pages

  3. arXiv:2406.17578  [pdf, other

    eess.IV

    Sparse-view Signal-domain Photoacoustic Tomography Reconstruction Method Based on Neural Representation

    Authors: Bowei Yao, Yi Zeng, Haizhao Dai, Qing Wu, Youshen Xiao, Fei Gao, Yuyao Zhang, **gyi Yu, Xiran Cai

    Abstract: Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.17225  [pdf, other

    eess.IV cs.CV

    Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images

    Authors: Songhan Jiang, Zhengyu Gan, Linghan Cai, Yifeng Wang, Yongbing Zhang

    Abstract: Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tu… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.16148  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

    Authors: Yuwei Zhang, Tong Xia, **g Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia Mascolo

    Abstract: Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide range of healthcare applications, yet is currently under-explored. The main problem for those applications arises from the difficulty in collecting large labeled task-specific data for model development. Generalizable respiratory acoustic foundation models pretrained with unlabeled data would offer appealing… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  6. arXiv:2406.15945  [pdf, other

    eess.SP cs.IT

    Full-Space Wireless Sensing Enabled by Multi-Sector Intelligent Surfaces

    Authors: Yumeng Zhang, Xiaodan Shao, Hongyu Li, Bruno Clerckx, Rui Zhang

    Abstract: The multi-sector intelligent surface (IS), benefiting from a smarter wave manipulation capability, has been shown to enhance channel gain and offer full-space coverage in communications. However, the benefits of multi-sector IS in wireless sensing remain unexplored. This paper introduces the application of multi-sector IS for wireless sensing/localization. Specifically, we propose a new self-sensi… ▽ More

    Submitted 25 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures

  7. arXiv:2406.15696  [pdf

    physics.med-ph eess.SP

    Functional photoacoustic noninvasive Doppler angiography in humans

    Authors: Yang Zhang, Joshua Olick-Gibson, Karteekeya Sastry, Lihong V. Wang

    Abstract: Optical imaging of blood flow yields critical functional insights into the circulatory system, but its clinical implementation has typically been limited to shallow depths (~1 millimeter) due to light scattering in biological tissue. Here, we present photoacoustic noninvasive Doppler angiography (PANDA) for deep blood flow imaging. PANDA synergizes the photoacoustic and Doppler effects to generate… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 38 pages, 7 main figures, 10 supplementary figures

  8. arXiv:2406.15093  [pdf, other

    cs.CR cs.CV eess.IV

    ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

    Authors: Xianlong Wang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Leo Yu Zhang, Peng Xu, Wei Wan, Hai **

    Abstract: Clean-label indiscriminate poisoning attacks add invisible perturbations to correctly labeled training images, thus dramatically reducing the generalization capability of the victim models. Recently, some defense mechanisms have been proposed such as adversarial training, image transformation techniques, and image purification. However, these schemes are either susceptible to adaptive attacks, bui… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by ESORICS 2024

  9. arXiv:2406.14977  [pdf, other

    cs.AI eess.IV

    Trustworthy Enhanced Multi-view Multi-modal Alzheimer's Disease Prediction with Brain-wide Imaging Transcriptomics Data

    Authors: Shan Cong, Zhoujie Fan, Hongwei Liu, Yinghan Zhang, Xin Wang, Haoran Luo, Xiaohui Yao

    Abstract: Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  10. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, **gyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  11. arXiv:2406.14176  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection

    Authors: Kyungbok Lee, You Zhang, Zhiyao Duan

    Abstract: This paper addresses the challenge of develo** a robust audio-visual deepfake detection model. In practical use cases, new generation algorithms are continually emerging, and these algorithms are not encountered during the development of detection methods. This calls for the generalization ability of the method. Additionally, to ensure the credibility of detection methods, it is beneficial for t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  12. arXiv:2406.13979  [pdf, other

    eess.IV cs.CV cs.LG

    Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning

    Authors: Yupei Zhang, Xiaofei Wang, Fangliangzi Meng, ** Tang, Chao Li

    Abstract: Multi-modal learning plays a crucial role in cancer diagnosis and prognosis. Current deep learning based multi-modal approaches are often limited by their abilities to model the complex correlations between genomics and histology data, addressing the intrinsic complexity of tumour ecosystem where both tumour and microenvironment contribute to malignancy. We propose a biologically interpretative an… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  13. arXiv:2406.13413  [pdf, other

    eess.IV cs.CV

    Recurrent Inference Machine for Medical Image Registration

    Authors: Yi Zhang, Yidong Zhao, Hui Xue, Peter Kellman, Stefan Klein, Qian Tao

    Abstract: Image registration is essential for medical image applications where alignment of voxels across multiple images is needed for qualitative or quantitative analysis. With recent advancements in deep neural networks and parallel computing, deep learning-based medical image registration methods become competitive with their flexible modelling and fast inference capabilities. However, compared to tradi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Preprint

  14. arXiv:2406.13335  [pdf, other

    cs.NI eess.SP

    AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Optimizations

    Authors: Xuelin Cao, Bo Yang, Kaining Wang, Xinghua Li, Zhiwen Yu, Chau Yuen, Yan Zhang, Zhu Han

    Abstract: With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimiz… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  15. arXiv:2406.12646  [pdf, other

    eess.IV cs.AI cs.CV

    An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

    Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Ya**g Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

    Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to MICCAI-2024

  16. arXiv:2406.12456  [pdf, other

    eess.IV cs.CV

    Deep-learning-based groupwise registration for motion correction of cardiac $T_1$ map**

    Authors: Yi Zhang, Yidong Zhao, Lu Huang, Liming Xia, Qian Tao

    Abstract: Quantitative $T_1$ map** by MRI is an increasingly important tool for clinical assessment of cardiovascular diseases. The cardiac $T_1$ map is derived by fitting a known signal model to a series of baseline images, while the quality of this map can be deteriorated by involuntary respiratory and cardiac motion. To correct motion, a template image is often needed to register all baseline images, b… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024. Contents may slightly differ from the camera-ready version

  17. arXiv:2406.12426  [pdf, other

    cs.IT eess.SP

    Multi-Active-IRS-Assisted Cooperative Sensing: Cramér-Rao Bound and Joint Beamforming Design

    Authors: Yuan Fang, Xianghao Yu, Jie Xu, Ying-Jun Angela Zhang

    Abstract: This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2404.13536

  18. arXiv:2406.12186  [pdf, ps, other

    eess.IV cs.CV

    Unlocking the Potential of Early Epochs: Uncertainty-aware CT Metal Artifact Reduction

    Authors: Xinquan Yang, Guanqun Zhou, Wei Sun, Youjian Zhang, Zhongya Wang, Jiahui He, Zhicheng Zhang

    Abstract: In computed tomography (CT), the presence of metallic implants in patients often leads to disruptive artifacts in the reconstructed images, hindering accurate diagnosis. Recently, a large amount of supervised deep learning-based approaches have been proposed for metal artifact reduction (MAR). However, these methods neglect the influence of initial training weights. In this paper, we have discover… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  19. arXiv:2406.11446  [pdf, other

    eess.SP

    Approximate Angular Domain Expression for Near-Field XL-MIMO Channel

    Authors: Hongbo Xing, Yuxiang Zhang, Jianhua Zhang, Huixin Xu, Guangyi Liu, Qixing Wang

    Abstract: As Extremely Large-Scale Multiple-Input-Multiple-Output (XL-MIMO) technology advances and frequency band rises, the near-field effects in communication are intensifying. A concise and accurate near-field XL-MIMO channel model serves as the cornerstone for investigating the near-field effects. However, existing angular domain XL-MIMO channel models under near-field conditions require non-closed-for… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  20. arXiv:2406.10434  [pdf, other

    eess.SY

    Risk-Aware Value-Oriented Net Demand Forecasting for Virtual Power Plants

    Authors: Yufan Zhang, Jiajun Han, Yuanyuan Shi

    Abstract: This paper develops a risk-aware net demand forecasting product for virtual power plants, which helps reduce the risk of high operation costs. At the training phase, a bilevel program for parameter estimation is formulated, where the upper level optimizes over the forecast model parameter to minimize the conditional value-at-risk (a risk metric) of operation costs. The lower level solves the opera… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted to The 56th North American Power Symposium (NAPS 2024)

  21. arXiv:2406.10361  [pdf, other

    eess.IV

    On Efficient Neural Network Architectures for Image Compression

    Authors: Yichi Zhang, Zhihao Duan, Fengqing Zhu

    Abstract: Recent advances in learning-based image compression typically come at the cost of high complexity. Designing computationally efficient architectures remains an open challenge. In this paper, we empirically investigate the impact of different network designs in terms of rate-distortion performance and computational complexity. Our experiments involve testing various transforms, including convolutio… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 2024 IEEE International Conference on Image Processing (ICIP2024)

  22. arXiv:2406.10284  [pdf, other

    cs.CL cs.SD eess.AS

    Improving child speech recognition with augmented child-like speech

    Authors: Yuanyuan Zhang, Zhengjun Yue, Tanvina Patel, Odette Scharenborg

    Abstract: State-of-the-art ASRs show suboptimal performance for child speech. The scarcity of child speech limits the development of child speech recognition (CSR). Therefore, we studied child-to-child voice conversion (VC) from existing child speakers in the dataset and additional (new) child speakers via monolingual and cross-lingual (Dutch-to-German) VC, respectively. The results showed that cross-lingua… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure Accepted to INTERSPEECH 2024

  23. arXiv:2406.09989  [pdf, other

    q-bio.NC eess.SY

    Suppressing seizure via optimal electrical stimulation to the hub of epileptic brain network

    Authors: Zhichao Liang, Guanyi Zhao, Yinuo Zhang, Weiting Sun, **gzhe Lin, Jialin Wang, Quanying Liu

    Abstract: The electrical stimulation to the seizure onset zone (SOZ) serves as an efficient approach to seizure suppression. Recently, seizure dynamics have gained widespread attendance in its network propagation mechanisms. Compared with the direct stimulation to SOZ, other brain network-level approaches that can effectively suppress epileptic seizures remain under-explored. In this study, we introduce a p… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  24. arXiv:2406.09167  [pdf, other

    cs.SD eess.AS

    Vision Transformer Segmentation for Visual Bird Sound Denoising

    Authors: Sahil Kumar, Jialu Li, Youshan Zhang

    Abstract: Audio denoising, especially in the context of bird sounds, remains a challenging task due to persistent residual noise. Traditional and deep learning methods often struggle with artificial or low-frequency noise. In this work, we propose ViTVS, a novel approach that leverages the power of the vision transformer (ViT) architecture. ViTVS adeptly combines segmentation techniques to disentangle clean… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  25. arXiv:2406.09161  [pdf, other

    cs.SD eess.AS

    Complex Image-Generative Diffusion Transformer for Audio Denoising

    Authors: Junhui Li, Pu Wang, Jialu Li, Youshan Zhang

    Abstract: The audio denoising technique has captured widespread attention in the deep neural network field. Recently, the audio denoising problem has been converted into an image generation task, and deep learning-based approaches have been applied to tackle this problem. However, its performance is still limited, leaving room for further improvement. In order to enhance audio denoising performance, this pa… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  26. arXiv:2406.09154  [pdf, other

    cs.SD cs.CL eess.AS

    Diffusion Gaussian Mixture Audio Denoise

    Authors: Pu Wang, Junhui Li, Jialu Li, Liangdong Guo, Youshan Zhang

    Abstract: Recent diffusion models have achieved promising performances in audio-denoising tasks. The unique property of the reverse process could recover clean signals. However, the distribution of real-world noises does not comply with a single Gaussian distribution and is even unknown. The sampling of Gaussian noise conditions limits its application scenarios. To overcome these challenges, we propose a Di… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  27. arXiv:2406.08928  [pdf, other

    cs.CV eess.IV

    Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer

    Authors: Guodong Sun, Junjie Liu, Mingxuan Liu, Moyun Liu, Yang Zhang

    Abstract: Self-supervised monocular depth estimation aims to infer depth information without relying on labeled data. However, the lack of labeled information poses a significant challenge to the model's representation, limiting its ability to capture the intricate details of the scene accurately. Prior information can potentially mitigate this issue, enhancing the model's understanding of scene structure a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 28 pages, 12 figures

  28. arXiv:2406.08806  [pdf, ps, other

    eess.SY

    Adaptive Cooperative Streaming of Holographic Video Over Wireless Networks: A Proximal Policy Optimization Solution

    Authors: Wanli Wen, Ji** Yan, Yulu Zhang, Zhen Huang, Liang Liang, Yunjian Jia

    Abstract: Adapting holographic video streaming to fluctuating wireless channels is essential to maintain consistent and satisfactory Quality of Experience (QoE) for users, which, however, is a challenging task due to the dynamic and uncertain characteristics of wireless networks. To address this issue, we propose a holographic video cooperative streaming framework designed for a generic wireless network in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted for publication in IEEE Wireless Communications Letters

  29. arXiv:2406.08380  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Unsupervised Speech Recognition Without Pronunciation Models

    Authors: Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of develo** ASR systems without paired speech and text corpora by pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  30. arXiv:2406.08081  [pdf

    eess.SP

    CLDTA: Contrastive Learning based on Diagonal Transformer Autoencoder for Cross-Dataset EEG Emotion Recognition

    Authors: Yuan Liao, Yuhong Zhang, Shenghuan Wang, Xiruo Zhang, Yiling Zhang, Wei Chen, Yuzhe Gu, Liya Huang

    Abstract: Recent advances in non-invasive EEG technology have broadened its application in emotion recognition, yielding a multitude of related datasets. Yet, deep learning models struggle to generalize across these datasets due to variations in acquisition equipment and emotional stimulus materials. To address the pressing need for a universal model that fluidly accommodates diverse EEG dataset formats and… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  31. arXiv:2406.07773  [pdf

    eess.IV

    Development of Focused X-ray Luminescence Compute Tomography Imaging

    Authors: Yile Fang, Yibing Zhang, Changqing Li

    Abstract: X-ray luminescence is produced when contrast agents absorb energy from X-ray photons and release a portion of that energy by emitting photons in the visible and near-infrared range. X-ray luminescence computed tomography (XLCT) was introduced in the past decade as a hybrid molecular imaging modality combining the merits of both X-ray imaging (high spatial resolution) and optical imaging (high sens… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  32. arXiv:2406.07135  [pdf, other

    cs.IT eess.SP

    Smart Wireless Environment Enhanced Telecommunications: A Network Stabilisation Paradigm for Mobile Operators

    Authors: Yangyishi Zhang, Khethiwe Mhlope, Aaron Walker, Fraser Burton

    Abstract: Due to the uncontrolled and complex real-life radio propagation environments, Claude Shannon's information theory of communications describes fundamental limits to state-of-the-art 5G radio access network (RAN) capacity, with respect to fixed radio resource usage. Fortunately, recent research has found that a holographic metasurface-based new physical layer architecture may hold the key to overcom… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  33. arXiv:2406.06649  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

    Authors: Kai Liu, Haotong Qin, Yong Guo, Xin Yuan, Linghe Kong, Guihai Chen, Yulun Zhang

    Abstract: Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their ful… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures. The code and models will be available at https://github.com/Kai-Liu001/2DQuant

  34. arXiv:2406.06295  [pdf, other

    cs.SD eess.AS

    Zero-Shot Audio Captioning Using Soft and Hard Prompts

    Authors: Yiming Zhang, Xuenan Xu, Ruoyi Du, Haohe Liu, Yuan Dong, Zheng-Hua Tan, Wenwu Wang, Zhanyu Ma

    Abstract: In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these model… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  35. arXiv:2406.05672  [pdf, other

    eess.AS

    Text-aware and Context-aware Expressive Audiobook Speech Synthesis

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Yongmao Zhang, Wenjie Tian, Lei Xie

    Abstract: Recent advances in text-to-speech have significantly improved the expressiveness of synthetic speech. However, a major challenge remains in generating speech that captures the diverse styles exhibited by professional narrators in audiobooks without relying on manually labeled data or reference speech. To address this problem, we propose a text-aware and context-aware(TACA) style modeling approach… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  36. arXiv:2406.04840  [pdf, other

    cs.SD eess.AS

    TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

    Authors: Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen

    Abstract: Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: acceped by interspeech 2024

  37. arXiv:2406.04762  [pdf, other

    eess.SP

    Holographic Intelligence Surface Assisted Integrated Sensing and Communication

    Authors: Zhuoyang Liu, Yuchen Zhang, Haiyang Zhang, Feng Xu, Yonina C. Eldar

    Abstract: Traditional discrete-array-based systems fail to exploit interactions between closely spaced antennas, resulting in inadequate utilization of the aperture resource. In this paper, we propose a holographic intelligence surface (HIS) assisted integrated sensing and communication (HISAC) system, wherein both the transmitter and receiver are fabricated using a continuous-aperture array. A continuous-d… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  38. arXiv:2406.03961  [pdf, ps, other

    eess.IV cs.CV

    LDM-RSIC: Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression

    Authors: Junhui Li, Jutao Li, Xingsong Hou, Huake Wang, Yutao Zhang, Yujie Dun, Wenke Sun

    Abstract: Deep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  39. arXiv:2406.03144  [pdf, other

    eess.SP cs.LG

    A Combination Model for Time Series Prediction using LSTM via Extracting Dynamic Features Based on Spatial Smoothing and Sequential General Variational Mode Decomposition

    Authors: Jianyu Liu, Wei Chen, Yong Zhang, Zhenfeng Chen, Bin Wan, **wei Hu

    Abstract: In order to solve the problems such as difficult to extract effective features and low accuracy of sales volume prediction caused by complex relationships such as market sales volume in time series prediction, we proposed a time series prediction method of market sales volume based on Sequential General VMD and spatial smoothing Long short-term memory neural network (SS-LSTM) combination model. Fi… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  40. arXiv:2406.02975  [pdf, other

    eess.SP

    A Shared-Aperture Dual-Band sub-6 GHz and mmWave Reconfigurable Intelligent Surface With Independent Operation

    Authors: Junhui Rao, Yujie Zhang, Shiwen Tang, Zan Li, Zhaoyang Ming, Jichen Zhang, Chi Yuk Chiu, Ross Murch

    Abstract: A novel dual-band reconfigurable intelligent surface (DBI-RIS) design that combines the functionalities of millimeter-wave (mmWave) and sub-6 GHz bands within a single aperture is proposed. This design aims to bridge the gap between current single-band reconfigurable intelligent surfaces (RISs) and wireless systems utilizing sub-6 GHz and mmWave bands that require RIS with independently reconfigur… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  41. arXiv:2406.02914  [pdf

    cs.CV eess.IV

    A Self-Supervised Denoising Strategy for Underwater Acoustic Camera Imageries

    Authors: Xiaoteng Zhou, Katsunori Mizuno, Yilong Zhang

    Abstract: In low-visibility marine environments characterized by turbidity and darkness, acoustic cameras serve as visual sensors capable of generating high-resolution 2D sonar images. However, acoustic camera images are interfered with by complex noise and are difficult to be directly ingested by downstream visual algorithms. This paper introduces a novel strategy for denoising acoustic camera images using… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 8 pages

  42. arXiv:2406.02438  [pdf, other

    eess.AS cs.MM cs.SD

    CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

    Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, **g Guo, Tomoki Toda, Zhiyao Duan

    Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  43. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  44. arXiv:2406.02247  [pdf, other

    physics.ins-det eess.SY

    A Study of the Latest Updates of the Readout System for the Hybird-Pixel Detector at HEPS

    Authors: Hangxu Li, Jie Zhang, Wei Wei, Zhenjie Li, Xiaolu Ji, Yan Zhang, Xuanzheng Yang, Shuihan Zhang, Xueke Ma, Peng Liu, Zheng Wang, Yuanbai Chen

    Abstract: The High Energy Photon Source (HEPS) represents a fourth-generation light source. This facility has made unprecedented advancements in accelerator technology, necessitating the development of new detectors to satisfy physical requirements such as single-photon resolution, large dynamic range, and high frame rates. Since 2016, the Institute of High Energy Physics has introduced the first user-exper… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  45. arXiv:2406.01850  [pdf, other

    eess.SP

    Cell-free massive MIMO Channels in an Urban Environment -- Measurements and Channel Statistics

    Authors: Yuning Zhang, Thomas Choi, Zihang Cheng, Jorge Gomez-Ponce, Issei Kanno, Masaaki Ito, Andreas F. Molisch

    Abstract: Cell-free massive MIMO (CF-mMIMO), where each user equipment (UE) is connected to multiple access points (APs), is emerging as an important component for 5G and 6G cellular systems. Accurate channel models based on measurements are required to optimize their design and deployment. This paper presents an extensive measurement campaign for CF-mMIMO in an urban environment. A new "virtual AP" techniq… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE TWC

  46. arXiv:2406.01193  [pdf, other

    eess.SY

    Programmable Multi-input Buck-Boost Converter for Photovoltaics Arrays

    Authors: Zhongting Tang, Yi Zhang, Pooya Davari

    Abstract: This paper proposes a programmable multi-input buck-boost structure method, which can enhance the operation tolerance for the PV array under extremely harsh climatic conditions. The proposed structure based on a traditional two switches buck-boost converter can connect PV panels in parallel and cascade flexibly, and also enable the individual operation of each PV panel. The active switches can be… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This work has been accepted by the conference proceedings of IPEMC 2024-ECCE Asia

  47. arXiv:2406.00956  [pdf, other

    cs.CV cs.LG eess.IV

    Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

    Authors: Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang

    Abstract: The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Project Link: https://sam-auxol.github.io/AuxOL/

  48. arXiv:2406.00690  [pdf, other

    eess.SP

    Electromagnetic Wave Property Inspired Radio Environment Knowledge Construction and AI-based Verification for 6G Digital Twin Channel

    Authors: Jialin Wang, Jianhua Zhang, Yutong Sun, Yuxiang Zhang, Tao Jiang, Liang Xia

    Abstract: As the underlying foundation of a digital twin network (DTN), a digital twin channel (DTC) can accurately depict the process of radio propagation in the air interface to support the DTN-based 6G wireless network. Since radio propagation is affected by the environment, constructing the relationship between the environment and radio wave propagation is the key to improving the accuracy of DTC, and t… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  49. arXiv:2406.00449  [pdf, other

    eess.IV cs.CV

    Dual Hyperspectral Mamba for Efficient Spectral Compressive Imaging

    Authors: Jiahua Dong, Hui Yin, Hongliu Li, Wenbo Li, Yulun Zhang, Salman Khan, Fahad Shahbaz Khan

    Abstract: Deep unfolding methods have made impressive progress in restoring 3D hyperspectral images (HSIs) from 2D measurements through convolution neural networks or Transformers in spectral compressive imaging. However, they cannot efficiently capture long-range dependencies using global receptive fields, which significantly limits their performance in HSI reconstruction. Moreover, these methods may suffe… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures

  50. arXiv:2406.00329  [pdf, other

    eess.IV cs.CV cs.LG

    Whole Heart 3D+T Representation Learning Through Sparse 2D Cardiac MR Images

    Authors: Yundi Zhang, Chen Chen, Suprosanna Shit, Sophie Starck, Daniel Rueckert, Jiazhen Pan

    Abstract: Cardiac Magnetic Resonance (CMR) imaging serves as the gold-standard for evaluating cardiac morphology and function. Typically, a multi-view CMR stack, covering short-axis (SA) and 2/3/4-chamber long-axis (LA) views, is acquired for a thorough cardiac assessment. However, efficiently streamlining the complex, high-dimensional 3D+T CMR data and distilling compact, coherent representation remains a… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.