Skip to main content

Showing 1–50 of 249 results for author: Gao, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.00949  [pdf, ps, other

    cs.CV eess.IV

    SpectralKAN: Kolmogorov-Arnold Network for Hyperspectral Images Change Detection

    Authors: Yanheng Wang, Xiaohan Yu, Yongsheng Gao, Jianjun Sha, Jian Wang, Lianru Gao, Yonggang Zhang, Xianhui Rong

    Abstract: It has been verified that deep learning methods, including convolutional neural networks (CNNs), graph neural networks (GNNs), and transformers, can accurately extract features from hyperspectral images (HSIs). These algorithms perform exceptionally well on HSIs change detection (HSIs-CD). However, the downside of these impressive results is the enormous number of parameters, FLOPs, GPU memory, tr… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.18067  [pdf, other

    cs.CL eess.AS

    Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification

    Authors: Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: The diverse nature of dialects presents challenges for models trained on specific linguistic patterns, rendering them susceptible to errors when confronted with unseen or out-of-distribution (OOD) data. This study introduces a novel margin-enhanced joint energy model (MEJEM) tailored specifically for OOD detection in dialects. By integrating a generative model and the energy margin loss, our appro… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.18065  [pdf, other

    eess.AS cs.SD

    On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations

    Authors: Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: For speech classification tasks, deep learning models often achieve high accuracy but exhibit shortcomings in calibration, manifesting as classifiers exhibiting overconfidence. The significance of calibration lies in its critical role in guaranteeing the reliability of decision-making within deep learning systems. This study explores the effectiveness of Energy-Based Models in calibrating confiden… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, **gyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  5. arXiv:2406.13268  [pdf, other

    eess.AS cs.SD

    CEC: A Noisy Label Detection Method for Speaker Recognition

    Authors: Yao Shen, Yingying Gao, Yaqian Hao, Chenguang Hu, Fulin Zhang, Junlan Feng, Shilei Zhang

    Abstract: Noisy labels are inevitable, even in well-annotated datasets. The detection of noisy labels is of significant importance to enhance the robustness of speaker recognition models. In this paper, we propose a novel noisy label detection approach based on two new statistical metrics: Continuous Inconsistent Counting (CIC) and Total Inconsistent Counting (TIC). These metrics are calculated through Cros… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: interspeech 2024

  6. arXiv:2406.13145  [pdf, other

    eess.SY cs.LG

    Constructing and Evaluating Digital Twins: An Intelligent Framework for DT Development

    Authors: Longfei Ma, Nan Cheng, Xiucheng Wang, Jiong Chen, Yinjun Gao, Dongxiao Zhang, Jun-Jie Zhang

    Abstract: The development of Digital Twins (DTs) represents a transformative advance for simulating and optimizing complex systems in a controlled digital space. Despite their potential, the challenge of constructing DTs that accurately replicate and predict the dynamics of real-world systems remains substantial. This paper introduces an intelligent framework for the construction and evaluation of DTs, spec… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2406.12300  [pdf

    eess.IV cs.CV q-bio.NC

    IR2QSM: Quantitative Susceptibility Map** via Deep Neural Networks with Iterative Reverse Concatenations and Recurrent Modules

    Authors: Min Li, Chen Chen, Zhuang Xiong, Ying Liu, Pengfei Rong, Shanshan Shan, Feng Liu, Hongfu Sun, Yang Gao

    Abstract: Quantitative susceptibility map** (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-bas… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 9 figures

  8. arXiv:2406.10856  [pdf, other

    cs.NI eess.SY

    LEO Satellite Networks Assisted Geo-distributed Data Processing

    Authors: Zhiyuan Zhao, Zhe Chen, Zheng Lin, Wenjun Zhu, Kun Qiu, Chaoqun You, Yue Gao

    Abstract: Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective soluti… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 6 pages, 5 figures

  9. arXiv:2406.09931  [pdf, other

    eess.IV cs.CV cs.LG

    SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

    Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, ** Fan, Changmiao Wang, Yu Gao, Gang Yu

    Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures

  10. arXiv:2406.09444  [pdf, other

    eess.AS cs.CL cs.SD

    GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model

    Authors: Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

    Abstract: Pre-trained speech language models such as HuBERT and WavLM leverage unlabeled speech data for self-supervised learning and offer powerful representations for numerous downstream tasks. Despite the success of these models, their high requirements for memory and computing resource hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowled… ▽ More

    Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.13418

  11. arXiv:2406.07801  [pdf, other

    cs.CL cs.SD eess.AS

    PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models

    Authors: Runyan Yang, Huibao Yang, Xiqing Zhang, Tiantian Ye, Ying Liu, Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

    Abstract: Recently, there have been attempts to integrate various speech processing tasks into a unified model. However, few previous works directly demonstrated that joint optimization of diverse tasks in multitask speech models has positive influence on the performance of individual tasks. In this paper we present a multitask speech model -- PolySpeech, which supports speech recognition, speech synthesis,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures

  12. arXiv:2406.05692  [pdf, other

    cs.SD cs.AI eess.AS

    SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion

    Authors: Bingsong Bai, Feng** Wang, Yingming Gao, Ya Li

    Abstract: Diffusion-based singing voice conversion (SVC) models have shown better synthesis quality compared to traditional methods. However, in cross-domain SVC scenarios, where there is a significant disparity in pitch between the source and target voice domains, the models tend to generate audios with hoarseness, posing challenges in achieving high-quality vocal outputs. Therefore, in this paper, we prop… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  13. arXiv:2406.04737  [pdf, other

    eess.SP eess.SY

    Fast-Fading Channel and Power Optimization of the Magnetic Inductive Cellular Network

    Authors: Honglei Ma, Erwu Liu, Zhijun Fang, Rui Wang, Yongbin Gao, Wenjun Yu, Dongming Zhang

    Abstract: The cellular network of magnetic Induction (MI) communication holds promise in long-distance underground environments. In the traditional MI communication, there is no fast-fading channel since the MI channel is treated as a quasi-static channel. However, for the vehicle (mobile) MI (VMI) communication, the unpredictable antenna vibration brings the remarkable fast-fading. As such fast-fading cann… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE TWC for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  14. arXiv:2406.04149  [pdf

    eess.IV cs.AI

    Characterizing segregation in blast rock piles a deep-learning approach leveraging aerial image analysis

    Authors: Chengeng Liu, Sihong Liu, Chaomin Shen, Yupeng Gao, Yuxuan Liu

    Abstract: Blasted rock material serves a critical role in various engineering applications, yet the phenomenon of segregation-where particle sizes vary significantly along the gradient of a quarry pile-presents challenges for optimizing quarry material storage and handling. This study introduces an advanced image analysis methodology to characterize such segregation of rock fragments. The accurate delineati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  15. arXiv:2406.03714  [pdf, other

    cs.SD eess.AS

    Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

    Authors: **long Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent prompt-based text-to-speech (TTS) models can clone an unseen speaker using only a short speech prompt. They leverage a strong in-context ability to mimic the speech prompts, including speaker style, prosody, and emotion. Therefore, the selection of a speech prompt greatly influences the generated speech, akin to the importance of a prompt in large language models (LLMs). However, current pr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  16. arXiv:2406.03706  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

    Authors: **long Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

    Abstract: Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we intr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  17. arXiv:2406.00540  [pdf, ps, other

    eess.SY

    Optimal Transmission Power Scheduling for Networked Control System under DoS Attack

    Authors: Siyi Wang, Yulong Gao, Sandra Hirche

    Abstract: Designing networked control systems that are reliable and resilient against adversarial threats, is essential for ensuring the security of cyber-physical systems. This paper addresses the communication-control co-design problem for networked control systems under denial-of-service (DoS) attacks. In the wireless channel, a transmission power scheduler periodically determines the power level for sen… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  18. arXiv:2405.15705  [pdf, other

    cs.AR eess.SY

    Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates

    Authors: **bo Peng, Zhe Chen, Zheng Lin, Haoxuan Yuan, Zihan Fang, Lingzhong Bao, Zihang Song, Ying Li, **g Ren, Yue Gao

    Abstract: Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 12 pages, 9 figures

  19. arXiv:2405.15542  [pdf, other

    cs.NI cs.DC cs.LG eess.SP

    SATSense: Multi-Satellite Collaborative Framework for Spectrum Sensing

    Authors: Haoxuan Yuan, Zhe Chen, Zheng Lin, **bo Peng, Zihan Fang, Yuhang Zhong, Zihang Song, Yue Gao

    Abstract: Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential.… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 13 pages, 16 figures

  20. arXiv:2405.12652  [pdf, other

    cs.NI eess.SP

    Edge Information Hub-Empowered 6G NTN: Latency-Oriented Resource Orchestration and Configuration

    Authors: Yueshan Lin, Wei Feng, Yunfei Chen, Ning Ge, Zhiyong Feng, Yue Gao

    Abstract: Quick response to disasters is crucial for saving lives and reducing loss. This requires low-latency uploading of situation information to the remote command center. Since terrestrial infrastructures are often damaged in disaster areas, non-terrestrial networks (NTNs) are preferable to provide network coverage, and mobile edge computing (MEC) could be integrated to improve the latency performance.… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  21. arXiv:2405.03956  [pdf, other

    cs.SD eess.AS

    Adaptive Speech Emotion Representation Learning Based On Dynamic Graph

    Authors: Yingxue Gao, Huan Zhao, Zixing Zhang

    Abstract: Graph representation learning has become a hot research topic due to its powerful nonlinear fitting capability in extracting representative node embeddings. However, for sequential data such as speech signals, most traditional methods merely focus on the static graph created within a sequence, and largely overlook the intrinsic evolving patterns of these data. This may reduce the efficiency of gra… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: published at ICASSP 2024

  22. arXiv:2404.19201  [pdf, other

    eess.IV cs.CV cs.RO physics.optics

    Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

    Authors: Yao Gao, Qi Jiang, Shaohua Gao, Lei Sun, Kailun Yang, Kaiwei Wang

    Abstract: The popularity of mobile vision creates a demand for advanced compact computational imaging systems, which call for the development of both a lightweight optical system and an effective image reconstruction model. Recently, joint design pipelines come to the research forefront, where the two significant components are simultaneously optimized via data-driven learning to realize the optimal system… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/wumengshenyou/GSO

  23. arXiv:2404.11881  [pdf, other

    cs.IT eess.SP

    Joint Transmitter and Receiver Design for Movable Antenna Enhanced Multicast Communications

    Authors: Ying Gao, Qingqing Wu, Wen Chen

    Abstract: Movable antenna (MA) is an emerging technology that utilizes localized antenna movement to pursue better channel conditions for enhancing communication performance. In this paper, we study the MA-enhanced multicast transmission from a base station equipped with multiple MAs to multiple groups of single-MA users. Our goal is to maximize the minimum weighted signal-to-interference-plus-noise ratio (… ▽ More

    Submitted 9 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 13 pages, 9 figures, submitted to IEEE journal for possible publication

  24. arXiv:2404.09436  [pdf

    physics.med-ph eess.IV

    Image Reconstruction with B0 Inhomogeneity using an Interpretable Deep Unrolled Network on an Open-bore MRI-Linac

    Authors: Shanshan Shan, Yang Gao, David E. J. Waddington, Hongli Chen, Brendan Whelan, Paul Z. Y. Liu, Yaohui Wang, Chunyi Liu, Hong** Gan, Mingyuan Gao, Feng Liu

    Abstract: MRI-Linac systems require fast image reconstruction with high geometric fidelity to localize and track tumours for radiotherapy treatments. However, B0 field inhomogeneity distortions and slow MR acquisition potentially limit the quality of the image guidance and tumour treatments. In this study, we develop an interpretable unrolled network, referred to as RebinNet, to reconstruct distortion-free… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  25. arXiv:2404.09140  [pdf, other

    cs.LG cs.IT eess.SP

    RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusion

    Authors: Guoxuan Chi, Zheng Yang, Chenshu Wu, **gao Xu, Yuchong Gao, Yunhao Liu, Tony Xiao Han

    Abstract: Along with AIGC shines in CV and NLP, its potential in the wireless domain has also emerged in recent years. Yet, existing RF-oriented generative solutions are ill-suited for generating high-quality, time-series RF data due to limited representation capabilities. In this work, inspired by the stellar achievements of the diffusion model in CV and NLP, we adapt it to the RF domain and propose RF-Dif… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Accepted by MobiCom 2024

    ACM Class: I.2.0

  26. Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

    Authors: Siyuan Shen, Yu Gao, Feng Liu, Hanyang Wang, Aimin Zhou

    Abstract: The mainstream paradigm of speech emotion recognition (SER) is identifying the single emotion label of the entire utterance. This line of works neglect the emotion dynamics at fine temporal granularity and mostly fail to leverage linguistic information of speech signal explicitly. In this paper, we propose Emotion Neural Transducer for fine-grained speech emotion recognition with automatic speech… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted by 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  27. arXiv:2403.14070  [pdf

    eess.IV cs.CV physics.med-ph

    QSMDiff: Unsupervised 3D Diffusion Models for Quantitative Susceptibility Map**

    Authors: Zhuang Xiong, Wei Jiang, Yang Gao, Feng Liu, Hongfu Sun

    Abstract: Quantitative Susceptibility Map** (QSM) dipole inversion is an ill-posed inverse problem for quantifying magnetic susceptibility distributions from MRI tissue phases. While supervised deep learning methods have shown success in specific QSM tasks, their generalizability across different acquisition scenarios remains constrained. Recent developments in diffusion models have demonstrated potential… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  28. arXiv:2403.11505  [pdf, other

    eess.IV cs.CV cs.LG

    COVID-19 detection from pulmonary CT scans using a novel EfficientNet with attention mechanism

    Authors: Ramy Farag, Parth Upadhyay, Yixiang Gao, Jacket Demby, Katherin Garces Montoya, Seyed Mohamad Ali Tousi, Gbenga Omotara, Guilherme DeSouza

    Abstract: Manual analysis and diagnosis of COVID-19 through the examination of Computed Tomography (CT) images of the lungs can be time-consuming and result in errors, especially given high volume of patients and numerous images per patient. So, we address the need for automation of this task by develo** a new deep learning model-based pipeline. Our motivation was sparked by the CVPR Workshop on "Domain A… ▽ More

    Submitted 27 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  29. arXiv:2403.10012  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Real-World Computational Aberration Correction via Quantized Domain-Mixing Representation

    Authors: Qi Jiang, Zhonghua Yi, Shaohua Gao, Yao Gao, Xiaolong Qian, Hao Shi, Lei Sun, Zhijie Xu, Kailun Yang, Kaiwei Wang

    Abstract: Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervi… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Codes and datasets will be made publicly available at https://github.com/zju-jiangqi/QDMR

  30. arXiv:2403.09153  [pdf, other

    eess.SY

    Fairness-Aware Multi-Server Federated Learning Task Delegation over Wireless Networks

    Authors: Yulan Gao, Chao Ren, Han Yu

    Abstract: In the rapidly advancing field of federated learning (FL), ensuring efficient FL task delegation while incentivising FL client participation poses significant challenges, especially in wireless networks where FL participants' coverage is limited. Existing Contract Theory-based methods are designed under the assumption that there is only one FL server in the system (i.e., the monopoly market assump… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  31. arXiv:2403.06222  [pdf, other

    cs.RO eess.SY

    Robust Predictive Motion Planning by Learning Obstacle Uncertainty

    Authors: Jian Zhou, Yulong Gao, Ola Johansson, Björn Olofsson, Erik Frisk

    Abstract: Safe motion planning for robotic systems in dynamic environments is nontrivial in the presence of uncertain obstacles, where estimation of obstacle uncertainties is crucial in predicting future motions of dynamic obstacles. The worst-case characterization gives a conservative uncertainty prediction and may result in infeasible motion planning for the ego robotic system. In this paper, an efficient… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  32. arXiv:2403.06066  [pdf

    eess.IV cs.CV cs.LG

    CausalCellSegmenter: Causal Inference inspired Diversified Aggregation Convolution for Pathology Image Segmentation

    Authors: Dawei Fan, Yifan Gao, Jiaming Yu, Yan** Chen, Wencheng Li, Chuancong Lin, Kaibin Li, Changcai Yang, Riqing Chen, Lifang Wei

    Abstract: Deep learning models have shown promising performance for cell nucleus segmentation in the field of pathology image analysis. However, training a robust model from multiple domains remains a great challenge for cell nucleus segmentation. Additionally, the shortcomings of background noise, highly overlap** between cell nucleus, and blurred edges often lead to poor performance. To address these ch… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 10 pages, 5 figures, 2 tables, MICCAI

  33. arXiv:2402.15594  [pdf, other

    cs.CL cs.SD eess.AS

    Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model Improves End-to-End ASR

    Authors: **tao Jiang, Yingbo Gao, Mohammad Zeineldeen, Zoltan Tuske

    Abstract: In this paper, alternating weak triphone/BPE alignment supervision is proposed to improve end-to-end model training. Towards this end, triphone and BPE alignments are extracted using a pre-existing hybrid ASR system. Then, regularization effect is obtained by cross-entropy based intermediate auxiliary losses computed on such alignments at a mid-layer representation of the encoder for triphone alig… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 5 pages, 1 figure, 3 tables

  34. arXiv:2402.12746  [pdf, ps, other

    eess.AS cs.SD

    Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network

    Authors: Yanan Chen, Zihao Cui, Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang

    Abstract: The expectation to deploy a universal neural network for speech enhancement, with the aim of improving noise robustness across diverse speech processing tasks, faces challenges due to the existing lack of awareness within static speech enhancement frameworks regarding the expected speech in downstream modules. These limitations impede the effectiveness of static speech enhancement approaches in ac… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  35. arXiv:2402.11211  [pdf, other

    eess.IV cs.CV

    Training-free image style alignment for self-adapting domain shift on handheld ultrasound devices

    Authors: Hongye Zeng, Ke Zou, Zhihao Chen, Yuchong Gao, Hongbo Chen, Haibin Zhang, Kang Zhou, Meng Wang, Rick Siow Mong Goh, Yong Liu, Chang Jiang, Rui Zheng, Huazhu Fu

    Abstract: Handheld ultrasound devices face usage limitations due to user inexperience and cannot benefit from supervised deep learning without extensive expert annotations. Moreover, the models trained on standard ultrasound device data are constrained by training data distribution and perform poorly when directly applied to handheld device data. In this study, we propose the Training-free Image Style Align… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  36. arXiv:2402.09463  [pdf

    eess.IV

    Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results

    Authors: Kelly Payette, Céline Steger, Roxane Licandro, Priscille de Dumast, Hongwei Bran Li, Matthew Barkovich, Liu Li, Maik Dannecker, Chen Chen, Cheng Ouyang, Niccolò McConnell, Alina Miron, Yongmin Li, Alena Uus, Irina Grigorescu, Paula Ramirez Gilliland, Md Mahfuzur Rahman Siddiquee, Daguang Xu, Andriy Myronenko, Haoyu Wang, Ziyan Huang, ** Ye, Mireia Alenyà, Valentin Comte, Oscar Camara , et al. (42 additional authors not shown)

    Abstract: Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Results from FeTA Challenge 2022, held at MICCAI; Manuscript submitted. Supplementary Info (including submission methods descriptions) available here: https://zenodo.org/records/10628648

  37. arXiv:2402.02936  [pdf, other

    eess.IV cs.CV cs.LG cs.MM

    Panoramic Image Inpainting With Gated Convolution And Contextual Reconstruction Loss

    Authors: Li Yu, Yanjun Gao, Farhad Pakdaman, Moncef Gabbouj

    Abstract: Deep learning-based methods have demonstrated encouraging results in tackling the task of panoramic image inpainting. However, it is challenging for existing methods to distinguish valid pixels from invalid pixels and find suitable references for corrupted areas, thus leading to artifacts in the inpainted results. In response to these challenges, we propose a panoramic image inpainting framework t… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Copyright 2024 IEEE - to appear in IEEE ICASSP 2024

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

  38. arXiv:2401.15803  [pdf, other

    cs.RO cs.AI cs.CV eess.SY

    GarchingSim: An Autonomous Driving Simulator with Photorealistic Scenes and Minimalist Workflow

    Authors: Liguo Zhou, Yinglei Song, Yichao Gao, Zhou Yu, Michael Sodamin, Hongshen Liu, Liang Ma, Lian Liu, Hao Liu, Yang Liu, Haichuan Li, Guang Chen, Alois Knoll

    Abstract: Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and inte… ▽ More

    Submitted 30 January, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  39. arXiv:2401.09336  [pdf, other

    eess.IV cs.CV

    To deform or not: treatment-aware longitudinal registration for breast DCE-MRI during neoadjuvant chemotherapy via unsupervised keypoints detection

    Authors: Luyi Han, Tao Tan, Tianyu Zhang, Yuan Gao, Xin Wang, Valentina Longo, Sofía Ventura-Díaz, Anna D'Angelo, Jonas Teuwen, Ritse Mann

    Abstract: Clinicians compare breast DCE-MRI after neoadjuvant chemotherapy (NAC) with pre-treatment scans to evaluate the response to NAC. Clinical evidence supports that accurate longitudinal deformable registration without deforming treated tumor regions is key to quantifying tumor changes. We propose a conditional pyramid registration network based on unsupervised keypoint detection and selective volume-… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  40. arXiv:2401.07120  [pdf, other

    cs.NI eess.SP quant-ph

    Generative AI-enabled Quantum Computing Networks and Intelligent Resource Allocation

    Authors: Minrui Xu, Dusit Niyato, Jiawen Kang, Zehui Xiong, Yuan Cao, Yulan Gao, Chao Ren, Han Yu

    Abstract: Quantum computing networks enable scalable collaboration and secure information exchange among multiple classical and quantum computing nodes while executing large-scale generative AI computation tasks and advanced quantum algorithms. Quantum computing networks overcome limitations such as the number of qubits and coherence time of entangled pairs and offer advantages for generative AI infrastruct… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  41. arXiv:2401.03800  [pdf, other

    cs.CV eess.IV

    MvKSR: Multi-view Knowledge-guided Scene Recovery for Hazy and Rainy Degradation

    Authors: Dong Yang, Wenyu Xu, Yuan Gao, Yuxu Lu, **gming Zhang, Yu Guo

    Abstract: High-quality imaging is crucial for ensuring safety supervision and intelligent deployment in fields like transportation and industry. It enables precise and detailed monitoring of operations, facilitating timely detection of potential hazards and efficient management. However, adverse weather conditions, such as atmospheric haziness and precipitation, can have a significant impact on image qualit… ▽ More

    Submitted 8 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  42. arXiv:2401.02118  [pdf, other

    cs.IT eess.SP

    Radio Map-Based Spectrum Sharing for Joint Communication and Sensing

    Authors: Xionran Fang, Wei Feng, Yunfei Chen, Dingxi Yang, Ning Ge, Zhiyong Feng, Yue Gao

    Abstract: The sixth-generation (6G) network is expected to provide both communication and sensing (C&S) services. However, spectrum scarcity poses a major challenge to the harmonious coexistence of C&S systems. Without effective cooperation, the interference resulting from spectrum sharing impairs the performance of both systems. This paper addresses C&S interference within a distributed network. Different… ▽ More

    Submitted 27 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

  43. Integrated Sensing and Communication with Massive MIMO: A Unified Tensor Approach for Channel and Target Parameter Estimation

    Authors: Ruoyu Zhang, Lei Cheng, Shuai Wang, Yi Lou, Yulong Gao, Wen Wu, Derrick Wing Kwan Ng

    Abstract: Benefitting from the vast spatial degrees of freedom, the amalgamation of integrated sensing and communication (ISAC) and massive multiple-input multiple-output (MIMO) is expected to simultaneously improve spectral and energy efficiencies as well as the sensing capability. However, a large number of antennas deployed in massive MIMO-ISAC raises critical challenges in acquiring both accurate channe… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Journal ref: IEEE Transactions on Wireless Communications, 2024

  44. arXiv:2401.01092  [pdf, other

    cs.IT eess.SP

    IRS-Aided Multi-Antenna Wireless Powered Communications in Interference Channels

    Authors: Ying Gao, Qingqing WU, Wen Chen

    Abstract: This paper investigates intelligent reflecting surface (IRS)-aided multi-antenna wireless powered communications in a multi-link interference channel, where multiple IRSs are deployed to enhance the downlink/uplink communications between each pair of hybrid access point (HAP) and wireless device. Our objective is to maximize the system sum throughput by optimizing the allocation of communication r… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: 5 pages, submitted to an IEEE journal for possible publication

  45. arXiv:2401.01044  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

    Authors: **long Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent advancements in diffusion models and large language models (LLMs) have significantly propelled the field of AIGC. Text-to-Audio (TTA), a burgeoning AIGC application designed to generate audio from natural language prompts, is attracting increasing attention. However, existing TTA studies often struggle with generation quality and text-audio alignment, especially for complex textual inputs.… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Demo and implementation at https://auffusion.github.io

  46. arXiv:2312.16383  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Frame-level emotional state alignment method for speech emotion recognition

    Authors: Qifei Li, Yingming Gao, Cong Wang, Yayue Deng, **long Xue, Yichen Han, Ya Li

    Abstract: Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address th… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  47. arXiv:2312.15659  [pdf, other

    eess.IV

    Perceptual Quality Assessment for Video Frame Interpolation

    Authors: **liang Han, Xiongkuo Min, Yixuan Gao, Jun Jia, Lei Sun, Zuowei Cao, Yonglin Luo, Guangtao Zhai

    Abstract: The quality of frames is significant for both research and application of video frame interpolation (VFI). In recent VFI studies, the methods of full-reference image quality assessment have generally been used to evaluate the quality of VFI frames. However, high frame rate reference videos, necessities for the full-reference methods, are difficult to obtain in most applications of VFI. To evaluate… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 5 pages, 4 figures

    ACM Class: I.4.0

  48. arXiv:2311.14835  [pdf, other

    cs.SD cs.CL eess.AS

    Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR

    Authors: **tao Jiang, Yingbo Gao, Zoltan Tuske

    Abstract: In this paper, we aim to create weak alignment supervision from an existing hybrid system to aid the end-to-end modeling of automatic speech recognition. Towards this end, we use the existing hybrid ASR system to produce triphone alignments of the training audios. We then create a cross-entropy loss at a certain layer of the encoder using the derived alignments. In contrast to the general one-hot… ▽ More

    Submitted 30 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: 7 pages, 7 figures, and 5 tables

  49. arXiv:2311.13043  [pdf, other

    eess.AS

    FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimer's Speech Detection

    Authors: Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li

    Abstract: The early-stage Alzheimer's disease (AD) detection has been considered an important field of medical studies. Like traditional machine learning methods, speech-based automatic detection also suffers from data privacy risks because the data of specific patients are exclusive to each medical institution. A common practice is to use federated learning to protect the patients' data privacy. However, i… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: accepted in IEEE-ASRU2023

  50. arXiv:2311.11337  [pdf, other

    eess.SY math.OC

    H2 suboptimal containment control of homogeneous and heterogeneous multi-agent systems

    Authors: Yuan Gao, Junjie Jiao, Zhongkui Li, Sandra Hirche

    Abstract: This paper deals with the H2 suboptimal state containment control problem for homogeneous linear multi-agent systems and the H2 suboptimal output containment control problem for heterogeneous linear multi-agent systems. For both problems, given multiple autonomous leaders and a number of followers, we introduce suitable performance outputs and an associated H2 cost functional, respectively. The ai… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 15 papges, 7 figures