Skip to main content

Showing 1–50 of 161 results for author: Gao, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.01517  [pdf, other

    eess.IV cs.CV cs.LG

    Centerline Boundary Dice Loss for Vascular Segmentation

    Authors: Pengcheng Shi, Jiesi Hu, Yanwu Yang, Zilve Gao, Wei Liu, Ting Ma

    Abstract: Vascular segmentation in medical imaging plays a crucial role in analysing morphological and functional assessments. Traditional methods, like the centerline Dice (clDice) loss, ensure topology preservation but falter in capturing geometric details, especially under translation and deformation. The combination of clDice with traditional Dice loss can lead to diameter imbalance, favoring larger ves… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: accepted by MICCAI 2024

  2. arXiv:2406.16189  [pdf, other

    eess.IV cs.CV

    Fuzzy Attention-based Border Rendering Network for Lung Organ Segmentation

    Authors: Sheng Zhang, Yang Nan, Yingying Fang, Shiyi Wang, Xiaodan Xing, Zhifan Gao, Guang Yang

    Abstract: Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  3. arXiv:2406.05839  [pdf, other

    eess.AS cs.AI

    MaLa-ASR: Multimedia-Assisted LLM-Based ASR

    Authors: Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen

    Abstract: As more and more information-rich data like video become available, utilizing multi-modal auxiliary information to enhance audio tasks has sparked widespread research interest. The recent surge in research on LLM-based audio models provides fresh perspectives for tackling audio tasks. Given that LLM can flexibly ingest multiple inputs, we propose MaLa-ASR, an LLM-based ASR model that can integrate… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  4. arXiv:2406.03438  [pdf, other

    cs.IT eess.SP

    CSI-GPT: Integrating Generative Pre-Trained Transformer with Federated-Tuning to Acquire Downlink Massive MIMO Channels

    Authors: Ye Zeng, Li Qiao, Zhen Gao, Tong Qin, Zhonghuai Wu, Sheng Chen, Mohsen Guizani

    Abstract: In massive multiple-input multiple-output (MIMO) systems, how to reliably acquire downlink channel state information (CSI) with low overhead is challenging. In this work, by integrating the generative pre-trained Transformer (GPT) with federated-tuning, we propose a CSI-GPT approach to realize efficient downlink CSI acquisition. Specifically, we first propose a Swin Transformer-based channel acqui… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  5. arXiv:2406.02518  [pdf, other

    cs.CV eess.IV

    DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering

    Authors: Zhongpai Gao, Benjamin Planche, Meng Zheng, Xiao Chen, Terrence Chen, Ziyan Wu

    Abstract: Digitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D CT volumes, widely used in preoperative settings but limited in intraoperative applications due to computational bottlenecks, especially for accurate but heavy physics-based Monte Carlo methods. While analytical DRR renderers offer greater efficiency, they overlook anisotropic X-ray image formation phenomena… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  6. arXiv:2406.01605  [pdf, other

    eess.IV cs.CV

    An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

    Authors: Zijun Gao, Qi Wang, Taiyuan Mei, Xiaohan Cheng, Yun Zi, Haowei Yang

    Abstract: The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the… ▽ More

    Submitted 26 May, 2024; originally announced June 2024.

  7. arXiv:2405.19889  [pdf, other

    eess.SP cs.IT cs.LG cs.MM

    Deep Joint Semantic Coding and Beamforming for Near-Space Airship-Borne Massive MIMO Network

    Authors: Minghui Wu, Zhen Gao, Zhaocheng Wang, Dusit Niyato, George K. Karagiannidis, Sheng Chen

    Abstract: Near-space airship-borne communication network is recognized to be an indispensable component of the future integrated ground-air-space network thanks to airships' advantage of long-term residency at stratospheric altitudes, but it urgently needs reliable and efficient Airship-to-X link. To improve the transmission efficiency and capacity, this paper proposes to integrate semantic communication wi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Major Revision by IEEE JSAC

  8. arXiv:2405.15969  [pdf, other

    cs.IT eess.SP

    Massive Digital Over-the-Air Computation for Communication-Efficient Federated Edge Learning

    Authors: Li Qiao, Zhen Gao, Mahdi Boloursaz Mashhadi, Deniz Gündüz

    Abstract: Over-the-air computation (AirComp) is a promising technology converging communication and computation over wireless networks, which can be particularly effective in model training, inference, and more emerging edge intelligence applications. AirComp relies on uncoded transmission of individual signals, which are added naturally over the multiple access channel thanks to the superposition property… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: To be published in the IEEE Journal on Selected Areas in Communications

  9. arXiv:2405.04476  [pdf, other

    eess.AS cs.SD

    BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals

    Authors: Lijun Wang, Yixian Lu, Ziyan Gao, Kai Li, Jianqiang Huang, Yuntao Kong, Shogo Okada

    Abstract: Room acoustic parameters (RAPs) and room physical parameters ( RPPs) are essential metrics for parameterizing the room acoustical characteristics (RAC) of a sound field around a listener's local environment, offering comprehensive indications for various applications. The current RAPs and RPPs estimation methods either fall short of covering broad real-world acoustic environments in the context of… ▽ More

    Submitted 16 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 13-page, Submitted to IEEE/ACM Transaction on Audio Speech and Language Processing (TASLP)

  10. arXiv:2405.02823  [pdf, other

    cs.IT eess.SP

    Reconfigurable Massive MIMO: Precoding Design and Channel Estimation in the Electromagnetic Domain

    Authors: Keke Ying, Zhen Gao, Yu Su, Tong Qin, Michail Matthaiou, Robert Schober

    Abstract: Reconfigurable massive multiple-input multiple-output (RmMIMO) technology offers increased flexibility for future communication systems by exploiting previously untapped degrees of freedom in the electromagnetic (EM) domain. The representation of the traditional spatial domain channel state information (sCSI) limits the insights into the potential of EM domain channel properties, constraining the… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: This work is being submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  11. arXiv:2404.10240  [pdf, other

    eess.SY

    Disturbance Rejection-Guarded Learning for Vibration Suppression of Two-Inertia Systems

    Authors: Fan Zhang, **feng Chen, Yu Hu, Zhiqiang Gao, Ge Lv, Qin Lin

    Abstract: Model uncertainty presents significant challenges in vibration suppression of multi-inertia systems, as these systems often rely on inaccurate nominal mathematical models due to system identification errors or unmodeled dynamics. An observer, such as an extended state observer (ESO), can estimate the discrepancy between the inaccurate nominal model and the true model, thus improving control perfor… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  12. arXiv:2404.00352  [pdf

    eess.IV

    Dependability Evaluation of Stable Diffusion with Soft Errors on the Model Parameters

    Authors: Zhen Gao, Lini Yuan, Pedro Reviriego, Shanshan Liu, Fabrizio Lombardi

    Abstract: Stable Diffusion is a popular Transformer-based model for image generation from text; it applies an image information creator to the input text and the visual knowledge is added in a step-by-step fashion to create an image that corresponds to the input text. However, this diffusion process can be corrupted by errors from the underlying hardware, which are especially relevant for implementations at… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 6 pages, 16 figures

  13. arXiv:2403.17256  [pdf, other

    cs.IT eess.SP

    Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models

    Authors: Li Qiao, Mahdi Boloursaz Mashhadi, Zhen Gao, Chuan Heng Foh, Pei Xiao, Mehdi Bennis

    Abstract: Generative foundation AI models have recently shown great success in synthesizing natural signals with high perceptual quality using only textual prompts and conditioning signals to guide the generation process. This enables semantic communications at extremely low data rates in future wireless networks. In this paper, we develop a latency-aware semantic communications framework with pre-trained g… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  14. arXiv:2403.12813  [pdf, other

    cs.IT eess.SP

    Knowledge and Data Dual-Driven Channel Estimation and Feedback for Ultra-Massive MIMO Systems under Hybrid Field Beam Squint Effect

    Authors: Kuiyu Wang, Zhen Gao, Sheng Chen, Boyu Ning, Gaojie Chen, Yu Su, Zhaocheng Wang, H. Vincent Poor

    Abstract: Acquiring accurate channel state information (CSI) at an access point (AP) is challenging for wideband millimeter wave (mmWave) ultra-massive multiple-input and multiple-output (UMMIMO) systems, due to the high-dimensional channel matrices, hybrid near- and far- field channel feature, beam squint effects, and imperfect hardware constraints, such as low-resolution analog-to-digital converters, and… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 17 pages, 22 figures, 3 tables

  15. arXiv:2403.11809  [pdf, other

    cs.IT eess.SP

    Sensing-Enhanced Channel Estimation for Near-Field XL-MIMO Systems

    Authors: Shicong Liu, Xianghao Yu, Zhen Gao, Jie Xu, Derrick Wing Kwan Ng, Shuguang Cui

    Abstract: Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. The spherical wavefront characteristics in the near field introduce additional degrees of freedom (DoFs), namely distance and angle, into the channel model, which leads to unique challenges in channe… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 14 pages, 9 figures

  16. arXiv:2403.05256  [pdf, other

    eess.IV cs.CV cs.LG

    DuDoUniNeXt: Dual-domain unified hybrid model for single and multi-contrast undersampled MRI reconstruction

    Authors: Ziqi Gao, Yue Zhang, Xinwen Liu, Kaiyan Li, S. Kevin Zhou

    Abstract: Multi-contrast (MC) Magnetic Resonance Imaging (MRI) reconstruction aims to incorporate a reference image of auxiliary modality to guide the reconstruction process of the target modality. Known MC reconstruction methods perform well with a fully sampled reference image, but usually exhibit inferior performance, compared to single-contrast (SC) methods, when the reference image is missing or of low… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures, 2 tables

  17. arXiv:2402.12816  [pdf, other

    eess.IV

    OMRA: Online Motion Resolution Adaptation to Remedy Domain Shift in Learned Hierarchical B-frame Coding

    Authors: Zong-Lin Gao, Sang NguyenQuang, Wen-Hsiao Peng, Xiem HoangVan

    Abstract: Learned hierarchical B-frame coding aims to leverage bi-directional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challenge. This issue arises from training the codec with small groups of pictures (GOP) but testing it on large GOPs. Specifically, the motion estimation network, when trained on small GO… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 7 pages, submitted to IEEE ICIP 2024

  18. arXiv:2402.10609  [pdf, other

    eess.IV cs.CV cs.LG

    U$^2$MRPD: Unsupervised undersampled MRI reconstruction by prompting a large latent diffusion model

    Authors: Ziqi Gao, S. Kevin Zhou

    Abstract: Implicit visual knowledge in a large latent diffusion model (LLDM) pre-trained on natural images is rich and hypothetically universal to natural and medical images. To test this hypothesis, we introduce a novel framework for Unsupervised Undersampled MRI Reconstruction by Prompting a pre-trained large latent Diffusion model ( U$^2$MRPD). Existing data-driven, supervised undersampled MRI reconstruc… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 17 pages, 6 figures, 5 tables, 2 pseudocodes

  19. arXiv:2402.08846  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

    Authors: Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

    Abstract: In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning f… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Working in progress and will open-source soon

  20. arXiv:2402.02410  [pdf, other

    eess.SP

    Block-Sparse Tensor Recovery

    Authors: Liyang Lu, Zhaocheng Wang, Zhen Gao, Sheng Chen, H. Vincent Poor

    Abstract: This work explores the fundamental problem of the recoverability of a sparse tensor being reconstructed from its compressed embodiment. We present a generalized model of block-sparse tensor recovery as a theoretical foundation, where concepts measuring holistic mutual incoherence property (MIP) of the measurement matrix set are defined. A representative algorithm based on the orthogonal matching p… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 53 pages, submitted to IEEE for possible publication

  21. arXiv:2401.16564  [pdf

    eess.SP

    Data and Physics driven Deep Learning Models for Fast MRI Reconstruction: Fundamentals and Methodologies

    Authors: Jiahao Huang, Yinzhe Wu, Fanwen Wang, Yingying Fang, Yang Nan, Cagan Alkan, Lei Xu, Zhifan Gao, Weiwen Wu, Lei Zhu, Zhaolin Chen, Peter Lally, Neal Bangerter, Kawin Setsompop, Yike Guo, Daniel Rueckert, Ge Wang, Guang Yang

    Abstract: Magnetic Resonance Imaging (MRI) is a pivotal clinical diagnostic tool, yet its extended scanning times often compromise patient comfort and image quality, especially in volumetric, temporal and quantitative scans. This review elucidates recent advances in MRI acceleration via data and physics-driven models, leveraging techniques from algorithm unrolling models, enhancement-based models, and plug-… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  22. arXiv:2401.09127  [pdf, other

    cs.IT eess.SP

    AI Empowered Channel Semantic Acquisition for 6G Integrated Sensing and Communication Networks

    Authors: Yifei Zhang, Zhen Gao, **g**g Zhao, Ziming He, Yunsheng Zhang, Chen Lu, Pei Xiao

    Abstract: Motivated by the need for increased spectral efficiency and the proliferation of intelligent applications, the sixth-generation (6G) mobile network is anticipated to integrate the dual-functions of communication and sensing (C&S). Although the millimeter wave (mmWave) communication and mmWave radar share similar multiple-input multiple-output (MIMO) architecture for integration, the full potential… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 9 pages, 5 figures, accepted by the IEEE journal

  23. arXiv:2401.00806  [pdf, other

    eess.SY

    Noise-Aware and Equitable Urban Air Traffic Management: An Optimization Approach

    Authors: Zhenyu Gao, Yue Yu, Qinshuang Wei, Ufuk Topcu, John-Paul Clarke

    Abstract: Urban air mobility (UAM), a transformative concept for the transport of passengers and cargo, faces several integration challenges in complex urban environments. Community acceptance of aircraft noise is among the most noticeable of these challenges when launching or scaling up a UAM system. Properly managing community noise is fundamental to establishing a UAM system that is environmentally and s… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: 30 pages, 15 figures

  24. arXiv:2401.00283  [pdf, other

    cs.IT eess.SP

    Near-Space Communications: the Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle

    Authors: Hongshan Liu, Tong Qin, Zhen Gao, Tianqi Mao, Keke Ying, Ziwei Wan, Li Qiao, Rui Na, Zhongxiang Li, Chun Hu, Yikun Mei, Tuan Li, Guanghui Wen, Lei Chen, Zhonghuai Wu, Ruiqi Liu, Gaojie Chen, Shuo Wang, Dezhi Zheng

    Abstract: This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis… ▽ More

    Submitted 4 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: 28 pages, 8 figures, 2 tables

  25. arXiv:2312.15829  [pdf, other

    eess.IV

    MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

    Authors: Yi-Hsin Chen, Hong-Sheng Xie, Cheng-Wei Chen, Zong-Lin Gao, Wen-Hsiao Peng, Martin Benjak, Jörn Ostermann

    Abstract: Conditional coding has lately emerged as the mainstream approach to learned video compression. However, a recent study shows that it may perform worse than residual coding when the information bottleneck arises. Conditional residual coding was thus proposed, creating a new school of thought to improve on conditional coding. Notably, conditional residual coding relies heavily on the assumption that… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  26. arXiv:2312.15185  [pdf, other

    cs.CL cs.HC cs.MM cs.SD eess.AS

    emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

    Authors: Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, **chao Li, Zhifu Gao, Shiliang Zhang, Xie Chen

    Abstract: We propose emotion2vec, a universal speech emotion representation model. emotion2vec is pre-trained on open-source unlabeled emotion data through self-supervised online distillation, combining utterance-level loss and frame-level loss during pre-training. emotion2vec outperforms state-of-the-art pre-trained universal models and emotion specialist models by only training linear layers for the speec… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: Code, checkpoints, and extracted features are available at https://github.com/ddlBoJack/emotion2vec

  27. arXiv:2312.14303  [pdf, other

    eess.SP cs.LG cs.NI

    Geo2SigMap: High-Fidelity RF Signal Map** Using Geographic Databases

    Authors: Yiming Li, Zeyu Li, Zhihui Gao, Tingjun Chen

    Abstract: Radio frequency (RF) signal map**, which is the process of analyzing and predicting the RF signal strength and distribution across specific areas, is crucial for cellular network planning and deployment. Traditional approaches to RF signal map** rely on statistical models constructed based on measurement data, which offer low complexity but often lack accuracy, or ray tracing tools, which prov… ▽ More

    Submitted 4 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

  28. arXiv:2312.11302  [pdf, other

    cs.IT eess.SP

    AFDM-SCMA: A Promising Waveform for Massive Connectivity over High Mobility Channels

    Authors: Qu Luo, Pei Xiao, Zilong Liu, Ziwei Wan, Thomos Nikolaos, Zhen Gao, Ziming He

    Abstract: This paper studies the affine frequency division multiplexing (AFDM)-empowered sparse code multiple access (SCMA) system, referred to as AFDM-SCMA, for supporting massive connectivity in high-mobility environments. First, by placing the sparse codewords on the AFDM chirp subcarriers, the input-output (I/O) relation of AFDM-SCMA systems is presented. Next, we delve into the generalized receiver des… ▽ More

    Submitted 11 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  29. arXiv:2312.05258  [pdf, other

    eess.IV cs.CV cs.LG q-bio.QM

    Automated Small Kidney Cancer Detection in Non-Contrast Computed Tomography

    Authors: William McGough, Thomas Buddenkotte, Stephan Ursprung, Zeyu Gao, Grant Stewart, Mireia Crispin-Ortuzar

    Abstract: This study introduces an automated pipeline for renal cancer (RC) detection in non-contrast computed tomography (NCCT). In the development of our pipeline, we test three detections models: a shape model, a 2D-, and a 3D axial-sample model. Training (n=1348) and testing (n=64) data were gathered from open sources (KiTS23, Abdomen1k, CT-ORG) and Cambridge University Hospital (CUH). Results from cros… ▽ More

    Submitted 24 November, 2023; originally announced December 2023.

  30. arXiv:2312.02372  [pdf, other

    eess.SP cs.LG

    On the Trade-Off between Stability and Representational Capacity in Graph Neural Networks

    Authors: Zhan Gao, Amanda Prorok, Elvin Isufi

    Abstract: Analyzing the stability of graph neural networks (GNNs) under topological perturbations is key to understanding their transferability and the role of each architecture component. However, stability has been investigated only for particular architectures, questioning whether it holds for a broader spectrum of GNNs or only for a few instances. To answer this question, we study the stability of EdgeN… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  31. arXiv:2312.01573  [pdf

    eess.IV cs.CV

    Survey on deep learning in multimodal medical imaging for cancer detection

    Authors: Yan Tian, Zhaocheng Xu, Yujun Ma, Wei** Ding, Ruili Wang, Zhihong Gao, Guohua Cheng, Linyang He, Xuran Zhao

    Abstract: The task of multimodal cancer detection is to determine the locations and categories of lesions by using different imaging techniques, which is one of the key research methods for cancer diagnosis. Recently, deep learning-based object detection has made significant developments due to its strength in semantic feature extraction and nonlinear function fitting. However, multimodal cancer detection r… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Journal ref: Neural Computing and Applications. 2023 Nov 29:1-6

  32. arXiv:2311.18173  [pdf

    eess.IV cs.CE cs.CV

    Quantification of cardiac capillarization in single-immunostained myocardial slices using weakly supervised instance segmentation

    Authors: Zhao Zhang, Xiwen Chen, William Richardson, Bruce Z. Gao, Abolfazl Razi, Tong Ye

    Abstract: Decreased myocardial capillary density has been reported as an important histopathological feature associated with various heart disorders. Quantitative assessment of cardiac capillarization typically involves double immunostaining of cardiomyocytes (CMs) and capillaries in myocardial slices. In contrast, single immunostaining of basement membrane components is a straightforward approach to simult… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  33. arXiv:2311.06770  [pdf, other

    cs.IT eess.SP

    Compressive Sensing-Based Grant-Free Massive Access for 6G Massive Communication

    Authors: Zhen Gao, Malong Ke, Yikun Mei, Li Qiao, Sheng Chen, Derrick Wing Kwan Ng, H. Vincent Poor

    Abstract: The advent of the sixth-generation (6G) of wireless communications has given rise to the necessity to connect vast quantities of heterogeneous wireless devices, which requires advanced system capabilities far beyond existing network architectures. In particular, such massive communication has been recognized as a prime driver that can empower the 6G vision of future ubiquitous connectivity, suppor… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Accepted by IEEE IoT Journal

  34. arXiv:2310.18180  [pdf, other

    cs.IT eess.SP

    DPSS-based Codebook Design for Near-Field XL-MIMO Channel Estimation

    Authors: Shicong Liu, Xianghao Yu, Zhen Gao, Derrick Wing Kwan Ng

    Abstract: Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. While accurate channel estimation is essential for beamforming and data detection, the unique characteristics of near-field channels pose additional challenges to the effective acquisition of channel… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 6 pages, 5 figures

  35. arXiv:2310.04673  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

    Authors: Jiaming Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, ** Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

    Abstract: Generative Pre-trained Transformer (GPT) models have achieved remarkable performance on various natural language processing tasks. However, there has been limited research on applying similar frameworks to audio tasks. Previously proposed large language models for audio tasks either lack sufficient quantitative evaluations, or are limited to tasks for recognizing and understanding audio content, o… ▽ More

    Submitted 10 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: 10 pages, under review

  36. arXiv:2309.16390  [pdf, other

    cs.CV cs.RO eess.IV

    An Enhanced Low-Resolution Image Recognition Method for Traffic Environments

    Authors: Zongcai Tan, Zhenhai Gao

    Abstract: Currently, low-resolution image recognition is confronted with a significant challenge in the field of intelligent traffic perception. Compared to high-resolution images, low-resolution images suffer from small size, low quality, and lack of detail, leading to a notable decrease in the accuracy of traditional neural network recognition algorithms. The key to low-resolution image recognition lies i… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  37. arXiv:2308.13943  [pdf, ps, other

    eess.SY

    Robust Control Barrier Functions for Safe Control Under Uncertainty Using Extended State Observer and Output Measurement

    Authors: **feng Chen, Zhiqiang Gao, Qin Lin

    Abstract: Control barrier functions-based quadratic programming (CBF-QP) is gaining popularity as an effective controller synthesis tool for safe control. However, the provable safety is established on an accurate dynamic model and access to all states. To address such a limitation, this paper proposes a novel design combining an extended state observer (ESO) with a CBF for safe control of a system with mod… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  38. arXiv:2308.13575  [pdf

    eess.SP cs.AI physics.optics

    FrFT based estimation of linear and nonlinear impairments using Vision Transformer

    Authors: Ting Jiang, Zheng Gao, Yizhao Chen, Zihe Hu, Ming Tang

    Abstract: To comprehensively assess optical fiber communication system conditions, it is essential to implement joint estimation of the following four critical impairments: nonlinear signal-to-noise ratio (SNRNL), optical signal-to-noise ratio (OSNR), chromatic dispersion (CD) and differential group delay (DGD). However, current studies only achieve identifying a limited number of impairments within a narro… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: 15 pages, 10 figures

  39. Aggregate Model of District Heating Network for Integrated Energy Dispatch: A Physically Informed Data-Driven Approach

    Authors: Shuai Lu, Zihang Gao, Yong Sun, Suhan Zhang, Baoju Li, Chengliang Hao, Yijun Xu, Wei Gu

    Abstract: The district heating network (DHN) is essential in enhancing the operational flexibility of integrated energy systems (IES). Yet, it is hard to obtain an accurate and concise DHN model for the operation owing to complicated network features and imperfect measurements. Considering this, this paper proposes a physical-ly informed data-driven aggregate model (AGM) for the DHN, providing a concise des… ▽ More

    Submitted 27 March, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

  40. arXiv:2308.03266  [pdf, other

    cs.SD cs.CL eess.AS

    SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

    Authors: Xian Shi, Yexin Yang, Zerui Li, Yanni Chen, Zhifu Gao, Shiliang Zhang

    Abstract: Hotword customization is one of the concerned issues remained in ASR field - it is of value to enable users of ASR systems to customize names of entities, persons and other phrases to obtain better experience. The past few years have seen effective modeling strategies for ASR contextualization developed, but they still exhibit space for improvement about training stability and the invisible activa… ▽ More

    Submitted 25 December, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

    Comments: accepted by ICASSP2024

  41. arXiv:2307.12266  [pdf, other

    cs.CL eess.SP

    Transformer-based Joint Source Channel Coding for Textual Semantic Communication

    Authors: Shicong Liu, Zhen Gao, Gaojie Chen, Yu Su, Lu Peng

    Abstract: The Space-Air-Ground-Sea integrated network calls for more robust and secure transmission techniques against jamming. In this paper, we propose a textual semantic transmission framework for robust transmission, which utilizes the advanced natural language processing techniques to model and encode sentences. Specifically, the textual sentences are firstly split into tokens using wordpiece algorithm… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

    Comments: 6 pages, 5 figures. Accepted by IEEE/CIC ICCC 2023

  42. arXiv:2307.10837  [pdf, other

    cs.IT eess.SP

    Sensing User's Activity, Channel, and Location with Near-Field Extra-Large-Scale MIMO

    Authors: Li Qiao, Anwen Liao, Zhuoran Li, Hua Wang, Zhen Gao, Xiang Gao, Yu Su, Pei Xiao, Li You, Derrick Wing Kwan Ng

    Abstract: This paper proposes a grant-free massive access scheme based on the millimeter wave (mmWave) extra-large-scale multiple-input multiple-output (XL-MIMO) to support massive Internet-of-Things (IoT) devices with low latency, high data rate, and high localization accuracy in the upcoming sixth-generation (6G) networks. The XL-MIMO consists of multiple antenna subarrays that are widely spaced over the… ▽ More

    Submitted 16 October, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: To appear in IEEE Transactions on Communications. Codes will be open to all on https://gaozhen16.github.io/ soon

  43. arXiv:2307.03070  [pdf, other

    eess.SP cs.AI cs.IT

    Hybrid Knowledge-Data Driven Channel Semantic Acquisition and Beamforming for Cell-Free Massive MIMO

    Authors: Zhen Gao, Shicong Liu, Yu Su, Zhongxiang Li, Dezhi Zheng

    Abstract: This paper focuses on advancing outdoor wireless systems to better support ubiquitous extended reality (XR) applications, and close the gap with current indoor wireless transmission capabilities. We propose a hybrid knowledge-data driven method for channel semantic acquisition and multi-user beamforming in cell-free massive multiple-input multiple-output (MIMO) systems. Specifically, we firstly pr… ▽ More

    Submitted 21 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: 15 pages, 15 figures

  44. arXiv:2306.05629  [pdf, other

    cs.IT eess.SY

    R-PMAC: A Robust Preamble Based MAC Mechanism Applied in Industrial Internet of Things

    Authors: Kai Song, Biqian Feng, Yongpeng Wu, Zhen Gao, Wenjun Zhang

    Abstract: This paper proposes a novel media access control (MAC) mechanism, called the robust preamble-based MAC mechanism (R-PMAC), which can be applied to power line communication (PLC) networks in the context of the Industrial Internet of Things (IIoT). Compared with other MAC mechanisms such as P-MAC and the MAC layer of IEEE1901.1, R-PMAC has higher networking speed. Besides, it supports whitelist auth… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: This paper has been accepted by IEEE Internet of Things Journal

  45. arXiv:2306.05581  [pdf, other

    eess.SY math.OC

    Risk-aware Urban Air Mobility Network Design with Overflow Redundancy

    Authors: Qinshuang Wei, Zhenyu Gao, John-Paul Clarke, Ufuk Topcu

    Abstract: Urban air mobility (UAM), as envisioned by aviation professionals, will transport passengers and cargo at low altitudes within urban and suburban areas. To operate in urban environments, precise air traffic management, in particular the management of traffic overflows due to physical and operational disruptions will be critical to ensuring system safety and efficiency. To this end, we propose UAM… ▽ More

    Submitted 23 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: 44 pages, 10 figures

  46. arXiv:2306.03407  [pdf, other

    eess.IV cs.CV

    LESS: Label-efficient Multi-scale Learning for Cytological Whole Slide Image Screening

    Authors: Beidi Zhao, Wenlong Deng, Zi Han, Li, Chen Zhou, Zuhua Gao, Gang Wang, Xiaoxiao Li

    Abstract: In computational pathology, multiple instance learning (MIL) is widely used to circumvent the computational impasse in giga-pixel whole slide image (WSI) analysis. It usually consists of two stages: patch-level feature extraction and slide-level aggregation. Recently, pretrained models or self-supervised learning have been used to extract patch features, but they suffer from low effectiveness or i… ▽ More

    Submitted 20 September, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: This paper was submitted to Medical Image Analysis. It is under review

  47. arXiv:2305.11260  [pdf, other

    eess.SY cs.LG cs.MA cs.RO

    Constrained Environment Optimization for Prioritized Multi-Agent Navigation

    Authors: Zhan Gao, Amanda Prorok

    Abstract: Traditional approaches to the design of multi-agent navigation algorithms consider the environment as a fixed constraint, despite the influence of spatial constraints on agents' performance. Yet hand-designing conducive environment layouts is inefficient and potentially expensive. The goal of this paper is to consider the environment as a decision variable in a system-level optimization problem, w… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2209.11279

  48. arXiv:2305.11013  [pdf, other

    cs.SD cs.CL eess.AS

    FunASR: A Fundamental End-to-End Speech Recognition Toolkit

    Authors: Zhifu Gao, Zerui Li, Jiaming Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang

    Abstract: This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications. FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications. The toolkit's flagship model, Paraformer, is a non-autoregressive end-to-end speech recognition model that has been trained on a manual… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: 5 pages, 3 figures, accepted by INTERSPEECH 2023

  49. arXiv:2305.10680  [pdf, other

    cs.SD cs.CL eess.AS

    Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

    Authors: Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan

    Abstract: Estimating confidence scores for recognition results is a classic task in ASR field and of vital importance for kinds of downstream tasks and training strategies. Previous end-to-end~(E2E) based confidence estimation models (CEM) predict score sequences of equal length with input transcriptions, leading to unreliable estimation when deletion and insertion errors occur. In this paper we proposed CI… ▽ More

    Submitted 24 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: 5 pages, 4 figures, Interspeech2023

  50. arXiv:2305.10609  [pdf, other

    cs.IT eess.SP

    Unsourced Massive Access-Based Digital Over-the-Air Computation for Efficient Federated Edge Learning

    Authors: Li Qiao, Zhen Gao, Zhongxiang Li, Deniz Gündüz

    Abstract: Over-the-air computation (OAC) is a promising technique to achieve fast model aggregation across multiple devices in federated edge learning (FEEL). In addition to the analog schemes, one-bit digital aggregation (OBDA) scheme was proposed to adapt OAC to modern digital wireless systems. However, one-bit quantization in OBDA can result in a serious information loss and slower convergence of FEEL. T… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: 2023 IEEE International Symposium on Information Theory (ISIT)