Skip to main content

Showing 1–50 of 186 results for author: Guo, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18538  [pdf, other

    cs.CV cs.AI eess.IV

    VideoQA-SC: Adaptive Semantic Communication for Video Question Answering

    Authors: Jiangyuan Guo, Wei Chen, Yuxuan Sun, Jialong Xu, Bo Ai

    Abstract: Although semantic communication (SC) has shown its potential in efficiently transmitting multi-modal data such as text, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal for downstream intelligent tasks. Moreover, SC systems without pixel-level video reconstruction present advantages by achieving higher bandwidth eff… ▽ More

    Submitted 17 May, 2024; originally announced June 2024.

  2. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2406.04791  [pdf, other

    cs.SD eess.AS

    Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

    Authors: Shaojun Li, Daimeng Wei, Jiaxin Guo, ZongYao Li, Zhanglin Wu, Zhiqiang Rao, Yuanchang Luo, Xianghui He, Hao Yang

    Abstract: Despite recent improvements in End-to-End Automatic Speech Recognition (E2E ASR) systems, the performance can degrade due to vocal characteristic mismatches between training and testing data, particularly with limited target speaker adaptation data. We propose a novel speaker adaptation approach Speaker-Smoothed kNN that leverages k-Nearest Neighbors (kNN) retrieval techniques to improve model out… ▽ More

    Submitted 11 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  4. arXiv:2406.02438  [pdf, other

    eess.AS cs.MM cs.SD

    CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

    Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, **g Guo, Tomoki Toda, Zhiyao Duan

    Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  5. arXiv:2405.14590  [pdf, other

    eess.IV cs.CV

    MAMOC: MRI Motion Correction via Masked Autoencoding

    Authors: Lennart Alexander Van der Goten, **gyu Guo, Kevin Smith

    Abstract: The presence of motion artifacts in magnetic resonance imaging (MRI) scans poses a significant challenge, where even minor patient movements can lead to artifacts that may compromise the scan's utility. This paper introduces Masked Motion Correction (MAMOC), a novel method designed to address the issue of Retrospective Artifact Correction (RAC) in motion-affected MRI brain scans. MAMOC uses masked… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  6. arXiv:2405.07023  [pdf, other

    eess.IV cs.CV

    Efficient Real-world Image Super-Resolution Via Adaptive Directional Gradient Convolution

    Authors: Long Peng, Yang Cao, Ren**g Pei, Wenbo Li, Jiaming Guo, Xueyang Fu, Yang Wang, Zheng-Jun Zha

    Abstract: Real-SR endeavors to produce high-resolution images with rich details while mitigating the impact of multiple degradation factors. Although existing methods have achieved impressive achievements in detail recovery, they still fall short when addressing regions with complex gradient arrangements due to the intensity-based linear weighting feature extraction manner. Moreover, the stochastic artifact… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  7. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  8. arXiv:2404.14063  [pdf, other

    cs.SD cs.LG cs.NE eess.AS

    LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search

    Authors: **yue Guo, Anna-Maria Christodoulou, Balint Laczko, Kyrre Glette

    Abstract: Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a me… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to GECCO 24 Companion

  9. arXiv:2404.13568  [pdf

    cs.SD eess.AS

    Sparse Direction of Arrival Estimation Method Based on Vector Signal Reconstruction with a Single Vector Sensor

    Authors: Jiabin Guo

    Abstract: This study investigates the application of single vector hydrophones in underwater acoustic signal processing for Direction of Arrival (DOA) estimation. Addressing the limitations of traditional DOA estimation methods in multi-source environments and under noise interference, this research proposes a Vector Signal Reconstruction (VSR) technique. This technique transforms the covariance matrix of s… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 20 pages

  10. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  11. arXiv:2404.10558  [pdf, other

    cs.AR eess.SP

    Decade-Bandwidth RF-Input Pseudo-Doherty Load Modulated Balanced Amplifier using Signal-Flow-Based Phase Alignment Design

    Authors: **zhu Gong, Jiachen Guo, Niteesh Bharadwaj Vangipurapu, Kenle Chen

    Abstract: This paper reports a first-ever decade-bandwidth pseudo-Doherty load-modulated balanced amplifier (PD-LMBA), designed for emerging 4G/5G communications and multi-band operations. By revisiting the LMBA theory using the signal-flow graph, a frequency-agnostic phase-alignment condition is found that is critical for ensuring intrinsically broadband load modulation behavior. This unique design methodo… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted for publication by IEEE Microwave and Wireless Technology Letters (not published). The IEEE copyright receipt is attached

  12. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  13. arXiv:2404.01716  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Effective internal language model training and fusion for factorized transducer model

    Authors: **xi Guo, Niko Moritz, Yingyi Ma, Frank Seide, Chunyang Wu, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

    Abstract: The internal language model (ILM) of the neural transducer has been widely studied. In most prior work, it is mainly used for estimating the ILM score and is subsequently subtracted during inference to facilitate improved integration with external language models. Recently, various of factorized transducer models have been proposed, which explicitly embrace a standalone internal language model for… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to ICASSP 2024

  14. arXiv:2403.20289  [pdf, other

    cs.CL cs.SD eess.AS

    Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation

    Authors: Fangxu Yu, Junjie Guo, Zhen Wu, Xinyu Dai

    Abstract: Emotion Recognition in Conversation (ERC) involves detecting the underlying emotion behind each utterance within a conversation. Effectively generating representations for utterances remains a significant challenge in this task. Recent works propose various models to address this issue, but they still struggle with differentiating similar emotions such as excitement and happiness. To alleviate thi… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by Findings of NAACL 2024

  15. Linear Hybrid Asymmetrical Load-Modulated Balanced Amplifier with Multi-Band Reconfigurability and Antenna-VSWR Resilience

    Authors: Jiachen Guo, Yuchen Cao, Kenle Chen

    Abstract: This paper presents the first-ever highly linear and load-insensitive three-way load-modulation power amplifier (PA) based on reconfigurable hybrid asymmetrical load modulated balanced amplifier (H-ALMBA). Through proper amplitude and phase controls, the carrier, control amplifier (CA), and two peaking balanced amplifiers (BA1 and BA2) can form a linear high-order load modulation over wide bandwid… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  16. arXiv:2403.11806  [pdf, other

    cs.IT eess.SP

    Fluid Antenna for Mobile Edge Computing

    Authors: Yi** Zuo, Jiajia Guo, Biyun Sheng, Chen Dai, Fu Xiao, Shi **

    Abstract: In the evolving environment of mobile edge computing (MEC), optimizing system performance to meet the growing demand for low-latency computing services is a top priority. Integrating fluidic antenna (FA) technology into MEC networks provides a new approach to address this challenge. This letter proposes an FA-enabled MEC scheme that aims to minimize the total system delay by leveraging the mobilit… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  17. arXiv:2403.10518  [pdf, other

    cs.CV cs.GR cs.SD eess.AS

    Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

    Authors: Ronghui Li, YuXiang Zhang, Yachao Zhang, Hongwen Zhang, Jie Guo, Yan Zhang, Yebin Liu, Xiu Li

    Abstract: We propose Lodge, a network capable of generating extremely long dance sequences conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion architecture, and propose the characteristic dance primitives that possess significant expressiveness as intermediate representations between two diffusion models. The first stage is global diffusion, which focuses on comprehending the… ▽ More

    Submitted 19 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024, Project page: https://li-ronghui.github.io/lodge

  18. arXiv:2403.07274  [pdf, other

    cs.IT eess.SP

    Achievable Rate Analysis and Optimization of Double-RIS Assisted Spatially Correlated MIMO with Statistical CSI

    Authors: Kaizhe Xu, Jiajia Guo, Jun Zhang, Shi **, Shaodan Ma

    Abstract: Reconfigurable intelligent surface (RIS) is a novel meta-material which can form a smart radio environment by dynamically altering reflection directions of the im**ing electromagnetic waves. In the prior literature, the inter-RIS links which also contribute to the performance of the whole system are usually neglected when multiple RISs are deployed. In this paper we investigate a general double-… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  19. arXiv:2402.18332  [pdf, other

    eess.SP

    Recursive GNNs for Learning Precoding Policies with Size-Generalizability

    Authors: Jia Guo, Chenyang Yang

    Abstract: Graph neural networks (GNNs) have been shown promising in optimizing power allocation and link scheduling with good size generalizability and low training complexity. These merits are important for learning wireless policies under dynamic environments, which partially come from the matched permutation equivariance (PE) properties of the GNNs to the policies to be learned. Nonetheless, it has been… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 37 pages, 8 figures

  20. arXiv:2402.09048  [pdf, other

    eess.SP

    Sensing in Bi-Static ISAC Systems with Clock Asynchronism: A Signal Processing Perspective

    Authors: Kai Wu, Jacopo Pegoraro, Francesca Meneghello, J. Andrew Zhang, Jesus O. Lacruz, Joerg Widmer, Francesco Restuccia, Michele Rossi, Xiao**g Huang, Daqing Zhang, Giuseppe Caire, Y. Jay Guo

    Abstract: Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks a… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 20 pages, 6 figures, 1 table

  21. arXiv:2401.09904  [pdf, ps, other

    eess.SP

    Distributed Task-Oriented Communication Networks with Multimodal Semantic Relay and Edge Intelligence

    Authors: Jie Guo, Hao Chen, Bin Song, Yuhao Chi, Chau Yuen, Fei Richard Yu, Geoffrey Ye Li, Dusit Niyato

    Abstract: In this article, we present a novel framework, named distributed task-oriented communication networks (DTCN), based on recent advances in multimodal semantic transmission and edge intelligence. In DTCN, the multimodal knowledge of semantic relays and the adaptive adjustment capability of edge intelligence can be integrated to improve task performance. Specifically, we propose the key techniques in… ▽ More

    Submitted 19 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 7 pages, 5 figures, 1 table, accepted by IEEE Communications Magazine

  22. arXiv:2401.09119  [pdf, other

    eess.SP

    Anchor-points Assisted Uplink Sensing in Perceptive Mobile Networks

    Authors: Yanmo Hu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

    Abstract: Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and ef… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 14 pages, 12 figures, journal paper

  23. arXiv:2401.09064  [pdf, other

    cs.IT eess.SP

    Performance Bounds and Optimization for CSI-Ratio based Bi-static Doppler Sensing in ISAC Systems

    Authors: Yanmo Hu, Kai Wu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo

    Abstract: Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the r… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 14 pages, 15 figures, journal paper

  24. arXiv:2401.06485  [pdf, other

    eess.AS cs.SD

    Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech

    Authors: Yu Xi, Baochen Yang, Hao Li, Jiaqi Guo, Kai Yu

    Abstract: Customizable keyword spotting (KWS) in continuous speech has attracted increasing attention due to its real-world application potential. While contrastive learning (CL) has been widely used to extract keyword representations, previous CL approaches all operate on pre-segmented isolated words and employ only audio-text representations matching strategy. However, for KWS in continuous speech, co-art… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP2024

  25. UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

    Authors: Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang

    Abstract: Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tu… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted in ICASSP 2023

  26. arXiv:2401.05000  [pdf

    eess.SP

    Map** Information in Feature Extraction Transformation for Chirp Signal

    Authors: Shuyi Gu, Zhenghua Luo, Lin Hu, Yilin Zhang, Junxiong Guo

    Abstract: Chirp signals have established diverse applications caused by the capable of producing time-dependent linear frequencies. Most feature extraction transformation methods for chirp signals focus on enhancing the performance of transform methods but neglecting the information derived from the transformation process. Consequently, they may fail to fully exploit the information from observations, resul… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 14 pages,10 figures

  27. Exploring Multi-Modal Control in Music-Driven Dance Generation

    Authors: Ronghui Li, Yuqin Dai, Yachao Zhang, Jun Li, Jian Yang, Jie Guo, Xiu Li

    Abstract: Existing music-driven 3D dance generation methods mainly concentrate on high-quality dance generation, but lack sufficient control during the generation process. To address these issues, we propose a unified framework capable of generating high-quality dance movements and supporting multi-modal control, including genre control, semantic control, and spatial control. First, we decouple the dance ge… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  28. arXiv:2312.10892  [pdf, other

    eess.IV cs.CV q-bio.QM

    Deep Learning-based MRI Reconstruction with Artificial Fourier Transform (AFT)-Net

    Authors: Yanting Yang, Jeffery Siyuan Tian, Matthieu Dagommer, Jia Guo

    Abstract: The deep complex-valued neural network provides a powerful way to leverage complex number operations and representations, which has succeeded in several phase-based applications. However, most previously published networks have not fully accessed the impact of complex-valued networks in the frequency domain. Here, we introduced a unified complex-valued deep learning framework - artificial Fourier… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  29. arXiv:2312.08343  [pdf

    eess.IV cs.CV q-bio.QM

    Enhancing CT Image synthesis from multi-modal MRI data based on a multi-task neural network framework

    Authors: Zhuoyao Xin, Christopher Wu, Dong Liu, Chunming Gu, Jia Guo, Jun Hua

    Abstract: Image segmentation, real-value prediction, and cross-modal translation are critical challenges in medical imaging. In this study, we propose a versatile multi-task neural network framework, based on an enhanced Transformer U-Net architecture, capable of simultaneously, selectively, and adaptively addressing these medical image tasks. Validation is performed on a public repository of human brain MR… ▽ More

    Submitted 17 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 4 pages, 3 figures, 2 tables

  30. arXiv:2312.08267  [pdf, other

    eess.IV cs.CV q-bio.QM

    TABSurfer: a Hybrid Deep Learning Architecture for Subcortical Segmentation

    Authors: Aaron Cao, Vishwanatha M. Rao, Kejia Liu, Xinru Liu, Andrew F. Laine, Jia Guo

    Abstract: Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propo… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, 2 tables

  31. arXiv:2312.05279  [pdf

    eess.IV cs.CV

    Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network

    Authors: Anbo Cao, Pin-Yu Le, Zhonghui Qie, Haseeb Hassan, Yingwei Guo, Asim Zaman, Jiaxi Lu, Xueqiang Zeng, Huihui Yang, Xiaoqiang Miao, Taiyu Han, Guangtao Huang, Yan Kang, Yu Luo, Jia Guo

    Abstract: Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  32. arXiv:2312.05119  [pdf, other

    eess.IV cs.CV

    Quantifying white matter hyperintensity and brain volumes in heterogeneous clinical and low-field portable MRI

    Authors: Pablo Laso, Stefano Cerri, Annabel Sorby-Adams, Jennifer Guo, Farrah Mateen, Philipp Goebl, Jiaming Wu, Peirong Liu, Hongwei Li, Sean I. Young, Benjamin Billot, Oula Puonti, Gordon Sze, Sam Payabavash, Adam DeHavenon, Kevin N. Sheth, Matthew S. Rosen, John Kirsch, Nicola Strisciuglio, Jelmer M. Wolterink, Arman Eshaghi, Frederik Barkhof, W. Taylor Kimberly, Juan Eugenio Iglesias

    Abstract: Brain atrophy and white matter hyperintensity (WMH) are critical neuroimaging features for ascertaining brain injury in cerebrovascular disease and multiple sclerosis. Automated segmentation and quantification is desirable but existing methods require high-resolution MRI with good signal-to-noise ratio (SNR). This precludes application to clinical and low-field portable MRI (pMRI) scans, thus hamp… ▽ More

    Submitted 15 February, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

  33. arXiv:2311.02865  [pdf, other

    cs.IT eess.SP

    Geometrically-Shaped Constellation for Visible Light Communications at Short Blocklength

    Authors: Jia-Ning Guo, Ru-Han Chen, Jian Zhang, Longguang Li, Xu Yang, **g Zhou

    Abstract: In this paper, we present a general framework of designing geometrically shaped constellations for short-packet visible light communications with a peak- and an average-intensity constraints. By leveraging tools from large deviation theory, we first characterize the second-order asymptotics of the optimal constellation sha** region under aforementioned intensity constraints, which serves as a go… ▽ More

    Submitted 28 April, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

  34. arXiv:2310.15548  [pdf, ps, other

    eess.SP

    Knowledge-driven Meta-learning for CSI Feedback

    Authors: Han Xiao, Wenqiang Tian, Wendong Liu, Jiajia Guo, Zhi Zhang, Shi **, Zhihua Shi, Li Guo, Jia Shen

    Abstract: Accurate and effective channel state information (CSI) feedback is a key technology for massive multiple-input and multiple-output systems. Recently, deep learning (DL) has been introduced for CSI feedback enhancement through massive collected training data and lengthy training time, which is quite costly and impractical for realistic deployment. In this article, a knowledge-driven meta-learning a… ▽ More

    Submitted 25 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: text overlap with arXiv:2301.13475

  35. arXiv:2309.13018  [pdf, other

    eess.AS cs.CL cs.SD

    Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

    Authors: Jiamin Xie, Ke Li, **xi Guo, Andros Tjandra, Yuan Shangguan, Leda Sari, Chunyang Wu, Junteng Jia, Jay Mahadeokar, Ozlem Kalinli

    Abstract: Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in… ▽ More

    Submitted 11 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

  36. arXiv:2309.10575  [pdf, ps, other

    cs.IT eess.SY

    AI/ML for Beam Management in 5G-Advanced

    Authors: Qing Xue, Jiajia Guo, Binggui Zhou, Yongjun Xu, Zhidu Li, Shaodan Ma

    Abstract: In beamformed wireless cellular systems such as 5G New Radio (NR) networks, beam management (BM) is a crucial operation. In the second phase of 5G NR standardization, known as 5G-Advanced, which is being vigorously promoted, the key component is the use of artificial intelligence (AI) based on machine learning (ML) techniques. AI/ML for BM is selected as a representative use case. This article pro… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 4 figures

  37. arXiv:2309.06981  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems

    Authors: Hanqing Guo, Xun Chen, Junfeng Guo, Li Xiao, Qiben Yan

    Abstract: Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by Mobicom 2023

  38. arXiv:2309.06681  [pdf

    eess.IV cs.AI

    A plug-and-play synthetic data deep learning for undersampled magnetic resonance image reconstruction

    Authors: Min Xiao, Zi Wang, Jiefeng Guo, Xiaobo Qu

    Abstract: Magnetic resonance imaging (MRI) plays an important role in modern medical diagnostic but suffers from prolonged scan time. Current deep learning methods for undersampled MRI reconstruction exhibit good performance in image de-aliasing which can be tailored to the specific k-space undersampling scenario. But it is very troublesome to configure different deep networks when the sampling setting chan… ▽ More

    Submitted 8 October, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 5 pages, 3 figures

  39. arXiv:2309.05423  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

    Authors: **zuomu Zhong, Yang Li, Hui Huang, Korin Richmond, Jie Liu, Zhiba Su, **g Guo, Benlai Tang, Fengjie Zhu

    Abstract: In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly improve the naturalness and controllability of synthesised speech. However, manual prosody annotation is labor-intensive and inconsistent. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. In the first stage, we use contrastive pretraining of Speech-Silenc… ▽ More

    Submitted 11 June, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

  40. arXiv:2309.04976  [pdf, other

    cs.LG cs.AI eess.SY

    AVARS -- Alleviating Unexpected Urban Road Traffic Congestion using UAVs

    Authors: Jiaying Guo, Michael R. Jones, Soufiene Djahel, Shen Wang

    Abstract: Reducing unexpected urban traffic congestion caused by en-route events (e.g., road closures, car crashes, etc.) often requires fast and accurate reactions to choose the best-fit traffic signals. Traditional traffic light control systems, such as SCATS and SCOOT, are not efficient as their traffic data provided by induction loops has a low update frequency (i.e., longer than 1 minute). Moreover, th… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  41. Reconfigurable Intelligent Surface Enabled Joint Backscattering and Communication

    Authors: **qiu Zhao, Jia Ye Shuaishuai Guo, Zhiquan Bai, Di Zhou, Abeer Mohamed

    Abstract: Reconfigurable intelligent surface (RIS) as an essential topic in the sixth-generation (6G) communications aims to enhance communication performance or mitigate undesired transmission. However, the controllability of each reflecting element on RIS also enables it to act as a passive backscatter device (BD) and transmit its information to reader devices. In this paper, we propose a RIS-enabled join… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 11 pages, 8 figures, published to IEEE TVT

    Journal ref: IEEE Transactions on Vehicular Technology, 2023

  42. arXiv:2308.07991  [pdf, other

    eess.SP cs.NI

    Demo: Reconfigurable Distributed Antennas and Reflecting Surface (RDARS)-aided Integrated Sensing and Communication System

    Authors: **tao Wang, Chengwang Ji, Jiajia Guo, Shaodan Ma

    Abstract: Integrated sensing and communication (ISAC) system has been envisioned as a promising technology to be applied in future applications requiring both communication and high-accuracy sensing. Different from most research focusing on theoretical analysis and optimization in the area of ISAC, we implement a reconfigurable distributed antennas and reflecting surfaces (RDARS)-aided ISAC system prototype… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 2 pages, 3 figures. Accepted by IEEE/CIC International Conference on Communications in China, Dalian, China, 2023

  43. A Survey of Beam Management for mmWave and THz Communications Towards 6G

    Authors: Qing Xue, Chengwang Ji, Shaodan Ma, Jiajia Guo, Yongjun Xu, Qianbin Chen, Wei Zhang

    Abstract: Communication in millimeter wave (mmWave) and even terahertz (THz) frequency bands is ushering in a new era of wireless communications. Beam management, namely initial access and beam tracking, has been recognized as an essential technique to ensure robust mmWave/THz communications, especially for mobile scenarios. However, narrow beams at higher carrier frequency lead to huge beam measurement ove… ▽ More

    Submitted 6 February, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

    Comments: accepted by IEEE Communications Surveys & Tutorials

  44. arXiv:2308.01146  [pdf, other

    cs.CV eess.IV

    UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation

    Authors: Qingsong Xu, Yilei Shi, Jianhua Guo, Chaojun Ouyang, Xiao Xiang Zhu

    Abstract: Change detection (CD) by comparing two bi-temporal images is a crucial task in remote sensing. With the advantages of requiring no cumbersome labeled change information, unsupervised CD has attracted extensive attention in the community. However, existing unsupervised CD approaches rarely consider the seasonal and style differences incurred by the illumination and atmospheric conditions in multi-t… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 16 pages, 7 figures, IEEE Transactions on Geoscience and Remote Sensing

  45. arXiv:2307.13220  [pdf

    eess.IV cs.AI physics.med-ph

    One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

    Authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Mei**g Lin, Jiefeng Guo, Congbo Cai, Zhong Chen , et al. (3 additional authors not shown)

    Abstract: Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 38 pages, 19 figures, 5 tables

  46. arXiv:2307.12799  [pdf, other

    eess.SY eess.SP

    Outage Performance of Multi-tier UAV Communication with Random Beam Misalignment

    Authors: Weihao Wang, Zesong Fei, **g Guo, Salman Durrani, Halim Yanikomeroglu

    Abstract: By exploiting the degree of freedom on the altitude, unmanned aerial vehicle (UAV) communication can provide ubiquitous communication for future wireless networks. In the case of concurrent transmission of multiple UAVs, the directional beamforming formed by multiple antennas is an effective way to reduce co-channel interference. However, factors such as airflow disturbance or estimation error for… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  47. arXiv:2307.12480  [pdf, other

    cs.LG eess.SP

    Learning Resource Allocation Policy: Vertex-GNN or Edge-GNN?

    Authors: Yao Peng, Jia Guo, Chenyang Yang

    Abstract: Graph neural networks (GNNs) update the hidden representations of vertices (called Vertex-GNNs) or hidden representations of edges (called Edge-GNNs) by processing and pooling the information of neighboring vertices and edges and combining to exploit topology information. When learning resource allocation policies, GNNs cannot perform well if their expressive power is weak, i.e., if they cannot di… ▽ More

    Submitted 23 December, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

  48. arXiv:2307.11795  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Prompting Large Language Models with Speech Recognition Abilities

    Authors: Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, **xi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

    Abstract: Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings,… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  49. arXiv:2307.10824  [pdf, other

    eess.IV cs.CV

    Parse and Recall: Towards Accurate Lung Nodule Malignancy Prediction like Radiologists

    Authors: Jianpeng Zhang, Xianghua Ye, Jianfeng Zhang, Yuxing Tang, Minfeng Xu, Jianfei Guo, Xin Chen, Zaiyi Liu, **gren Zhou, Le Lu, Ling Zhang

    Abstract: Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: MICCAI 2023

  50. arXiv:2307.07366  [pdf, other

    eess.IV

    Reconstructing Three-decade Global Fine-Grained Nighttime Light Observations by a New Super-Resolution Framework

    Authors: **yu Guo, Feng Zhang, Hang Zhao, Baoxiang Pan, Linlu Mei

    Abstract: Satellite-collected nighttime light provides a unique perspective on human activities, including urbanization, population growth, and epidemics. Yet, long-term and fine-grained nighttime light observations are lacking, leaving the analysis and applications of decades of light changes in urban facilities undeveloped. To fill this gap, we developed an innovative framework and used it to design a new… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.