Skip to main content

Showing 1–50 of 453 results for author: Li, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.01018  [pdf, other

    cs.IT eess.SP

    Experimental Comparison of Average-Power Constrained and Peak-Power Constrained 64QAM under Optimal Clip** in 400Gbps Unamplified Coherent Links

    Authors: Wing-Chau Ng, Chuandong Li

    Abstract: We experimentally demonstrated an end-to-end link budget optimization over clip** in 400Gbps unamplified links, showing that the clipped MB distribution outperforms the peak-power constrained 64QAM by 1dB link budget.

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Submitted to European Conference on Optical Communications (ECOC) 2024

  2. arXiv:2406.18009  [pdf, other

    eess.AS cs.SD

    E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

    Abstract: This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.13979  [pdf, other

    eess.IV cs.CV cs.LG

    Knowledge-driven Subspace Fusion and Gradient Coordination for Multi-modal Learning

    Authors: Yupei Zhang, Xiaofei Wang, Fangliangzi Meng, ** Tang, Chao Li

    Abstract: Multi-modal learning plays a crucial role in cancer diagnosis and prognosis. Current deep learning based multi-modal approaches are often limited by their abilities to model the complex correlations between genomics and histology data, addressing the intrinsic complexity of tumour ecosystem where both tumour and microenvironment contribute to malignancy. We propose a biologically interpretative an… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.13471  [pdf, other

    eess.AS cs.SD

    Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

    Authors: Chenda Li, Samuele Cornell, Shinji Watanabe, Yanmin Qian

    Abstract: Diffusion-based generative models (DGMs) have recently attracted attention in speech enhancement research (SE) as previous works showed a remarkable generalization capability. However, DGMs are also computationally intensive, as they usually require many iterations in the reverse diffusion process (RDP), making them impractical for streaming SE systems. In this paper, we propose to use discriminat… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2406.12596  [pdf, ps, other

    eess.SP

    Beyond Near-Field: Far-Field Location Division Multiple Access in Downlink MIMO Systems

    Authors: Haoyan Liu, Caijian Jie, Min Yang, Chengguang Li

    Abstract: Exploring channel dimensions has been the driving force behind breakthroughs in successive generations of mobile communication systems. In 5G, space division multiple access (SDMA) leveraging massive MIMO has been crucial in enhancing system capacity through spatial differentiation of users. However, SDMA can only finely distinguish users at adjacent angles in ultra-dense networks by extremely lar… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  6. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao **, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, **g Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  7. arXiv:2406.09356  [pdf, other

    cs.CV eess.IV

    CMC-Bench: Towards a New Paradigm of Visual Signal Compression

    Authors: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

    Abstract: Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  8. arXiv:2406.09304  [pdf

    physics.app-ph eess.SP

    Self-reconfigurable Multifunctional Memristive Nociceptor for Intelligent Robotics

    Authors: Shengbo Wang, Mingchao Fang, Lekai Song, Cong Li, Jian Zhang, Arokia Nathan, Guohua Hu, Shuo Gao

    Abstract: Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  9. arXiv:2406.08634  [pdf, other

    eess.IV cs.CV cs.LG

    Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging Masked Predicted Auto-Encoder and Divergence Learning

    Authors: Zhongao Sun, Jiameng Li, Yuhan Wang, Jiarong Cheng, Qing Zhou, Chun Li

    Abstract: Brain tumor segmentation remains a significant challenge, particularly in the context of multi-modal magnetic resonance imaging (MRI) where missing modality images are common in clinical settings, leading to reduced segmentation accuracy. To address this issue, we propose a novel strategy, which is called masked predicted pre-training, enabling robust feature learning from incomplete modality data… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  10. arXiv:2406.07773  [pdf

    eess.IV

    Development of Focused X-ray Luminescence Compute Tomography Imaging

    Authors: Yile Fang, Yibing Zhang, Changqing Li

    Abstract: X-ray luminescence is produced when contrast agents absorb energy from X-ray photons and release a portion of that energy by emitting photons in the visible and near-infrared range. X-ray luminescence computed tomography (XLCT) was introduced in the past decade as a hybrid molecular imaging modality combining the merits of both X-ray imaging (high spatial resolution) and optical imaging (high sens… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  11. arXiv:2406.06434  [pdf, ps, other

    eess.IV cs.CV

    Spatiotemporal Graph Neural Network Modelling Perfusion MRI

    Authors: Ruodan Yan, Carola-Bibiane Schönlieb, Chao Li

    Abstract: Perfusion MRI (pMRI) offers valuable insights into tumor vascularity and promises to predict tumor genotypes, thus benefiting prognosis for glioma patients, yet effective models tailored to 4D pMRI are still lacking. This study presents the first attempt to model 4D pMRI using a GNN-based spatiotemporal model PerfGAT, integrating spatial information and temporal kinetics to predict Isocitrate DeHy… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 11 pages, 2 figures

  12. arXiv:2406.06375  [pdf, other

    cs.SD cs.AI eess.AS

    MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

    Authors: Yu-Fen Huang, Nikki Moran, Simon Coleman, Jon Kelly, Shun-Hwa Wei, Po-Yin Chen, Yun-Hsin Huang, Tsung-** Chen, Yu-Chia Kuo, Yu-Chi Wei, Chih-Hsuan Li, Da-Yu Huang, Hsuan-Kai Kao, Ting-Wei Lin, Li Su

    Abstract: In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music m… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. 14 pages, 7 figures. Dataset is available on: https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset/tree/main and https://zenodo.org/records/11393449

  13. arXiv:2406.04660  [pdf, other

    eess.AS cs.SD

    URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

    Authors: Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

    Abstract: The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this gap and promote research toward universal SE, we establish a new SE challenge, named URGENT, to focus on the universality, robustness, and generaliza… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 figures, 3 tables. Accepted by Interspeech 2024. An extended version of the accepted manuscript with appendix

  14. arXiv:2406.04350  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Prompt-guided Precise Audio Editing with Diffusion Models

    Authors: Manjie Xu, Chenxing Li, Duzhen zhang, Dan Su, Wei Liang, Dong Yu

    Abstract: Audio editing involves the arbitrary manipulation of audio content through precise control. Although text-guided diffusion models have made significant advancements in text-to-audio generation, they still face challenges in finding a flexible and precise way to modify target events within an audio track. We present a novel approach, referred to as PPAE, which serves as a general module for diffusi… ▽ More

    Submitted 11 May, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  15. arXiv:2406.04281  [pdf, other

    eess.AS

    Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, **yu Li, Sheng Zhao, Naoyuki Kanda

    Abstract: Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker characteristics, has been underexplored. In this work, we propose a novel total-duration-aware (TDA) duration model for TTS, where phoneme durations a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  16. arXiv:2406.04269  [pdf, other

    eess.AS cs.SD

    Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

    Authors: Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian

    Abstract: Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement.… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, 4 tables, Accepted by Interspeech 2024

  17. arXiv:2406.03002  [pdf, other

    eess.IV cs.CV

    Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis

    Authors: Juanhua Zhang, Ruodan Yan, Alessandro Perelli, Xi Chen, Chao Li

    Abstract: Diffusion MRI (dMRI) is an important neuroimaging technique with high acquisition costs. Deep learning approaches have been used to enhance dMRI and predict diffusion biomarkers through undersampled dMRI. To generate more comprehensive raw dMRI, generative adversarial network based methods are proposed to include b-values and b-vectors as conditions, but they are limited by unstable training and l… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  18. arXiv:2406.02918  [pdf, other

    eess.IV cs.CV

    U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation

    Authors: Chenxin Li, Xinyu Liu, Wuyang Li, Cheng Wang, Hengyu Liu, Yixuan Yuan

    Abstract: U-Net has become a cornerstone in various visual applications such as image segmentation and diffusion probability models. While numerous innovative designs and improvements have been introduced by incorporating transformers or MLPs, the networks are still limited to linearly modeling patterns as well as the deficient interpretability. To address these challenges, our intuition is inspired by the… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  19. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  20. arXiv:2406.00974  [pdf, other

    eess.SY

    Large Language Model Assisted Optimal Bidding of BESS in FCAS Market: An AI-agent based Approach

    Authors: Borui Zhang, Chaojie Li, Guo Chen, Zhaoyang Dong

    Abstract: To incentivize flexible resources such as Battery Energy Storage Systems (BESSs) to offer Frequency Control Ancillary Services (FCAS), Australia's National Electricity Market (NEM) has implemented changes in recent years towards shorter-term bidding rules and faster service requirements. However, firstly, existing bidding optimization methods often overlook or oversimplify the key aspects of FCAS… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2405.17716  [pdf, ps, other

    eess.SP

    Soft Multipath Information-Based UWB Tracking in Cluttered Scenarios: Preliminaries and Validations

    Authors: Chenglong Li, Zukun Lu, Long Huang, Shaojie Ni, Guangfu Sun, Emmeric Tanghe, Wout Joseph

    Abstract: In this paper, we investigate ultra-wideband (UWB) localization and tracking in cluttered environments. Instead of mitigating the multipath, we exploit the specular reflections to enhance the localizability and improve the positioning accuracy. With the assistance of the multipath, it is also possible to achieve localization purposes using fewer anchors or when the line-of-sight propagations are b… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  22. arXiv:2405.16664  [pdf

    eess.SP physics.med-ph

    Deep learning improved autofocus for motion artifact reduction and its application in quantitative susceptibility map**

    Authors: Chao Li, **wei Zhang, Hang Zhang, Jiahao Li, Pascal Spincemaille, Thanh D. Nguyen, Yi Wang

    Abstract: Purpose: To develop a pipeline for motion artifact correction in mGRE and quantitative susceptibility map** (QSM). Methods: Deep learning is integrated with autofocus to improve motion artifact suppression, which is applied QSM of patients with Parkinson's disease (PD). The estimation of affine motion parameters in the autofocus method depends on signal-to-noise ratio and lacks accuracy when dat… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  23. arXiv:2405.15863  [pdf, other

    cs.SD cs.AI eess.AS

    Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

    Authors: Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

    Abstract: In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering a novel approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, which often constitutes only a fraction of available datasets. Within open-source datasets, the prevalence of issues like mi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  24. arXiv:2405.07994  [pdf

    eess.IV cs.AI cs.CV cs.LG

    BubbleID: A Deep Learning Framework for Bubble Interface Dynamics Analysis

    Authors: Christy Dunlap, Changgen Li, Hari Pandey, Ngan Le, Han Hu

    Abstract: This paper presents BubbleID, a sophisticated deep learning architecture designed to comprehensively identify both static and dynamic attributes of bubbles within sequences of boiling images. By amalgamating segmentation powered by Mask R-CNN with SORT-based tracking techniques, the framework is capable of analyzing each bubble's location, dimensions, interface shape, and velocity over its lifetim… ▽ More

    Submitted 20 March, 2024; originally announced May 2024.

    Comments: 16 pages, 4 figures

  25. arXiv:2405.07216  [pdf, other

    eess.SY

    Magnetic-Guided Flexible Origami Robot toward Long-Term Phototherapy of H. pylori in the Stomach

    Authors: Sishen Yuan, Baijia Liang, Po Wa Wong, Ming**g Xu, Chi Hsuan Li, Zhen Li, Hongliang Ren

    Abstract: Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population. The efficacy of standard clinical eradication therapies is diminishing due to the rise of antibiotic-resistant strains, necessitating alternative treatment strategies. Photodynamic therapy (P… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: IEEE ICRA 2024

  26. arXiv:2405.06999  [pdf, other

    eess.SY

    Large Language Model-aided Edge Learning in Distribution System State Estimation

    Authors: Renyou Xie, Xin Yin, Chaojie Li, Nian Liu, Bo Zhao, Zhaoyang Dong

    Abstract: Distribution system state estimation (DSSE) plays a crucial role in the real-time monitoring, control, and operation of distribution networks. Besides intensive computational requirements, conventional DSSE methods need high-quality measurements to obtain accurate states, whereas missing values often occur due to sensor failures or communication delays. To address these challenging issues, a forec… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  27. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  28. arXiv:2405.04295  [pdf, other

    eess.IV cs.CV

    Semi-Supervised Disease Classification based on Limited Medical Image Data

    Authors: Yan Zhang, Chun Li, Zhaoxia Liu, Ming Li

    Abstract: In recent years, significant progress has been made in the field of learning from positive and unlabeled examples (PU learning), particularly in the context of advancing image and text classification tasks. However, applying PU learning to semi-supervised disease classification remains a formidable challenge, primarily due to the limited availability of labeled medical images. In the realm of medi… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  29. arXiv:2404.17926  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Pre-training on High Definition X-ray Images: An Experimental Study

    Authors: Xiao Wang, Yuehang Li, Wentao Wu, Jiandong **, Yao Rong, Bo Jiang, Chuanfu Li, ** Tang

    Abstract: Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficul… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Technology Report

  30. arXiv:2404.12973  [pdf, other

    eess.IV cs.CV cs.LG q-bio.QM

    Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics

    Authors: Xiaofei Wang, Xingxu Huang, Stephen J. Price, Chao Li

    Abstract: The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. Howeve… ▽ More

    Submitted 27 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  31. arXiv:2404.11275  [pdf, other

    cs.SD eess.AS

    Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation

    Authors: Ye Bai, Chenxing Li, Hao Li, Yuanyuan Zhao, Xiaorui Wang

    Abstract: In short video and live broadcasts, speech, singing voice, and background music often overlap and obscure each other. This complexity creates difficulties in structuring and recognizing the audio content, which may impair subsequent ASR and music understanding applications. This paper proposes a multi-task audio source separation (MTASS) based ASR model called JRSV, which Jointly Recognizes Speech… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by ICME 2024

  32. arXiv:2404.10180  [pdf, other

    cs.CL cs.AI cs.LG cs.NE eess.AS

    Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

    Authors: Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar

    Abstract: Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data. Attention-based biasing is a leading approach which allows for full end-to-end cotraining of the recognizer and biasing system and requires no separate inference-time components. Such biasers typically consist of… ▽ More

    Submitted 23 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 figures, accepted by NAACL 2024 - Industry Track

    Journal ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track

  33. arXiv:2404.05418  [pdf, other

    eess.SY

    Joint Active and Passive Beamforming for IRS-Aided Wireless Energy Transfer Network Exploiting One-Bit Feedback

    Authors: Taotao Ji, Meng Hua, Chunguo Li, Yongming Huang, Luxi Yang

    Abstract: To reap the active and passive beamforming gain in an intelligent reflecting surface (IRS)-aided wireless network, a typical way is to first acquire the channel state information (CSI) relying on the pilot signal, and then perform the joint beamforming design. However, it is a great challenge when the receiver can neither send pilot signals nor have complex signal processing capabilities due to it… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  34. arXiv:2404.05149  [pdf, other

    cs.ET eess.SP

    Intelligent Reflecting Surface Aided Target Localization With Unknown Transceiver-IRS Channel State Information

    Authors: Taotao Ji, Meng Hua, Xuanhong Yan, Chunguo Li, Yongming Huang, Luxi Yang

    Abstract: Integrating wireless sensing capabilities into base stations (BSs) has become a widespread trend in the future beyond fifth-generation (B5G)/sixth-generation (6G) wireless networks. In this paper, we investigate intelligent reflecting surface (IRS) enabled wireless localization, in which an IRS is deployed to assist a BS in locating a target in its non-line-of-sight (NLoS) region. In particular, w… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  35. arXiv:2404.03209  [pdf, other

    eess.IV

    CSR-dMRI: Continuous Super-Resolution of Diffusion MRI with Anatomical Structure-assisted Implicit Neural Representation Learning

    Authors: Ruoyou Wu, Jian Cheng, Cheng Li, Juan Zou, **g Yang, Wenxin Fan, Shanshan Wang

    Abstract: Deep learning-based dMRI super-resolution methods can effectively enhance image resolution by leveraging the learning capabilities of neural networks on large datasets. However, these methods tend to learn a fixed scale map** between low-resolution (LR) and high-resolution (HR) images, overlooking the need for radiologists to scale the images at arbitrary resolutions. Moreover, the pixel-wise lo… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 10 pages

  36. arXiv:2404.01671  [pdf

    eess.IV

    Automating Vessel Segmentation in the Heart and Brain: A Trend to Develop Multi-Modality and Label-Efficient Deep Learning Techniques

    Authors: Nazik Elsayed, Yousuf Babiker M. Osman, Cheng Li, Jiong Zhang, Shanshan Wang

    Abstract: Cardio-cerebrovascular diseases are the leading causes of mortality worldwide, whose accurate blood vessel segmentation is significant for both scientific research and clinical usage. However, segmenting cardio-cerebrovascular structures from medical images is very challenging due to the presence of thin or blurred vascular shapes, imbalanced distribution of vessel and non-vessel pixels, and inter… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 13 pages with 4 figures , and 2 table

  37. arXiv:2404.00780  [pdf, ps, other

    eess.SP

    Cooperative Gradient Coding for Collaborative Federated Learning

    Authors: Shudi Weng, Chengxi Li, Ming Xiao, Mikael Skoglund

    Abstract: We investigate federated learning (FL) in the presence of stragglers, with emphasis on wireless scenarios where the power-constrained edge devices collaboratively train a global model on their local datasets and transmit local model updates through fading channels. To tackle stragglers resulting from link disruptions without requiring accurate prior information on connectivity or dataset sharing,… ▽ More

    Submitted 22 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  38. arXiv:2403.17598  [pdf

    eess.SY

    Ultrafast Adaptive Primary Frequency Tuning and Secondary Frequency Identification for S/S WPT system

    Authors: Chang Liu, Wei Han, Guangyu Yan, Bowang Zhang, Chunlin Li

    Abstract: Magnetic resonance wireless power transfer (WPT) technology is increasingly being adopted across diverse applications. However, its effectiveness can be significantly compromised by parameter shifts within the resonance network, owing to its high system quality factor. Such shifts are inherent and challenging to mitigate during the manufacturing process. In response, this article introduces a rapi… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 11 pages,16 figures,to be published in IEEE Transactions on Industrial Electronics

  39. arXiv:2403.15803  [pdf, other

    eess.IV cs.CV

    Innovative Quantitative Analysis for Disease Progression Assessment in Familial Cerebral Cavernous Malformations

    Authors: Ruige Zong, Tao Wang, Chunwang Li, Xinlin Zhang, Yuanbin Chen, Longxuan Zhao, Qixuan Li, Qinquan Gao, Dezhi Kang, Fuxin Lin, Tong Tong

    Abstract: Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions ha… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  40. arXiv:2403.15418  [pdf, other

    eess.SP

    Stochastic Analysis of Touch-Tone Frequency Recognition in Two-Way Radio Systems for Dialed Telephone Number Identification

    Authors: Liqiang Yu, Chen Li, Bo Liu, Chang Che

    Abstract: This paper focuses on recognizing dialed numbers in a touch-tone telephone system based on the Dual Tone MultiFrequency (DTMF) signaling technique with analysis of stochastic aspects during the noise and random duration of characters. Each dialed digit's acoustic profile is derived from a composite of two carrier frequencies, distinctly assigned to represent that digit. The identification of each… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: It is accepted by The 7th International Conference on Advanced Algorithms and Control Engineering (ICAACE 2024)

  41. arXiv:2403.14905  [pdf, other

    eess.SP cs.CR cs.LG

    Adaptive Coded Federated Learning: Privacy Preservation and Straggler Mitigation

    Authors: Chengxi Li, Ming Xiao, Mikael Skoglund

    Abstract: In this article, we address the problem of federated learning in the presence of stragglers. For this problem, a coded federated learning framework has been proposed, where the central server aggregates gradients received from the non-stragglers and gradient computed from a privacy-preservation global coded dataset to mitigate the negative impact of the stragglers. However, when aggregating these… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  42. arXiv:2403.14523  [pdf, other

    eess.IV cs.CV

    Invisible Needle Detection in Ultrasound: Leveraging Mechanism-Induced Vibration

    Authors: Chenyang Li, Dianye Huang, Angelos Karlas, Nassir Navab, Zhongliang Jiang

    Abstract: In clinical applications that involve ultrasound-guided intervention, the visibility of the needle can be severely impeded due to steep insertion and strong distractors such as speckle noise and anatomical occlusion. To address this challenge, we propose VibNet, a learning-based framework tailored to enhance the robustness and accuracy of needle detection in ultrasound images, even when the target… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  43. arXiv:2403.06700  [pdf, other

    eess.IV

    Enhancing Adversarial Training with Prior Knowledge Distillation for Robust Image Compression

    Authors: Zhi Cao, Youneng Bao, Fanyang Meng, Chao Li, Wen Tan, Genhong Wang, Yongsheng Liang

    Abstract: Deep neural network-based image compression (NIC) has achieved excellent performance, but NIC method models have been shown to be susceptible to backdoor attacks. Adversarial training has been validated in image compression models as a common method to enhance model robustness. However, the improvement effect of adversarial training on model robustness is limited. In this paper, we propose a prior… ▽ More

    Submitted 15 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  44. arXiv:2403.03809  [pdf, other

    eess.SP

    Variational Bayesian Learning based Joint Localization and Channel Estimation with Distance-dependent Noise

    Authors: Yunfei Li, Yiting Luo, Weiqiang Tan, Chunguo Li, Shaodan Ma, Guanghua Yang

    Abstract: In the Industrial Internet of Things (IIoTs) and Ocean of Things (OoTs), the advent of massive intelligent services has imposed stringent requirements on both communication and localization, particularly emphasizing precise localization and channel information. This paper focuses on the challenge of jointly optimizing localization and communication in IoT networks. Departing from the conventional… ▽ More

    Submitted 6 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  45. arXiv:2403.01529  [pdf, other

    cs.RO eess.SY

    Deep Incremental Model Based Reinforcement Learning: A One-Step Lookback Approach for Continuous Robotics Control

    Authors: Cong Li

    Abstract: Model-based reinforcement learning (MBRL) attempts to use an available or a learned model to improve the data efficiency of reinforcement learning. This work proposes a one-step lookback approach that jointly learns the latent-space model and the policy to realize the sample-efficient continuous robotic control, wherein the control-theoretical knowledge is utilized to decrease the model learning d… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  46. arXiv:2402.17281  [pdf, other

    eess.SP

    GAN Based Near-Field Channel Estimation for Extremely Large-Scale MIMO Systems

    Authors: Ming Ye, Xiao Liang, Cunhua Pan, Yinfei Xu, Ming Jiang, Chunguo Li

    Abstract: Extremely large-scale multiple-input-multiple-output (XL-MIMO) is a promising technique to achieve ultra-high spectral efficiency for future 6G communications. The mixed line-of-sight (LoS) and non-line-of-sight (NLoS) XL-MIMO near-field channel model is adopted to describe the XL-MIMO near-field channel accurately. In this paper, a generative adversarial network (GAN) variant based channel estima… ▽ More

    Submitted 17 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 13 pages, 9 figures, 3 tables, accepted by IEEE TGCN

  47. arXiv:2402.16749  [pdf, other

    cs.CV cs.AI eess.IV

    MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

    Authors: Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang

    Abstract: With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To… ▽ More

    Submitted 17 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 13 page, 11 figures, 4 tables

  48. arXiv:2402.16747  [pdf

    physics.bio-ph eess.SY q-bio.QM

    Recent progress in the physical principles of dynamic ground self-righting

    Authors: Chen Li

    Abstract: Animals and robots must self-right on the ground after overturning. Biology research described various strategies and motor patterns in many species. Robotics research devised many strategies. However, we do not well understand how the physical principles of how the need to generate mechanical energy to overcome the potential energy barrier governs behavioral strategies and 3-D body rotations give… ▽ More

    Submitted 10 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: In review at Integrative & Comparative Biology

  49. arXiv:2402.15335  [pdf, other

    eess.IV cs.CV cs.LG

    Low-Rank Representations Meets Deep Unfolding: A Generalized and Interpretable Network for Hyperspectral Anomaly Detection

    Authors: Chenyu Li, Bing Zhang, Danfeng Hong, **g Yao, Jocelyn Chanussot

    Abstract: Current hyperspectral anomaly detection (HAD) benchmark datasets suffer from low resolution, simple background, and small size of the detection data. These factors also limit the performance of the well-known low-rank representation (LRR) models in terms of robustness on the separation of background and target features and the reliance on manual parameter selection. To this end, we build a new set… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  50. arXiv:2402.14349  [pdf, other

    eess.IV cs.CV cs.LG

    Uncertainty-driven and Adversarial Calibration Learning for Epicardial Adipose Tissue Segmentation

    Authors: Kai Zhao, Zhiming Liu, Jiaqi Liu, **gbiao Zhou, Bihong Liao, Huifang Tang, Qiuyu Wang, Chunquan Li

    Abstract: Epicardial adipose tissue (EAT) is a type of visceral fat that can secrete large amounts of adipokines to affect the myocardium and coronary arteries. EAT volume and density can be used as independent risk markers measurement of volume by noninvasive magnetic resonance images is the best method of assessing EAT. However, segmenting EAT is challenging due to the low contrast between EAT and pericar… ▽ More

    Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 13 pages,7 figuers